opendevreview | Merged openstack/openstack-ansible-galera_server master: Convert xinetd clustercheck to systemd socket service https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/824042 | 00:44 |
---|---|---|
*** dviroel|ruck|afk is now known as dviroel|ruck | 00:48 | |
*** dviroel|ruck is now known as dviroel|ruck|out | 00:57 | |
*** dviroel|ruck|out is now known as dviroel|out | 00:57 | |
opendevreview | Bhagyashri Shewale proposed openstack/openstack-ansible-os_tempest master: Move zuul jobs layout to centos9 only for master branch https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/828449 | 03:27 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Drop nova_glance_api_servers variable https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/828460 | 06:55 |
jrosser | calico is broken on victoria "oslo_config.cfg.NoSuchOptError: no such option report_interval in group [AGENT]" | 07:00 |
noonedeadpunk | I'd say it's broken everywhere. Just NV now | 07:02 |
noonedeadpunk | I was trying to dig one day but didn't find where it get's (or it was some lazy loading with no way to overcome) | 07:03 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_neutron stable/victoria: Remove legacy centos-8 jobs https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/827483 | 07:03 |
jrosser | maybe time to think if we keep support or not | 07:04 |
noonedeadpunk | the only occurance was https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/templates/metering_agent.ini.j2#L15 but even dropping this file didn't help | 07:04 |
jrosser | it is not really cmopatible with internal VIP ssl either | 07:04 |
noonedeadpunk | I tend to agree here | 07:04 |
jrosser | becasue of instances wanting metadata on http and calico not running an haproxy for metadata | 07:04 |
noonedeadpunk | I think ovn kind of same? | 07:05 |
jrosser | potentially, i really dont know much about it | 07:06 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Rename RBD cinder backend https://review.opendev.org/c/openstack/openstack-ansible/+/828463 | 07:11 |
noonedeadpunk | but calico interest is really limited I believe. | 07:11 |
noonedeadpunk | well, evrardjp was talking about it recently, so likely need to double check before saying for sure :) | 07:12 |
jrosser | ok well like all this stuff it needs maintainance effort | 07:35 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Remove secure_proxy_ssl_header logic https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/828467 | 07:42 |
noonedeadpunk | I think this needs to be double checked as maybe we need to jsut apply logic in other place ^ | 07:42 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Switch keystone logging to syslog https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/828469 | 07:58 |
jrosser | i'm getting good value out of the infra scenario tests for the ssh keypairs stuff | 08:34 |
jrosser | its already testing the repo sync as part of that so shows up some bugs on centos-8 | 08:35 |
noonedeadpunk | who was surprised about centos-related hickups | 08:40 |
noonedeadpunk | *hiccups | 08:40 |
*** sshnaidm|afk is now known as sshnaidm | 08:54 | |
opendevreview | Merged openstack/openstack-ansible-openstack_hosts stable/victoria: Assume centos version is at least 8.3 https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/828346 | 10:06 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Use uwsgi role for keystone https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/828510 | 10:10 |
opendevreview | Merged openstack/openstack-ansible-lxc_hosts stable/xena: Replace CentOS 8 with Stream jobs https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/828095 | 10:21 |
opendevreview | Merged openstack/openstack-ansible-lxc_hosts stable/wallaby: Ensure that the legacy network-scripts package is present https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/828236 | 10:27 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_horizon master: Move Listen definition to VHosts https://review.opendev.org/c/openstack/openstack-ansible-os_horizon/+/828515 | 10:49 |
opendevreview | Merged openstack/openstack-ansible stable/xena: Fix additional facts gathering in ceph-install.yml https://review.opendev.org/c/openstack/openstack-ansible/+/828392 | 11:10 |
*** dviroel|out is now known as dviroel|ruck | 11:10 | |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Define X-Forwarded-Proto for keystone https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/828518 | 11:19 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Drop ProxyPass out of VHost https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/828519 | 11:44 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_horizon master: Move Listen definition to VHosts https://review.opendev.org/c/openstack/openstack-ansible-os_horizon/+/828515 | 11:49 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Do not run rsyslog against RabbitMQ https://review.opendev.org/c/openstack/openstack-ansible/+/826347 | 12:29 |
noonedeadpunk | would be awesome to get another review on https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/826338/ :) | 12:30 |
*** akahat|rover is now known as akahat|PTO | 14:11 | |
jrosser | is this a thing? lsyncd[7554]: rsync: failed to open "/var/www/repo/repo_prepost_cmd.sh", continuing: Permission denied (13) | 14:23 |
opendevreview | Merged openstack/openstack-ansible-lxc_hosts stable/wallaby: Replace CentOS 8 with Stream jobs https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/827966 | 14:28 |
jrosser | oh thats confusing, lsyncd writes some stuff to the journal and the most of it to /var/log/lsyncd/lsyncd.log | 14:29 |
noonedeadpunk | whaaat | 14:54 |
jamesdenton | good morning | 14:55 |
jrosser | o/ hello | 14:55 |
damiandabrowski[m] | hey! | 14:55 |
jamesdenton | my bouncer died, and i didn't really notice | 14:55 |
jamesdenton | :| | 14:56 |
jamesdenton | anything new? | 14:57 |
jrosser | well i would make some centos related comment, but thats just nothing new :) | 14:59 |
jrosser | this maybe https://review.opendev.org/c/openstack/openstack-ansible/+/828386 | 14:59 |
jrosser | ^ that blew up quite badly on stable branches | 14:59 |
jamesdenton | hmm | 15:00 |
noonedeadpunk | should we wait for master patch before merging it? | 15:02 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible stable/xena: Remove enablement of neutron tempest plugin in scenario templates https://review.opendev.org/c/openstack/openstack-ansible/+/828548 | 15:02 |
jrosser | tada! | 15:02 |
jamesdenton | was it some particular test causing issues? | 15:03 |
noonedeadpunk | it was like neutron-lib and tempest plugin being incompatible I guess | 15:03 |
noonedeadpunk | as it didn't even come to tests) | 15:04 |
jrosser | it installed master version of the plugin which then tries to test non existing things in older neutron iirc | 15:04 |
jamesdenton | i don't really know how their tags work, seems like the latest one stops ~train | 15:05 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_neutron stable/xena: DNM - test https://review.opendev.org/c/openstack/openstack-ansible/+/828548 https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/828549 | 15:07 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible stable/xena: Remove enablement of neutron tempest plugin in scenario templates https://review.opendev.org/c/openstack/openstack-ansible/+/828548 | 15:09 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible stable/wallaby: Remove enablement of neutron tempest plugin in scenario templates https://review.opendev.org/c/openstack/openstack-ansible/+/828551 | 15:10 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_neutron stable/xena: DNM - test https://review.opendev.org/c/openstack/openstack-ansible/+/828548 https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/828549 | 15:10 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_neutron stable/wallaby: DNM - test https://review.opendev.org/c/openstack/openstack-ansible/+/828551 https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/828552 | 15:12 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove enablement of neutron tempest plugin in scenario templates https://review.opendev.org/c/openstack/openstack-ansible/+/828553 | 15:47 |
spatel | jamesdenton around? | 16:00 |
jamesdenton | yes | 16:00 |
spatel | I have question related STP enable/disable on bridge with ubuntu netplan - https://paste.opendev.org/show/bzIbTv4XYyKh6oySYFfI/ | 16:01 |
spatel | brctl show - saying STP is not enable | 16:01 |
spatel | netplan - default config saying stp is enabled | 16:01 |
spatel | netplan doc saying STP is enabled by default | 16:02 |
spatel | how should i prove that its really really disabled | 16:02 |
jamesdenton | hmm, you might try 'bridge -d link show <br>' | 16:03 |
jamesdenton | i believe 'state' reflects STP state | 16:04 |
spatel | here is the output - https://paste.opendev.org/show/bFHvFO0JBEgn5LzVmOKn/ | 16:05 |
spatel | trying to understand what flag indicate stp is active | 16:06 |
spatel | learning on flood on ??? | 16:06 |
spatel | state forwarding priority 32 cost 2 | 16:07 |
spatel | does that means STP is enabled? | 16:07 |
spatel | jamesdenton we had network loop and i believe this could be the issue.. | 16:08 |
NeilHanlon | spatel: if state is anything but 0 (DISABLED), then STP is enabled | 16:12 |
NeilHanlon | `state forwarding` is spanning tree forwarding | 16:12 |
spatel | hmm very odd then.. | 16:13 |
NeilHanlon | if you're using a bridge with two interfaces, or if you bridged two interfaces on the same LAN, then you can cause loops, yes | 16:14 |
spatel | neutron create tap interface they are always showing STP on | 16:14 |
spatel | i do have bond interface active-backup mode | 16:14 |
NeilHanlon | The best thing to do is to never flood BPDUs to the devices unless you have to for some reason | 16:14 |
spatel | what is the best practice to disable STP for everything on compute node? | 16:15 |
spatel | if i disable STP on bond0 then it should disable underlying bridges/vlans or not? | 16:18 |
jrosser | heres a little something we cooked up with openstack-ansible https://superuser.openstack.org/articles/environmental-reporting-dashboards-for-openstack-from-bbc-rd/ | 16:20 |
jamesdenton | spatel it's probably in your best interest to leave the default (stp on) | 16:21 |
spatel | hmm | 16:22 |
jamesdenton | i wouldn't trust brctl for accurate info, i think it was deprecated a while back in favor of iproute2 (bridge) | 16:22 |
spatel | we have noticed one of our compute node lock up because of memory and same time switch block entire vlan on that rack | 16:23 |
spatel | now i started thinking about STP in bridge.. may be it created some kind of loop because i have bond interface and if stp is enable then it will do damage correct? | 16:23 |
jamesdenton | nice article jrosser | 16:24 |
spatel | i don't know i am just making up some story | 16:24 |
jrosser | jamesdenton: thankyou :) | 16:24 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add ssh_keypairs role https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/825113 | 16:40 |
NeilHanlon | spatel: linux will dutifully process and flood Spanning Tree Bridge Protocol Data Units (BPDUs) out other interfaces in a bridge--that's what it's supposed to do because it has to ensure that the data is flooded through the entire tree. I've seen (and caused) broadcast storms due to this exact thing ;) | 16:40 |
opendevreview | Merged openstack/openstack-ansible master: Remove symlinking of selinux libraries into the ansible-runtime venv https://review.opendev.org/c/openstack/openstack-ansible/+/827556 | 16:40 |
spatel | I do have BPDU-Protection on my edge interface of switch but still no sure what happened to that box when i crashed | 16:41 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-repo_server master: Use ssh_keypairs role to generate keys for repo sync https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/827100 | 16:42 |
spatel | looking for some kernel watchdog config if kernel shutdown machine during any crash then it would be good | 16:42 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_nova master: Use ssh_keypairs role to generate cold migration ssh keys https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/825306 | 16:44 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_keystone master: Use ssh_keypairs role to generate fernet sync ssh keys https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/827090 | 16:45 |
jamesdenton | sorry spatel - just finished digging myself out of a hole i created with OVN. | 19:46 |
spatel | :) tell me the story | 19:47 |
spatel | jamesdenton ^ | 19:48 |
spatel | I am running OVN on production so i would like to know that | 19:49 |
jamesdenton | i swapped out a node but kept the same name/ips/etc | 19:52 |
jamesdenton | chassis id changed | 19:53 |
jamesdenton | and the new node didn't rejoin the cluster properly | 19:53 |
jamesdenton | i have notes but it's a mess. would likely be better off recreating the situation and walk through the fix properly | 19:53 |
spatel | hmm... that is interesting.. | 19:57 |
spatel | worth testing in lab and see.. | 19:58 |
spatel | did you try this - https://github.com/amorenoz/ovsdb-mon | 20:02 |
spatel | this is good tool for debug OVN | 20:02 |
spatel | i am playing to play and see how we can make thing easy | 20:02 |
noonedeadpunk | debug OVN sounds like pain.... | 21:18 |
noonedeadpunk | have huge concerns about it operations prespectives... | 21:18 |
noonedeadpunk | oh, btw, spatel do you run ovn already somewhere in prod?:) | 21:18 |
spatel | I told you i am deploying HPC on openstack so that is where i am running OVN | 21:19 |
spatel | it has 30 compute nodes and yes its production | 21:19 |
noonedeadpunk | mmm, and do you use tenant routers there?:) | 21:20 |
spatel | OVN is not that bad only problem is we don't have enough knowledge to debug and fix quickly :( | 21:20 |
spatel | Yes we do tenant router and VxLAN etc.. | 21:21 |
noonedeadpunk | is it breaks ? :D | 21:21 |
noonedeadpunk | So eventually why I'm asking - I'm super unhappy about l3 routers with ovs | 21:21 |
noonedeadpunk | it's really a pita to do maintanences on net nodes | 21:21 |
noonedeadpunk | But ovn doesn't have net nodes as concept:) | 21:22 |
noonedeadpunk | as it's DVR | 21:22 |
noonedeadpunk | But not sure if it made things less painfull | 21:22 |
spatel | Yes OVN doesn't have net node and it works smooth | 21:22 |
spatel | I am running in HA mode so if node is down it will automatically shift load to next machine.. | 21:23 |
noonedeadpunk | like we recently had big issues with l3s jsut because of rabbit fallen apart... | 21:23 |
spatel | what is the connection with rabbit? | 21:23 |
damiandabrowski[m] | noonedeadpunk: thanks for reminding me about it, now I'll have a nightmares :D | 21:24 |
noonedeadpunk | you see ?:) | 21:24 |
noonedeadpunk | ah, always welcome damiandabrowski[m]! | 21:24 |
noonedeadpunk | I actually already know why that all happened :D | 21:24 |
spatel | OVN is very simple compare to traditional L3 deployment in namespace :) | 21:25 |
noonedeadpunk | (kidding) | 21:25 |
noonedeadpunk | spatel: so I mainly concerned if things won't go south for example when ovs package got updated or glibc | 21:25 |
noonedeadpunk | (connection to rabbit btw is that l3 when loosing connection for $timout starts re-syncing and cause tons of other issues) | 21:26 |
spatel | hmm the beauty of OVN is it has zero dependency with rabbitMQ | 21:27 |
spatel | noonedeadpunk agreed upgrading stuff in OVN not great i would say.. but again we need to keep doing otherwise never going to learn :( | 21:27 |
noonedeadpunk | yeah, I know... | 21:27 |
spatel | just need to push hard | 21:28 |
noonedeadpunk | So I was more kind of interested if you're happy overall comparing to your ovs setup with dpdk , blackjack and... you know:) | 21:29 |
spatel | I can see people developing tools to debug OVN so that is good | 21:29 |
noonedeadpunk | well, when having tool is only option for debug... | 21:29 |
spatel | i stopped using dpdk :( i didn't see any performance gain | 21:29 |
noonedeadpunk | so just regular ovs? | 21:30 |
spatel | Yes OVN+OVS | 21:30 |
spatel | I found until unless you run DPDK aware application there is no advantage :( | 21:31 |
noonedeadpunk | I see | 21:31 |
spatel | i did lots of loadtesting and result is same DPDK vs non-DPDK | 21:31 |
spatel | because VM virtio is not going to improve performance just because you are running OVS+DPDK on host | 21:32 |
spatel | no one can beat SRIOV that is fact | 21:32 |
noonedeadpunk | likely also depends on network cards, as modern ones cover gap with offloading | 21:33 |
spatel | noonedeadpunk also i have successfully setup my infiniband network to run MPI job :) | 21:33 |
noonedeadpunk | oh! | 21:33 |
spatel | i did pass through Mellanox to vm and then my VM able to see VF and i successfully run MPI job | 21:33 |
noonedeadpunk | has it worked out as you expected with subnet manager ? :D | 21:34 |
spatel | i am able to get 100Gbps inside VM | 21:34 |
spatel | Yes i configured subnet manager inside infiniband switch :) | 21:34 |
spatel | soon i am going to write up my blog about ib fun | 21:34 |
noonedeadpunk | IB always fun. I'm glad I'm not dealing with it anymore :D | 21:35 |
spatel | I am not doing any IPoIB stuff | 21:35 |
noonedeadpunk | Oh, yes, that's actually nice thing | 21:35 |
noonedeadpunk | as otherwise it's nightmare | 21:35 |
spatel | I am getting 100Gbps speed between two VM so that is awesome :) | 21:35 |
noonedeadpunk | also - don't install any ceph packages with OFED :D | 21:35 |
spatel | hmm what do you mean ? | 21:36 |
noonedeadpunk | yeah, I can imagine. I gad only 40Gbps with rubbish ConnectX-2 and the upgraded to ConnextX-3Pro that were soooo amazing back then:) | 21:37 |
noonedeadpunk | If you upgrade OFED for example, it will drop all ceph packages on host | 21:37 |
spatel | I have ConnectX-5 | 21:37 |
noonedeadpunk | so if you was running OSD node.... | 21:37 |
noonedeadpunk | As there's some dependency on ubuntu between ofed built packages and ceph-common | 21:38 |
noonedeadpunk | maybe it's fixed today... | 21:38 |
spatel | I have noticed when i install OFED then it does compile module for kernel and upgrade kernel also | 21:38 |
noonedeadpunk | yeah, with dkms usually... | 21:38 |
spatel | may be because of that ceph doesn't like it | 21:38 |
noonedeadpunk | it was more about package cross-dependency I guess... but yeah. dunno how valid that is nowadays | 21:39 |
spatel | I don't have ceph storage in this environment (I do have glusterFS ) | 21:39 |
noonedeadpunk | yeah, I do recall | 21:40 |
spatel | in each compute node we have 384GB memory :D | 21:41 |
spatel | I think most costly openstack i have ever build | 21:41 |
noonedeadpunk | heh, yeah, tiny computes :D | 21:41 |
spatel | 15 Tesla GPU each cost $20,000 around | 21:41 |
noonedeadpunk | btw | 21:42 |
noonedeadpunk | you just passthrough tesla inside vms? | 21:42 |
spatel | from 64GB to 384G is big deal for me.. hehe | 21:42 |
spatel | Yes i did passthrough | 21:42 |
noonedeadpunk | and you don't do licensing? OR you don't use cuda? | 21:42 |
spatel | We don't have license :( | 21:43 |
spatel | This HPC is for research and not for public service so we don't need virtualization | 21:43 |
spatel | I can understand for public cloud | 21:43 |
noonedeadpunk | Well it was more about some confusiong coming from https://docs.nvidia.com/grid/13.0/grid-licensing-user-guide/index.html#software-enforcement-grid-licensing | 21:44 |
noonedeadpunk | `When licensing is enforced through software, the performance of the virtual GPU or physical GPU is degraded over time if the VM fails to obtain a license.` | 21:44 |
spatel | hmm | 21:44 |
noonedeadpunk | and jsut in previous paragraph they say `GPU pass through for compute-intensive virtual servers requires vCS` | 21:45 |
spatel | hehe.. | 21:45 |
spatel | do you guys running GPU in your cloud? | 21:45 |
noonedeadpunk | I bet with T4 I was also passing through without any issues, but I;m not sure if they were working inside VMs tbh | 21:45 |
spatel | hmm | 21:46 |
noonedeadpunk | but yeah, likely it only raises when gridd is installed on compute node | 21:46 |
noonedeadpunk | but it's confusing... | 21:47 |
spatel | i am also new in GPU and so learning for me | 21:47 |
spatel | i found but in OSA /etc/hosts file | 21:48 |
spatel | it has container name with _ underscores | 21:48 |
spatel | that is not valid hostname for /etc/hosts file | 21:48 |
spatel | https://paste.opendev.org/show/bMFRNBU2jhEOgbi6k0ov/ | 21:49 |
spatel | not sure if it has been fix in Xena but i am seeing error in wallaby | 21:50 |
noonedeadpunk | we haven;t changed it for a while now | 21:50 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible-openstack_hosts/src/branch/master/tasks/openstack_update_hosts_file.yml < that is responsible for generating | 21:50 |
noonedeadpunk | so likely it comes from `hostvars[item]['ansible_facts']['hostname']` | 21:51 |
spatel | yes.. during debug i saw lots of error in logs saying invalid hostname so i freaked out and noticed this issue | 21:51 |
noonedeadpunk | I hav that everywhere on V as well | 21:51 |
spatel | we should fix it (no rush but) just noice | 21:52 |
noonedeadpunk | I haven't seen issues in logs though( | 21:52 |
spatel | i have seen in /var/log/syslog file | 21:52 |
spatel | may be during reboot of system | 21:53 |
noonedeadpunk | yeah, might be | 21:53 |
spatel | did you work on openstack masakari ? | 21:54 |
noonedeadpunk | I did | 21:54 |
spatel | i am looking for HA solution for some critical application | 21:54 |
noonedeadpunk | I want to add it to current workloads as well | 21:55 |
spatel | How do it work and how good its? | 21:55 |
spatel | last week one of my vm down which breach SLA :( | 21:55 |
noonedeadpunk | well I never really used instancemonitor tbh | 21:55 |
noonedeadpunk | But it would help with that I belive | 21:56 |
spatel | i am planning to play with this in LAB to test our and see how we can use it to improve SLA | 21:56 |
spatel | I don't have shared storage, does it need one? | 21:56 |
spatel | currently i have developed IP_TAKE_OVER.sh script | 21:57 |
noonedeadpunk | all depends, you know. So instancemonitor tracks vm by virsh log and if it sees VM down tries to re-spawn it locally first | 21:57 |
noonedeadpunk | if not - tries evacuate iirc. | 21:57 |
spatel | whenever vm down or anything happened someone from NOC run IP_TAKE_OVER.sh script and attach vif to my standby VM | 21:57 |
spatel | I don't have shared storage for evacuate won't help | 21:57 |
noonedeadpunk | yeah, and hostmonitor actually does just evacuate | 21:58 |
noonedeadpunk | when it finds that compute went down | 21:58 |
noonedeadpunk | I'm not really sure what is the app, but it sounds like you more need loadbalancer? | 21:58 |
spatel | We have very complex application running for our customer which has many components talking to each other | 21:59 |
noonedeadpunk | ah | 21:59 |
spatel | if one of application or vm is down then i need to replace that with with *SAME* ip | 21:59 |
spatel | keeping same IP is very important for us | 22:00 |
spatel | otherwise i have to reboot every single machine in that application | 22:00 |
spatel | Engineering working to fix legacy code but mean time i need some heck :) | 22:01 |
damiandabrowski[m] | maybe You can disable port_security for these ports and replace IP_TAKE_OVER.sh with pacemaker/keepalived? | 22:01 |
damiandabrowski[m] | or better: add more allowed ip pairs | 22:01 |
noonedeadpunk | well masakari is more about revive what you already have | 22:02 |
noonedeadpunk | you can define custom workflows there in case of failovers ofc | 22:02 |
noonedeadpunk | but it needs writing code | 22:02 |
noonedeadpunk | and also it monitors on qemu/libvirt level | 22:03 |
noonedeadpunk | not app inside vm | 22:03 |
spatel | i need to test in lab and see how it can fit in my deployment | 22:03 |
*** dviroel|ruck is now known as dviroel|out | 22:33 | |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_keystone master: Use ssh_keypairs role to generate fernet sync ssh keys https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/827090 | 22:59 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Define X-Forwarded-Proto for keystone https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/828518 | 23:03 |
jrosser | feels like this has strange side effects https://github.com/openstack/openstack-ansible/commit/6e9da4753af83e5b1c34f6ee7c35854c15a72bb0#diff-8c199e8e49846eb701be959066e29d5279fbde49ce2e92ce4a3ca274af3e3d9cR25 | 23:17 |
noonedeadpunk | like what? | 23:17 |
jrosser | makes it hard when writing a role like ssh_keypairs, that it runs the whole play on repo_servers[0] then again on all the rest | 23:18 |
noonedeadpunk | we have run_once somewhere in role? | 23:18 |
jrosser | so the role tasks are not run against all the nodes at the same time | 23:18 |
jrosser | so for example, it deploys the keys and lsyncd onto node[0] | 23:19 |
jrosser | then starts again and puts the keys on nodes [1] and [2] | 23:19 |
jrosser | but somehow on centos lsyncd already fails because it cannot ssh to [1] and [2] when the service starts | 23:20 |
noonedeadpunk | why it starts though? As handlers should basically run after all play done? | 23:20 |
noonedeadpunk | Well , deb needs hook to prevent service from starting | 23:21 |
noonedeadpunk | but centos by default doesn't start in general... | 23:21 |
noonedeadpunk | or we have flush_handlers there somewhere? | 23:22 |
jrosser | handler once https://zuul.opendev.org/t/openstack/build/f3ae5dc016cd423987842b70c8801485/log/job-output.txt#13347-13350 | 23:22 |
jrosser | then later handler twice https://zuul.opendev.org/t/openstack/build/f3ae5dc016cd423987842b70c8801485/log/job-output.txt#13881-13886 | 23:23 |
jrosser | idk why this is different on focal | 23:24 |
jrosser | well, tasks sun in the same order, but end result works | 23:24 |
jrosser | *run | 23:24 |
noonedeadpunk | I wonder why we need flush_handlers at end of tasks/main.yml | 23:26 |
noonedeadpunk | to restart based on serial I guess... | 23:26 |
noonedeadpunk | so like we basically need to run keypairs in pre_tasks for lsyncd | 23:29 |
noonedeadpunk | or just do rolling restart of lsynd in post tasks | 23:29 |
noonedeadpunk | from other side serial in this way doesn't make real sense | 23:30 |
jrosser | currently the data is in role defaults | 23:30 |
jrosser | so playbook pre_tasks would need that moving | 23:31 |
noonedeadpunk | as if we think about it, we can have several group of hosts for repo | 23:31 |
noonedeadpunk | (if we have multiple OS) | 23:32 |
noonedeadpunk | so 1, 100% is just wrong | 23:32 |
noonedeadpunk | but also I think we miss smth like that https://opendev.org/openstack/openstack-ansible/src/branch/stable/rocky/playbooks/repo-build.yml#L33-L41 to set group of repo containers per OS | 23:33 |
noonedeadpunk | maybe we should just do rolling restart of lsync in post-tasks? | 23:35 |
jrosser | well right now its only on [0] | 23:35 |
jrosser | perhaps the flush handlers is wrong | 23:36 |
noonedeadpunk | mmmm.... | 23:36 |
jrosser | but i also see the idea there when using serial | 23:36 |
noonedeadpunk | what if centos systemd unit missing restart on failure? | 23:36 |
jrosser | oh maybe | 23:37 |
noonedeadpunk | so we can just add override | 23:37 |
jrosser | behaviour is just different on focal https://zuul.opendev.org/t/openstack/build/c490c74c8c774d6490685a498f04bedf/log/logs/openstack/aio1_repo_container-a31176e7/lsyncd/lsyncd.log.txt | 23:38 |
jrosser | it doesnt bail out on error | 23:38 |
noonedeadpunk | so it just retrying... | 23:39 |
noonedeadpunk | `Terminating since "insist" is not set` hm. | 23:40 |
jrosser | helpful https://github.com/lsyncd/lsyncd/issues/632 | 23:41 |
noonedeadpunk | some russian blogbost suggesting adding insist = true, in /etc/lsyncd.conf | 23:41 |
noonedeadpunk | specifically for centos btw | 23:42 |
jrosser | `Continues startup even if a startup rsync cannot connect.` | 23:42 |
jrosser | looks like what we need | 23:42 |
noonedeadpunk | which is exactly the case | 23:43 |
noonedeadpunk | so somewhere here https://opendev.org/openstack/openstack-ansible-repo_server/src/branch/master/templates/lsyncd.lua.j2#L611 ? | 23:44 |
jrosser | huh https://github.com/openstack/openstack-ansible-repo_server/blob/master/templates/lsyncd.defaults.j2#L2 | 23:45 |
noonedeadpunk | but for redhat we pass only config | 23:46 |
noonedeadpunk | so that explains :) | 23:46 |
jrosser | does DAEMON_ARGS even make sense with systemd? | 23:47 |
noonedeadpunk | considering -insist applies for ubuntu.... | 23:48 |
jrosser | /etc/lsyncd.conf exists on centos and we don't try to manage it | 23:48 |
noonedeadpunk | I'd tried to move to config... | 23:48 |
noonedeadpunk | but we pass another conf file? | 23:49 |
noonedeadpunk | https://github.com/openstack/openstack-ansible-repo_server/blob/master/templates/lsyncd.defaults.j2#L4 | 23:49 |
jrosser | the lua file | 23:50 |
noonedeadpunk | How is systemd inotify thing goes ?:D | 23:50 |
jrosser | haha -ENOTIME | 23:50 |
jrosser | like this is turning into yak shaving again | 23:50 |
noonedeadpunk | yeah, but we define LSYNCD_OPTIONS to repo_lsyncd_config_file which should just replace /etc/lsyncd.conf with our PATH | 23:50 |
noonedeadpunk | so i bet it's not taken into account | 23:51 |
jrosser | is that different though https://github.com/openstack/openstack-ansible-repo_server/blob/master/vars/debian.yml#L30 | 23:51 |
jrosser | the horrid horrid file here https://github.com/openstack/openstack-ansible-repo_server/blob/master/templates/lsyncd.lua.j2 | 23:52 |
noonedeadpunk | except for debian we don't override path I believe | 23:52 |
noonedeadpunk | as I said - we should put insist here https://opendev.org/openstack/openstack-ansible-repo_server/src/branch/master/templates/lsyncd.lua.j2#L611 | 23:52 |
jrosser | it still ships an init script /o\ https://packages.ubuntu.com/focal/amd64/lsyncd/filelist | 23:52 |
noonedeadpunk | I guess | 23:52 |
noonedeadpunk | no wonder - last lsync release was years ago | 23:53 |
noonedeadpunk | to be correct almost 4 years ago | 23:53 |
jrosser | oh i see what you mean now | 23:54 |
* jrosser didnt spot you could put config in the lua file | 23:54 | |
noonedeadpunk | not sure if we should drop defaults for ubuntu... | 23:55 |
* noonedeadpunk is quite drunk and clock shows almost 2am.... | 23:55 | |
jrosser | yeah late | 23:55 |
* jrosser sleeps | 23:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!