Wednesday, 2022-02-09

opendevreview	Merged openstack/openstack-ansible-galera_server master: Convert xinetd clustercheck to systemd socket service https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/824042	00:44
*** dviroel\|ruck\|afk is now known as dviroel\|ruck		00:48
*** dviroel\|ruck is now known as dviroel\|ruck\|out		00:57
*** dviroel\|ruck\|out is now known as dviroel\|out		00:57
opendevreview	Bhagyashri Shewale proposed openstack/openstack-ansible-os_tempest master: Move zuul jobs layout to centos9 only for master branch https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/828449	03:27
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Drop nova_glance_api_servers variable https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/828460	06:55
jrosser	calico is broken on victoria "oslo_config.cfg.NoSuchOptError: no such option report_interval in group [AGENT]"	07:00
noonedeadpunk	I'd say it's broken everywhere. Just NV now	07:02
noonedeadpunk	I was trying to dig one day but didn't find where it get's (or it was some lazy loading with no way to overcome)	07:03
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_neutron stable/victoria: Remove legacy centos-8 jobs https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/827483	07:03
jrosser	maybe time to think if we keep support or not	07:04
noonedeadpunk	the only occurance was https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/templates/metering_agent.ini.j2#L15 but even dropping this file didn't help	07:04
jrosser	it is not really cmopatible with internal VIP ssl either	07:04
noonedeadpunk	I tend to agree here	07:04
jrosser	becasue of instances wanting metadata on http and calico not running an haproxy for metadata	07:04
noonedeadpunk	I think ovn kind of same?	07:05
jrosser	potentially, i really dont know much about it	07:06
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Rename RBD cinder backend https://review.opendev.org/c/openstack/openstack-ansible/+/828463	07:11
noonedeadpunk	but calico interest is really limited I believe.	07:11
noonedeadpunk	well, evrardjp was talking about it recently, so likely need to double check before saying for sure :)	07:12
jrosser	ok well like all this stuff it needs maintainance effort	07:35
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Remove secure_proxy_ssl_header logic https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/828467	07:42
noonedeadpunk	I think this needs to be double checked as maybe we need to jsut apply logic in other place ^	07:42
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Switch keystone logging to syslog https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/828469	07:58
jrosser	i'm getting good value out of the infra scenario tests for the ssh keypairs stuff	08:34
jrosser	its already testing the repo sync as part of that so shows up some bugs on centos-8	08:35
noonedeadpunk	who was surprised about centos-related hickups	08:40
noonedeadpunk	*hiccups	08:40
*** sshnaidm\|afk is now known as sshnaidm		08:54
opendevreview	Merged openstack/openstack-ansible-openstack_hosts stable/victoria: Assume centos version is at least 8.3 https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/828346	10:06
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Use uwsgi role for keystone https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/828510	10:10
opendevreview	Merged openstack/openstack-ansible-lxc_hosts stable/xena: Replace CentOS 8 with Stream jobs https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/828095	10:21
opendevreview	Merged openstack/openstack-ansible-lxc_hosts stable/wallaby: Ensure that the legacy network-scripts package is present https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/828236	10:27
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_horizon master: Move Listen definition to VHosts https://review.opendev.org/c/openstack/openstack-ansible-os_horizon/+/828515	10:49
opendevreview	Merged openstack/openstack-ansible stable/xena: Fix additional facts gathering in ceph-install.yml https://review.opendev.org/c/openstack/openstack-ansible/+/828392	11:10
*** dviroel\|out is now known as dviroel\|ruck		11:10
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Define X-Forwarded-Proto for keystone https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/828518	11:19
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Drop ProxyPass out of VHost https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/828519	11:44
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_horizon master: Move Listen definition to VHosts https://review.opendev.org/c/openstack/openstack-ansible-os_horizon/+/828515	11:49
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Do not run rsyslog against RabbitMQ https://review.opendev.org/c/openstack/openstack-ansible/+/826347	12:29
noonedeadpunk	would be awesome to get another review on https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/826338/ :)	12:30
*** akahat\|rover is now known as akahat\|PTO		14:11
jrosser	is this a thing? lsyncd[7554]: rsync: failed to open "/var/www/repo/repo_prepost_cmd.sh", continuing: Permission denied (13)	14:23
opendevreview	Merged openstack/openstack-ansible-lxc_hosts stable/wallaby: Replace CentOS 8 with Stream jobs https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/827966	14:28
jrosser	oh thats confusing, lsyncd writes some stuff to the journal and the most of it to /var/log/lsyncd/lsyncd.log	14:29
noonedeadpunk	whaaat	14:54
jamesdenton	good morning	14:55
jrosser	o/ hello	14:55
damiandabrowski[m]	hey!	14:55
jamesdenton	my bouncer died, and i didn't really notice	14:55
jamesdenton	:\|	14:56
jamesdenton	anything new?	14:57
jrosser	well i would make some centos related comment, but thats just nothing new :)	14:59
jrosser	this maybe https://review.opendev.org/c/openstack/openstack-ansible/+/828386	14:59
jrosser	^ that blew up quite badly on stable branches	14:59
jamesdenton	hmm	15:00
noonedeadpunk	should we wait for master patch before merging it?	15:02
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible stable/xena: Remove enablement of neutron tempest plugin in scenario templates https://review.opendev.org/c/openstack/openstack-ansible/+/828548	15:02
jrosser	tada!	15:02
jamesdenton	was it some particular test causing issues?	15:03
noonedeadpunk	it was like neutron-lib and tempest plugin being incompatible I guess	15:03
noonedeadpunk	as it didn't even come to tests)	15:04
jrosser	it installed master version of the plugin which then tries to test non existing things in older neutron iirc	15:04
jamesdenton	i don't really know how their tags work, seems like the latest one stops ~train	15:05
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_neutron stable/xena: DNM - test https://review.opendev.org/c/openstack/openstack-ansible/+/828548 https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/828549	15:07
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible stable/xena: Remove enablement of neutron tempest plugin in scenario templates https://review.opendev.org/c/openstack/openstack-ansible/+/828548	15:09
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible stable/wallaby: Remove enablement of neutron tempest plugin in scenario templates https://review.opendev.org/c/openstack/openstack-ansible/+/828551	15:10
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_neutron stable/xena: DNM - test https://review.opendev.org/c/openstack/openstack-ansible/+/828548 https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/828549	15:10
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_neutron stable/wallaby: DNM - test https://review.opendev.org/c/openstack/openstack-ansible/+/828551 https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/828552	15:12
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove enablement of neutron tempest plugin in scenario templates https://review.opendev.org/c/openstack/openstack-ansible/+/828553	15:47
spatel	jamesdenton around?	16:00
jamesdenton	yes	16:00
spatel	I have question related STP enable/disable on bridge with ubuntu netplan - https://paste.opendev.org/show/bzIbTv4XYyKh6oySYFfI/	16:01
spatel	brctl show - saying STP is not enable	16:01
spatel	netplan - default config saying stp is enabled	16:01
spatel	netplan doc saying STP is enabled by default	16:02
spatel	how should i prove that its really really disabled	16:02
jamesdenton	hmm, you might try 'bridge -d link show <br>'	16:03
jamesdenton	i believe 'state' reflects STP state	16:04
spatel	here is the output - https://paste.opendev.org/show/bFHvFO0JBEgn5LzVmOKn/	16:05
spatel	trying to understand what flag indicate stp is active	16:06
spatel	learning on flood on ???	16:06
spatel	state forwarding priority 32 cost 2	16:07
spatel	does that means STP is enabled?	16:07
spatel	jamesdenton we had network loop and i believe this could be the issue..	16:08
NeilHanlon	spatel: if state is anything but 0 (DISABLED), then STP is enabled	16:12
NeilHanlon	`state forwarding` is spanning tree forwarding	16:12
spatel	hmm very odd then..	16:13
NeilHanlon	if you're using a bridge with two interfaces, or if you bridged two interfaces on the same LAN, then you can cause loops, yes	16:14
spatel	neutron create tap interface they are always showing STP on	16:14
spatel	i do have bond interface active-backup mode	16:14
NeilHanlon	The best thing to do is to never flood BPDUs to the devices unless you have to for some reason	16:14
spatel	what is the best practice to disable STP for everything on compute node?	16:15
spatel	if i disable STP on bond0 then it should disable underlying bridges/vlans or not?	16:18
jrosser	heres a little something we cooked up with openstack-ansible https://superuser.openstack.org/articles/environmental-reporting-dashboards-for-openstack-from-bbc-rd/	16:20
jamesdenton	spatel it's probably in your best interest to leave the default (stp on)	16:21
spatel	hmm	16:22
jamesdenton	i wouldn't trust brctl for accurate info, i think it was deprecated a while back in favor of iproute2 (bridge)	16:22
spatel	we have noticed one of our compute node lock up because of memory and same time switch block entire vlan on that rack	16:23
spatel	now i started thinking about STP in bridge.. may be it created some kind of loop because i have bond interface and if stp is enable then it will do damage correct?	16:23
jamesdenton	nice article jrosser	16:24
spatel	i don't know i am just making up some story	16:24
jrosser	jamesdenton: thankyou :)	16:24
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add ssh_keypairs role https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/825113	16:40
NeilHanlon	spatel: linux will dutifully process and flood Spanning Tree Bridge Protocol Data Units (BPDUs) out other interfaces in a bridge--that's what it's supposed to do because it has to ensure that the data is flooded through the entire tree. I've seen (and caused) broadcast storms due to this exact thing ;)	16:40
opendevreview	Merged openstack/openstack-ansible master: Remove symlinking of selinux libraries into the ansible-runtime venv https://review.opendev.org/c/openstack/openstack-ansible/+/827556	16:40
spatel	I do have BPDU-Protection on my edge interface of switch but still no sure what happened to that box when i crashed	16:41
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-repo_server master: Use ssh_keypairs role to generate keys for repo sync https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/827100	16:42
spatel	looking for some kernel watchdog config if kernel shutdown machine during any crash then it would be good	16:42
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_nova master: Use ssh_keypairs role to generate cold migration ssh keys https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/825306	16:44
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_keystone master: Use ssh_keypairs role to generate fernet sync ssh keys https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/827090	16:45
jamesdenton	sorry spatel - just finished digging myself out of a hole i created with OVN.	19:46
spatel	:) tell me the story	19:47
spatel	jamesdenton ^	19:48
spatel	I am running OVN on production so i would like to know that	19:49
jamesdenton	i swapped out a node but kept the same name/ips/etc	19:52
jamesdenton	chassis id changed	19:53
jamesdenton	and the new node didn't rejoin the cluster properly	19:53
jamesdenton	i have notes but it's a mess. would likely be better off recreating the situation and walk through the fix properly	19:53
spatel	hmm... that is interesting..	19:57
spatel	worth testing in lab and see..	19:58
spatel	did you try this - https://github.com/amorenoz/ovsdb-mon	20:02
spatel	this is good tool for debug OVN	20:02
spatel	i am playing to play and see how we can make thing easy	20:02
noonedeadpunk	debug OVN sounds like pain....	21:18
noonedeadpunk	have huge concerns about it operations prespectives...	21:18
noonedeadpunk	oh, btw, spatel do you run ovn already somewhere in prod?:)	21:18
spatel	I told you i am deploying HPC on openstack so that is where i am running OVN	21:19
spatel	it has 30 compute nodes and yes its production	21:19
noonedeadpunk	mmm, and do you use tenant routers there?:)	21:20
spatel	OVN is not that bad only problem is we don't have enough knowledge to debug and fix quickly :(	21:20
spatel	Yes we do tenant router and VxLAN etc..	21:21
noonedeadpunk	is it breaks ? :D	21:21
noonedeadpunk	So eventually why I'm asking - I'm super unhappy about l3 routers with ovs	21:21
noonedeadpunk	it's really a pita to do maintanences on net nodes	21:21
noonedeadpunk	But ovn doesn't have net nodes as concept:)	21:22
noonedeadpunk	as it's DVR	21:22
noonedeadpunk	But not sure if it made things less painfull	21:22
spatel	Yes OVN doesn't have net node and it works smooth	21:22
spatel	I am running in HA mode so if node is down it will automatically shift load to next machine..	21:23
noonedeadpunk	like we recently had big issues with l3s jsut because of rabbit fallen apart...	21:23
spatel	what is the connection with rabbit?	21:23
damiandabrowski[m]	noonedeadpunk: thanks for reminding me about it, now I'll have a nightmares :D	21:24
noonedeadpunk	you see ?:)	21:24
noonedeadpunk	ah, always welcome damiandabrowski[m]!	21:24
noonedeadpunk	I actually already know why that all happened :D	21:24
spatel	OVN is very simple compare to traditional L3 deployment in namespace :)	21:25
noonedeadpunk	(kidding)	21:25
noonedeadpunk	spatel: so I mainly concerned if things won't go south for example when ovs package got updated or glibc	21:25
noonedeadpunk	(connection to rabbit btw is that l3 when loosing connection for $timout starts re-syncing and cause tons of other issues)	21:26
spatel	hmm the beauty of OVN is it has zero dependency with rabbitMQ	21:27
spatel	noonedeadpunk agreed upgrading stuff in OVN not great i would say.. but again we need to keep doing otherwise never going to learn :(	21:27
noonedeadpunk	yeah, I know...	21:27
spatel	just need to push hard	21:28
noonedeadpunk	So I was more kind of interested if you're happy overall comparing to your ovs setup with dpdk , blackjack and... you know:)	21:29
spatel	I can see people developing tools to debug OVN so that is good	21:29
noonedeadpunk	well, when having tool is only option for debug...	21:29
spatel	i stopped using dpdk :( i didn't see any performance gain	21:29
noonedeadpunk	so just regular ovs?	21:30
spatel	Yes OVN+OVS	21:30
spatel	I found until unless you run DPDK aware application there is no advantage :(	21:31
noonedeadpunk	I see	21:31
spatel	i did lots of loadtesting and result is same DPDK vs non-DPDK	21:31
spatel	because VM virtio is not going to improve performance just because you are running OVS+DPDK on host	21:32
spatel	no one can beat SRIOV that is fact	21:32
noonedeadpunk	likely also depends on network cards, as modern ones cover gap with offloading	21:33
spatel	noonedeadpunk also i have successfully setup my infiniband network to run MPI job :)	21:33
noonedeadpunk	oh!	21:33
spatel	i did pass through Mellanox to vm and then my VM able to see VF and i successfully run MPI job	21:33
noonedeadpunk	has it worked out as you expected with subnet manager ? :D	21:34
spatel	i am able to get 100Gbps inside VM	21:34
spatel	Yes i configured subnet manager inside infiniband switch :)	21:34
spatel	soon i am going to write up my blog about ib fun	21:34
noonedeadpunk	IB always fun. I'm glad I'm not dealing with it anymore :D	21:35
spatel	I am not doing any IPoIB stuff	21:35
noonedeadpunk	Oh, yes, that's actually nice thing	21:35
noonedeadpunk	as otherwise it's nightmare	21:35
spatel	I am getting 100Gbps speed between two VM so that is awesome :)	21:35
noonedeadpunk	also - don't install any ceph packages with OFED :D	21:35
spatel	hmm what do you mean ?	21:36
noonedeadpunk	yeah, I can imagine. I gad only 40Gbps with rubbish ConnectX-2 and the upgraded to ConnextX-3Pro that were soooo amazing back then:)	21:37
noonedeadpunk	If you upgrade OFED for example, it will drop all ceph packages on host	21:37
spatel	I have ConnectX-5	21:37
noonedeadpunk	so if you was running OSD node....	21:37
noonedeadpunk	As there's some dependency on ubuntu between ofed built packages and ceph-common	21:38
noonedeadpunk	maybe it's fixed today...	21:38
spatel	I have noticed when i install OFED then it does compile module for kernel and upgrade kernel also	21:38
noonedeadpunk	yeah, with dkms usually...	21:38
spatel	may be because of that ceph doesn't like it	21:38
noonedeadpunk	it was more about package cross-dependency I guess... but yeah. dunno how valid that is nowadays	21:39
spatel	I don't have ceph storage in this environment (I do have glusterFS )	21:39
noonedeadpunk	yeah, I do recall	21:40
spatel	in each compute node we have 384GB memory :D	21:41
spatel	I think most costly openstack i have ever build	21:41
noonedeadpunk	heh, yeah, tiny computes :D	21:41
spatel	15 Tesla GPU each cost $20,000 around	21:41
noonedeadpunk	btw	21:42
noonedeadpunk	you just passthrough tesla inside vms?	21:42
spatel	from 64GB to 384G is big deal for me.. hehe	21:42
spatel	Yes i did passthrough	21:42
noonedeadpunk	and you don't do licensing? OR you don't use cuda?	21:42
spatel	We don't have license :(	21:43
spatel	This HPC is for research and not for public service so we don't need virtualization	21:43
spatel	I can understand for public cloud	21:43
noonedeadpunk	Well it was more about some confusiong coming from https://docs.nvidia.com/grid/13.0/grid-licensing-user-guide/index.html#software-enforcement-grid-licensing	21:44
noonedeadpunk	`When licensing is enforced through software, the performance of the virtual GPU or physical GPU is degraded over time if the VM fails to obtain a license.`	21:44
spatel	hmm	21:44
noonedeadpunk	and jsut in previous paragraph they say `GPU pass through for compute-intensive virtual servers requires vCS`	21:45
spatel	hehe..	21:45
spatel	do you guys running GPU in your cloud?	21:45
noonedeadpunk	I bet with T4 I was also passing through without any issues, but I;m not sure if they were working inside VMs tbh	21:45
spatel	hmm	21:46
noonedeadpunk	but yeah, likely it only raises when gridd is installed on compute node	21:46
noonedeadpunk	but it's confusing...	21:47
spatel	i am also new in GPU and so learning for me	21:47
spatel	i found but in OSA /etc/hosts file	21:48
spatel	it has container name with _ underscores	21:48
spatel	that is not valid hostname for /etc/hosts file	21:48
spatel	https://paste.opendev.org/show/bMFRNBU2jhEOgbi6k0ov/	21:49
spatel	not sure if it has been fix in Xena but i am seeing error in wallaby	21:50
noonedeadpunk	we haven;t changed it for a while now	21:50
noonedeadpunk	https://opendev.org/openstack/openstack-ansible-openstack_hosts/src/branch/master/tasks/openstack_update_hosts_file.yml < that is responsible for generating	21:50
noonedeadpunk	so likely it comes from `hostvars[item]['ansible_facts']['hostname']`	21:51
spatel	yes.. during debug i saw lots of error in logs saying invalid hostname so i freaked out and noticed this issue	21:51
noonedeadpunk	I hav that everywhere on V as well	21:51
spatel	we should fix it (no rush but) just noice	21:52
noonedeadpunk	I haven't seen issues in logs though(	21:52
spatel	i have seen in /var/log/syslog file	21:52
spatel	may be during reboot of system	21:53
noonedeadpunk	yeah, might be	21:53
spatel	did you work on openstack masakari ?	21:54
noonedeadpunk	I did	21:54
spatel	i am looking for HA solution for some critical application	21:54
noonedeadpunk	I want to add it to current workloads as well	21:55
spatel	How do it work and how good its?	21:55
spatel	last week one of my vm down which breach SLA :(	21:55
noonedeadpunk	well I never really used instancemonitor tbh	21:55
noonedeadpunk	But it would help with that I belive	21:56
spatel	i am planning to play with this in LAB to test our and see how we can use it to improve SLA	21:56
spatel	I don't have shared storage, does it need one?	21:56
spatel	currently i have developed IP_TAKE_OVER.sh script	21:57
noonedeadpunk	all depends, you know. So instancemonitor tracks vm by virsh log and if it sees VM down tries to re-spawn it locally first	21:57
noonedeadpunk	if not - tries evacuate iirc.	21:57
spatel	whenever vm down or anything happened someone from NOC run IP_TAKE_OVER.sh script and attach vif to my standby VM	21:57
spatel	I don't have shared storage for evacuate won't help	21:57
noonedeadpunk	yeah, and hostmonitor actually does just evacuate	21:58
noonedeadpunk	when it finds that compute went down	21:58
noonedeadpunk	I'm not really sure what is the app, but it sounds like you more need loadbalancer?	21:58
spatel	We have very complex application running for our customer which has many components talking to each other	21:59
noonedeadpunk	ah	21:59
spatel	if one of application or vm is down then i need to replace that with with SAME ip	21:59
spatel	keeping same IP is very important for us	22:00
spatel	otherwise i have to reboot every single machine in that application	22:00
spatel	Engineering working to fix legacy code but mean time i need some heck :)	22:01
damiandabrowski[m]	maybe You can disable port_security for these ports and replace IP_TAKE_OVER.sh with pacemaker/keepalived?	22:01
damiandabrowski[m]	or better: add more allowed ip pairs	22:01
noonedeadpunk	well masakari is more about revive what you already have	22:02
noonedeadpunk	you can define custom workflows there in case of failovers ofc	22:02
noonedeadpunk	but it needs writing code	22:02
noonedeadpunk	and also it monitors on qemu/libvirt level	22:03
noonedeadpunk	not app inside vm	22:03
spatel	i need to test in lab and see how it can fit in my deployment	22:03
*** dviroel\|ruck is now known as dviroel\|out		22:33
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_keystone master: Use ssh_keypairs role to generate fernet sync ssh keys https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/827090	22:59
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Define X-Forwarded-Proto for keystone https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/828518	23:03
jrosser	feels like this has strange side effects https://github.com/openstack/openstack-ansible/commit/6e9da4753af83e5b1c34f6ee7c35854c15a72bb0#diff-8c199e8e49846eb701be959066e29d5279fbde49ce2e92ce4a3ca274af3e3d9cR25	23:17
noonedeadpunk	like what?	23:17
jrosser	makes it hard when writing a role like ssh_keypairs, that it runs the whole play on repo_servers[0] then again on all the rest	23:18
noonedeadpunk	we have run_once somewhere in role?	23:18
jrosser	so the role tasks are not run against all the nodes at the same time	23:18
jrosser	so for example, it deploys the keys and lsyncd onto node[0]	23:19
jrosser	then starts again and puts the keys on nodes [1] and [2]	23:19
jrosser	but somehow on centos lsyncd already fails because it cannot ssh to [1] and [2] when the service starts	23:20
noonedeadpunk	why it starts though? As handlers should basically run after all play done?	23:20
noonedeadpunk	Well , deb needs hook to prevent service from starting	23:21
noonedeadpunk	but centos by default doesn't start in general...	23:21
noonedeadpunk	or we have flush_handlers there somewhere?	23:22
jrosser	handler once https://zuul.opendev.org/t/openstack/build/f3ae5dc016cd423987842b70c8801485/log/job-output.txt#13347-13350	23:22
jrosser	then later handler twice https://zuul.opendev.org/t/openstack/build/f3ae5dc016cd423987842b70c8801485/log/job-output.txt#13881-13886	23:23
jrosser	idk why this is different on focal	23:24
jrosser	well, tasks sun in the same order, but end result works	23:24
jrosser	*run	23:24
noonedeadpunk	I wonder why we need flush_handlers at end of tasks/main.yml	23:26
noonedeadpunk	to restart based on serial I guess...	23:26
noonedeadpunk	so like we basically need to run keypairs in pre_tasks for lsyncd	23:29
noonedeadpunk	or just do rolling restart of lsynd in post tasks	23:29
noonedeadpunk	from other side serial in this way doesn't make real sense	23:30
jrosser	currently the data is in role defaults	23:30
jrosser	so playbook pre_tasks would need that moving	23:31
noonedeadpunk	as if we think about it, we can have several group of hosts for repo	23:31
noonedeadpunk	(if we have multiple OS)	23:32
noonedeadpunk	so 1, 100% is just wrong	23:32
noonedeadpunk	but also I think we miss smth like that https://opendev.org/openstack/openstack-ansible/src/branch/stable/rocky/playbooks/repo-build.yml#L33-L41 to set group of repo containers per OS	23:33
noonedeadpunk	maybe we should just do rolling restart of lsync in post-tasks?	23:35
jrosser	well right now its only on [0]	23:35
jrosser	perhaps the flush handlers is wrong	23:36
noonedeadpunk	mmmm....	23:36
jrosser	but i also see the idea there when using serial	23:36
noonedeadpunk	what if centos systemd unit missing restart on failure?	23:36
jrosser	oh maybe	23:37
noonedeadpunk	so we can just add override	23:37
jrosser	behaviour is just different on focal https://zuul.opendev.org/t/openstack/build/c490c74c8c774d6490685a498f04bedf/log/logs/openstack/aio1_repo_container-a31176e7/lsyncd/lsyncd.log.txt	23:38
jrosser	it doesnt bail out on error	23:38
noonedeadpunk	so it just retrying...	23:39
noonedeadpunk	`Terminating since "insist" is not set` hm.	23:40
jrosser	helpful https://github.com/lsyncd/lsyncd/issues/632	23:41
noonedeadpunk	some russian blogbost suggesting adding insist = true, in /etc/lsyncd.conf	23:41
noonedeadpunk	specifically for centos btw	23:42
jrosser	`Continues startup even if a startup rsync cannot connect.`	23:42
jrosser	looks like what we need	23:42
noonedeadpunk	which is exactly the case	23:43
noonedeadpunk	so somewhere here https://opendev.org/openstack/openstack-ansible-repo_server/src/branch/master/templates/lsyncd.lua.j2#L611 ?	23:44
jrosser	huh https://github.com/openstack/openstack-ansible-repo_server/blob/master/templates/lsyncd.defaults.j2#L2	23:45
noonedeadpunk	but for redhat we pass only config	23:46
noonedeadpunk	so that explains :)	23:46
jrosser	does DAEMON_ARGS even make sense with systemd?	23:47
noonedeadpunk	considering -insist applies for ubuntu....	23:48
jrosser	/etc/lsyncd.conf exists on centos and we don't try to manage it	23:48
noonedeadpunk	I'd tried to move to config...	23:48
noonedeadpunk	but we pass another conf file?	23:49
noonedeadpunk	https://github.com/openstack/openstack-ansible-repo_server/blob/master/templates/lsyncd.defaults.j2#L4	23:49
jrosser	the lua file	23:50
noonedeadpunk	How is systemd inotify thing goes ?:D	23:50
jrosser	haha -ENOTIME	23:50
jrosser	like this is turning into yak shaving again	23:50
noonedeadpunk	yeah, but we define LSYNCD_OPTIONS to repo_lsyncd_config_file which should just replace /etc/lsyncd.conf with our PATH	23:50
noonedeadpunk	so i bet it's not taken into account	23:51
jrosser	is that different though https://github.com/openstack/openstack-ansible-repo_server/blob/master/vars/debian.yml#L30	23:51
jrosser	the horrid horrid file here https://github.com/openstack/openstack-ansible-repo_server/blob/master/templates/lsyncd.lua.j2	23:52
noonedeadpunk	except for debian we don't override path I believe	23:52
noonedeadpunk	as I said - we should put insist here https://opendev.org/openstack/openstack-ansible-repo_server/src/branch/master/templates/lsyncd.lua.j2#L611	23:52
jrosser	it still ships an init script /o\ https://packages.ubuntu.com/focal/amd64/lsyncd/filelist	23:52
noonedeadpunk	I guess	23:52
noonedeadpunk	no wonder - last lsync release was years ago	23:53
noonedeadpunk	to be correct almost 4 years ago	23:53
jrosser	oh i see what you mean now	23:54
* jrosser didnt spot you could put config in the lua file		23:54
noonedeadpunk	not sure if we should drop defaults for ubuntu...	23:55
* noonedeadpunk is quite drunk and clock shows almost 2am....		23:55
jrosser	yeah late	23:55
* jrosser sleeps		23:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!