Tuesday, 2022-07-26

*** ysandeep\|out is now known as ysandeep		01:55
*** ysandeep is now known as ysandeep\|afk		03:21
*** ysandeep\|afk is now known as ysandeep		05:03
jrosser	morning	07:38
*** ysandeep is now known as ysandeep\|afk		08:02
noonedeadpunk	morning!	08:07
damiandabrowski	hi!	09:10
*** tosky_ is now known as tosky		09:10
*** dviroel__ is now known as dviroel		11:05
mgariepy	good morning everyone	11:39
*** ysandeep\|afk is now known as ysandeep		12:30
mgariepy	hmm ? ERROR! couldn't resolve module/action 'openstack.cloud.os_auth'. This often indicates a misspelling, missing collection, or incorrect module path.	14:44
opendevreview	Marc Gariépy proposed openstack/openstack-ansible master: Fix cloud.openstack.auth module https://review.opendev.org/c/openstack/openstack-ansible/+/851038	14:47
noonedeadpunk	do we need to backport that? ^	14:49
jrosser	whoops	14:50
mgariepy	maybe we need to start managing the version or the module we install ?	14:50
jrosser	i think the zuul stuff is set up now to run the infra jobs when those playbooks are touched	14:50
jrosser	we didnt do that before and could merge broken stuff	14:50
jrosser	huh interesting https://github.com/openstack/openstack-ansible/blob/master/zuul.d/jobs.yaml#L267-L277	14:52
jrosser	yes so we don't actually run that anywhere in CI?	14:53
*** dviroel is now known as dviroel\|lunch		14:55
opendevreview	Marc Gariépy proposed openstack/openstack-ansible master: Set the number of threads for processes to 2 https://review.opendev.org/c/openstack/openstack-ansible/+/850942	14:56
jrosser	also why are we running stream-9 distro jobs	14:59
jrosser	that looks like a mistake	14:59
*** ysandeep is now known as ysandeep\|out		15:00
noonedeadpunk	#startmeeting openstack_ansible_meeting	15:00
opendevmeet	Meeting started Tue Jul 26 15:00:59 2022 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.	15:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	15:01
opendevmeet	The meeting name has been set to 'openstack_ansible_meeting'	15:01
noonedeadpunk	#topic office hours	15:01
mgariepy	right on time :D	15:01
noonedeadpunk	so for infra jobs yeah, we probably should also add ansible-role-requirements and ansible-collection-requirements	15:02
noonedeadpunk	to "files"	15:02
noonedeadpunk	stream-9 distro jobs - well, I intended to fix them one day	15:03
noonedeadpunk	they're NV as of today, so why not. But I had a dependency on smth to get them working, can't really recall	15:03
* noonedeadpunk fixed alarm clock :D		15:04
damiandabrowski	:D	15:04
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible master: Remove centos-9-stream distro jobs https://review.opendev.org/c/openstack/openstack-ansible/+/851041	15:04
jrosser	o/ hello	15:05
noonedeadpunk	Will try to spend some time on them maybe next week	15:05
noonedeadpunk	oh, ok, so the reason why distro jobs are failing, is that zed packages are not published yet	15:07
noonedeadpunk	and we need openstacksdk 0.99 with our collection versions	15:07
noonedeadpunk	So. I was looking today at our lxc jobs. And wondering if we should try switching to overlayfs instead of dir by default?	15:09
noonedeadpunk	maybe it's more a ptg topic though	15:09
jrosser	is there a benefit?	15:10
noonedeadpunk	But I guess as of today, all supported distros (well, except focal - do we support it?) have kernel 3.13 by default	15:10
noonedeadpunk	Well, yes? You don't copy root each time, but just use cow and snapshot?	15:10
noonedeadpunk	I have no idea if it's working at all tbh as of today, as I haven't touched that part for real	15:11
noonedeadpunk	but you should save quite some space with that on controllers	15:11
jrosser	i think we have good tests for all of these, so it should be easy to see	15:12
noonedeadpunk	Each container is ~500MB	15:12
noonedeadpunk	And diff would be like 20MB tops	15:12
noonedeadpunk	Nah, more. Image is 423MB	15:12
noonedeadpunk	So we would save 423Mb per each container that runs	15:12
noonedeadpunk	Another thing was, that we totally forget to mention during previous PTG, is getting rid of dash-sparated groups and use underscore only	15:13
noonedeadpunk	We're postponing this for quite a while now to have that said...	15:13
noonedeadpunk	But it might be not that big chunk of work.	15:14
noonedeadpunk	I will try to look into that, but not now for sure	15:14
noonedeadpunk	Right now I'm trying to work on AZ concept and what needs to be done from OSA side to get this working. Will publish a scenario to docs once done if there're no objections	15:15
noonedeadpunk	env.d overrides are quite massive as of to day to have that said	15:15
noonedeadpunk	*today	15:15
jrosser	i wonder if we prune journals properly	15:16
opendevreview	Marc Gariépy proposed openstack/openstack-ansible master: Fix cloud.openstack.* modules https://review.opendev.org/c/openstack/openstack-ansible/+/851038	15:16
jrosser	i am already using COW on my controllers (zfs) and the base size of the container is like tiny compared to the total	15:16
noonedeadpunk	ah. well, then there's no profit in overlayfs for you :D	15:17
noonedeadpunk	But I kind of wonder if there're consequences to use overlayfs with other bind-mounts, as it kind of change representaton of these from what I read	15:18
jrosser	i guess i mean that the delta doesnt always stay small	15:18
jrosser	it will be when you first create them	15:18
noonedeadpunk	well, delta would be basically venvs size I guess?	15:19
jrosser	for $time-since-resinstall on one i'me looking at 25G for 13 containers even with COW	15:20
noonedeadpunk	as logs and journals and databases are passed into the container and not root	15:20
noonedeadpunk	that is kind of weird?	15:20
mgariepy	wow.. https://github.com/ansible/ansible/issues/78344	15:28
jrosser	huh	15:29
jrosser	they're going to complain next at the slightly unusual use of the setup module, i can just feel it	15:30
mgariepy	let's make the control persist 4 hours.. ;p	15:30
mgariepy	what a good responses..	15:30
jrosser	noonedeadpunk: in my container /openstack is nearly 1G for keystone container, each keystone venv is ~200M so for a couple of upgrade this adds up pretty quick	15:31
noonedeadpunk	btw, retention of old venvs is a good topic	15:32
noonedeadpunk	should we create some playbook at least for ops repo for that?	15:32
jrosser	it's pretty coupled to the inventory so might be hard for the ops repo?	15:35
jrosser	mgariepy: i will reproduce with piplining=False and show that it handles -13 in that case	15:36
jrosser	i'm sure i saw it doing that	15:36
mgariepy	hmm i haven't been able to reproduce it with pipelining false.	15:37
noonedeadpunk	because it re-tries :)	15:37
mgariepy	no	15:38
mgariepy	because scp catch it to transfer the file.	15:38
mgariepy	then reactivate the error	15:38
mgariepy	error / socket	15:38
jrosser	well - question is if it throws AnsibleControlPersistBrokenPipeError and handles it properly when pipelining is false	15:40
mgariepy	when pipelining is false	15:40
mgariepy	it will transfert the file.	15:40
mgariepy	if there is an error it most likely end up being exit(255).	15:40
mgariepy	then it retries.	15:40
mgariepy	no ?	15:40
jrosser	lets test it	15:41
mgariepy	you never catch the same race with pipelining false.	15:41
jrosser	no, but my code that was printing p.returncode was showing what was happening though	15:41
jrosser	anyway, lets meeting some more then look at that	15:42
jrosser	we did a Y upgrade in the lab last week	15:44
jrosser	which went really quite smoothly, just a couple of issues	15:44
spotz_	nice	15:44
jrosser	galaxy seems to have some mishandling of http proxies so we had to override all the things to be https://github....	15:45
jrosser	also lxc_hosts_container_build_command is not taking account of lxc_apt_mirror	15:46
anskiy	noonedeadpunk: there is already one: https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/ansible_tools/playbooks/cleanup-venvs.yml	15:46
jrosser	but the biggest trouble was from `2022-07-22 08:04:35 upgrade lxc:all 1:4.0.6-0ubuntu1~20.04.1 1:4.0.12-0ubuntu1~20.04.1`	15:47
noonedeadpunk	anskiy: ah, well :)	15:47
noonedeadpunk	oh, well, it would restart all containers	15:48
jrosser	yeah, pretty much	15:48
jrosser	and i think setup-hosts does that across * at the same time	15:48
noonedeadpunk	what I had to do with that, is to systemctl stop lxc, systemctl mask lxc, then update package, unmask and reboot	15:49
noonedeadpunk	if we pass package_state: latest, then yeah...	15:49
jrosser	i don't know if we want to manage that a bit more	15:49
jrosser	but that might be a big surprise for poeple expecting the upgrades to not hose everything at once	15:50
noonedeadpunk	I bet this is causing troubles https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/run-upgrade.sh#L186	15:51
jrosser	we also put that in the documentation https://opendev.org/openstack/openstack-ansible/src/branch/master/doc/source/admin/upgrades/minor-upgrades.rst#L52	15:53
noonedeadpunk	So we would need to add serial here https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/containers-lxc-host.yml#L24 and override it in script	15:53
noonedeadpunk	ok, that is good a point.	15:53
noonedeadpunk	* a good point	15:53
jrosser	well i don't know if serial is the right thing to do	15:53
noonedeadpunk	are you thinking about putting lxc on hold?	15:54
noonedeadpunk	I really can't recall how it's done on centos though	15:55
noonedeadpunk	but yes, I do agree we need to fix that for ppl	15:57
noonedeadpunk	not upgrading packages - well, it's also a thing. Maybe we can adjust docs not to mention `-e package_state=latest` as smth "default" but what can be added if you want to update packages, but with some pre-cautions	15:58
noonedeadpunk	As at very least this also affects neutron, as ovs being upgraded also cause disturbances	15:58
jrosser	andrewbonney is back tomorrow - i'm sure we have some use of unattended upgrades here with config to prevent $bad-things but i can't find where that is just now	15:59
noonedeadpunk	well, we jsut disabled unattended upgrades...	16:00
jrosser	yeah i might be confused here, but it is the same sort of area	16:00
noonedeadpunk	so my intention when I added `-e package_state=latest` was - you're panning upgrade and thus maintenance. During maintenance interruptions happen.	16:01
noonedeadpunk	But with case of LXC you would need to re-boostrap galera at bery least	16:01
noonedeadpunk	as all goes down at the same time	16:01
jrosser	right, it's not at all self-healing just by running some playbooks	16:01
*** dviroel\|lunch is now known as dviroel		16:01
jrosser	which is kind of what poeple expect	16:01
noonedeadpunk	and what I see we can do - either manage serial, then restarting lxc will likely self-heal as others are alive, or, put some packages on hold and add another flag to upgrade LXC to make it intentionally	16:02
noonedeadpunk	as I don't think we are able to pin the version	16:03
noonedeadpunk	ok, let's wait for andrewbonney then to make a decision	16:04
noonedeadpunk	#endmeeting	16:04
opendevmeet	Meeting ended Tue Jul 26 16:04:06 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	16:04
opendevmeet	Minutes: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.html	16:04
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.txt	16:04
opendevmeet	Log: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.log.html	16:04
noonedeadpunk	oh, I can recall that we somehow prevented package triggers to run for galera I guess.	16:08
noonedeadpunk	and we already have lxc_container_allow_restarts variable	16:09
noonedeadpunk	https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/tasks/galera_install_apt.yml#L65-L71	16:09
jrosser	mgariepy: without pipelining https://paste.opendev.org/show/bfdqBx2uOhGOr8LJRMGV/	16:15
jrosser	and with -vvv https://paste.opendev.org/show/bWCoeQuDgiFeR8msOpop/	16:17
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Allow to provide serial for lxc_hosts https://review.opendev.org/c/openstack/openstack-ansible/+/851049	16:17
jrosser	and with pipelining enabled https://paste.opendev.org/show/baJ8I2xKlAiovdwHvniM/	16:20
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Prevent lxc.service from being restarted on package update https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/851071	16:29
noonedeadpunk	huh, how it re-tries then....	16:30
jrosser	i don't know	16:34
jrosser	mgariepy: noonedeadpunk the other thing i totally don't undestand is why this always happens with the setup module	16:43
jrosser	or is it really 'always', or are we just unlucky in the stucture of our playbooks that T>ControlPersist has elapsed before we try to gather facts on the repo hosts	16:44
mgariepy	it think it's reproducable by another module.	17:02
mgariepy	but it would need not to exit 255..	17:03
mgariepy	we probably can trigger it with comamnd module also	17:03
jrosser	perhaps there is something specific with the setup module	17:06
mgariepy	maybe.	17:06
mgariepy	but as long as we get the same race and an exit outside of 255 it should be failing.	17:07
mgariepy	the fun part is to make the other modules takes as much time as setup takes :D	17:10
jrosser	ok here with go with the file: module https://paste.opendev.org/show/bFNbYfNTxDRBENHh6h9S/	17:11
mgariepy	how nice :D	17:11
mgariepy	https://github.com/ansible/ansible/blob/e4087baa835df6046950a6579d2d2b56df75d12b/lib/ansible/plugins/connection/ssh.py#L1235	17:16
mgariepy	checkrc=False	17:17
mgariepy	so if it's scp of sftp it does not care about the exit status and retry if so	17:18
jrosser	here is another case where we get `muxclient: master hello exchange failed` https://paste.opendev.org/show/b3xllr0pREQ3vDvpyoax/	17:18
jrosser	i am still not able to make a MODULE FAILURE with anything else though	17:38
mgariepy	the file modules with pipelining does it do the same as the setup module ?	17:41
spatel	jamesdenton around?	17:41
mgariepy	or it does transfer the file in another way ?	17:42
jrosser	it's possible it's different, just look at the number of SSH steps in my paste	17:42
jrosser	i'm just trying something else like package:	17:42
mgariepy	some vote please :D https://review.opendev.org/c/openstack/openstack-ansible/+/851038	17:43
mgariepy	also any comments on : https://review.opendev.org/c/openstack/openstack-ansible/+/850942/ ?	17:49
mgariepy	i think having it to 1 makes the process not responing to haproxy an then disconect the endpoint server.	17:50
jrosser	package: MODULE FAILURE https://paste.opendev.org/show/bGFkEBoURUovv3ktl9JJ/	17:51
jrosser	mgariepy: is it really threads - i always find this sooo confusing with python	17:51
mgariepy	le'ts rewrite openstack in go.	17:52
mgariepy	simple deployment, real threads. and so on.	17:52
jrosser	i think what i mean is i don't know if that actually does what you expect :)	17:54
mgariepy	so should we set the number of process to 2 instead ?	17:56
mgariepy	i assume that with 2 threads it should be a bit better to respond to multiple request.	17:56
jrosser	noonedeadpunk: ^ do you understand how this works?	17:58
mgariepy	arf.. meeting now	17:58
noonedeadpunk	I wonder what's wrong with linters actually. seems that always fail	19:22
noonedeadpunk	um, not sure I got where this leads...	19:28
mgariepy	the linter is fixed in the other patch.	19:29
noonedeadpunk	(last was regarding python threads)	19:30
mgariepy	i saw a few failure when tempest was running that the services were just not responding.	19:38
mgariepy	not 100% sure having 2 thread will fix 100% of the case. but maybe.	19:39
mgariepy	mayber i could try to take the time to test if it actualy help. but i'm kinda short on time for the remaining of the week.	19:39
opendevreview	Merged openstack/openstack-ansible master: Fix cloud.openstack.* modules https://review.opendev.org/c/openstack/openstack-ansible/+/851038	20:14
*** anskiy1 is now known as anskiy		21:28
*** dviroel is now known as dviroel\|out		23:29

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!