Friday, 2023-08-25

opendevreview	OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/892726	03:06
opendevreview	Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/892726	11:43
*** TheJulia is now known as needs_brains_and_sleep		13:04
*** needs_brains_and_sleep is now known as TheJulia		13:23
frickler	infra-root: kolla just saw four jobs failing in gate in parallel https://zuul.opendev.org/t/openstack/buildset/9a2af787cb62474c88536484142d607d , three of them ran on rax-ord. I'm kind of eoding, maybe you can have a look? mnasiadka might still be around a bit to answer possible kolla related questions	14:02
fungi	looking	14:12
fungi	during `kolla-ansible -i /etc/kolla/inventory -vvv bootstrap-servers` the ssh connection to the node was prematurely closed, in all 4 cases, looks like?	14:13
fungi	since one of those four happened in ovh, i think we can rule out a provider-specific problem impacting the nodes themselves	14:16
fungi	though it's possible something provider side is impacting the executors	14:16
Clark[m]	Was it zuul's connection or Kolla ansible's connection? The latter is comms all within the same cloud	14:19
fungi	with all four builds, the connection error occurred within 1-2 minutes of starting to run the setup_gate.sh script	14:22
fungi	primary \| ok: 19 changed: 15 unreachable: 0 failed: 1 skipped: 14 rescued: 0 ignored: 0	14:23
fungi	i think that's from the nested ansible?	14:23
fungi	it's possible that stderr line about the shared connection being closed is not related to the cause. ultimately the task failed because the setup script exited 2	14:30
fungi	okay, here we go: https://zuul.opendev.org/t/openstack/build/72774530166546c1a96284f3312e0e36/log/primary/logs/ansible/bootstrap-servers#749	14:31
fungi	https://zuul.opendev.org/t/openstack/build/f7e6f2677ed9440db9f8a3c1b04f1868/log/primary/logs/ansible/bootstrap-servers#699	14:32
fungi	failing in different spots in that script	14:33
fungi	"Failed to download metadata for repo 'docker': Yum repo downloading error: Downloading error(s): repodata/8e89c445039a4ff75bb98ab62bee6b6ae7c4c8ae853a61cab75de5e30c39d0bf-primary.xml.gz - Cannot download, all mirrors were already tried without success; repodata/abe464de7c144654302f1b3b46042d88f1d6550b46527f15a2cef794091f2b3c-filelists.xml.gz - Cannot download, all mirrors were already tried	14:33
fungi	without success"	14:33
fungi	"E:Failed to fetch https://download.docker.com/linux/debian/dists/bookworm/stable/binary-amd64/Packages.bz2 File has unexpected size (11933 != 12572). Mirror sync in progress?"	14:36
frickler		14:37
fungi	seems like different mirror problems (for different distros, in different parts of the world), but maybe there is some relationship	14:37
fungi	afs fileservers and database servers don't have anything new in dmesg for the past two months, so they don't seem to think they're in distress at least	14:43
fungi	oh! i should have paid closer attention. those are direct download errors for things hosted on download.docker.com, nothing to do with our mirrors at all i don't think?	14:44
fungi	i should have looked closer	14:45
fungi	so my best guess is that docker.com is blowing up their package repositories	14:45
fungi	mnasiadka: ^ let me know if that explanation doesn't match your observations	14:45
frickler	oops, looks like that ssh session wasn't as dead yet as it looked like, sorry	15:09
frickler	fungi: thanks for debugging, seems I was mislead by the non-fatal warnings about our mirror host in the same block	15:10
fungi	frickler: some of the confusion is probably due to the fact that zuul is now separating display of stdout and stderr streams, which made a normal connection close jump out as if it were the cause	15:11
fungi	simply because it was the only thing that task sent to the stderr stream	15:12
fungi	it mislead me first too	15:12
frickler	I thought that that feature wasn't enabled yet?	15:13
frickler	kolla explicitly redirects logs for those tasks I think	15:13
fungi	right, i think the connection closed was coming from the task run by the executor, looking at how it's split up. but maybe i'm misinterpreting that	15:15
Clark[m]	Stdout and stderr shouldn't be split yet. And when the first change happens it will be opt in	15:20
fungi	that "Shared connection to 23.253.56.132 closed." et cetera is clearly called out as being from stderr though	15:27
fungi	https://zuul.opendev.org/t/openstack/build/72774530166546c1a96284f3312e0e36/console	15:27
fungi	has separate boxes for stderr and stdout	15:27
fungi	they're separate in the ansible json	15:28
fungi	and so also in the task summary	15:28
Clark[m]	It's not a command or shell task. I think it is a https://docs.ansible.com/ansible/latest/collections/ansible/builtin/script_module.html task	15:33
Clark[m]	Only command and shell have the combined outputs	15:33
fungi	oh, i see	15:35
clarkb	fungi: if you're still ready for mailman3 I think we can send it in	15:41
fungi	yeah, i'm braced and ready for impact	15:42
clarkb	should I +A or do you want to do it?	15:43
fungi	go for it	15:44
frickler	the "Shared connection ... closed." is a red herring, you can also see it for that task when it (and the whole job) is passing https://zuul.opendev.org/t/openstack/build/89e86e76b0df4af1a19511f98a4fb323/console#2/1/28/primary	15:44
clarkb	done	15:44
fungi	thanks!	15:47
fungi	frickler: yeah, i realized that was the case after i went hunting for the bootstrap-servers log and saw the errors in it	15:48
fungi	the upload image job has been sitting queued for a while even though we've got tons of available capacity. i wonder if we're running into significant boot failures again	16:08
clarkb	it is running now	16:13
clarkb	~5 minutes isn't abnormal for anode boot in some clouds. However I Think it ended up being longer than that	16:13
fungi	yeah, it was close to 30 minutes wait for the node after the registry job paused	16:23
fungi	it's wrapping up now	16:26
opendevreview	Merged opendev/system-config master: Upgrade to latest Mailman 3 releases https://review.opendev.org/c/opendev/system-config/+/869210	16:27
clarkb	looks like it is promoting the image but not triggering the lists3 job	16:29
clarkb	ya that job doesn't trigger on mailman3 docker updates. Only on the playbook side	16:30
clarkb	fungi: you could add the docker paths to the files list for the infra-prod-service-lists3 job and add a rebuild trigger comment to one or several of the imges to have it run through again	16:30
clarkb	or we could manually trigger the infra-prod-service-lists3 playbook on bridge instead	16:30
fungi	sure, on it	16:30
clarkb	proably better long term to have docker image builds trigger the job though	16:31
clarkb	thinking back to when we built this out I think it wasn't clear if upgrades needed intervention like gerrit or not so we left that out	16:34
clarkb	the current indication is that this should typically be automated so making it happen automatically makes sense to me. But if you think that isn't the case we can leave it as is and trigger the playbook manually	16:34
opendevreview	Jeremy Stanley proposed opendev/system-config master: Trigger mm3 deployment when containers change https://review.opendev.org/c/opendev/system-config/+/892807	16:36
fungi	clarkb: like that ^?	16:36
clarkb	yup looks similar to etherpad for example	16:37
clarkb	I've approved it	16:37
fungi	thanks	16:37
fungi	once again system-config-build-image-mailman is taking a surprisingly long time to get a single ubuntu-jammy node assigned	16:48
fungi	there it finally goes	16:48
fungi	that time was about 10 minutes after opendev-buildset-registry paused	16:49
fungi	not as bad at least	16:49
fungi	just surprising when we have so much available capacity at the moment	16:49
fungi	this time it got a node the instant the registry paused	17:09
fungi	error node and time to ready spikes on https://grafana.opendev.org/d/6c807ed8fd/nodepool suggest we may have some intermittent issues	17:13
fungi	i think the errors are predominately in ovh regions: https://grafana.opendev.org/d/2b4dba9e25/nodepool%3a-ovh	17:14
clarkb	should merge soon	17:22
fungi	yup	17:24
opendevreview	Merged opendev/system-config master: Trigger mm3 deployment when containers change https://review.opendev.org/c/opendev/system-config/+/892807	17:24
clarkb	I just cleaned up my etherpad and gitea autoholds in zuul	17:25
fungi	infra-prod-service-lists3 is waiting in deploy this time, so looks like it worked	17:25
fungi	i'll clean up the mm3 held node once we're sure the prod upgrade is good, just in case we need it for a comparison or something	17:26
fungi	deployment is now in progress	17:26
fungi	containers are restarting	17:27
clarkb	ya I left those other two around for a bit for the same reason	17:28
fungi	https://lists.opendev.org/ is up and still seems to be working	17:28
fungi	"Postorius Version 1.3.8" on https://lists.opendev.org/mailman3/lists/	17:29
clarkb	and hyperkitty 1.3.7 on archive pages	17:29
fungi	"HyperKitty version 1.3.7" on https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/ yep	17:29
clarkb	I can read emails in the archive. vhosting still seems good.	17:30
clarkb	Main thing we're missing is an email going through	17:30
fungi	docker/mailman/web/requirements.txt contains postorius==1.3.8 and hyperkitty==1.3.7	17:31
fungi	i'm planning to send something to service-discuss next	17:31
fungi	i've received my copy	17:35
fungi	https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/EUE5GZNFTH22QAG5D2BMF3R56IEAXE4R/	17:36
clarkb	I got it too	17:36
fungi	i think we're set	17:36
clarkb	++	17:36
fungi	i'm going to head out to a late lunch pretty soon, but will check back in once i get back	17:36
clarkb	enjoy. I'm going to try and sneak a bike ride in before it gets super hot	17:36
clarkb	we had thunderstorms overnight so temperatures never really dropped	17:36
clarkb	was warm and humid.	17:37
fungi	hot and muggy. sounds like here	17:37
fungi	i mostly just want to get outside to escape the paint fumes	17:37
fungi	now that they're done for the day	17:37
clarkb	if you breath deeply it is its own form of escape	17:37
fungi	touché	17:37
fungi	seems the universe didn't implode while i was out at the bar. good	19:46
Clark[m]	I didn't expect fireworks. I'm eating lunch but wanted to follow up on whether or not we can unfork that config file now. Then maybe approve a bookworm container update or two	20:01
fungi	oh, yep sonds good	20:08
fungi	Clark[m]: which config file specifically were you thinking we can un-fork?	20:13
Clark[m]	fungi: https://review.opendev.org/c/opendev/system-config/+/869210/8/docker/mailman/web/mailman-web/settings.py that one maybe	20:22
fungi	i'll double check it against upstream	20:23
Clark[m]	https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mailman3/files/web-settings.py is the forked version	20:23
Clark[m]	the one in the change is/should be in sync with upstream. Then in our role we bind mount over it	20:23
fungi	we do force SITE_ID = 0 in it intentionally	20:25
clarkb	ya so there are a couple of extra things that we would need ot address upstream first	20:27
clarkb	in that case we can't unfork and thats fine. This has worked well enough so far	20:27
clarkb	fungi: https://review.opendev.org/c/opendev/system-config/+/892702 is a low risk bookworm update	20:28
fungi	we can reset ours to what's in maxking/docker-mailman except for the SITE_ID override, i thnk	20:28
fungi	yes, i agree. the rest seem like basically no-op changes	20:31
fungi	at least in our case	20:31
clarkb	I think the gethostbyname("mailman-web"), hardcoded in the list of hosts is still a problem too	20:35
clarkb	we use host netowkring so that name doesn't end up in magical dns for us	20:36
fungi	ah, so we do still need the custom 127.0.0.1 entry?	20:39
opendevreview	Jeremy Stanley proposed opendev/system-config master: mailman3: re-sync custom web/settings.py https://review.opendev.org/c/opendev/system-config/+/892817	20:42
clarkb	fungi: I thought 127.0.0.1 came from upstream and I put the differences under my comment. We may not need it since localhost is there	20:42
fungi	the 127.0.0.1 had a comment explicitly claiming to be an opendev edit	20:43
clarkb	ah it did	20:43
clarkb	fungi: note the mailman-web stuff needs to be commented out so I don't think^ will work	20:43
fungi	i just tried to clarify the comment a bit	20:43
clarkb	it will fail on mailman web startup trying ot resolve that name	20:43
fungi	ah, i'll comment it out again then	20:44
clarkb	both the lines	20:44
opendevreview	Jeremy Stanley proposed opendev/system-config master: mailman3: re-sync custom web/settings.py https://review.opendev.org/c/opendev/system-config/+/892817	20:45
fungi	yep	20:45
fungi	didn't know if it would just get ignored when lookups failed	20:46
clarkb	fungi: oh you need to edit our DJANGO_ALLOWED_HOSTS value too	20:49
clarkb	I dind't notice upstream changed the separator	20:49
fungi	the new upstream separator won't work for us?	20:50
clarkb	fungi: our values are separated by : not , so the split won't split things in a meaningful way for us	20:50
clarkb	currently we reuse the exim mm_domains variable and exim wants : iirc	20:50
clarkb	but we can define a new value or convert it in ansible before writing it out into the config for docker-compose	20:50
fungi	oh, so we can't just change to ,	20:50
clarkb	correct	20:51
clarkb	we need something slightly smarter. But still doable	20:51
fungi	we still need the conditional there too?	20:51
fungi	and use the ansible var instead of the envvar?	20:52
clarkb	fungi: we don't need the condition. That was there to make the change more likely to be upstreamable. But they condensed it down in a safe way for us (even if they didn't it would work because we always set the value)	20:52
clarkb	I wouldn't use the ansible var. I would keep using the envvar to stay in sync with upstream there	20:53
clarkb	what we need to change is where we set the env var value	20:53
clarkb	which is set in playbooks/roles/mailman3/templates/docker-compose.yaml.j2	20:53
opendevreview	Jeremy Stanley proposed opendev/system-config master: mailman3: re-sync custom web/settings.py https://review.opendev.org/c/opendev/system-config/+/892817	20:53
clarkb	and we can do something like mm_domains \| split(:) \| join (,)	20:53
clarkb	not valid ansible	20:53
fungi	mmm, okay. but basically the only thing we could un-fork was to update the TIME_ZONE assignment?	20:54
clarkb	we can unfork the DJANGO_ALLOWED_HOSTS code too. We just have to change how we set the DJANGO_ALLOWED_HOSTS value in docker-compose.yaml	20:54
fungi	seems like we're not really un-forking the config, though it was a good exercise to confirm basically all the differences there were needed	20:54
clarkb	bsaically upstream splits on , we split on : so we can change the input to the split and unfork that way	20:55
fungi	ah, i guess the docker-compose is a jinja2 template so we can manipulate values there	20:57
clarkb	yup exactly. I think doing that is worthwhile if we can pretty easily use jinja filters to change the separator value	20:57
clarkb	that way its less divergence from upstream in the settings file	20:57
opendevreview	Jeremy Stanley proposed opendev/system-config master: mailman3: re-sync custom web/settings.py https://review.opendev.org/c/opendev/system-config/+/892817	21:02
fungi	so like that?	21:02
clarkb	in DJANGO_ALLOWED_HOSTS={{ mm_domains.split(':') \| join(,) }} I don't think mm_domains.split() is valid. YOu need to us \| filter() syntax?	21:08
clarkb	but yes from a psuedo code perspective	21:08
clarkb	huh google says I'm wrong	21:12
clarkb	I guess it isn't clera to me which things are functions and whihc things are filters then	21:12
clarkb	fungi: you might need quotes around the , ? otherwise I guess that may work	21:13
fungi	oh, i thought i did, sorry. will fix	21:14
opendevreview	Jeremy Stanley proposed opendev/system-config master: mailman3: re-sync custom web/settings.py https://review.opendev.org/c/opendev/system-config/+/892817	21:15
fungi	i stole the foo.split() invocation from other templates we have in system-config, fwiw	21:15
clarkb	ya so some string methods exist as directly invocable?	21:16
clarkb	jinja is weird	21:16
clarkb	I think that will work. CI should confirm	21:16
fungi	playbooks/roles/base/exim/templates/exim4.conf.j2 roles/set-hostname/templates/hosts.j2 roles/set-hostname/templates/mailname.j2	21:17
fungi	were the examples i found of split() methods in templates	21:17
fungi	cargo cult ftw	21:18
opendevreview	Merged opendev/system-config master: Update zookeeper-statsd image to bookworm https://review.opendev.org/c/opendev/system-config/+/892702	21:21
clarkb	the mm3 change passed testing. I was hoping we record the docker-compose.yaml file but it seems we don't	22:02
clarkb	its probably fine	22:02
clarkb	looks like zookeeper-statsd won't update until our daily run later. Not a big deal its low impact if anything goes wrong (we lose zk stats until we fix it)	22:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!