Tuesday, 2021-12-14

ianw	so weird, because pip verbosity doesn't even seem to affect the verbosity of the uwsgi build bits	00:05
clarkb	no, but I think it does affect buffering due to python stuff	00:07
ianw	could try reducing CPUCOUNT= to see it's some sort of dependency thing ... but no error at all ...	00:07
clarkb	thats the other weird thing is it says it failed but doesn't say how or why	00:09
fungi	feels like maybe it's saying why on stderr and pip is bitbucketing that	00:10
ianw	you could also try python uwsgiconfig.py --build directly maybe?	00:11
clarkb	hrm ya we could try that. Would haev to clone the repo rather than relying on pypi but that seems possible	00:11
clarkb	let me try that	00:11
opendevreview	Clark Boylan proposed opendev/system-config master: Try building uWSGI directly https://review.opendev.org/c/opendev/system-config/+/821631	00:23
clarkb	that isn't mergeable beacuse `ython uwsgiconfig.py --build` doesn't produce a wheel. But maybe it will give us insight if we can make it fail	00:23
ianw	++	00:30
clarkb	updated gerritbot is running now. Anyone have a change to update?	00:33
clarkb	All three of the uwsgi bullseye builds when built directly seem fine: https://zuul.opendev.org/t/openstack/build/9c401ac728ed44ab87ce77d368245c6d/log/job-output.txt#1767 https://zuul.opendev.org/t/openstack/build/1ca52f43ac834b5e95d047d91918b1da/log/job-output.txt#1764 https://zuul.opendev.org/t/openstack/build/74c2c9ff1b8948c18d7f5841430a7554/log/job-output.txt#1804	00:35
ianw	sigh	00:36
ianw	the only other thing i can think is run it under strace with a really big -s value	00:36
clarkb	I think we pushed an event for magnum. trying to verify with logs now (as I'm not in that channel)	00:39
clarkb	hrm no these are all comment added events which we don't notify for	00:40
clarkb	aha it logged we sent something to #tacker	00:42
clarkb	yup its there. I'll link to it as soon as our htmlification runs	00:42
clarkb	But I think gerritbot is good	00:42
clarkb	https://meetings.opendev.org/irclogs/%23tacker/%23tacker.2021-12-14.log.html#t2021-12-14T00:39:41 this was from the new bot	00:46
clarkb	s/new/updated	00:46
clarkb	ianw: ya or maybe hold a node like fungi suggests and see if it is consistent on specific nodes (then we can try all manner of debugging)	00:47
clarkb	But I'm running out of time today. I'll see if I can pick this up tomorrow	00:47
clarkb	Need ot figure out dinner now	00:47
*** rlandy\|ruck is now known as rlandy\|out		00:54
ianw	ok, i'll keep thinking	01:01
opendevreview	Ian Wienand proposed opendev/infra-specs master: zuul-credentials : new spec https://review.opendev.org/c/opendev/infra-specs/+/821645	03:58
opendevreview	Merged openstack/project-config master: Add openEuler 20.03 LTS SP2 node https://review.opendev.org/c/openstack/project-config/+/818723	04:56
opendevreview	Ian Wienand proposed opendev/base-jobs master: Update Fedora latest nodeset to 35 https://review.opendev.org/c/opendev/base-jobs/+/821649	05:00
opendevreview	Ian Wienand proposed opendev/base-jobs master: Add 8-stream-arm64 and 9-stream nodesets https://review.opendev.org/c/opendev/base-jobs/+/821650	05:00
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Switch 9-stream testing to use opendev mirrors https://review.opendev.org/c/openstack/diskimage-builder/+/821651	05:05
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Add debian-bullseye-arm64 build test https://review.opendev.org/c/openstack/diskimage-builder/+/821652	05:16
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Add debian-bullseye-arm64 build test https://review.opendev.org/c/openstack/diskimage-builder/+/821652	05:24
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Add 9-stream ARM64 testing https://review.opendev.org/c/openstack/diskimage-builder/+/821653	05:24
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: debian-minimal: remove old testing targets https://review.opendev.org/c/openstack/diskimage-builder/+/821654	05:24
*** ysandeep\|out is now known as ysandeep		05:27
opendevreview	chandan kumar proposed openstack/diskimage-builder master: Revert "Fix BLS based bootloader installation" https://review.opendev.org/c/openstack/diskimage-builder/+/821526	06:14
*** sshnaidm\|afk is now known as sshnaidm		06:57
*** ysandeep is now known as ysandeep\|lunch		07:26
opendevreview	Merged openstack/diskimage-builder master: Use OpenDev mirrors for 8-stream CI builds https://review.opendev.org/c/openstack/diskimage-builder/+/820978	07:38
*** ysandeep\|lunch is now known as ysandeep		08:41
*** ysandeep is now known as ysandeep\|afk		09:35
opendevreview	Lajos Katona proposed opendev/elastic-recheck master: Add query for bug 1954663 https://review.opendev.org/c/opendev/elastic-recheck/+/821684	09:56
opendevreview	Lajos Katona proposed opendev/elastic-recheck master: Add query for bug 1799790 https://review.opendev.org/c/opendev/elastic-recheck/+/821684	10:22
*** ysandeep\|afk is now known as ysandeep		10:32
*** rlandy is now known as rlandy\|ruck		11:13
*** jpena\|off is now known as jpena		11:42
dtantsur	hey folks! any issues with pypi mirrors? we see a ton of random errors today.	12:15
dtantsur	see https://review.opendev.org/c/openstack/ironic/+/821010 for example	12:16
ykarel	seeing a lot of those in neutron too, too many red in https://zuul.opendev.org/t/openstack/status, seems only some providers are impacted	12:19
fungi	dtantsur: the first one i looked at seems to be complaining about a dependency conflict between openstackdocstheme and constraints over dulwich, are they all like that?	12:19
dtantsur	different packages	12:19
fungi	ykarel: providers in/around montreal canada again?	12:19
ykarel	fungi, yes atleast i noticed in those	12:20
ykarel	iweb-mtl01	12:20
fungi	iweb mtl01, vexxhost ya-cmq-1, and ovh bhs1 are all in that area	12:20
*** ysandeep is now known as ysandeep\|brb		12:21
fungi	er, vexxhost ca-ymq-1	12:21
fungi	i'm still pre-coffee	12:21
ykarel	also seen in rax-iad	12:21
fungi	definitely not that region, that's virginia/washington dc	12:22
fungi	so whatever's going on with pypi is probably more global	12:22
*** outbrito_ is now known as outbrito		12:22
fungi	is it only pypi-related errors, or problems with other content too?	12:23
ykarel	i noticed only pypi till now	12:23
dtantsur	same	12:23
fungi	pypi's just a caching proxy in our case, so it looks like pypi is probably being serving us stale or incomplete indices again	12:24
fungi	if we can figure out which specific package(s) is/are impacted, we can issue requests to their cdn to refetch from pypi's backend	12:25
fungi	dulwich===0.20.26 seems to probably be one	12:26
dtantsur	keystone 20.1.0.dev19 depends on PyJWT>=1.6.1 The user requested (constraint) pyjwt===2.3.0	12:27
fungi	yeah, it's usually whatever constraint it's complaining about that it couldn't find in those cases	12:28
dtantsur	ironic 19.0.1.dev7 depends on pecan!=1.0.2, !=1.0.3, !=1.0.4, !=1.2 and >=1.0.0 The user requested (constraint) pecan===1.4.1	12:28
dtantsur	fungi: looks like dulwich, pyjwt and pecan in our case	12:29
ykarel	http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22The%20user%20requested%20(constraint)%5C%22	12:29
ykarel	yeap pecan/dulwich/pyjwt	12:29
ykarel	and providers: rax-iad, iweb-mtl01, rax-dfw,	12:31
fungi	thanks, those are the only packages i've seen in the errors so far as well. i'll dig up my notes on how to ask fastly to refresh those indices	12:31
ykarel	and ovh-bhs1	12:32
frickler	fungi: curl -XPURGE https://pypi.org/simple , just did that	12:34
fungi	i've done like `curl -XPURGE https://pypi.org/simple/dulwich` with and without a trailing / for each of the identified package names	12:35
fungi	from each of the mirrors, in case it matters which endpoint cluster they're sending it to	12:36
ykarel	also seen few failures for python-ironic-inspector-client===4.7.0 in provider airship-kna1	12:37
jrosser	we see the same keystone/pyjwt problem in some OSA jobs	12:39
fungi	i've now done it for python-ironic-inspector-client as well	12:39
ykarel	ack Thanks fungi	12:45
fungi	here's hoping it helps. if some fastly endpoints simply refreshes from the same stale backend again, then we're not any better off	12:46
ykarel	ack lets see how it goes	12:47
*** ysandeep\|brb is now known as ysandeep		12:48
opendevreview	yatin proposed openstack/project-config master: Update Neutron's Grafana as per recent changes https://review.opendev.org/c/openstack/project-config/+/821706	13:30
jrosser	clarkb: fungi i may have reproduced the uwsgi build failure https://paste.opendev.org/show/811652/	13:43
jrosser	hacking the code a bit to import builtins and switching __builtins__.compile for builtins.compile makes it work	13:44
jrosser	but that is now the limit of my python understanding	13:44
fungi	oh weird!	13:54
fungi	if it's that, i wonder why pip is eating the error details	13:55
opendevreview	Merged openstack/project-config master: Update Neutron's Grafana as per recent changes https://review.opendev.org/c/openstack/project-config/+/821706	13:56
fungi	jrosser: and also i wonder why it only fails for us sometimes	13:57
jrosser	fungi: i'm not sure what is going on tbh - if your build is run through a script or something and stderr gets lost?	13:58
fungi	it's being built by pip which is downloading the sdist and installing it	13:58
jrosser	so locally, when i build with the makefile it's completey fine	13:58
jrosser	but if i `pip3 wheel .` in the same directory it looks like it fails exactly at the point you saw yesterday	13:59
fungi	yeah, that seems like ore than mere coincidence, i agre	13:59
fungi	e	13:59
jrosser	and for $reasons, messing with how it finds __builtin__.compile fixes it	14:00
jrosser	reason i had a dig was that we build uwsgi on every OSA bullseye job and never see anything like this	14:00
jrosser	fungi: with CPUCOUNT=1 the output is not confused with threading, so you can see exactly where it fails https://paste.opendev.org/show/811661/	14:03
fungi	it came up for us when switching from debian buster to bullseye based python container images	14:03
jrosser	yeah, and i think it's when it enters plugins/python that errors	14:04
jrosser	which may point to python version	14:04
fungi	interestingly, we used python3.7 built on both buster and bullseye in this case	14:05
fungi	switching from the buster 3.7 to bullseye 3.7 images is when we started to run into it	14:06
fungi	but yeah, i have a feeling it's something like a race related to concurrency because whether or not we hit it seems to be influenced by simple things like increasing verbosity	14:07
fungi	classic heisenbug	14:07
fungi	up the logging so you can observe, and you influence the outcome so it stops breaking	14:07
fungi	i'm not finding any examples like yours via a web search, so probably not common	14:10
fungi	their issue tracker is littered with people reporting linker errors on macos	14:12
jrosser	no, i also had a search and didnt find anything	14:13
*** ysandeep is now known as ysandeep\|out		14:13
jrosser	there must be a detail difference between import builtins and __builtins__ in the context of the pip build	14:13
fungi	jrosser: maybe https://github.com/unbit/uwsgi/pull/2373 is a clue?	14:14
jrosser	i've applied that here and there is no difference	14:16
jrosser	i was really surprised they've built their own parallel build system out of python though	14:16
fungi	yeah	14:17
fungi	the comment in https://github.com/agdsn/pycroft/pull/508 does also mention bullseye	14:18
fungi	jrosser: https://github.com/unbit/uwsgi/pull/2362	14:19
jrosser	oh!	14:19
fungi	though that's with 3.10	14:20
fungi	web search engines do a poor job of indexing github comments, or so it seems	14:21
jrosser	that has the same effect as switching to builtins.compile	14:22
jrosser	i.e its no longer throwing a error	14:23
fungi	yep	14:23
fungi	more just pointing out that it seems to mention the same exception you got	14:23
fungi	and that someone was seeing it at least as far back as 2021-11-02	14:24
jrosser	could adjust this patch to do the direct build with `pip3 wheel` instead of calling the build script directly	14:24
jrosser	https://review.opendev.org/c/opendev/system-config/+/821631/1/docker/uwsgi-base/Dockerfile	14:25
opendevreview	Jeremy Stanley proposed opendev/system-config master: Try building uWSGI directly https://review.opendev.org/c/opendev/system-config/+/821631	14:34
fungi	jrosser: clarkb: ^ like that?	14:34
jrosser	yes - hopefully that will behave similarly to what i see	14:35
noonedeadpunk	can I ask infra-root to abandon patches for retired repos? like https://review.opendev.org/c/openstack/openstack-ansible-pip_install/+/720133 and https://review.opendev.org/c/openstack/openstack-ansible-os_almanach/+/658585 ?	15:30
fungi	noonedeadpunk: tc members should be able to abandon patches on retired repos	15:31
noonedeadpunk	ok, gotcha	15:31
fungi	the openstack retirement acl grants them rights to make changes to the repos for such purposes	15:31
fungi	i would, but i'm in the middle of several things already	15:32
fungi	and this is one of the reasons the tc has special acl access over retired repos in the openstack/ namespace	15:32
clarkb	jrosser: fungi: thank you for the help debugging that. I've just returned from a number of early morning errands and it decided to snow just to make things more difficult :)	16:22
clarkb	catching up now	16:22
jrosser	o/ hello	16:23
clarkb	fungi: jrosser: so one thing thatmakes this extra weird is we are trying to rely on our "assemble" script to do bindep and make wheels for us	16:25
clarkb	running pip3 wheel doesn't quite work because you also need to install all the deps and their wheels	16:26
clarkb	considering that upping the verbosity works and we've got a hint as to what is happening maybe we keep the verbosity and link to https://github.com/unbit/uwsgi/pull/2362 ? As for why older python exhibits this I bet you python backported whatever caused that and since we get up to date python we see it	16:27
clarkb	let me know what yall think is reasonable and I'll try ot update changes to accomodate	16:28
clarkb	I've approved https://review.opendev.org/c/opendev/gerritbot/+/818494 and will monitor that as it goes in	16:29
jrosser	instinct says that you are seeing a failure due to https://github.com/unbit/uwsgi/pull/2362 even though the stderr has gone missing	16:29
jrosser	as it stops in exactly the same point as mine did	16:30
clarkb	jrosser: ya I wouldn't be surprised	16:30
clarkb	and strongly suspect python backported whatever change did that in 3.10 on our images	16:30
jrosser	fwiw i had 3.9.2-3 on a bullseye vm	16:32
clarkb	My thought is to link to that pull request and stick with the verbose flag for now. Or just stick the pull request in there as a note for why we don't have bullseye yet. Except we thought we were already on bullseye with those images so I think hacking it to work is probably best	16:33
fungi	yeah, i agree it's quite likely something happening with more recent point releases of python interpreters of varying minor revs	16:37
opendevreview	Merged opendev/gerritbot master: Update the docker image to run as uid 11000 https://review.opendev.org/c/opendev/gerritbot/+/818494	16:37
clarkb	fungi: do you think that is a reasonable compromize to just stick with the verbosity for now and land the update?	16:39
clarkb	lodgeit in particular thought it was already on bullseye but since our uwsgi image is publishing bullseye with buster contents that isn't true. And this change will fix that	16:39
fungi	clarkb: yeah, that seems fine to me. if the problem begins to crop up for us again we have more to go on and hopefully more detail captured in the build log	16:40
clarkb	cool I'll make that udpate as soon as I've eaten something	16:40
*** marios is now known as marios\|out		16:46
jrosser	fungi: clarkb this does seem to be a bit self-inflicted by uwsgi, they've re-used a builtin function name `compile` and then had to reference the actual builtin version explicitly	16:54
jrosser	and renaming the function away from the builtin also seems to resolve this trouble https://paste.opendev.org/show/811669/	16:55
fungi	yeah, rolling their own parallel build system, as you observed, is a special kind of nih as well	17:03
jrosser	maybe i make a PR for this as it's really odd what they've done	17:04
opendevreview	Clark Boylan proposed opendev/system-config master: Properly build bullseye uwsgi-base docker images https://review.opendev.org/c/opendev/system-config/+/821339	17:06
clarkb	jrosser: ++	17:06
clarkb	also ^ there is the verbosity hack with appropriate details	17:06
fungi	clarkb: your commit message includes a reminder to check with vexxhost, should we get noonedeadpunk to confirm it's fine?	17:08
* noonedeadpunk not working for vexxhost for quite a while now		17:09
clarkb	fungi: not a bad idea. I'm not sure if they are using this image beyond lodgeit though. If it is just lodgeit then we should be able to confirm it works	17:09
clarkb	https://review.opendev.org/c/opendev/lodgeit/+/821340 via recheck on that running some testing	17:09
fungi	noonedeadpunk: what i meant was i wondered if it was really a reminder to check with you	17:09
fungi	no idea if it was actually vexxhost using those lodgeit images	17:10
clarkb	fungi: well its for whoever at vexxhost is still using that image if at all	17:10
clarkb	mnaser: are you using opendevorg/uwsgi-base docker images for anything? I think you proposed the image initially. We discovered that our bullseye images are actually buster images and https://review.opendev.org/c/opendev/system-config/+/821339 corrects this	17:10
clarkb	Wanted to warn you if you are using them as this shift could be surprising depending on how you use it	17:10
noonedeadpunk	fungi: yeah they used images for lodgeit one day. no idea if they are now.	17:13
clarkb	fungi: is '*.foobar CNAME foobar' and 'foobar CNAME foobar01' a valid DNS configuration?	17:19
clarkb	I guess we have CI for that so I can just push up the change I'm thinking of	17:19
fungi	yeah, that should be fine	17:21
fungi	it was traditionally considered poor form to point a cname to another cname (or an mx to a cname) simply because it results in more recursion to get to the intended address(es), but these days that's usually not the case because modern nameservers are smart enough to return related records when queried so that you don't have to ask again	17:22
fungi	so when you ask for baz.foobar the response from the resolver is going to have not only the cname to foobar but also the cname from foobar to foobar01 and the address records for foobar01 if it has them	17:23
opendevreview	Clark Boylan proposed opendev/zone-opendev.org master: Try to make zuul-preview records more clear https://review.opendev.org/c/opendev/zone-opendev.org/+/821743	17:23
clarkb	fungi: ^ the context is zuul-preview and me getting all confused trying to figure out what the actual host to ssh into was	17:24
clarkb	this was when I was auditing buster to bullseye image update needs	17:24
clarkb	I had our inventory and was checkign things in inventory but zp01 in our inventory wasn't in dns :/	17:25
clarkb	fungi: I responded to your question at https://review.opendev.org/c/opendev/zone-opendev.org/+/821743	17:36
*** jpena is now known as jpena\|off		17:37
fungi	clarkb: maybe i wasn't clear with my question... what uses the zuul-preview.opendev.org name? anything? i know what we're using the *.zuul-preview.opendev.org names for	17:40
clarkb	oh I have no idea. But that is all that was in DNS so I assume something	17:41
fungi	i suspect the original server was named zuul-preview and we didn't reevaluate the need for that record when we replaced it with zp01	17:41
fungi	doesn't hurt to keep the old name around, i guess, i was just pointing out that it's probably cruft	17:42
clarkb	hrm ya maybe check with corvus and mordred and we can shift the *.zuul-preview CNAME to zp01 instead of zuul-preview	17:43
clarkb	mordred: corvus ^ does anything use the zuul-preview.opendev.org name? or should it just be zp01.opendev.org?	17:43
clarkb	wow pytest loads configs out of tox.ini for massive confusion	17:44
fungi	yes, i found that amazing	17:45
fungi	granted, flake8 does as well	17:45
clarkb	flake8 only does it from its flake8 section though right?	17:45
clarkb	at least it is somewhat explicit in that case	17:45
fungi	right	17:47
fungi	the pytest solution is all extra sorts of nuts	17:48
corvus	clarkb: i think the magic proxy is designed to use zuul-preview, but i'm not 100% sure	18:02
clarkb	corvus: ya I think fungi's question is if we only need the *.zuul-preview.opendev.org for the proxy	18:03
clarkb	btu we can be safe and leave both records in place	18:03
corvus	ooh... erm... yeah i'd guess we can remove it	18:05
fungi	right, trying to determine if the bare name (not the subdomain records under it) is cruft	18:05
corvus	still not 100% on that, but i agree, i can't think of a reason we need it	18:05
fungi	but as mentioned, it's fine to keep it	18:05
corvus	i think it was probably just to keep similarity with other hosts, even though nothing should reference it	18:05
fungi	i did some digging in the git histories, and it doesn't seem like there was actually any server before zp01, so my theory that zuul-preview was an older server name is probably wrong	18:06
clarkb	ok if you have a preference to keep or renew let me know and I can update the change	18:07
fungi	i have no preference really, just making sure i understood whether the record was actually used by anything	18:08
fungi	also the extra cname indirection is sort of pointless	18:08
fungi	in fact, even the *.zuul-preview rr doesn't need to be a cname, it could be a/aaaa rrs instead	18:09
fungi	but the cname makes it a little more convenient when we replace the server as its' fewer records to update in the zone	18:10
fungi	it's	18:10
frickler	corvus: regarding zuul processing multiple branch deletions serially: would it make sense to activate tracing while this is happening? maybe too late now but before the next deletions?	18:11
frickler	(we were discussing it in #openstack-infra before)	18:11
frickler	elodilles is currently doing some cleanups	18:11
corvus	frickler: i found and reproduced the bug, so i shouldn't need any more info	18:14
fungi	elodilles has a bunch of outstanding deletes still to apply, so there's an opportunity yet	18:14
fungi	but doesn't sound necessary	18:14
frickler	so how far are we from deploying the fix? does it make sense to delay outstanding deletions to verify it?	18:15
corvus	no fix yet; many hours or maybe tomorrow	18:19
elodilles	actually i can break the script and run again the deletions tomorrow if that makes sense	18:23
elodilles	i mean, continue the branch deletions	18:24
frickler	elodilles: thx, I was just going to ask: how much would it matter to you to delay the deletions?	18:24
elodilles	it shouldn't be a problem	18:24
elodilles	the branches are eol'd already, and the branch deletions are not run instantly anyway, so one extra day shouldn't cause any problem	18:26
fungi	yeah, so you can either continue to trickle them in, or wait until our next rolling scheduler restart once a fix lands, or both	18:26
elodilles	fungi: will 'rolling scheduler restart' happen tomorrow as well, after the fix has landed, or is it something that is scheduled, like, weekly, or so?	18:38
fungi	elodilles: the fix doesn't exist yet, so hard to predict exactly	18:41
fungi	but yes we should in theory be able to restart things once the fix merges	18:41
fungi	now that we have highly-available schedulers, most restarts should be ~zero impact to zuul's operation (except in non-backward-compatible situations with changes to the state data generally)	18:42
elodilles	ok, i understand that. it really shouldn't be a problem to wait a couple of days so that we can test the zuul fix as well. i just wondered if the restart would happen more later, like couple of weeks for example, then it might not worth to wait with the branch deletions	18:46
frickler	relatedly we should also discuss at the meeting about whether and when to do some freeze period over the holidays	18:46
frickler	but I think that wouldn't happen this week, so waiting until tomorrow and then deciding based upen fix progress would be my proposal	18:47
elodilles	frickler: if you say it regarding the branch deletions, then it sounds good to me :)	18:49
frickler	elodilles: yes, pause them until tomorrow and then re-evaluate the status, that's what I meant to say	18:50
elodilles	frickler: ack, thanks, i will do like that :)	18:52
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: [dnm] boot test with centos 9-stream https://review.opendev.org/c/openstack/diskimage-builder/+/821772	19:07
opendevreview	Jeremy Stanley proposed opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144	19:34
opendevreview	Jeremy Stanley proposed opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397	19:34
opendevreview	Jeremy Stanley proposed opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144	19:37
opendevreview	Jeremy Stanley proposed opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397	19:37
clarkb	looks like gerribot restarted about an hour ago on the uid image update. And ^ happened more rencelty so we should be good on that	20:09
opendevreview	Jeremy Stanley proposed opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144	20:15
opendevreview	Jeremy Stanley proposed opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397	20:15
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader https://review.opendev.org/c/openstack/diskimage-builder/+/821772	20:25
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader https://review.opendev.org/c/openstack/diskimage-builder/+/821772	20:44
opendevreview	Merged opendev/system-config master: Block outbound SMTP connections from test jobs https://review.opendev.org/c/opendev/system-config/+/820900	20:46
fungi	interesting, looks like mailman is failing to start in our deploy tests for lists.k.i (but working on lists.o.o): https://zuul.opendev.org/t/openstack/build/5657946352694851926161489bfec28f/log/lists.katacontainers.io/syslog.txt#1521-1525	20:59
fungi	i think it may be due to the lack of a "mailman" meta-list in the config	21:00
fungi	the production server has one	21:01
fungi	so if i add it to the inventory, it'll be a no-op in prod	21:01
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader https://review.opendev.org/c/openstack/diskimage-builder/+/821772	21:04
opendevreview	Jeremy Stanley proposed opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144	21:04
opendevreview	Jeremy Stanley proposed opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397	21:04
opendevreview	Jeremy Stanley proposed opendev/system-config master: Add "mailman" meta-list to lists.katacontainers.io https://review.opendev.org/c/opendev/system-config/+/821775	21:04
ianw	is it just me or is there a lot more "second attempts" in zuul atm?	21:18
fungi	i did see a post_failure on a zuul change moments ago where nodejs ran out of heap memory during yarn build	21:24
fungi	no idea if that's typical	21:24
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader https://review.opendev.org/c/openstack/diskimage-builder/+/821772	21:24
fungi	clarkb: ianw: should i move the new playbook for 821144 into playbooks/zuul/ instead? i noticed we have other playbooks/test-* files and so am unsure if there's a reason to keep them in one vs the other	21:32
fungi	i guess it's a question of whether the playbook is run by the nested ansible as opposed to zuul's ansible?	21:32
opendevreview	Jeremy Stanley proposed opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144	21:35
opendevreview	Jeremy Stanley proposed opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397	21:35
fungi	820397 seems to have fixed the failures on the subsequent changes, at least	21:36
ianw	fungi: i think most of them are in playbooks/test-blah.yaml	21:59
fungi	yes, i concur	21:59
fungi	i found only two playbooks/zuul/test_blah.yaml counterexamples	21:59
corvus	the zuul fix merged; i think this is an excellent candidate for a rolling restart based on our analysis. i'm going to begin that shortly.	22:18
fungi	thanks! i concur	22:21
corvus	ianw: incidentally -- what's the latest on the load balancer prep -- did that change to generalize load balancer configs merge? so are we ready to make a zuul lb based on that?	22:21
fungi	i'll be around for a while yet too	22:21
clarkb	I'm back and around if I can help	22:21
clarkb	corvus: they did merge	22:21
clarkb	corvus: they were in the perido of time where system-config wasn't running so I remember them going in	22:21
corvus	cool, so next step is to make "zuul-lb.opendev.org" in the style of gitea-lb?	22:21
clarkb	fungi: ianw: might be a good idea to move them under zuul/ to avoid confusion but I'm not sure if that affects role lookups and similar	22:21
clarkb	corvus: ya I think so	22:22
ianw	++ afaik we're good to make new lb nodes	22:22
corvus	running the pull playbook now	22:23
corvus	done; that was not a noop	22:26
corvus	i'd like to tempt fate again and hard-stop the schedulers instead of graceful... thoughts?	22:27
corvus	(last time i did that, we found a bug)	22:28
clarkb	oh I think that waws the only way I did it before. I guess it should've been graceful/	22:28
clarkb	the tripleo gate queue isn't short, might be better to try the least impactful thing if we can	22:28
corvus	well, i'm being loose with terminology; by graceful i mean "run 'zuul-scheduler stop' and wait for it to idle before running 'docker-compose down"	22:28
corvus	by hard i mean "run docker-compose down"	22:29
clarkb	got it	22:29
clarkb	I think last time I ran the stop playbook which probably does the down. Oh but we did a full shutdown then and deleted all data so wouldn't have been caught by any issues	22:30
clarkb	ya I guess I don't know how to judge so am indifferent :)	22:30
fungi	i'm fine with either experiment	22:30
fungi	whatever is likely to yield the most useful data/find the most bugs	22:31
corvus	i know i can be around for > 2 hours, which is the longest i would expect a latent issue from a hard-restart to show up, so i like the idea of accepting a little more risk now to try to reduce it later	22:32
clarkb	wfm	22:32
corvus	okay. i'll make sure to save a copy of the queues in case something goes wrong	22:33
corvus	zuul02 is stopped	22:35
corvus	zuul01 still seems happy; i think i stopped zuul02 right as it was about to start processing openstack/check	22:36
corvus	i'll restart zuul02 now	22:37
corvus	and start peeling a mandarin	22:39
clarkb	I might actually have some, but my fingers will get all oily and I don't want that on the keyboard :)	22:39
corvus	ill just throw it in the dishwasher if it's a problem	22:40
corvus	zuul02 is back	22:46
corvus	watching the logs, it's a bit like a car accelerating onto the highway... it handles more and more pipelines until it's fully synced...	22:47
corvus	i'm going to kill zuul01 now	22:47
corvus	starting zuul01	22:48
corvus	in retrospect, i don't think either of those stops were very disruptive. maybe next time i want to chaos monkey i should sigterm	22:49
opendevreview	Clark Boylan proposed opendev/zone-opendev.org master: Try to make zuul-preview records more clear https://review.opendev.org/c/opendev/zone-opendev.org/+/821743	22:49
clarkb	fungi: ^ I went ahead and updated the zone change to remove the unneeded record. This way we don't go through the same q&a in a year :)	22:49
fungi	fair enough	22:50
corvus	i saw a traceback scroll by... but it was just a 5xx from gerrit@google	22:50
clarkb	ianw: if you have time for https://review.opendev.org/q/hashtag:%2522bullseye-image-update%2522+status:open today that would be great. In particular I'm thinking doing limnoria tomorrow unless you want to watch it would be good so that you can help debug should it have a sad (you did the previous fixup fork so have a good grasp of it I think)	22:51
clarkb	Once the zuul updating is done I'll go ahead and approve the accessbot change since that should be super low impact if it breaks	22:52
ianw	clarkb: will do. just trying to think through some bootloader issues with 9-stream but will look in a bit	22:52
clarkb	ianw: ya no rush. I won't approve any you haven't already +2'd until tomorrow relative to me	22:52
corvus	zuul01 is up	22:54
corvus	zuul-web is next	22:55
clarkb	fungi: fwiw your iptables update seems to have hit a lot of servers and it all seems to be working as expected	22:55
corvus	my heart rate increased at the start of that sentence and decreased at the end	22:56
clarkb	thats interesting though it looks like the zookeeper and zuul jobs are running concurrently	22:56
clarkb	corvus: sorry :)	22:56
clarkb	I wonder if the starting the jobs concurrently is an artifact of the zuul rolling restart	22:56
fungi	clarkb: thanks, i was spot-checking too and don't see any unexpected new rules	22:57
corvus	clarkb: what are the job names?	22:57
corvus	(i have no status page)	22:58
clarkb	corvus: infra-prod-service-zookeeper and infra-prod-service-zuul in deploy for change 820900,9	22:58
corvus	2021-12-14 22:54:24,263 ERROR zuul.zk.SemaphoreHandler: Releasing leaked semaphore /zuul/semaphores/openstack/infra-prod-playbook held by a8b0a7c92aa1449b9eade0dbdf7f781e-infra-prod-service-zookeeper	22:59
corvus	that could indicate a problem	22:59
clarkb	in this case it is ok for those to run concurrently so we should be fine this instance	23:00
clarkb	but ya might need to look into that for future rolling restarts if that was the cause	23:00
corvus	i think it was the cause and is a bug	23:01
corvus	we run the semaphore cleanup handler right after startup, and i think we can do that before restoring the pipeline state	23:02
corvus	web is back up; that concludes the rolling restart	23:03
fungi	thanks!	23:04
clarkb	corvus: other than the concurrent builds due to the semaphore release any concerns? or are we looking happy?	23:04
fungi	elodilles: ^ we're all set for more branch deletions the next time you want to try a batch	23:04
corvus	clarkb: so far so good. and that should be a one-time issue; there shouldn't be continuing fallout from the semaphore cleanup.	23:05
corvus	elodillesfungi i'd suggest doing at least 3-4 branches all around the same time if you want to confirm the behavior is fixed (it's possible the first 2 may not merge if it starts processing the first event quickly enough, so i'd make sure to submit a minimum of 3)	23:07
corvus	and of course, if it is fixed as we suspect, the more done at once the better	23:08
corvus	since things are looking food now, i'm going to take a short break and will check back in a bit	23:08
fungi	freudian slip!	23:10
opendevreview	Clark Boylan proposed opendev/system-config master: Add firewall behavior assertions to test_bridge https://review.opendev.org/c/opendev/system-config/+/821780	23:29
corvus	guess it's time to wash my keyboard :)	23:31
clarkb	ha	23:31
opendevreview	Merged opendev/system-config master: Update the accessbot image to bullseye https://review.opendev.org/c/opendev/system-config/+/821328	23:40
clarkb	hrm the testinfra get_host doesn't seem to check the inventory as much as just give you what you want even if it isn't already there	23:51
opendevreview	Clark Boylan proposed opendev/system-config master: Add firewall behavior assertions to test_bridge https://review.opendev.org/c/opendev/system-config/+/821780	23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!