Tuesday, 2021-12-14

ianwso weird, because pip verbosity doesn't even seem to affect the verbosity of the uwsgi build bits00:05
clarkbno, but I think it does affect buffering due to python stuff00:07
ianwcould try reducing CPUCOUNT= to see it's some sort of dependency thing ... but no error at all ...00:07
clarkbthats the other weird thing is it says it failed but doesn't say how or why00:09
fungifeels like maybe it's saying why on stderr and pip is bitbucketing that00:10
ianwyou could also try  python uwsgiconfig.py --build directly maybe?00:11
clarkbhrm ya we could try that. Would haev to clone the repo rather than relying on pypi but that seems possible00:11
clarkblet me try that00:11
opendevreviewClark Boylan proposed opendev/system-config master: Try building uWSGI directly  https://review.opendev.org/c/opendev/system-config/+/82163100:23
clarkbthat isn't mergeable beacuse `ython uwsgiconfig.py --build` doesn't produce a wheel. But maybe it will give us insight if we can make it fail00:23
clarkbupdated gerritbot is running now. Anyone have a change to update?00:33
clarkbAll three of the uwsgi bullseye builds when built directly seem fine: https://zuul.opendev.org/t/openstack/build/9c401ac728ed44ab87ce77d368245c6d/log/job-output.txt#1767 https://zuul.opendev.org/t/openstack/build/1ca52f43ac834b5e95d047d91918b1da/log/job-output.txt#1764 https://zuul.opendev.org/t/openstack/build/74c2c9ff1b8948c18d7f5841430a7554/log/job-output.txt#180400:35
ianwthe only other thing i can think is run it under strace with a really big -s value00:36
clarkbI think we pushed an event for magnum. trying to verify with logs now (as I'm not in that channel)00:39
clarkbhrm no these are all comment added events which we don't notify for00:40
clarkbaha it logged we sent something to #tacker00:42
clarkbyup its there. I'll link to it as soon as our htmlification runs00:42
clarkbBut I think gerritbot is good00:42
clarkbhttps://meetings.opendev.org/irclogs/%23tacker/%23tacker.2021-12-14.log.html#t2021-12-14T00:39:41 this was from the new bot00:46
clarkbianw: ya or maybe hold a node like fungi suggests and see if it is consistent on specific nodes (then we can try all manner of debugging)00:47
clarkbBut I'm running out of time today. I'll see if I can pick this up tomorrow00:47
clarkbNeed ot figure out dinner now00:47
*** rlandy|ruck is now known as rlandy|out00:54
ianwok, i'll keep thinking01:01
opendevreviewIan Wienand proposed opendev/infra-specs master: zuul-credentials : new spec  https://review.opendev.org/c/opendev/infra-specs/+/82164503:58
opendevreviewMerged openstack/project-config master: Add openEuler 20.03 LTS SP2 node  https://review.opendev.org/c/openstack/project-config/+/81872304:56
opendevreviewIan Wienand proposed opendev/base-jobs master: Update Fedora latest nodeset to 35  https://review.opendev.org/c/opendev/base-jobs/+/82164905:00
opendevreviewIan Wienand proposed opendev/base-jobs master: Add 8-stream-arm64 and 9-stream nodesets  https://review.opendev.org/c/opendev/base-jobs/+/82165005:00
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Switch 9-stream testing to use opendev mirrors  https://review.opendev.org/c/openstack/diskimage-builder/+/82165105:05
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Add debian-bullseye-arm64 build test  https://review.opendev.org/c/openstack/diskimage-builder/+/82165205:16
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Add debian-bullseye-arm64 build test  https://review.opendev.org/c/openstack/diskimage-builder/+/82165205:24
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Add 9-stream ARM64 testing  https://review.opendev.org/c/openstack/diskimage-builder/+/82165305:24
opendevreviewIan Wienand proposed openstack/diskimage-builder master: debian-minimal: remove old testing targets  https://review.opendev.org/c/openstack/diskimage-builder/+/82165405:24
*** ysandeep|out is now known as ysandeep05:27
opendevreviewchandan kumar proposed openstack/diskimage-builder master: Revert "Fix BLS based bootloader installation"  https://review.opendev.org/c/openstack/diskimage-builder/+/82152606:14
*** sshnaidm|afk is now known as sshnaidm06:57
*** ysandeep is now known as ysandeep|lunch07:26
opendevreviewMerged openstack/diskimage-builder master: Use OpenDev mirrors for 8-stream CI builds  https://review.opendev.org/c/openstack/diskimage-builder/+/82097807:38
*** ysandeep|lunch is now known as ysandeep08:41
*** ysandeep is now known as ysandeep|afk09:35
opendevreviewLajos Katona proposed opendev/elastic-recheck master: Add query for bug 1954663  https://review.opendev.org/c/opendev/elastic-recheck/+/82168409:56
opendevreviewLajos Katona proposed opendev/elastic-recheck master: Add query for bug 1799790  https://review.opendev.org/c/opendev/elastic-recheck/+/82168410:22
*** ysandeep|afk is now known as ysandeep10:32
*** rlandy is now known as rlandy|ruck11:13
*** jpena|off is now known as jpena11:42
dtantsurhey folks! any issues with pypi mirrors? we see a ton of random errors today.12:15
dtantsursee https://review.opendev.org/c/openstack/ironic/+/821010 for example12:16
ykarelseeing a lot of those in neutron too, too many red in https://zuul.opendev.org/t/openstack/status, seems only some providers are impacted12:19
fungidtantsur: the first one i looked at seems to be complaining about a dependency conflict between openstackdocstheme and constraints over dulwich, are they all like that?12:19
dtantsurdifferent packages12:19
fungiykarel: providers in/around montreal canada again?12:19
ykarelfungi, yes atleast i noticed in those12:20
fungiiweb mtl01, vexxhost ya-cmq-1, and ovh bhs1 are all in that area12:20
*** ysandeep is now known as ysandeep|brb12:21
fungier, vexxhost ca-ymq-112:21
fungii'm still pre-coffee12:21
ykarelalso seen in rax-iad12:21
fungidefinitely not that region, that's virginia/washington dc12:22
fungiso whatever's going on with pypi is probably more global12:22
*** outbrito_ is now known as outbrito12:22
fungiis it only pypi-related errors, or problems with other content too?12:23
ykareli noticed only pypi till now12:23
fungipypi's just a caching proxy in our case, so it looks like pypi is probably being serving us stale or incomplete indices again12:24
fungiif we can figure out which specific package(s) is/are impacted, we can issue requests to their cdn to refetch from pypi's backend12:25
fungidulwich===0.20.26 seems to probably be one12:26
dtantsurkeystone 20.1.0.dev19 depends on PyJWT>=1.6.1  The user requested (constraint) pyjwt===2.3.012:27
fungiyeah, it's usually whatever constraint it's complaining about that it couldn't find in those cases12:28
dtantsurironic 19.0.1.dev7 depends on pecan!=1.0.2, !=1.0.3, !=1.0.4, !=1.2 and >=1.0.0  The user requested (constraint) pecan===1.4.112:28
dtantsurfungi: looks like dulwich, pyjwt and pecan in our case12:29
ykarelyeap pecan/dulwich/pyjwt12:29
ykareland providers: rax-iad, iweb-mtl01, rax-dfw, 12:31
fungithanks, those are the only packages i've seen in the errors so far as well. i'll dig up my notes on how to ask fastly to refresh those indices12:31
ykareland ovh-bhs112:32
fricklerfungi: curl -XPURGE https://pypi.org/simple , just did that12:34
fungii've done like `curl -XPURGE https://pypi.org/simple/dulwich` with and without a trailing / for each of the identified package names12:35
fungifrom each of the mirrors, in case it matters which endpoint cluster they're sending it to12:36
ykarelalso seen few failures for python-ironic-inspector-client===4.7.0 in provider airship-kna112:37
jrosserwe see the same keystone/pyjwt problem in some OSA jobs12:39
fungii've now done it for python-ironic-inspector-client as well12:39
ykarelack Thanks fungi12:45
fungihere's hoping it helps. if some fastly endpoints simply refreshes from the same stale backend again, then we're not any better off12:46
ykarelack lets see how it goes12:47
*** ysandeep|brb is now known as ysandeep12:48
opendevreviewyatin proposed openstack/project-config master: Update Neutron's Grafana as per recent changes  https://review.opendev.org/c/openstack/project-config/+/82170613:30
jrosserclarkb: fungi i may have reproduced the uwsgi build failure https://paste.opendev.org/show/811652/13:43
jrosserhacking the code a bit to import builtins and switching __builtins__.compile for builtins.compile makes it work13:44
jrosserbut that is now the limit of my python understanding13:44
fungioh weird!13:54
fungiif it's that, i wonder why pip is eating the error details13:55
opendevreviewMerged openstack/project-config master: Update Neutron's Grafana as per recent changes  https://review.opendev.org/c/openstack/project-config/+/82170613:56
fungijrosser: and also i wonder why it only fails for us sometimes13:57
jrosserfungi: i'm not sure what is going on tbh - if your build is run through a script or something and stderr gets lost?13:58
fungiit's being built by pip which is downloading the sdist and installing it13:58
jrosserso locally, when i build with the makefile it's completey fine13:58
jrosserbut if i `pip3 wheel .` in the same directory it looks like it fails exactly at the point you saw yesterday13:59
fungiyeah, that seems like ore than mere coincidence, i agre13:59
jrosserand for $reasons, messing with how it finds __builtin__.compile fixes it14:00
jrosserreason i had a dig was that we build uwsgi on every OSA bullseye job and never see anything like this14:00
jrosserfungi: with CPUCOUNT=1 the output is not confused with threading, so you can see exactly where it fails https://paste.opendev.org/show/811661/14:03
fungiit came up for us when switching from debian buster to bullseye based python container images14:03
jrosseryeah, and i think it's when it enters plugins/python that errors14:04
jrosserwhich may point to python version14:04
fungiinterestingly, we used python3.7 built on both buster and bullseye in this case14:05
fungiswitching from the buster 3.7 to bullseye 3.7 images is when we started to run into it14:06
fungibut yeah, i have a feeling it's something like a race related to concurrency because whether or not we hit it seems to be influenced by simple things like increasing verbosity14:07
fungiclassic heisenbug14:07
fungiup the logging so you can observe, and you influence the outcome so it stops breaking14:07
fungii'm not finding any examples like yours via a web search, so probably not common14:10
fungitheir issue tracker is littered with people reporting linker errors on macos14:12
jrosserno, i also had a search and didnt find anything14:13
*** ysandeep is now known as ysandeep|out14:13
jrosserthere must be a detail difference between import builtins and __builtins__ in the context of the pip build14:13
fungijrosser: maybe https://github.com/unbit/uwsgi/pull/2373 is a clue?14:14
jrosseri've applied that here and there is no difference14:16
jrosseri was really surprised they've built their own parallel build system out of python though14:16
fungithe comment in https://github.com/agdsn/pycroft/pull/508 does also mention bullseye14:18
fungijrosser: https://github.com/unbit/uwsgi/pull/236214:19
fungithough that's with 3.1014:20
fungiweb search engines do a poor job of indexing github comments, or so it seems14:21
jrosserthat has the same effect as switching to builtins.compile14:22
jrosseri.e its no longer throwing a error14:23
fungimore just pointing out that it seems to mention the same exception you got14:23
fungiand that someone was seeing it at least as far back as 2021-11-0214:24
jrossercould adjust this patch to do the direct build with `pip3 wheel` instead of calling the build script directly14:24
opendevreviewJeremy Stanley proposed opendev/system-config master: Try building uWSGI directly  https://review.opendev.org/c/opendev/system-config/+/82163114:34
fungijrosser: clarkb: ^ like that?14:34
jrosseryes - hopefully that will behave similarly to what i see14:35
noonedeadpunkcan I ask infra-root to abandon patches for retired repos? like https://review.opendev.org/c/openstack/openstack-ansible-pip_install/+/720133 and https://review.opendev.org/c/openstack/openstack-ansible-os_almanach/+/658585 ?15:30
funginoonedeadpunk: tc members should be able to abandon patches on retired repos15:31
noonedeadpunkok, gotcha15:31
fungithe openstack retirement acl grants them rights to make changes to the repos for such purposes15:31
fungii would, but i'm in the middle of several things already15:32
fungiand this is one of the reasons the tc has special acl access over retired repos in the openstack/ namespace15:32
clarkbjrosser: fungi: thank you for the help debugging that. I've just returned from a number of early morning errands and it decided to snow just to make things more difficult :)16:22
clarkbcatching up now16:22
jrossero/ hello16:23
clarkbfungi: jrosser: so one thing thatmakes this extra weird is we are trying to rely on our "assemble" script to do bindep and make wheels for us16:25
clarkbrunning pip3 wheel doesn't quite work because you also need to install all the deps and their wheels16:26
clarkbconsidering that upping the verbosity works and we've got a hint as to what is happening maybe we keep the verbosity and link to https://github.com/unbit/uwsgi/pull/2362 ? As for why older python exhibits this I bet you python backported whatever caused that and since we get up to date python we see it16:27
clarkblet me know what yall think is reasonable and I'll try ot update changes to accomodate16:28
clarkbI've approved https://review.opendev.org/c/opendev/gerritbot/+/818494 and will monitor that as it goes in16:29
jrosserinstinct says that you are seeing a failure due to https://github.com/unbit/uwsgi/pull/2362 even though the stderr has gone missing16:29
jrosseras it stops in exactly the same point as mine did16:30
clarkbjrosser: ya I wouldn't be surprised16:30
clarkband strongly suspect python backported whatever change did that in 3.10 on our images16:30
jrosserfwiw i had 3.9.2-3 on a bullseye vm16:32
clarkbMy thought is to link to that pull request and stick with the verbose flag for now. Or just stick the pull request in there as a note for why we don't have bullseye yet. Except we thought we were already on bullseye with those images so I think hacking it to work is probably best16:33
fungiyeah, i agree it's quite likely something happening with more recent point releases of python interpreters of varying minor revs16:37
opendevreviewMerged opendev/gerritbot master: Update the docker image to run as uid 11000  https://review.opendev.org/c/opendev/gerritbot/+/81849416:37
clarkbfungi: do you think that is a reasonable compromize to just stick with the verbosity for now and land the update?16:39
clarkblodgeit in particular thought it was already on bullseye but since our uwsgi image is publishing bullseye with buster contents that isn't true. And this change will fix that16:39
fungiclarkb: yeah, that seems fine to me. if the problem begins to crop up for us again we have more to go on and hopefully more detail captured in the build log16:40
clarkbcool I'll make that udpate as soon as I've eaten something16:40
*** marios is now known as marios|out16:46
jrosserfungi: clarkb this does seem to be a bit self-inflicted by uwsgi, they've re-used a builtin function name `compile` and then had to reference the actual builtin version explicitly16:54
jrosserand renaming the function away from the builtin also seems to resolve this trouble https://paste.opendev.org/show/811669/16:55
fungiyeah, rolling their own parallel build system, as you observed, is a special kind of nih as well17:03
jrossermaybe i make a PR for this as it's really odd what they've done17:04
opendevreviewClark Boylan proposed opendev/system-config master: Properly build bullseye uwsgi-base docker images  https://review.opendev.org/c/opendev/system-config/+/82133917:06
clarkbjrosser: ++17:06
clarkbalso ^ there is the verbosity hack with appropriate details17:06
fungiclarkb: your commit message includes a reminder to check with vexxhost, should we get noonedeadpunk to confirm it's fine?17:08
* noonedeadpunk not working for vexxhost for quite a while now17:09
clarkbfungi: not a bad idea. I'm not sure if they are using this image beyond lodgeit though. If it is just lodgeit then we should be able to confirm it works17:09
clarkbhttps://review.opendev.org/c/opendev/lodgeit/+/821340 via recheck on that running some testing17:09
funginoonedeadpunk: what i meant was i wondered if it was really a reminder to check with you17:09
fungino idea if it was actually vexxhost using those lodgeit images17:10
clarkbfungi: well its for whoever at vexxhost is still using that image if at all17:10
clarkbmnaser: are you using opendevorg/uwsgi-base docker images for anything? I think you proposed the image initially. We discovered that our bullseye images are actually buster images and https://review.opendev.org/c/opendev/system-config/+/821339 corrects this17:10
clarkbWanted to warn you if you are using them as this shift could be surprising depending on how you use it17:10
noonedeadpunkfungi: yeah they used images for lodgeit one day. no idea if they are now.17:13
clarkbfungi: is '*.foobar CNAME foobar' and 'foobar CNAME foobar01' a valid DNS configuration?17:19
clarkbI guess we have CI for that so I can just push up the change I'm thinking of17:19
fungiyeah, that should be fine17:21
fungiit was traditionally considered poor form to point a cname to another cname (or an mx to a cname) simply because it results in more recursion to get to the intended address(es), but these days that's usually not the case because modern nameservers are smart enough to return related records when queried so that you don't have to ask again17:22
fungiso when you ask for baz.foobar the response from the resolver is going to have not only the cname to foobar but also the cname from foobar to foobar01 and the address records for foobar01 if it has them17:23
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Try to make zuul-preview records more clear  https://review.opendev.org/c/opendev/zone-opendev.org/+/82174317:23
clarkbfungi: ^ the context is zuul-preview and me getting all confused trying to figure out what the actual host to ssh into was17:24
clarkbthis was when I was auditing buster to bullseye image update needs17:24
clarkbI had our inventory and was checkign things in inventory but zp01 in our inventory wasn't in dns :/17:25
clarkbfungi: I responded to your question at https://review.opendev.org/c/opendev/zone-opendev.org/+/82174317:36
*** jpena is now known as jpena|off17:37
fungiclarkb: maybe i wasn't clear with my question... what uses the zuul-preview.opendev.org name? anything? i know what we're using the *.zuul-preview.opendev.org names for17:40
clarkboh I have no idea. But that is all that was in DNS so I assume something17:41
fungii suspect the original server was named zuul-preview and we didn't reevaluate the need for that record when we replaced it with zp0117:41
fungidoesn't hurt to keep the old name around, i guess, i was just pointing out that it's probably cruft17:42
clarkbhrm ya maybe check with corvus and mordred and we can shift the *.zuul-preview CNAME to zp01 instead of zuul-preview17:43
clarkbmordred: corvus ^ does anything use the zuul-preview.opendev.org name? or should it just be zp01.opendev.org?17:43
clarkbwow pytest loads configs out of tox.ini for massive confusion17:44
fungiyes, i found that amazing17:45
fungigranted, flake8 does as well17:45
clarkbflake8 only does it from its flake8 section though right?17:45
clarkbat least it is somewhat explicit in that case17:45
fungithe pytest solution is all extra sorts of nuts17:48
corvusclarkb: i think the magic proxy is designed to use zuul-preview, but i'm not 100% sure18:02
clarkbcorvus: ya I think fungi's question is if we only need the *.zuul-preview.opendev.org for the proxy18:03
clarkbbtu we can be safe and leave both records in place18:03
corvusooh... erm... yeah i'd guess we can remove it18:05
fungiright, trying to determine if the bare name (not the subdomain records under it) is cruft18:05
corvusstill not 100% on that, but i agree, i can't think of a reason we need it18:05
fungibut as mentioned, it's fine to keep it18:05
corvusi think it was probably just to keep similarity with other hosts, even though nothing should reference it18:05
fungii did some digging in the git histories, and it doesn't seem like there was actually any server before zp01, so my theory that zuul-preview was an older server name is probably wrong18:06
clarkbok if you have a preference to keep or renew let me know and I can update the change18:07
fungii have no preference really, just making sure i understood whether the record was actually used by anything18:08
fungialso the extra cname indirection is sort of pointless18:08
fungiin fact, even the *.zuul-preview rr doesn't need to be a cname, it could be a/aaaa rrs instead18:09
fungibut the cname makes it a little more convenient when we replace the server as its' fewer records to update in the zone18:10
fricklercorvus: regarding zuul processing multiple branch deletions serially: would it make sense to activate tracing while this is happening? maybe too late now but before the next deletions?18:11
frickler(we were discussing it in #openstack-infra before)18:11
fricklerelodilles is currently doing some cleanups18:11
corvusfrickler: i found and reproduced the bug, so i shouldn't need any more info18:14
fungielodilles has a bunch of outstanding deletes still to apply, so there's an opportunity yet18:14
fungibut doesn't sound necessary18:14
fricklerso how far are we from deploying the fix? does it make sense to delay outstanding deletions to verify it?18:15
corvusno fix yet; many hours or maybe tomorrow18:19
elodillesactually i can break the script and run again the deletions tomorrow if that makes sense18:23
elodillesi mean, continue the branch deletions18:24
fricklerelodilles: thx, I was just going to ask: how much would it matter to you to delay the deletions?18:24
elodillesit shouldn't be a problem18:24
elodillesthe branches are eol'd already, and the branch deletions are not run instantly anyway, so one extra day shouldn't cause any problem18:26
fungiyeah, so you can either continue to trickle them in, or wait until our next rolling scheduler restart once a fix lands, or both18:26
elodillesfungi: will 'rolling scheduler restart' happen tomorrow as well, after the fix has landed, or is it something that is scheduled, like, weekly, or so?18:38
fungielodilles: the fix doesn't exist yet, so hard to predict exactly18:41
fungibut yes we should in theory be able to restart things once the fix merges18:41
funginow that we have highly-available schedulers, most restarts should be ~zero impact to zuul's operation (except in non-backward-compatible situations with changes to the state data generally)18:42
elodillesok, i understand that. it really shouldn't be a problem to wait a couple of days so that we can test the zuul fix as well. i just wondered if the restart would happen more later, like couple of weeks for example, then it might not worth to wait with the branch deletions18:46
fricklerrelatedly we should also discuss at the meeting about whether and when to do some freeze period over the holidays18:46
fricklerbut I think that wouldn't happen this week, so waiting until tomorrow and then deciding based upen fix progress would be my proposal18:47
elodillesfrickler: if you say it regarding the branch deletions, then it sounds good to me :)18:49
fricklerelodilles: yes, pause them until tomorrow and then re-evaluate the status, that's what I meant to say18:50
elodillesfrickler: ack, thanks, i will do like that :)18:52
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] boot test with centos 9-stream  https://review.opendev.org/c/openstack/diskimage-builder/+/82177219:07
opendevreviewJeremy Stanley proposed opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114419:34
opendevreviewJeremy Stanley proposed opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039719:34
opendevreviewJeremy Stanley proposed opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114419:37
opendevreviewJeremy Stanley proposed opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039719:37
clarkblooks like gerribot restarted about an hour ago on the uid image update. And ^ happened more rencelty so we should be good on that20:09
opendevreviewJeremy Stanley proposed opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114420:15
opendevreviewJeremy Stanley proposed opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039720:15
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader  https://review.opendev.org/c/openstack/diskimage-builder/+/82177220:25
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader  https://review.opendev.org/c/openstack/diskimage-builder/+/82177220:44
opendevreviewMerged opendev/system-config master: Block outbound SMTP connections from test jobs  https://review.opendev.org/c/opendev/system-config/+/82090020:46
fungiinteresting, looks like mailman is failing to start in our deploy tests for lists.k.i (but working on lists.o.o): https://zuul.opendev.org/t/openstack/build/5657946352694851926161489bfec28f/log/lists.katacontainers.io/syslog.txt#1521-152520:59
fungii think it may be due to the lack of a "mailman" meta-list in the config21:00
fungithe production server has one21:01
fungiso if i add it to the inventory, it'll be a no-op in prod21:01
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader  https://review.opendev.org/c/openstack/diskimage-builder/+/82177221:04
opendevreviewJeremy Stanley proposed opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114421:04
opendevreviewJeremy Stanley proposed opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039721:04
opendevreviewJeremy Stanley proposed opendev/system-config master: Add "mailman" meta-list to lists.katacontainers.io  https://review.opendev.org/c/opendev/system-config/+/82177521:04
ianwis it just me or is there a lot more "second attempts" in zuul atm?21:18
fungii did see a post_failure on a zuul change moments ago where nodejs ran out of heap memory during yarn build21:24
fungino idea if that's typical21:24
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader  https://review.opendev.org/c/openstack/diskimage-builder/+/82177221:24
fungiclarkb: ianw: should i move the new playbook for 821144 into playbooks/zuul/ instead? i noticed we have other playbooks/test-* files and so am unsure if there's a reason to keep them in one vs the other21:32
fungii guess it's a question of whether the playbook is run by the nested ansible as opposed to zuul's ansible?21:32
opendevreviewJeremy Stanley proposed opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114421:35
opendevreviewJeremy Stanley proposed opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039721:35
fungi820397 seems to have fixed the failures on the subsequent changes, at least21:36
ianwfungi: i think most of them are in playbooks/test-blah.yaml21:59
fungiyes, i concur21:59
fungii found only two playbooks/zuul/test_blah.yaml counterexamples21:59
corvusthe zuul fix merged; i think this is an excellent candidate for a rolling restart based on our analysis.  i'm going to begin that shortly.22:18
fungithanks! i concur22:21
corvusianw: incidentally -- what's the latest on the load balancer prep -- did that change to generalize load balancer configs merge?  so are we ready to make a zuul lb based on that?22:21
fungii'll be around for a while yet too22:21
clarkbI'm back and around if I can help22:21
clarkbcorvus: they did merge22:21
clarkbcorvus: they were in the perido of time where system-config wasn't running so I remember them going in22:21
corvuscool, so next step is to make "zuul-lb.opendev.org" in the style of gitea-lb?22:21
clarkbfungi: ianw: might be a good idea to move them under zuul/ to avoid confusion but I'm not sure if that affects role lookups and similar22:21
clarkbcorvus: ya I think so22:22
ianw++ afaik we're good to make new lb nodes22:22
corvusrunning the pull playbook now22:23
corvusdone; that was not a noop22:26
corvusi'd like to tempt fate again and hard-stop the schedulers instead of graceful... thoughts?22:27
corvus(last time i did that, we found a bug)22:28
clarkboh I think that waws the only way I did it before. I guess it should've been graceful/22:28
clarkbthe tripleo gate queue isn't short, might be better to try the least impactful thing if we can22:28
corvuswell, i'm being loose with terminology; by graceful i mean "run 'zuul-scheduler stop' and wait for it to idle before running 'docker-compose down"22:28
corvusby hard i mean "run docker-compose down"22:29
clarkbgot it22:29
clarkbI think last time I ran the stop playbook which probably does the down. Oh but we did a full shutdown then and deleted all data so wouldn't have been caught by any issues22:30
clarkbya I guess I don't know how to judge so am indifferent :)22:30
fungii'm fine with either experiment22:30
fungiwhatever is likely to yield the most useful data/find the most bugs22:31
corvusi know i can be around for > 2 hours,  which is the longest i would expect a latent issue from a hard-restart to show up, so i like the idea of accepting a little more risk now to try to reduce it later22:32
corvusokay.  i'll make sure to save a copy of the queues in case something goes wrong22:33
corvuszuul02 is stopped22:35
corvuszuul01 still seems happy; i think i stopped zuul02 right as it was about to start processing openstack/check22:36
corvusi'll restart zuul02 now22:37
corvusand start peeling a mandarin22:39
clarkbI might actually have some, but my fingers will get all oily and I don't want that on the keyboard :)22:39
corvusill just throw it in the dishwasher if it's a problem22:40
corvuszuul02 is back22:46
corvuswatching the logs, it's a bit like a car accelerating onto the highway... it handles more and more pipelines until it's fully synced...22:47
corvusi'm going to kill zuul01 now22:47
corvusstarting zuul0122:48
corvusin retrospect, i don't think either of those stops were very disruptive.  maybe next time i want to chaos monkey i should sigterm22:49
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Try to make zuul-preview records more clear  https://review.opendev.org/c/opendev/zone-opendev.org/+/82174322:49
clarkbfungi: ^ I went ahead and updated the zone change to remove the unneeded record. This way we don't go through the same q&a in a year :)22:49
fungifair enough22:50
corvusi saw a traceback scroll by... but it was just a 5xx from gerrit@google22:50
clarkbianw: if you have time for https://review.opendev.org/q/hashtag:%2522bullseye-image-update%2522+status:open today that would be great. In particular I'm thinking doing limnoria tomorrow unless you want to watch it would be good so that you can help debug should it have a sad (you did the previous fixup fork so have a good grasp of it I think)22:51
clarkbOnce the zuul updating is done I'll go ahead and approve the accessbot change since that should be super low impact if it breaks22:52
ianwclarkb: will do.  just trying to think through some bootloader issues with 9-stream but will look in a bit22:52
clarkbianw: ya no rush. I won't approve any you haven't already +2'd until tomorrow relative to me22:52
corvuszuul01 is up22:54
corvuszuul-web is next22:55
clarkbfungi: fwiw your iptables update seems to have hit a lot of servers and it all seems to be working as expected22:55
corvusmy heart rate increased at the start of that sentence and decreased at the end22:56
clarkbthats interesting though it looks like the zookeeper and zuul jobs are running concurrently22:56
clarkbcorvus: sorry :)22:56
clarkbI wonder if the starting the jobs concurrently is an artifact of the zuul rolling restart22:56
fungiclarkb: thanks, i was spot-checking too and don't see any unexpected new rules22:57
corvusclarkb: what are the job names?22:57
corvus(i have no status page)22:58
clarkbcorvus: infra-prod-service-zookeeper and infra-prod-service-zuul in deploy for change 820900,922:58
corvus2021-12-14 22:54:24,263 ERROR zuul.zk.SemaphoreHandler: Releasing leaked semaphore /zuul/semaphores/openstack/infra-prod-playbook held by a8b0a7c92aa1449b9eade0dbdf7f781e-infra-prod-service-zookeeper22:59
corvusthat could indicate a problem22:59
clarkbin this case it is ok for those to run concurrently so we should be fine this instance23:00
clarkbbut ya might need to look into that for future rolling restarts if that was the cause23:00
corvusi think it was the cause and is a bug23:01
corvuswe run the semaphore cleanup handler right after startup, and i think we can do that before restoring the pipeline state23:02
corvusweb is back up; that concludes the rolling restart23:03
clarkbcorvus: other than the concurrent builds due to the semaphore release any concerns? or are we looking happy?23:04
fungielodilles: ^ we're all set for more branch deletions the next time you want to try a batch23:04
corvusclarkb: so far so good.  and that should be a one-time issue; there shouldn't be continuing fallout from the semaphore cleanup.23:05
corvuselodillesfungi i'd suggest doing at least 3-4 branches all around the same time if you want to confirm the behavior is fixed (it's possible the first 2 may not merge if it starts processing the first event quickly enough, so i'd make sure to submit a minimum of 3)23:07
corvusand of course, if it is fixed as we suspect, the more done at once the better23:08
corvussince things are looking food now, i'm going to take a short break and will check back in a bit23:08
fungifreudian slip!23:10
opendevreviewClark Boylan proposed opendev/system-config master: Add firewall behavior assertions to test_bridge  https://review.opendev.org/c/opendev/system-config/+/82178023:29
corvusguess it's time to wash my keyboard :)23:31
opendevreviewMerged opendev/system-config master: Update the accessbot image to bullseye  https://review.opendev.org/c/opendev/system-config/+/82132823:40
clarkbhrm the testinfra get_host doesn't seem to check the inventory as much as just give you what you want even if it isn't already there23:51
opendevreviewClark Boylan proposed opendev/system-config master: Add firewall behavior assertions to test_bridge  https://review.opendev.org/c/opendev/system-config/+/82178023:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!