Wednesday, 2021-07-28

opendevreviewMerged opendev/system-config master: Update docker-compose restart flags
opendevreviewIan Wienand proposed opendev/base-jobs master: Switch Debian Stable to Bullseye
opendevreviewIan Wienand proposed opendev/system-config master: Add Debian Bullseye testing
ianwclarkb: it looks like the playbook import worked, but skipped?01:26
ianwhosts: review in suggests the group matching isn't working as hoped01:28
ianwahh, no actually i think it is working01:39
ianwjust the group rename bit didn't do anything
opendevreviewIan Wienand proposed opendev/system-config master: [for squash] rename testing
opendevreviewTristan Cacqueray proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop
opendevreviewTristan Cacqueray proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop
*** ykarel|away is now known as ykarel04:44
opendevreviewMerged opendev/base-jobs master: Switch fedora-latest to use fedora-34
opendevreviewIan Wienand proposed openstack/project-config master: Remove debian-stretch disk images
opendevreviewIan Wienand proposed openstack/project-config master: Remove Debian stretch image builds
ianwfungi: i don't think we have too far to go with stretch removal; should do it04:53
ianwhowever, i've stacked that ontop of mnaser's (mostly) work to remove fedora-32 as it will conflict.  that has a bit further to go.  we need to pull it out of the stable devstack branches it has got into, and fix up master for f3404:53
*** marios is now known as marios|ruck05:49
*** rpittau|afk is now known as rpittau07:02
*** amoralej|off is now known as amoralej07:07
*** ykarel is now known as ykarel|lunch08:34
*** zbr is now known as Guest254409:09
*** sshnaidm|afk is now known as sshnaidm09:45
*** ykarel|lunch is now known as ykarel10:17
opendevreviewTristan Cacqueray proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop
opendevreviewTristan Cacqueray proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop
*** artom_ is now known as artom12:58
*** amoralej is now known as amoralej|lunch13:16
fungi2021-07-27 20:14:22,121 DEBUG zuul.Pipeline.openstack.gate: [e: 4f75f02688cb4529b4c61e35eb7079d8] Adding node request <NodeRequest 199-0014882562 <NodeSet two-centos-8-nodes [<Node None ('primary',):centos-8-stream>, <Node None ('secondary',):centos-8-stream>]>> for job tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates to item <QueueItem c249608bba964523a9592e8defa507a2 for13:32
fungi<Change 0x7f03fd588eb0 openstack/tripleo-heat-templates 800848,2> in gate>13:32
fungii think it's that one (199-0014882562)13:32
opendevreviewTristan Cacqueray proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop
*** amoralej|lunch is now known as amoralej13:38
fungilooks like it's nl0313:39
fungi2021-07-27 20:21:30,856 DEBUG nodepool.driver.NodeRequestHandler[]: [e: 4f75f02688cb4529b4c61e35eb7079d8] [node_request: 199-0014882562] Declining node request because nodes failed13:39
fungiit seems to have never given up the lock on the node request13:39
fungii'll restart the nodepool-launcher container on nl03 to free that any any other stuck node requests it might be responsible for13:39
fungi#status log Restarted the nodepool-launcher container on in order to free stale node request locks13:40
opendevstatusfungi: finished logging13:41
fungilooks like there were some periodic jobs stuck for ~80 hours which are getting new node assignments now13:42
fungithere's also a stuck build in check for 99 hours i'm hoping this will take care of13:43
fungithat seems to have (finally) gotten it13:57
fungiapparently the node request in question eventually got replaced:14:01
fungi2021-07-28 06:18:54,574 DEBUG zuul.nodepool: [e: 4f75f02688cb4529b4c61e35eb7079d8] Resubmitting lost node request <NodeRequest 199-0014882562 <NodeSet two-centos-8-nodes [<Node None ('primary',):centos-8-stream>, <Node None ('secondary',):centos-8-stream>]>>14:01
fungi2021-07-28 06:18:57,225 DEBUG zuul.nodepool: [e: 4f75f02688cb4529b4c61e35eb7079d8] Updating node request <NodeRequest 199-0014886787 <NodeSet two-centos-8-nodes [<Node None ('primary',):centos-8-stream>, <Node None ('secondary',):centos-8-stream>]>>14:01
fungiand then replaced again:14:02
fungi2021-07-28 11:22:48,893 DEBUG zuul.nodepool: [e: 4f75f02688cb4529b4c61e35eb7079d8] Resubmitting lost node request <NodeRequest 199-0014886787 <NodeSet two-centos-8-nodes [<Node None ('primary',):centos-8-stream>, <Node None ('secondary',):centos-8-stream>]>>14:02
fungi2021-07-28 11:22:48,994 DEBUG zuul.nodepool: [e: 4f75f02688cb4529b4c61e35eb7079d8] Updating node request <NodeRequest 199-0014890476 <NodeSet two-centos-8-nodes [<Node None ('primary',):centos-8-stream>, <Node None ('secondary',):centos-8-stream>]>>14:02
fungibut each one ended up getting stuck in inap-mtl01 on nl0314:04
fungisheer luck i guess?14:04
sshnaidmfungi, hi, can you please delete tag 1.5.0-1 from openstack/ansible-collections-openstack? I pushed it by mistake, need 1.5.1-114:34
fungisshnaidm: "Tags can’t be effectively deleted once pushed, so make absolutely certain they’re correct (ideally by locally testing release artifact generation commands and inspecting the results between the tag and push steps above)."14:36
fungitag deletions don't propagate via pull or remote update14:37
sshnaidmfungi, ack14:38
fungiand even if we pushed a deletion to gerrit, zuul executor and merger caches would continue to have that tag as would users pulling updates to their local clones14:38
fungiso generally better to just assume that tags can't be deleted once pushed, and choose alternate means of correcting the problem14:39
sshnaidmfungi, yeah, no problem, I can handle it14:39
fungithis is one of the reasons the openstack/releases repo exists, to allow for collective review of proposed tags14:40
sshnaidmfungi, one more question, I have a pre-release job in repo - when I pushed the pre-release tag 1.5.0-1 it didn't start. Although acc. to semver it was lower than last tag 1.5.0 - so this is the reason? Or something else prevented it to trigger?14:42
fungiit's the -1 on the end, that's not semver nor pep 440 compliant14:45
*** ykarel is now known as ykarel|away14:45
fungithe regular expressions we match on for release and pre-release pipelines can be found here:
fungirelease: ^refs/tags/[0-9]+(\.[0-9]+)*$14:46
fungipre-release: ^refs/tags/[0-9]+(\.[0-9]+)*(a|b|rc)[0-9]+$14:46
sshnaidmit can include hyphen/dash14:48
sshnaidmwell, anyway, now I see why it's not triggered14:49
fungii should clarify, it's not (semver + pep 440) compliant14:50
fungiwe do have a pipeline called "tag" though which will match on any tag value14:50
sshnaidmfungi, do you have example of pre-release tag which is compliant both with semver and zuul regexp?14:57
sshnaidmbecause I can't find such15:02
fungisshnaidm: sure, shows a and a which are pep 440 "alpha" prerelease versions for 1.27.1 and 1.28.0 respectively15:03
fungiswitch the "a" to "b" for a beta release, or to "rc" for a release candidate15:03
sshnaidmfungi, I mean a pure semver, not pep44015:03
fungii think only actual release versions are going to match in that case, not prereleases15:04
sshnaidmpython -c "import semver;semver.match('', '<1.0.0')"  - ValueError: is not valid SemVer string15:04
sshnaidmfungi, so maybe we can add semver release option to zuul regexp?15:05
fungiwe should probably discuss if the openstack tenant wants to expand the regular expressions for those pipelines15:05
sshnaidmfungi, yeah, would be great, because ansible galaxy for example can proceed only pure semver tags15:06
fungibut also, like i said, there's also the "tag" pipeline which will match on anything, so if your job is able to tell the difference between prereleases and releases (or doesn't need to care) then you can just run the job in the tag pipeline15:06
sshnaidmso we can't publish there pre-releases with zuul15:06
sshnaidmfungi, yeah, that's a good option I think15:07
sshnaidmthough with "release" and "pre-release" it's more beautiful :)15:07
fungisince openstack predominately publishes its software on pypi, it has opted to use strict pep 440 comlpiant versions for its prerelease pipeline15:07
sshnaidmI see15:08
fungisince semver prereleases wouldn't be handled properly by pip and related python packaging ecosystem tooling15:08
sshnaidmso on submitter to push a right tag15:09
sshnaidmnot sure zuul should be involved here..15:09
clarkbyou can push pre release via the tag pipeline15:11
clarkbthe tag pipeline allows you to process any tag15:11
sshnaidmbtw, maybe worth to update docs, they don't have "tag" pipeline:
sshnaidmclarkb, yeah, fungi pointed me to it, seems like a good solution for now15:13
fungisshnaidm: yeah, we probably need to do a better job of moving more of the info about the openstack tenant out of our documentation and to somewhere else like the openstack project teams guide15:13
clarkbysterday we had an alert that the openstackid ssl cert hadn't refreshed but its happy today15:17
clarkbtoday we have an email saying review's cert didn't update in a timely manner. I wonder if that will self correct like openstackid15:17
fungiyeah, saw it as well, another possibility is we're not restarting apache on cert refresh i guess and the apache workers aren't recycling in a timely fashion after graceful15:18
fungiand it's roulette as to whether you get a worker which is clinging to the old cert15:19
clarkbinfra-prod-letsencrypt does claim to have succeeded. I suppose either review02 is in the emergency file and we skipped it or it is the apache worker issue?15:22
* clarkb wanders off to grab keys and look15:22
clarkbreview02 is not in the emergency file and the le cert file has a timestamp from today. I suspect this is apache worker rollover15:24
fungiyeah, there's a bit of a race there since we don't have much of a grace period between our cert refresh period and where we start sending alerts for the cert getting close to expiry15:28
clarkbfungi: I'm still getting the old cert for review though so the race may not be so tight15:31
yoctozeptomorning infra - any idea whether gerrit can be configured to allow anyone (or at least cores) to set hashtags on changes? (currently only the change owner can)15:33
opendevreviewClark Boylan proposed opendev/system-config master: Test the rename_repos playbook
clarkbianw: fungi ^ I squashed ianw's fixup change in to that and also added a little more testing on the gitea side15:34
clarkbyoctozepto: it can be. Zuul and Ironic have done this. I think we might consider allowing it across the server though?15:34
yoctozeptoclarkb: OOH, GREAT! I am asking for Kolla today but it seems useful enough for everyone15:35
clarkbyou can definitely request in individual project acls for now. And we should probably start thinking about adding it to the all projects global acl but that may need some double checking? (I'm not sure where it would go in that acl today)15:35
yoctozeptook, I can handle the kolla part then, thank you very much; I will have a look at ironic15:36
opendevreviewRadosław Piliszek proposed openstack/project-config master: Allow kolla cores to edit kolla hashtags
*** amoralej is now known as amoralej|off15:57
clarkbyuriys: I'm going to find some breakfast, but then I'm pretty much free to do cloud surgery if today is still good for you. Feel free to ping if/when you want to do that16:16
yuriysSounds good!16:17
mnasergood morning infra-root.  i would appreciate a hold on (loci-keystone).  i'm unable to reproduce this failure locally no matter what i try (uwsgi fails to build inside the container for some reason)16:17
mnaserall i see in logs is "[thread 6][x86_64-linux-gnu-gcc -pthread] core/routing.o" that goes to "ERROR: Failed building wheel for uwsgi"16:19
*** rpittau is now known as rpittau|afk16:21
yoctozeptocould someone from infra merge this old fix ~>
yoctozeptothe missing branch name is sad16:27
clarkbmnaser: and you have installed the documented uwsgi build deps?16:32
mnaserclarkb: yeah.. the error seems that it just died midway through on focal only.  i'm trying to reproduce locally from a perfectly _clean_ ubuntu focal image with no deps .. and it works16:32
mnaseri got latest pip in an empty ubuntu focal with just python3-pip installed and `/tmp/test/bin/pip wheel --find-links /source-wheels --no-deps --wheel-dir / -c uwsgi` and it builds fine16:33
clarkbmnaser: the hold has been set16:34
mnaserrecheck'd, see ya in 15 minutes :P16:34
clarkbyuriys: I'm back now. Just ping when you want to dig into this stuff16:37
*** marios|ruck is now known as marios|out16:38
fungii'm around as well, but at some point i'll need to disappear for a bit to pick up cats from the vet (they're in for their 50k mile maintenance)16:41
yuriysGreat. I've already updated all the CTs.16:42
clarkbwoot passes with the extra group verification and gitea side checking16:42
clarkbianw: fungi: ^ I think that is ready for review now16:43
fungitristanC: you may want to take into account in your matrix gerritbot implementation16:45
mnaserinfra-root: is the ip for loci-keystone job that's failing if i can have snuck in? :) -- there's a hold on the system (see above)16:46
tristanCfungi: yes thank you, i'll make sure the event types are similar16:47
clarkbmnaser: one sec16:47
clarkbmnaser: done16:49
mnaserclarkb: thank you!16:51
opendevreviewMerged opendev/gerritbot master: Add branch to all the remaining event messages
fungiand now it's time to go get the cats out of hock, back in 30-ish17:00
mnaserclarkb: and of course.. it passes on loci-keystone on the run that i have a hold in.  can i just recheck and if it fails it'll catch ?17:08
clarkbmnaser: yes, the hold will stay open until a failure on that change for that job occurs17:09
mnaserok great, time to recheck.. i'll drop the other jobs in the change so im not wasting ci resources17:09
clarkbyoctozepto: fungi: fyi a different gerrit user is asking about the unpack errors from clients after they have upgraded on the gerrit mailing list17:12
clarkbI'm updating them with the info we've collected so far17:12
clarkbI'll share a traceback with them too if the thread originator doesn't respond with that info (it was requested)17:13
yoctozeptoclarkb: thanks, good to know17:14
clarkbI ended up responding with a sanitized traceback17:23
clarkbyoctozepto: fungi: that settles that17:52
fungiokay, back now17:55
fungiclarkb: thanks for spotting that report17:55
opendevreviewJeremy Stanley proposed openstack/project-config master: Make Sunny an operator on Kata IRC channels
*** mtreinish_ is now known as mtreinish18:52
clarkbfungi: +2'd ^ if you want to approve. Finishing up the inmotion cloud stuff now19:15
*** prometheanfire is now known as Guest262119:34
corvusi'd like to restart zuul -- i think it's at a good point for a release, and we should have one as a checkpoint19:34
corvususage looks tame, and i don't see any release stuff going on right now19:34
opendevreviewMerged openstack/project-config master: Make Sunny an operator on Kata IRC channels
fungicorvus: sounds fine to me19:36
fungii'll mention it in the #openstack-release channel too19:37
fungii'm helping a user who's waiting for that queued project-config deploy job, but i expect it's still a while behind the prod hourly jobs, and it'll probably still work after getting reenqueued19:41
corvusrestarting now19:42
corvusfungi: and yeah, i saw that and figured it'll re-enqueue ok19:43
fungiit's a good test at least19:43
corvusmay actually run sooner19:43
fungiyeah, if it gets enqueued ahead of the hourly19:44
fungiwhich it may if we reenqueue with the pipelines in alpha order19:44
corvus2021-07-28 19:44:19,633 WARNING zuul.ConfigLoader: Zuul encountered an error while accessing the repo19:45
corvusinaugust/  The error was:19:45
corvus  ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))19:45
fungior display order, whatever that ordering is in the status page (i guess it's the yaml file order)19:45
fungihuh, that was gerrit?19:45
corvusyeah i think so19:45
corvusthat was during opendev tenant configuration19:46
corvusso let's check that out when it's done.  it's still proceeding19:46
corvusmy guess is we get the tenant but without the inaugust jobs?19:46
corvusit's worth noting there is now a considerable amount of network between the zuul scheduler and gerrit19:47
fungii see it hitting the other inaugust repos in gerrit's ssh log, but not that one19:47
fungiand no mention in the error log19:48
corvusso the connection may not have made it to the vm (but hard to say for sure)19:48
corvus#status log restarted all of zuul on commit 8e4af0ce5e708ec6a8a2bf3a421b299f94704a7e19:50
opendevstatuscorvus: finished logging19:50
fungiif it exceeded the 100 simultaneous ssh connections conntrack overflow rule we have in iptables, that would have rejected the connection with icmp-port-unreachable19:51
fungianother possibility is it exceeded the max ssh connections for a single user set in gerrit itself (i think we have that at 64?)19:52
corvusit should only have 1 ssh connection19:52
corvusor maybe 219:52
fungiyeah, so highly unlikely to be either of those causes19:52
corvuswe ended up with 3 tenants19:53
fungishould full-reconfigure pick up the others?19:54
corvusi think so19:54
corvusi want to see if there are error logs for them19:54
corvusoh now there are others19:56
corvusmay have just not finished the initial load19:57
corvusit does look like it didn't load anything from there19:58
corvusi don't see any other connection errors19:58
corvusi'm running full-reconfigure on the scheduler19:59
corvusfungi: the accessbot job is running now20:00
corvusi'll check back in on the opendev tenant when the full-reconfiguration is complete20:01
fungiso maybe it just missed that one repo20:01
corvusit's correct now.  full-reconfig is still proceeding.20:03
fungicloud network blip?20:07
corvusmy guess yes20:07
fungiit is connecting from rackspace to vexxhost since last week20:07
fungiso crossing more of the wild internet20:07
mordredthe wild internet is dark and full of terrors20:13
fungithis land was green and good... until the crystal cracked20:20
corvus... until they made the prequel20:21
* fungi sighs20:22
fungii gave up on it the moment they showed the skeksis council chamber and there weren't even the right number of them20:23
fungii suppose that's what i get for rewatching the movie right before trying to watch the series, the details they got wrong were still very fresh in my mind20:23
mordredI tried watching ... but after the first episode I just didn't care. and I was excited when I hit play20:24
corvuseither they completely misunderstood the central premise of the original, or i did20:24
mordredmaybe they should have explained how the midichlorians related to the crystal20:25
corvuscould not have hurt20:25
opendevreviewYuriy Shyyan proposed opendev/system-config master: 1st commit
clarkbok we got the cloud back and then did some gerrit things :)20:32
fungivery cool!20:32
clarkbI need to eat lunch then I will make sure the mirror is properly patched and rebooted then we can reenable that cloud20:32
clarkbfungi: I can also push up the opendev/project-config change to track the tapaas rename if you want to work on the etherpad?20:33
clarkbyou indicated you'd like to continue with that in the tc channel yesterday so I guess we keep doing that. Not sure hwo much stepping on toes that is if the TC doesn't ack it properly first but I'm willing to pretend that will happen for now :)20:33
fungisure, i was going to do the reverse, but that works too. i should be able to find an old maintenance plan i can copy and hack up to suit20:34
clarkbnow I really need to eat lunch20:34
clarkbI've removed from the emergency.yaml file and did manual updates and rebooted it to ensure it was up to date there20:51
clarkbianw: I think we can remove the WIP From and land that to reenable the region20:52
yuriysis rate a manual calculation you guys do20:52
yuriysor is that like sleeps between requests20:53
clarkbyuriys: no, its sort of hand wavy. Some clouds had strong rate limits that we set to that value, and others we just set something reasonable20:53
clarkbyuriys: ya it basically ensures that you don't make more requests in $period than specified. In many cases I think the rtt on the previous request ends up being long enough that we just proceed with the enxt request without sleeping20:53
clarkbbut some clouds have enforced that pretty strongly so we have the abiltiy to tune it if necessary20:53
yuriysmakes sense, esp on highly trafficked apis20:54
*** timburke_ is now known as timburke20:55
opendevreviewClark Boylan proposed opendev/project-config master: Add tap-as-a-service rename records
clarkbfungi: ^ ok that is the recording change21:06
fungithanks, i'll record it in the plan21:07
clarkbI cut lunch a bit short to make headway on these things so I'm going to go take another break21:07
fungido that21:07
clarkbyuriys: have you followed the operator pain points discussion on the openstack-discuss mailing list?21:15
clarkbyuriys: I'm thinking we should add the struggles with cells and rabbitmq to that21:15
clarkbyuriys: is the document where tehy are capturing stuff. Would you prefer I try to capture it and then you can edit or would it be easier for you to add them? I'm thinking we can add orphaned instances in nova cells when rabbitmq breaks to nova and under kolla we can put something about how restarting rabbitmq isn't reliable?21:16
clarkbok taking that break now. I'll work on capturing that on the etherpad in a bit if you haven't gone ahead and done it (I assume your day is ending soon so I can add those items)21:17
yuriysThat's a good idea, I'll read through it, I have not heard of the pain point doc.21:20
clarkbyuriys: is the main thread, kolla started one too21:27
clarkbalright I left two notes around what we found21:32
yuriysawesome ty21:35
fungiinfra-root: i've begun drafting a maintenance plan for friday at with the skeleton steps taken from our process doc. i've added a few implied steps of my own and also left some commentary. whatever we hack up on this can be applied as an update to our official process21:52
fungii can flesh it out with some more specific commands/prose to cut and paste once the process is firmed up21:55
ianwclarkb: thanks!  i've approved that to reenable region21:58
opendevreviewMerged openstack/project-config master: Revert "nodepool: set inmotion cloud to zero"
clarkbI don't think we should make these changes to the rename playbook but I think we might want to consider adding group reindexes if practice shows this is necessary after a group rename and possibly add for each project we rename too22:08
clarkbI suspect its fine as is and we can manually run those should they be an issue and backfill the playbook after22:08
clarkb(I'd avoid adding more reindexing than necessary hence not adding these proactively)22:09
fungiyes, i concur22:12
opendevreviewMonty Taylor proposed opendev/system-config master: Run matrix-eavesdrop on eavesdrop
opendevreviewMonty Taylor proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop
mordredclarkb, corvus, ianw: I fixed ianw's issues with tristanC 's patch regarding file matchers - and also we'd missed a similar thing in corvus' patch. both of those should be GTG now ^^23:37 thanks!23:37
corvusmordred: both lgtm23:39
mordredalso - fwiw - we had a miss on docker/ircbot in the test job - I included that in fixing the corvus patch23:40
ianwfungi: if you have a chance to look @ that switches the stable jobs to bullseye23:42
ianwas you noted, i think they're lightly used23:43
clarkbcorvus: mordred  I +2'd the eavesdrop change as I had previously given that an in depth review but only +1'd the gerritbot one since I haven't had a chance to really dig into it yet but noted the issues with testing that were called out had been addressed. I'm not approving anything as I'm currently trying ot make some ramen and won't be able to monitor23:52

Generated by 2.17.2 by Marius Gedminas - find it at!