Wednesday, 2023-11-29

opendevreviewMerged opendev/system-config master: [testinfra] Update Ansible canary.
opendevreviewBrian Haley proposed openstack/project-config master: [Neutron-lib] Update Grafana Dashboard
johnsomOk, here is an odd one:
johnsomWorkflowed, gate jobs passed at 9:33 am, but the patch isn't marked as merged02:28
johnsomno cross repo dependencies, no parent patches02:28
johnsom(no un-merged parent patches I should say)02:29
johnsomI'm going to leave it in case you want to look into it. I guess I can try a recheck tomorrow02:30
Clark[m]It's the rebased on behalf of thing tripping a Gerrit bug. Should be fixed after we land the 3.8.3 update change and restart Gerrit03:00
fricklerClark[m]: what's the time frame for that? the current case seems no longer related to a missing email address, so I don't see a workaround like before. and we are talking about an urgent bug fix here, so having it blocked by a gerrit issue is kind of a critical situation04:45
fricklerah, the fix already made it into a new gerrit release and we only need to deploy ?04:51
tkajinam(I assume "the current case" is
fricklertkajinam: yes04:52
tkajinamah, it was already raised by johnsom. sorry I overlooked it04:52
fricklerseeing this issue and also the revert happening for 3.9.0, I'm wondering whether we may have moved too far to the edge with keeping gerrit close to the latest releases04:54
opendevreviewMerged openstack/project-config master: [check-release-approval] Fix distributed leadership handling
telepathynoise**Worldpeace not expected until 6+8+3+6+8+3**09:06
chenz_workgot a failed build here due to a key rotation issue on a node:
chenz_workwhat can be done to fix this?09:39
fricklerchenz_work: this sometimes happens due to issues in one of our cloud providers which are out of our control. however, this should happen seldom enough that when you do a recheck, the error is unlikely to repeat09:49
chenz_workfrickler: thanks, that seems to have worked :-)10:06
Clark[m]frickler: the day the fix that needed backporting was identified I reached out to Gerrit and they confirmed. I pushed a backport and it was approved shortly after. I also have a change up to update our gerrit installation which has been in review but needs a good time for us to restart Gerrit after. Landing this change and restarting gerrit will pull on the identified fix14:24
Clark[m]frickler: I don't think we are flying too close to the sun here. Our testing caught the bug with 3.9.0 immediately after it was released (if you ignore the holiday days I didn't bother to work on). The other issue was latent in Gerrit 3.8 since it released 6 months ago. I doubt waiting another 6 months would help much14:25
Clark[m]As for 901871 there may be another issue I was unable to check logs when it was brought up last night but the reading seemed to mimic the other identified problem so I associated it with that 14:26
fricklerClark[m]: I checked that the gerrit error for 901871 looks the same as earlier, I just don't see where we get the null account from in that case14:34
Clark[m]I'll update the commit message on the 3.8.3 change after the school run this morning and we can plan a time to merge and restart. Seems possible to do that today14:47
fungiyes, i'll be out from ~16:00-17:00 utc but am otherwise available to help with a gerrit restart14:47
Clark[m]I have an appointment at 1900 that will probably last an hour as well14:49
*** blarnath is now known as d34dh0r5315:00
opendevreviewWill Szumski proposed openstack/diskimage-builder master: Adds support for setting build-args when using containerfile
opendevreviewWill Szumski proposed openstack/diskimage-builder master: Adds support for setting build-args when using containerfile
fungiokay, i'm heading out, should be back in roughly an hour15:58
opendevreviewClark Boylan proposed opendev/system-config master: Update Gerrit 3.8 images to 3.8.3
opendevreviewClark Boylan proposed opendev/system-config master: Add gerrit 3.9 image builds
opendevreviewClark Boylan proposed opendev/system-config master: Add gerrit 3.8 to 3.9 upgrade testing
clarkbthat updates the commit message for the 3.8.3 update. frickler does that look better to you?16:12
clarkbgoing back to why I think our current gerrit upgrade cadence is a good thing: We know what it is like to get left behind and stuck on old gerrit releases. We're basically stuck with no bugfixes and no features and an upstream that has no interest in helping us. With us staying about 1 release behind upstream we remain on a supported release, and we can easily have testing set up for16:13
clarkbthe latest release (as we do now). This means we can both contribute our own fixes/needs upstream and get things reviewed by the experts as well as work with them to influence the future direction16:13
clarkbEssentially we're in a good balance point between testing all the new stuff and being too stale to move forward effectively16:14
opendevreviewMerged zuul/zuul-jobs master: tox: Do not concat stdout and stderr in getting siblings
opendevreviewClark Boylan proposed opendev/system-config master: Switch Gerrit replication to using an ed25519 key
clarkbinfra-root ^ that is the gerrit side change for a gitea replication key rotation16:42
clarkbI think it would be good for people to be happy with both changes we can then land the gitea one and ensure it noops. Generate a new key and add it to key then add the key material to host var secrets and land the above change to have gerrit use it16:43
opendevreviewWill Szumski proposed openstack/diskimage-builder master: Adds support for setting build-args when using containerfile
fungiokay, i'm back and not going anywhere else for the day, so available when people want to do gerrit work17:13
clarkbfungi: I think we should go ahead and approve the change as soon as we're happy with it. Then we can do a restart during my afternoon? I don't want to do it before my appointment will feel too rushed17:16
clarkb that change to avoid confusion17:16
fungiwfm, yes17:16
fungii had already +2'd the commit message update, but approving now17:17
opendevreviewMerged opendev/system-config master: Update Gerrit 3.8 images to 3.8.3
clarkbI don't think that actually promoted the image arg17:52
clarkbwe can push a Dockerfile noop update and/or have the file matchers match the zuul.d/docker/gerrit.yaml file17:54
clarkbdo we have a preference? I kinda like the idea of updating the file matchers because I think this isn't the first time this has happened to us17:54
opendevreviewJeremy Stanley proposed opendev/system-config master: Add OpenInfra Asia mailing lists
clarkbfungi: ^ thoughts above since you approved teh change17:57
fungiclarkb: yes, let's fix the matchers17:57
fungithen we don't have to worry about it again17:58
clarkb++ I'll leave the master image build out of that though since master is something that is less important17:59
opendevreviewClark Boylan proposed opendev/system-config master: Trigger gerrit image promotion when the gerrit image jobs update
clarkbI think ^ should do it18:01
fungiisn't zuul supposed to detect changes to job definitions and run them anyway?18:04
fungii guess it's that we're wanting a different job to run than what is changed?18:04
clarkbyes, but as noted in the commit message I think the issue here is the type of pipeline18:04
fungiyeah, okay18:04
clarkbwe're triggering on change update or ref updated and I forget which can't actualyl do the thing18:04
clarkbone can iirc but the other can't and we're using the one that can't18:05
clarkbbut if that isn't the case merging that change should also result in no promotion18:05
clarkband we can revert in that case and I guess go back to doing the noop dockerfile update18:05
fungiyep, that was my thinking as well18:06
fungiwe'll see either way18:06
opendevreviewClark Boylan proposed opendev/system-config master: Trigger gerrit image promotion when the gerrit image jobs update
clarkbfungi: ^ I realized we weren't actually testing the new gerrit only building it18:20
clarkbI think that will fix this problem18:20
clarkbI have to pop out in about 10-15 minutes to get to my appointment. Should be back around 20:00 UTC I hope18:32
fungisounds good, i'll be here18:47
opendevreviewMerged opendev/system-config master: Add OpenInfra Asia mailing lists
fungideploy failed for 902176 and it looks like it's because infra-prod-base failed TASK [base/server : Ensure required build packages for non-wheel architectures] due to an apt update failure on mirror01.regionone.osuosl (unexpected file size for
fungii'm able to apt update on that server cleanly now, so it must have been a transitive issue19:49
tonybfungi: so we'd reenque that job?19:50
fungitonyb: yep, i just ran this on zuul01: `sudo zuul-client enqueue --tenant=openstack --pipeline=deploy --project=opendev/system-config --change=902176,1`19:51
fungimainly because i'm not sure if the hourly deploy will run infra-prod-service-lists319:52
fungithe daily will, but would rather not wait for that19:52
tonybOkay.  For my own education I'll try to find answer to that question.19:53
fungialso, if i'd waited, then reenqueuing that would have to wait for the hourly buildset to finish, since they share a semaphore19:54
fungipretty easy to check19:55
TheJuliaAny chance you guys can hold the next failure of job "ironic-tempest-ipa-partition-uefi-pxe-grub2" for changeset 901182? Specifically we're seeing  boot loader file sizes be setupidly off from what they should be when we try to download them, which is preventing us from validating if httpboot works or not.19:55
tonybAhh that's much easier than my process, but in my own defence I was starting from fundamentals ;P19:56
tonybfungi: So it looks infra-prod-service-lists3 isn't on that pipeline.  Should it be?19:57
fungiTheJulia: done19:57
TheJuliafungi: thanks!19:57
fungifor the record, i ran this on zuul01:19:57
fungisudo zuul-client autohold --tenant=openstack --job=ironic-tempest-ipa-partition-uefi-pxe-grub2 --ref='refs/changes/82/901182/.*' --reason='TheJulia investigating a discrepancy in boot loader file sizes'19:57
tonybfungi: perfect.  You are forgiven for doing it faster than I could figure it out ;P19:58
fungitonyb: no, i think we mainly keep the hourly buildset to infra-prod-base and a handful of other quick jobs. what i meant to say is i was pretty sure the listserv deploy job wasn't included there19:59
fungiso didn't want to wait for the daily deploy20:00
tonybAh okay got it20:00
* tonyb has started a 'cheatsheet' doc with those last 2 scenarios20:04
corvusclarkb: the promote pipeline does support auto job change detection, however the tenant reconfiguration with the change completes before adding the item to the pipeline, so there is no change from the running config (and that strict sequencing is deliberate -- we can now rely on that).20:04
tonybI should turn it into an etherpad or something else shareable20:04
corvustonyb: you can also use the web ui.20:04
opendevreviewBrian Haley proposed openstack/project-config master: [Neutron-tempest-plugin] Update Grafana Dashboard
corvusalso, if someone finishes hooking up keycloak to openstackid or whatever then we could start handing that out to other folks.20:05
tonybcorvus: I could that is true20:05
Clark[m]corvus we promote in the deploy pipeline20:06
Clark[m]Not sure if that changes anything20:06
tonybcorvus: I can do that, next year, at this stage.20:06
*** cloudnull0 is now known as cloudnull20:07
corvustonyb: that's awesome.  that would only be a 25% extension in the project timeline to date.  :)20:08
corvusi mean, it depends on where you start counting.  the server is like 4 years old... the idea is like 8 years old.20:09
tonybYou can't rush a good thing20:10
corvusClark: that makes my head explode but also it's the same.  :)20:10
fungitonyb: if you want to review... not set in stone, just where we were hopefully heading20:10
fungii think i wrote some of that spec while sitting in a teahouse in the middle of the garden at the tokyo summit venue20:11
tonybHow wonderful and terrible at the same time20:12
fungifueled by suntori highball in a can from the 7-eleven20:12
corvusi remember significant discussion at sydney when the monorail existed20:12
fungiwhat's that word? monorail!20:13
tonybThere is a monorail car inside the google office in Sydney20:14
fungii've sold monorails to brockway, ogdenville and north haverbrook20:14
corvusfungi: turns out it was more of a shelbyville thing after all20:15
fungievery visit to seattle gives me flashbacks of that episode20:16
Clark[m]corvus: so if I understand correctly the latest change to explicitly trigger on that file updating won't promote this time either but will for the next time?20:17
corvusClark: i think it should because the actual trigger will take effect by then20:17
corvuser actual files trigger20:17
corvus(it's running now because of auto-config change, and when it merges, the tenant reconfig will be strictly sequenced before anything else relying on that config change, so the files matcher will be in effect before it considers adding that change to any pipelines)20:18
* clarkb is finally back. Took a detour to eat some lunch20:33
clarkblooks like I'm still ahead of the image promotion though so I didn't miss anything20:33
opendevreviewMerged opendev/system-config master: Trigger gerrit image promotion when the gerrit image jobs update
clarkbcool promote did run20:42
clarkbso ya I think we can plan to do a gerrit restart whenever we like at this point. Did someone else want to drive it? THe main thing is moving aside the replication waiting queue between the docker-compose down and docker-compose up -d20:43
fungii'm happy to do it unless tonyb wants a turn20:52
tonybI think I'd rather watch one more to verify I grok the process.20:56
fungino sweat, i got this20:57
clarkbin that case fungi: I think the process is basically docker-compose pull, docker-compose down, mv /whatever/the/path/is /home/gerrit/tmp/someplace/else, docker-compose up -d20:57
fungistep 0: check the image list on the server, but yeah20:57
tonyblet me know when you have a screen session 20:57
clarkbIv'e been moving things aside os that I have data to debug further though at this point that is probably less critical20:58
tonybokay cool20:58
fungii've initiated a root screen session on review.o.o20:58
clarkbI'm attached20:58
fungiwe do have a lot of old images we should probably clean up20:58
* tonyb to20:59
fungiopendevorg/gerrit   3.2       780c9efed97e   2 years ago     793MB20:59
fungiis there a good way to prune these? should i care?20:59
clarkbfungi: there is a way. Most of our services autoprune images but we don't do that for gerrit iirc. But even then autoprune won't prune tagged images so the last tag for each gerrit version lives on21:00
clarkbfungi: what I think we should do is worry less about the 3.2/3.3 etc images and instead manually run a prune command to clear out everything older than say 6 months and get the intermediate images that are left behind21:00
clarkband then later we can come through and manually delete the 3.2/3.3/ etc images21:01
clarkbfungi: its docker image prune iirc21:01
clarkbwe have examples in system-config21:01
fungicool, finding21:02
clarkb`docker image prune -f --filter "until=72h"` is what we do automatically on gitea21:02
fungiyeah, just found that in playbooks/roles/gitea/tasks/main.yaml21:02
fungiwe want longer i guess? 180d?21:02
clarkbyes though the comment in gitea says it doesn't take day filters21:03
clarkbin my command history on review02 I ran `sudo docker image prune --filter "until=2022-05-01T00:00:00" -f` once21:03
clarkbI ran that on 2022-10-13T22:52:5121:04
clarkbso I think I was doing 6 months ish then too21:04
fungidocker image prune -f --filter "until=2023-05-29T00:00:00"21:04
clarkbthat looks about right21:05
fungistatus notice The Gerrit service on will be restarting momentarily for a patch update to address a recently observed regression preventing some changes from merging21:06
fungithat look like an okay announcement?21:07
tonybYup good to me21:07
clarkb/home/gerrit2/review_site/data/replication/ref-updates/waiting <- this is the path to move aside21:07
fungithanks, i almost forgot21:07
fungipulling images now21:07
fungiand done21:08
fungi#status notice The Gerrit service on will be restarting momentarily for a patch update to address a recently observed regression preventing some changes from merging21:08
opendevstatusfungi: sending notice21:08
clarkbyou can mv it to /home/gerrit2/tmp/fungi/waiting_20231129 to follow what I've done when I do it21:08
-opendevstatus- NOTICE: The Gerrit service on will be restarting momentarily for a patch update to address a recently observed regression preventing some changes from merging21:08
clarkbnote it doesn't matter for gerrit but it is better to run the docker-compose commands in the docker-compose yaml dir in case there is an env file21:08
fungiyeah, i normally add an env option21:09
fungihaven't we usually moved that dir somewhere in the gerrit homedir to make it atomic?21:09
clarkbfungi: yes /home/gerrit/2/tmp/fungi is in the same disk21:09
clarkband that tmp path is excluded from backups21:09
fungidone and containers coming back up21:10
fungii initially misread your suggestion as /home/fungi21:11
opendevstatusfungi: finished sending notice21:11
clarkbweb ui loads for me21:11
fungiPowered by Gerrit Code Review (3.8.3-2-gb446549261-dirty)21:11
clarkb shows b44... is the cherry pick of the fix I pushed21:12
clarkbjohnsom: frickler tkajinam I think you can try to merge those chagnes again21:12
fungii was mainly going off the 3.8.3 since that's newer than what we had before21:12
clarkb++ I also wanted to confirm the other bugfix we want was included too21:13
clarkband it appears to be21:13
johnsomOk, the patch I know of is staceyatorres@gmail.com21:13
johnsomSorry, bad paste,
fungioh, we have a test case?21:14
clarkbhrm looks like frickler -W'd it so we need to remove that first21:14
clarkbbut then you can reapprove and see if zuul can merge it 21:14
clarkbjohnsom: are you able to remove the -W as a core reviewer or do we need to do that as gerrit admin?21:14
johnsomIt doesn't look like I can....21:15
johnsomI thought we put ACLs in to allow that. hmmm21:15
fungi grants toggleWipState to designate-core but doesn't allow workflow vote deletion21:17
clarkbI think we need to promote our regular account to admin to do it through the web ui. The only option through ssh is to remove frickler as a reviewer entirely21:18
fungiprobably only admins can delete votes21:18
johnsomYeah, that must be what I was thinking of21:18
clarkbtonyb: do you want to do that?21:18
fungiin time, we should probably scale back the label-Workflow to only 0..+1 and start relying exclusively on wip state21:19
clarkbfungi: I actually prefer the label myself still. It resets on new patchsets is the main reason which maybe we can make a toggle for the other thing21:19
clarkbI'm also no sure it was helpful to set wip on this cahnge. WIP is when you don't want something to merge but we want the opposite here :)21:21
fungialternatively, a core reviewer could rebase the change and then reapprove it, but that will take longer courtesy of the clean-check policy for the openstack tenant21:22
clarkband it may dirty our test case21:22
clarkbwe know it can merge  in the current state but a rebase by one of us may make it mergeable udner the old code21:22
clarkbanyway fungi tonyb or me who wants to become admin to remove the -W?21:23
* tonyb has been looking up how to do that21:23
clarkbtonyb: you need to use your ssh admin account to add your regular account to the Administrators group so that you can use the web ui to click the (x) next to the vote21:24
clarkbtonyb: so basically the process is `ssh -p29418 gerrit set-members Administrators --add yourreguarly account` refresh the gerrit web page for that change, click the button to remove the vote then run set-members --remove instead of --add21:25
tonybAh okay.21:26
clarkbyou can also do the gerrit ssh command `gerrit ls-members Administrators` to see current membership21:26
opendevreviewBrian Rosmaita proposed openstack/project-config master: Implement openstack-unmaintained-core group
tonybOkay frickler's -W has been removed21:29
clarkbjohnsom: if you remove your +W and then add it back again that should trigger zuul to load it straight into the gate21:29
clarkband then we awit for jobs to complete and hopefully succeed and see if zuul can merge the change21:30
tonyband my account is no longer an Admin21:30
clarkbtonyb: the web ui has updated a couple times since I had to do that myself was the (x) fairly obvious and easy to find?21:30
clarkbjohnsom: thanks! I see it in the gate now as expected21:31
tonybclarkb: There was a little 'transcan' icon next to the vote.  I had to do a little clicking to get it to show up but generally it wasn't too puzzling.21:32
clarkbcool good to know it is a trashcan now and not an 'x'21:34
clarkbtonyb: there is probably a way to do it via the rest api as well. But figuring out the values to plugin to get the right vote will probably take much longer than simply using the ui21:38
tonybclarkb: It looks like we could do: ssh -p 29418 "gerrit set-reviewers --project foo/bar --remove $USER_WITH_VOTE 123456" based on our docs?21:41
clarkbtonyb: I think that removes all of that users's votes which is fine for removing a -2 Verified because zuul only votes on verified. But for humans its more impactful?21:41
clarkbiirc the docs are talking specifically about remove a zuul -2 to make something mergeable21:41
clarkbthat would've worked here but then we would lose frickler's +2 code review21:42
tonybAhhh got it21:43
clarkb this change merged after the update. I have no reason to think it is a test case for the issue though. More just "merges generally work"21:48
tonybAny ideas on (Change 901468,6) failure?21:52
fungistill better than "more arbitrary changes are now refusing to merge" i guess21:52
clarkbtonyb: thats weird beacuse we just build gerrit 3.8 like 4 times (we only wanted it to be built once)21:53
clarkbtonyb: it looks like the jvm itself crashed21:55
clarkbin libc21:56
clarkbI do notice that we seem to be building with java 11 but should build with 17 now21:56
clarkbhrm the job does do ensure-java against version 1721:57
clarkb and java 17 is our default java. Maybe ensure-bazelisk is running under java 11 anyway for some reason21:59
clarkbhrm no as far as I can tell bazelisk uses the default java by default. Maybe the build target needs a specific java version and it switches over (this joys of build tools that do whatever they want)22:00
clarkbah yup the command has external/remotejdk11_linux/bin/java in it so I think that is what has happened there.22:00
clarkbI think we might have to treat this as "cosmic rays" type of situation22:01
clarkbin reality it was maybe some sort of race in libc that created a fault? iirc those are becoming popular these days22:01
fungiwow, nice22:05
fungi"Modifications of environment variables are not allowed in multi-threaded programs. -- the glibc manual"22:06
clarkbI'm trying to think what sorts of stuff the jvm will go through libc for. It can't be much22:08
fungi"If you're on Linux, and you're using glibc, you're probably a passenger on this boat. Try not to drill any more holes."22:19
johnsomThe patch merged22:24
clarkbfungi: next on my ops todo list is and that first one I think is safe to merge whenever (it should noop because the key already exists in gitea)22:37
clarkbonce that first one is in I'll generate a new key and do a followup change to set the new pubkey in gitea and add the key to hostvars and we can merge the second if we are happy with it22:37
clarkband that should be all we need to do to rotate keys between gerrit and gitea other than cleaning up the old key22:38
fungiTheJulia: ssh root@
tonybFor creating replacement infra servers it's 0) test new config; 1) add server; 2) add DNS; 3) add to inventory  etc .  I assume cleanup is the reverse remove from inventory, remove from DNS, remove old server #profit22:58
clarkbtonyb: pretty much. The important thing is to remove from inventory first to avoid errosr when we try to renew LE stuff on the server and DNS is gone22:59
clarkbbut I think you can remove dns after you delete the server if that lines up better22:59
fungiorder on cleanup is somewhat loose, but basically remove from dns and inventory (order irrelevant), possibly create an archival image of the server (if there's a chance we might need to recreate it), then delete it in the provider23:00
tonybGot it.23:43
tonybI'll try spinning up new jvb and meetpad servers tomorrow23:45
opendevreviewMerged opendev/system-config master: Add ssh key rotation to gitea ssh key management
opendevreviewBrian Rosmaita proposed openstack/project-config master: Implement openstack-unmaintained-core group

Generated by 2.17.3 by Marius Gedminas - find it at!