Friday, 2026-01-09

opendevreviewJames E. Blair proposed opendev/base-jobs master: Update OVH log upload creds  https://review.opendev.org/c/opendev/base-jobs/+/97281303:38
opendevreviewMerged opendev/base-jobs master: Update OVH log upload creds  https://review.opendev.org/c/opendev/base-jobs/+/97281303:41
opendevreviewJames E. Blair proposed opendev/base-jobs master: Update Rax and Vexxhost log upload creds  https://review.opendev.org/c/opendev/base-jobs/+/97281403:54
opendevreviewMerged opendev/base-jobs master: Update Rax and Vexxhost log upload creds  https://review.opendev.org/c/opendev/base-jobs/+/97281403:54
opendevreviewJames E. Blair proposed opendev/system-config master: Remove opendaylight Zuul connection  https://review.opendev.org/c/opendev/system-config/+/97281504:00
opendevreviewJames E. Blair proposed opendev/base-jobs master: Fix OVH upload creds  https://review.opendev.org/c/opendev/base-jobs/+/97281604:19
opendevreviewMerged opendev/base-jobs master: Fix OVH upload creds  https://review.opendev.org/c/opendev/base-jobs/+/97281604:19
opendevreviewJames E. Blair proposed opendev/base-jobs master: Update pypi api token  https://review.opendev.org/c/opendev/base-jobs/+/97281704:25
opendevreviewJames E. Blair proposed opendev/base-jobs master: Update keytabs  https://review.opendev.org/c/opendev/base-jobs/+/97281904:38
opendevreviewMerged opendev/base-jobs master: Update pypi api token  https://review.opendev.org/c/opendev/base-jobs/+/97281704:39
opendevreviewMerged opendev/base-jobs master: Update keytabs  https://review.opendev.org/c/opendev/base-jobs/+/97281904:39
opendevreviewJames E. Blair proposed opendev/base-jobs master: Update intermediate registry password  https://review.opendev.org/c/opendev/base-jobs/+/97282104:46
opendevreviewMerged opendev/base-jobs master: Update intermediate registry password  https://review.opendev.org/c/opendev/base-jobs/+/97282104:57
clarkbcorvus: how about this: #status notice Zuul is up and running again and should report back to Gerrit successfully. Changes can also merge if there no other failures, but we expect that publication jobs like docs and tarballs updates will not work currently.04:59
clarkb#status notice Zuul is up and running again and should report back to Gerrit successfully. Changes can also merge if there are no other failures, but we expect that publication jobs like docs and tarballs updates will not work currently.05:00
opendevstatusclarkb: sending notice05:00
-opendevstatus- NOTICE: Zuul is up and running again and should report back to Gerrit successfully. Changes can also merge if there are no other failures, but we expect that publication jobs like docs and tarballs updates will not work currently.05:01
clarkbwe'll be back in the morning to finish the cleanup05:01
*** ykarel_ is now known as ykarel05:36
ykarelHi is the RETRY_LIMIT issue known one? most job failing with it https://zuul.openstack.org/builds05:38
ykarelstarted around ~ 03:30 UTC05:39
ykarellogs not available, but check job console just see05:53
ykarel2026-01-09 05:52:44.031501 | Preparing playbooks05:53
ykarel--- END OF STREAM ---05:53
fricklerykarel: I need to check details, but I expect most of this to be fallout from https://lists.zuul-ci.org/archives/list/zuul-announce@lists.zuul-ci.org/message/2WHXPBPRLFF6ZNSSZ3AKOBBBHMWY4YNR/ . I wouldn't bet on things getting better until much later today07:16
ykarelfrickler, ok thx07:19
* tkajinam was about to report the same problem07:54
tkajinamit's wired that the problem is seen only in devstack jobs so far.07:55
frickleryes, but not all of them. like https://zuul.opendev.org/t/openstack/build/497ed602e852445eb96960aa0494e5a6 worked (grenade on master)08:29
tkajinamI wonder if anyone can send a notification to channels to ask people to avoid recheck ?08:44
opendevreviewDr. Jens Harbott proposed openstack/project-config master: Replace ubuntu fips secret with dummy value  https://review.opendev.org/c/openstack/project-config/+/97282808:52
fricklerI hope ^^ will fix most of the issues08:52
fricklerargl, can't update the secret while others are not yet refreshed. guess I'll need to dummy out all of those to make progress _sigh_08:55
opendevreviewDr. Jens Harbott proposed openstack/project-config master: Replace all secrets with dummy value  https://review.opendev.org/c/openstack/project-config/+/97282809:05
opendevreviewMerged openstack/project-config master: Replace all secrets with dummy value  https://review.opendev.org/c/openstack/project-config/+/97282809:33
fricklerykarel: tkajinam: ^^ that should unblock check jobs. I would suggest to avoid merging stuff currently though, since we can not yet publish anything, neither docs nor code09:38
ykarelthx frickler 09:44
ykarelso next step will be to generate secret, right? who will be taking care of that?09:45
fricklerykarel: the afs secrets need to be done by some infra-root, not sure if I'll get to that today, so likely will be some more hours until others a awake again09:48
ykarelok thx09:48
tkajinamfrickler, thank you !09:55
opendevreviewDr. Jens Harbott proposed openstack/project-config master: Restore AFS secrets  https://review.opendev.org/c/openstack/project-config/+/97284711:38
tkajinamfrickler, just fyi. I suspect that the promote jobs are affected by that change. hopefully that revert fixes it. https://zuul.opendev.org/t/openstack/build/8f27264e80ce451d88e857e308afc88012:23
gthiemon1ehey folks, just in case, I just had a build that is not loaded in zuul: https://zuul.opendev.org/t/openstack/build/dad8442fe2844a6e8ceb0d1b33f7b8f2 ("Something went wrong"), it seems that zuul is looking for json-output.json.gz file, but the file doesn't exist in https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_dad/openstack/dad8442fe2844a6e8ceb0d1b33f7b8f2/12:25
fricklertkajinam: yes, that's working as expected, kind of. the above change should fix that, but I need to wait for reviews. the promote jobs didn't work with the old secrets anymore, either, that's why I suggested not to merge anything for now if possible12:26
opendevreviewDr. Jens Harbott proposed opendev/system-config master: Update zuul ssh keys  https://review.opendev.org/c/opendev/system-config/+/97289714:29
clarkbok bit later of a start than I had hoped for, but here now. Is there anything I should be looking at or help debug before I start working through the todo list?16:17
clarkbfrickler: for https://review.opendev.org/c/openstack/project-config/+/972847/1/zuul.d/secrets.yaml did you generate new keytabs or did you use the one I generated last night (to replace the old value) ? I don't think it matters too much either way I'm more curious16:22
clarkbinfra-root I left a review on 972897 with some notes on sequencing concerns16:27
corvusheh that "zuul-launcher" principal name is really going to confuse us in the future; at least there's a comment.16:28
corvus897 is WIP status16:29
clarkbI also wanted to note that last night my base-test test change recheck to check rax log uploads had one post failure. I didn't end up looking at hte logs but I suspect that rax swift may still have been somewhat flaky at the time? So I think it was a good call to only use ovh for now16:29
clarkbcorvus: frickler ya maybe we can remove the WIP and then just have an understanding amongst infra-root that when we land that one we'll need to pay close attention and manually update authorized_keys on bridge?16:30
clarkbmy comment on that change tried to capture what I think are some of the main concerns16:30
clarkbgthiemon1e: thank you for the report. Log uploads are generally working, and when they fail zuul will typically render a build status page just without logs. So this is probably something other than a typical log upload failure16:31
clarkbgthiemon1e: that said there is a job-output.json file (not job-output.json.gz) and zuul should check both (there are compression differences amongst swift implementations iirc so we do both)16:32
clarkbthe json itself appears to be valid based on my browser being able to render it (but maybe firefox is generous in its interpretation)16:35
clarkbin positive news it looks liek some of these secrets might be able to be cleaned up like openstackzuul_docker_login in openstack/project-config16:37
clarkbbut that is probably the least urgent thing right now16:37
clarkbI can work on updating system-config secrets shortly16:45
timburkehas anyone already reported that github mirroring seems to have broken? https://zuul.opendev.org/t/openstack/builds?job_name=openstack-upload-github-mirror16:49
timburkethere was a stretch where jobs failed to report results, but now they all die like 'Load key "/var/lib/zuul/.../ansible.zqir32ci": error in libcrypto'16:50
clarkbtimburke: both are known problems due to https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/WBBLBI6ZS6FA6Q5ZMH4C2MWPL3WG3H24/ (highly recommend subscribing to that list if you aren't already)16:50
clarkbtimburke: in the case of failing to report results and not having logs we've correct that and things run now. But now projects (like openstack) need to update secrets for things liek github mirroring16:51
timburkegot it, thanks -- i was just working with repos before checking email ;-)16:52
clarkbI think fungi mentioned that he planned to dig into those underlying secrets for openstack today16:55
corvusinfra-root: do we still care about zuul status backups?  i ask because i'm making a change where i need to move them, but wondering if i should just delete them instead.16:58
clarkbcorvus: I can't recall the last time we needed them. Zuul doing rolling upgrades largely negates the need I think16:58
clarkbI would be ok with dropping them16:58
corvus(the thing where we dump the status page json to a timestamped file)16:58
corvusk.  i'll propose that16:59
clarkbcorvus: I'm attempting to use `sudo docker run --rm -i --pull always quay.io/zuul-ci/zuul-client --zuul-url https://zuul.opendev.org encrypt --tenant openstack --project opendev/system-config --secret-name system-config-opendevmirror --field-name password` and it seems like I have to enter ^d twice to get it to write out the secret. Do you know if that is expected? I'm slightly17:04
clarkbworried that it will encode a literal ^d in the secret17:04
corvusclarkb: that is expected17:04
corvusthe docs even say to do that17:04
corvusso if you were typing the password "foobar" in, then you would type "foobar^D^D".  that will encode the exact string "foobar".17:05
clarkbthe help output says to enter ^d but doesn't say it needs to be done twice. I'm more concerned about it needing to go twice. Thanks for confirming. Maybe just need to update the help output17:05
corvusthat's standard shell behavior -- ^D normally only does EOF after newline. ^D^D is needed if there was no newline.17:06
clarkboh! til17:06
clarkbcool and hopefully that command is useful to anyone else trying to encrypt things (you'll need to change the parameters for your project and secret names but should be 80% there)17:06
corvusyeah, that's one of those things no one learns until well after they think they have learned everything about unix.  :)17:07
corvusyou think, "surely i must have hit ^D at the end of a line before" but no, no one does!17:07
corvusyou can even see the behavior with cat (a little better -- because you can see that the first one does a flush and the second one does an eof)17:08
clarkblooks like someone has the credentials file in use/locked. Any chance you can let me know when I can look at it?17:11
corvusoops done.17:12
clarkbthank you17:12
opendevreviewJames E. Blair proposed opendev/system-config master: Move zuul scheduler backups to dedicated dir  https://review.opendev.org/c/opendev/system-config/+/97293017:13
opendevreviewJames E. Blair proposed opendev/system-config master: Remove /var/lib/zuul/times backup exclusion  https://review.opendev.org/c/opendev/system-config/+/97293117:13
opendevreviewJames E. Blair proposed opendev/system-config master: Remove zuul status backups  https://review.opendev.org/c/opendev/system-config/+/97293217:13
opendevreviewJames E. Blair proposed opendev/system-config master: Remove /var/lib/zuul/backup exclusion  https://review.opendev.org/c/opendev/system-config/+/97293317:13
corvusokay that change series addresses the thing that caused us to rotate the zuul secrets.17:13
corvusin the end, technically, we don't actually need to stop running the status backups, so i put that later in the series.  but i still think we should do it, because i think that last change is important too.17:14
corvusbecause if i'm reading this right, we weren't actually backing up our keystore to borg.17:14
opendevreviewClark Boylan proposed opendev/system-config master: Update zuul secrets for docker hub and quay.io  https://review.opendev.org/c/opendev/system-config/+/97293417:22
clarkbok I think that covers system-config. While this is fresh in my mind I'm going to hunt down the other projects we build images with that upload container images17:23
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Update image upload secret  https://review.opendev.org/c/opendev/zuul-providers/+/97293617:30
opendevreviewClark Boylan proposed opendev/lodgeit master: Rotate quay.io upload secret  https://review.opendev.org/c/opendev/lodgeit/+/97293717:30
opendevreviewClark Boylan proposed opendev/gerritbot master: Rotate quay.io upload secret  https://review.opendev.org/c/opendev/gerritbot/+/97294117:33
opendevreviewClark Boylan proposed opendev/statusbot master: Rotate quay.io upload secret  https://review.opendev.org/c/opendev/statusbot/+/97294217:38
opendevreviewClark Boylan proposed opendev/grafyaml master: Rotate quay.io upload secret  https://review.opendev.org/c/opendev/grafyaml/+/97294317:40
opendevreviewMerged openstack/project-config master: Restore AFS secrets  https://review.opendev.org/c/openstack/project-config/+/97284717:44
clarkbClark Boylan proposed openstack/ptgbot master: Rotate quay.io upload secret  https://review.opendev.org/c/openstack/ptgbot/+/972945 <- this was the last one I could find17:44
clarkbas far as getting system-config back up and running it doesn't look like updating secrets really tests much (probably a good thing in this particular case) so we probably want to go with https://review.opendev.org/c/opendev/system-config/+/972934 to update secrets, then frickler's change to reenable ssh access/management, then corvus' cleanups around zuul backups? I'm going to review17:46
clarkbcorvus' changes next17:46
corvusi think i've at least +2d everything i've seen17:48
corvusclarkb: https://review.opendev.org/972936 could use a +317:49
clarkbcorvus: I left a question on https://review.opendev.org/c/opendev/system-config/+/972930 looking at 972936 next17:49
fungitimburke: clarkb: yeah, github replication is next on my list after release signing17:50
clarkb972936 is approved17:50
opendevreviewJames E. Blair proposed opendev/system-config master: Move zuul scheduler backups to dedicated dir  https://review.opendev.org/c/opendev/system-config/+/97293017:50
opendevreviewJames E. Blair proposed opendev/system-config master: Remove /var/lib/zuul/times backup exclusion  https://review.opendev.org/c/opendev/system-config/+/97293117:50
opendevreviewJames E. Blair proposed opendev/system-config master: Remove zuul status backups  https://review.opendev.org/c/opendev/system-config/+/97293217:50
opendevreviewJames E. Blair proposed opendev/system-config master: Remove /var/lib/zuul/backup exclusion  https://review.opendev.org/c/opendev/system-config/+/97293317:50
corvusclarkb: thx, fixed17:51
opendevreviewMerged opendev/zuul-providers master: Update image upload secret  https://review.opendev.org/c/opendev/zuul-providers/+/97293617:51
clarkbcorvus: related to the next change in that stack I wonder if we need to put those exclusion rules in the zuul group vars rather than zuul02 host vars because we have a zuul02 now17:51
clarkboh maybe only zuul02 is backed up externally that might explain it17:53
corvusclarkb: i don't know why they're in the host and not group, but we have them for both hosts and they were different.  they're the same at the end of my stack. we may be able to refactor?17:53
corvusonly zuul02 had the times exclusion17:53
corvusthey both had the backup exclusion17:53
clarkbah. If they end up the same then ya maybe we can simply combine the values into a group vars value at the end of the stack17:54
clarkbyour stack lgtm as is though17:54
opendevreviewJames E. Blair proposed opendev/system-config master: Refactor zuul-scheduler borg backup excludes  https://review.opendev.org/c/opendev/system-config/+/97294617:55
corvusclarkb: ^ cherry on top17:55
clarkbinfra-root I think we're quickly approaching a state where we can/should look at reenabling system-config management. That said it is probably low on the priority list in terms of getting user facing things going. Maybe raise your hand when you think we are ready to keep an eye on that and we'll take it from there? frickler can you drop the work in progress status in the meantime17:57
clarkb(otherwise we may need to push a new change or something to get around that)17:57
timburkefungi, thanks! and yeah, makes sense that "be able to release" should take precedence ;-)17:59
fungiclarkb: i'm around for the rest of the day and happy to help monitor/work through deployment job failures or unecpected gotchas17:59
opendevreviewMerged opendev/system-config master: Update zuul secrets for docker hub and quay.io  https://review.opendev.org/c/opendev/system-config/+/97293418:00
clarkbfungi: ya its mostly that I think there is a bit of a dance involved (see my comment on frickler's change for details) and i want ot amke sure we aren't distracted with the other stuff we're working on when we get there18:01
clarkbgerritbot is failing on a python2 only assert somehow18:01
fungitimburke: appreciated, and yeah we're not even really at the 24 hour mark from when we first learned there was a zuul vulnerability, so the progress is impressing me most of all18:01
clarkbhow did we ever add python3 testing in the first place? I'll just fix it as its a test and I'm not too worried about it18:01
opendevreviewMerged opendev/statusbot master: Rotate quay.io upload secret  https://review.opendev.org/c/opendev/statusbot/+/97294218:02
corvusclarkb: i'm in favor of getting the system-config train out of the station18:04
clarkbcorvus: ack. I think the main thing we need to do is ensure that when we manually add the new authorized_keys file to bridge that the next jobs that run are for the chagne that update the key list and not another change or hourly runs18:06
clarkbI think the best way to do that is probably to approve the change and then see how things shake out in zuul and just add the file right before the jobs will start? If it looks like hourly will run instaed then we can wait?18:06
fungiand frickler created the new authorized_keys file just didn't move it into place, right?18:07
corvushourly runs just finished (failed) so nows a good time18:07
fungimaking sure i didn't misread earlier discussion18:07
opendevreviewClark Boylan proposed opendev/gerritbot master: Rotate quay.io upload secret  https://review.opendev.org/c/opendev/gerritbot/+/97294118:08
clarkbfungi: yes aiui the file is staged but not in use18:09
corvushttps://review.opendev.org/972897 is the change we need to approve?  and it's still wip?18:10
clarkbcorvus: it was just flipped ready for review18:11
fricklersorry was away for a bit, un-wip that now18:11
clarkbI was going to double check key values too as I realized I didn't do that when I +2'd. I only reviewed the structure18:11
clarkbfungi: /home/zuul/.ssh/new_authorized_keys is the staging location18:11
frickler972847 uses the keytab files that were generated earlier18:12
clarkbfrickler: ack thanks18:12
corvusapproved18:19
clarkbI've double checked teh keys in the change and in the staging file on bridge and I think they all lgtm18:20
opendevreviewJeremy Stanley proposed openstack/project-config master: Replace OpenPGP signing subkey  https://review.opendev.org/c/openstack/project-config/+/97296018:30
clarkbfungi: ^ +2 from me but I didn't approve it18:32
clarkbwasn't sure if ytou wanted the iopenstack relase team to look at it first18:32
opendevreviewMerged opendev/lodgeit master: Rotate quay.io upload secret  https://review.opendev.org/c/opendev/lodgeit/+/97293718:32
fungithere's, unfortunately, not much to look at since it's an encrypted blob18:33
clarkbtrue I guess in that case we should probably approve it. Should I do that or do you want to?18:33
fungii filled them in during their weekly meeting earlier today that this was coming, and i'm going to coordinate test releases with them anyway18:33
fungifeel free to approve18:33
clarkbdone18:34
corvusthe system-config change estimates 55m remaining, which means it should merge well after the next hourly run18:34
clarkbperfect18:34
clarkbso in theory after the 1900 ish hourly run we can put the staged authorized_key file in place and then monitor the jobs the run when the change merges18:34
clarkba reminder that it will probably run every single last job because the all.yaml group vars are modified18:35
fungiconvenient, if slow, smnoketest18:38
opendevreviewMerged opendev/gerritbot master: Rotate quay.io upload secret  https://review.opendev.org/c/opendev/gerritbot/+/97294118:44
clarkbinfra-root the number corvus stated earlier is a bit skewed by the opendev-buildset-registry I think. The actual number is actually about 26 minutes from now18:47
clarkbwhich will be a bit closer to the hourly jobs18:47
clarkbbut still ok I think18:47
opendevreviewMerged openstack/project-config master: Replace OpenPGP signing subkey  https://review.opendev.org/c/openstack/project-config/+/97296018:51
clarkbstatusbot, gerritbot, ptgbot, and lodgeit all updated images on quay.io as part of having their secret updated. This is good means the secrets are working. But it also means those will be updated when the fix system-config deployments change lands18:56
clarkbnot a big deal but another behavior to keep an eye out for as we reenable things18:56
clarkbnoen of the system-config hosted images appear to have rebuilt which is nice because I don't want to globally udpated everything all at once18:57
opendevreviewMerged opendev/grafyaml master: Rotate quay.io upload secret  https://review.opendev.org/c/opendev/grafyaml/+/97294318:58
corvusprod hourly has started19:04
clarkbcorvus: I wonder if we should go ahead and deuquee it since its just goign to retry limit everything?19:04
clarkbzuul estimated about 8 minutes to the ssh key fixup change merging19:05
corvusdone19:05
corvusdequeued i mean19:05
clarkbthanks. I'm going to move the staged authorized_keys file into the actual location now19:05
clarkbthat is done19:06
clarkbin theory that means we'll actually try to run things in a few minutes19:09
opendevreviewMerged opendev/system-config master: Update zuul ssh keys  https://review.opendev.org/c/opendev/system-config/+/97289719:13
clarkbhrm there may be another place the key is managed in the boostrap bridge playbook arg19:15
clarkbbut that may just be for CI. So I don't think we need to change anything at the moment. Just monitor and see if thigns work and if not then unravel that more?19:15
clarkboh ya ther eis a comment that says in testing we are called with root_rsa_set19:16
clarkbso hopefully that is testing only19:16
clarkbansible is running on bridge19:16
clarkbbootstrap bridge paused and infra-prod-base is running now19:17
clarkbcorvus: do we want to disable the zuul upgrades and reboots that will start later today?19:18
clarkbmaybe that depends on how this deployment goes19:18
fungii mean, if things are generally working then it should be a non-event19:19
clarkbya19:19
fungii expect to be around all weekend, so if something goes sideways with the zuul upgrade then presumably one of us will notice fairly quickly19:19
clarkbauthorized keys for zuul on bridge updated I think to drop the comments from the old staged file19:19
clarkbthe key values all have the 20260109 comments in them19:19
corvusi'm leaning toward not disabling them19:21
clarkbwe have our first successful job (infra-prod-base) and the other ones look like they have been able to ssh in19:26
clarkbcorvus: ok I think we may have our first problem and it is super minor. I didn't update the mariadb password in the zuul-db group vars only the zuul connection group vars. But that means the docker-compose file wasn't updated and I think everything nooped when ansible ran on zuul-db0119:29
clarkbcorvus: I think I should go ahead and fix that now in the zuul-db group vars unless you have any concerns with that19:29
corvusi may be missing something -- why did it noop instead of revert to old creds?19:31
corvusoh wait19:31
clarkbcorvus: because i changed the password via sql statement in the running db not via the docker-compose file19:31
clarkbso the docker-compose file has stayed the same19:31
corvusah okay.  i think those passwords in compose are only used for bootstrapping19:32
clarkbyes I think that is the case which is why I ended up using sql statement isntead19:32
corvusokay, then i agree this is minor and we should fix it and it should still noop after fixing19:32
clarkbcool I'll update the zuul-db group var now19:32
corvusi mean, it'll change the file but nothing else.  not exactly a noop but almost19:33
clarkbya let me double check the ansible playbook. Maybe it will restart the db if things change?19:33
clarkbbut even then the impact should be short and minor19:33
fungiand avoid a nasty surprise at some point in the future if we ever need to re-bootstrap things19:33
clarkbthe mariadb_run_compose_up variable controls whether docker compose up -d is run and it is set to false by default and only true in testing as far as I can tell19:34
clarkbso yes this should only update the docker-compose.yaml file and then do nothing else.19:35
clarkbI'm updating the group var data now19:35
clarkbthat is done19:39
clarkbzookeeper deployment jobs are running now. Zuul should run shortly19:39
clarkbapparently manage-projects goes first then zuul (makes sense so that zuul doesn't try to load projects that don't exist yet)19:42
clarkbfungi: any chance you might be interested in updating the opendevinfra gpg key for ppa package publishing? I feel like gnupg and I have a fight every time we talk to each other19:45
fungiyeah, i can work on that. i need a break from looking into all our github accounts anyway19:46
clarkbthank you19:47
clarkbmanage-projects was successful and now infra-prod-service-zuul is running19:47
clarkbthis is the last job in the buildset19:47
clarkbhttps://zuul.opendev.org/t/openstack/buildset/d51061e3ce2a4c93ab46f1f07b8dc048 we have success19:56
clarkbI'm going to pause for lunch now, but I suspect we can proceed with corvus's changes to system-config that update some of the zuul stuff now19:56
clarkbif anyone else wants to review those that would be great19:56
corvusw00t.  i am also going to lunch19:56
clarkbalso I see gerritbot reporting changes in other channels after its image update and deployment. statusbot rejoined this channel too19:57
fungii'm coming to you from the future just to let you know that lunch was indeed great, and you should enjoy it to the fullest19:57
corvusfungi: most important question: what was it?  cause i'd love to know that in advance19:58
clarkbI think I've got a turkey sandwich20:00
fungisea scallops on seaweed salad, followed by rockfish sandwich and cole slaw20:00
clarkbbut it hasn't been made yet so that might change. fungi's future lunch sounds better20:00
clarkbcorvus: for https://review.opendev.org/c/opendev/system-config/+/972930 we're going to need to manually restart the schedulers so that they pick up the new bind mounts so that the backup at 00:00 Succeeds. Should I got ahead and approve teh change knowing that is the case?21:01
clarkbin the meantime the opendev prod hourly jobs have just started. they shoudl succeed now21:02
corvusclarkb: sgtm21:04
clarkbdone21:06
clarkbI haven't approved any of the others thinking it may be good to address thsi particular item first but happy for you to just send it and approve the others if you think that is best21:07
clarkbhttps://zuul.opendev.org/t/openstack/buildset/04123668b61f446387bad46fece29bfe hourly builds are successful now too21:09
corvusclarkb: stages sounds like a plan.21:10
clarkbI'm going to recheck my base-test test change to see how rax swift uploads are doing21:25
clarkbcorvus: one thing that occurs to me is we may want to check that each zuul launcher cloud is able to boot nodes. I guess grafana graphs may be good enough for that? Just to ensure we don't have any rotation bugs for a specific cloud or region that is masked by falling back to other clouds21:27
clarkbya looking at grafana I think we're using each cloud region except for rax flex sjc3 which we had intentionally disabled last month ish21:28
corvusshould be the same creds as the other flex21:29
clarkbyup I think it is likely fine and we can debug when we reenable it after they get back to us if anything doesn't work as expected21:30
clarkbtimburke: in addition to the github replication issues I notice that swift's container image publication is failing. I think you control that secret. You'll want to rotate the credential then encrypt the new value and push that to gerrit21:52
clarkbtimburke: basically don't reencrypt the old value21:52
clarkbcorvus: do we need to wait for the opendaylight removal to land before restarting schedulers/ I Guess we might not need to but it would make logs on startup cleaner?22:04
corvusi figure it's fine to restart again later... if they're close, sure wait, otherwise no need i think.22:06
clarkback22:06
clarkblooks like its an extra 15 minutes to wait so not that long22:12
clarkbheh nope it passed tests first so they go in together22:16
opendevreviewMerged opendev/system-config master: Move zuul scheduler backups to dedicated dir  https://review.opendev.org/c/opendev/system-config/+/97293022:21
opendevreviewMerged opendev/system-config master: Remove opendaylight Zuul connection  https://review.opendev.org/c/opendev/system-config/+/97281522:21
clarkbthe second change is deploying now22:31
clarkbcorvus: ok both deployments are done now22:50
clarkbcorvus: I can stop zuul-scheduler on zuul01 then start it then repeat the process for zuul02 after zuul01 is up and running again?22:51
clarkbor do you want to do it?22:51
corvusclarkb: i'll take care of it22:53
clarkbcool let me know if I can help22:53
corvusi'm going to go ahead and approve the rest of the changes22:55
clarkbsounds good22:55
corvusokay, well, maybe just the next 2.  up through removing the backup cron.  i think we could have done those earlier, whoops.22:56
corvuswe want to remove the backup json files between https://review.opendev.org/972932 and https://review.opendev.org/97293322:56
clarkbmakes sense22:57
clarkbthe base-test test change had no problems this time around. I'm going to make a note locally that I should retest it Monday and reenable rax swift log uploads then if it continues to look good. I would do it now but we're still landing fixups and I don't want those going sideways on a swift having problems if we can avoid it22:59
clarkbif it were not Friday I'd probably go for it now but its Friday and ya23:00
corvusboth schedulers are restarted and i have manually removed the keystore backup in /var/lib/zuul from both23:08
clarkbthanks!23:08
timburkeclarkb, thanks for the heads-up on the docker hub job -- i'd assumed that needed rotation, too, given the description but it's good to have the confirmation. might not get to it until after the weekend, though (it's already saturday for the guy with the email for the account)23:25
clarkbtimburke: ack just wanted to call it out since you were asking questions here earlier and I noticed it while looking at other jobs23:26

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!