| opendevreview | James E. Blair proposed opendev/base-jobs master: Update OVH log upload creds https://review.opendev.org/c/opendev/base-jobs/+/972813 | 03:38 |
|---|---|---|
| opendevreview | Merged opendev/base-jobs master: Update OVH log upload creds https://review.opendev.org/c/opendev/base-jobs/+/972813 | 03:41 |
| opendevreview | James E. Blair proposed opendev/base-jobs master: Update Rax and Vexxhost log upload creds https://review.opendev.org/c/opendev/base-jobs/+/972814 | 03:54 |
| opendevreview | Merged opendev/base-jobs master: Update Rax and Vexxhost log upload creds https://review.opendev.org/c/opendev/base-jobs/+/972814 | 03:54 |
| opendevreview | James E. Blair proposed opendev/system-config master: Remove opendaylight Zuul connection https://review.opendev.org/c/opendev/system-config/+/972815 | 04:00 |
| opendevreview | James E. Blair proposed opendev/base-jobs master: Fix OVH upload creds https://review.opendev.org/c/opendev/base-jobs/+/972816 | 04:19 |
| opendevreview | Merged opendev/base-jobs master: Fix OVH upload creds https://review.opendev.org/c/opendev/base-jobs/+/972816 | 04:19 |
| opendevreview | James E. Blair proposed opendev/base-jobs master: Update pypi api token https://review.opendev.org/c/opendev/base-jobs/+/972817 | 04:25 |
| opendevreview | James E. Blair proposed opendev/base-jobs master: Update keytabs https://review.opendev.org/c/opendev/base-jobs/+/972819 | 04:38 |
| opendevreview | Merged opendev/base-jobs master: Update pypi api token https://review.opendev.org/c/opendev/base-jobs/+/972817 | 04:39 |
| opendevreview | Merged opendev/base-jobs master: Update keytabs https://review.opendev.org/c/opendev/base-jobs/+/972819 | 04:39 |
| opendevreview | James E. Blair proposed opendev/base-jobs master: Update intermediate registry password https://review.opendev.org/c/opendev/base-jobs/+/972821 | 04:46 |
| opendevreview | Merged opendev/base-jobs master: Update intermediate registry password https://review.opendev.org/c/opendev/base-jobs/+/972821 | 04:57 |
| clarkb | corvus: how about this: #status notice Zuul is up and running again and should report back to Gerrit successfully. Changes can also merge if there no other failures, but we expect that publication jobs like docs and tarballs updates will not work currently. | 04:59 |
| clarkb | #status notice Zuul is up and running again and should report back to Gerrit successfully. Changes can also merge if there are no other failures, but we expect that publication jobs like docs and tarballs updates will not work currently. | 05:00 |
| opendevstatus | clarkb: sending notice | 05:00 |
| -opendevstatus- NOTICE: Zuul is up and running again and should report back to Gerrit successfully. Changes can also merge if there are no other failures, but we expect that publication jobs like docs and tarballs updates will not work currently. | 05:01 | |
| clarkb | we'll be back in the morning to finish the cleanup | 05:01 |
| *** ykarel_ is now known as ykarel | 05:36 | |
| ykarel | Hi is the RETRY_LIMIT issue known one? most job failing with it https://zuul.openstack.org/builds | 05:38 |
| ykarel | started around ~ 03:30 UTC | 05:39 |
| ykarel | logs not available, but check job console just see | 05:53 |
| ykarel | 2026-01-09 05:52:44.031501 | Preparing playbooks | 05:53 |
| ykarel | --- END OF STREAM --- | 05:53 |
| frickler | ykarel: I need to check details, but I expect most of this to be fallout from https://lists.zuul-ci.org/archives/list/zuul-announce@lists.zuul-ci.org/message/2WHXPBPRLFF6ZNSSZ3AKOBBBHMWY4YNR/ . I wouldn't bet on things getting better until much later today | 07:16 |
| ykarel | frickler, ok thx | 07:19 |
| * tkajinam was about to report the same problem | 07:54 | |
| tkajinam | it's wired that the problem is seen only in devstack jobs so far. | 07:55 |
| frickler | yes, but not all of them. like https://zuul.opendev.org/t/openstack/build/497ed602e852445eb96960aa0494e5a6 worked (grenade on master) | 08:29 |
| tkajinam | I wonder if anyone can send a notification to channels to ask people to avoid recheck ? | 08:44 |
| opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Replace ubuntu fips secret with dummy value https://review.opendev.org/c/openstack/project-config/+/972828 | 08:52 |
| frickler | I hope ^^ will fix most of the issues | 08:52 |
| frickler | argl, can't update the secret while others are not yet refreshed. guess I'll need to dummy out all of those to make progress _sigh_ | 08:55 |
| opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Replace all secrets with dummy value https://review.opendev.org/c/openstack/project-config/+/972828 | 09:05 |
| opendevreview | Merged openstack/project-config master: Replace all secrets with dummy value https://review.opendev.org/c/openstack/project-config/+/972828 | 09:33 |
| frickler | ykarel: tkajinam: ^^ that should unblock check jobs. I would suggest to avoid merging stuff currently though, since we can not yet publish anything, neither docs nor code | 09:38 |
| ykarel | thx frickler | 09:44 |
| ykarel | so next step will be to generate secret, right? who will be taking care of that? | 09:45 |
| frickler | ykarel: the afs secrets need to be done by some infra-root, not sure if I'll get to that today, so likely will be some more hours until others a awake again | 09:48 |
| ykarel | ok thx | 09:48 |
| tkajinam | frickler, thank you ! | 09:55 |
| opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Restore AFS secrets https://review.opendev.org/c/openstack/project-config/+/972847 | 11:38 |
| tkajinam | frickler, just fyi. I suspect that the promote jobs are affected by that change. hopefully that revert fixes it. https://zuul.opendev.org/t/openstack/build/8f27264e80ce451d88e857e308afc880 | 12:23 |
| gthiemon1e | hey folks, just in case, I just had a build that is not loaded in zuul: https://zuul.opendev.org/t/openstack/build/dad8442fe2844a6e8ceb0d1b33f7b8f2 ("Something went wrong"), it seems that zuul is looking for json-output.json.gz file, but the file doesn't exist in https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_dad/openstack/dad8442fe2844a6e8ceb0d1b33f7b8f2/ | 12:25 |
| frickler | tkajinam: yes, that's working as expected, kind of. the above change should fix that, but I need to wait for reviews. the promote jobs didn't work with the old secrets anymore, either, that's why I suggested not to merge anything for now if possible | 12:26 |
| opendevreview | Dr. Jens Harbott proposed opendev/system-config master: Update zuul ssh keys https://review.opendev.org/c/opendev/system-config/+/972897 | 14:29 |
| clarkb | ok bit later of a start than I had hoped for, but here now. Is there anything I should be looking at or help debug before I start working through the todo list? | 16:17 |
| clarkb | frickler: for https://review.opendev.org/c/openstack/project-config/+/972847/1/zuul.d/secrets.yaml did you generate new keytabs or did you use the one I generated last night (to replace the old value) ? I don't think it matters too much either way I'm more curious | 16:22 |
| clarkb | infra-root I left a review on 972897 with some notes on sequencing concerns | 16:27 |
| corvus | heh that "zuul-launcher" principal name is really going to confuse us in the future; at least there's a comment. | 16:28 |
| corvus | 897 is WIP status | 16:29 |
| clarkb | I also wanted to note that last night my base-test test change recheck to check rax log uploads had one post failure. I didn't end up looking at hte logs but I suspect that rax swift may still have been somewhat flaky at the time? So I think it was a good call to only use ovh for now | 16:29 |
| clarkb | corvus: frickler ya maybe we can remove the WIP and then just have an understanding amongst infra-root that when we land that one we'll need to pay close attention and manually update authorized_keys on bridge? | 16:30 |
| clarkb | my comment on that change tried to capture what I think are some of the main concerns | 16:30 |
| clarkb | gthiemon1e: thank you for the report. Log uploads are generally working, and when they fail zuul will typically render a build status page just without logs. So this is probably something other than a typical log upload failure | 16:31 |
| clarkb | gthiemon1e: that said there is a job-output.json file (not job-output.json.gz) and zuul should check both (there are compression differences amongst swift implementations iirc so we do both) | 16:32 |
| clarkb | the json itself appears to be valid based on my browser being able to render it (but maybe firefox is generous in its interpretation) | 16:35 |
| clarkb | in positive news it looks liek some of these secrets might be able to be cleaned up like openstackzuul_docker_login in openstack/project-config | 16:37 |
| clarkb | but that is probably the least urgent thing right now | 16:37 |
| clarkb | I can work on updating system-config secrets shortly | 16:45 |
| timburke | has anyone already reported that github mirroring seems to have broken? https://zuul.opendev.org/t/openstack/builds?job_name=openstack-upload-github-mirror | 16:49 |
| timburke | there was a stretch where jobs failed to report results, but now they all die like 'Load key "/var/lib/zuul/.../ansible.zqir32ci": error in libcrypto' | 16:50 |
| clarkb | timburke: both are known problems due to https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/WBBLBI6ZS6FA6Q5ZMH4C2MWPL3WG3H24/ (highly recommend subscribing to that list if you aren't already) | 16:50 |
| clarkb | timburke: in the case of failing to report results and not having logs we've correct that and things run now. But now projects (like openstack) need to update secrets for things liek github mirroring | 16:51 |
| timburke | got it, thanks -- i was just working with repos before checking email ;-) | 16:52 |
| clarkb | I think fungi mentioned that he planned to dig into those underlying secrets for openstack today | 16:55 |
| corvus | infra-root: do we still care about zuul status backups? i ask because i'm making a change where i need to move them, but wondering if i should just delete them instead. | 16:58 |
| clarkb | corvus: I can't recall the last time we needed them. Zuul doing rolling upgrades largely negates the need I think | 16:58 |
| clarkb | I would be ok with dropping them | 16:58 |
| corvus | (the thing where we dump the status page json to a timestamped file) | 16:58 |
| corvus | k. i'll propose that | 16:59 |
| clarkb | corvus: I'm attempting to use `sudo docker run --rm -i --pull always quay.io/zuul-ci/zuul-client --zuul-url https://zuul.opendev.org encrypt --tenant openstack --project opendev/system-config --secret-name system-config-opendevmirror --field-name password` and it seems like I have to enter ^d twice to get it to write out the secret. Do you know if that is expected? I'm slightly | 17:04 |
| clarkb | worried that it will encode a literal ^d in the secret | 17:04 |
| corvus | clarkb: that is expected | 17:04 |
| corvus | the docs even say to do that | 17:04 |
| corvus | so if you were typing the password "foobar" in, then you would type "foobar^D^D". that will encode the exact string "foobar". | 17:05 |
| clarkb | the help output says to enter ^d but doesn't say it needs to be done twice. I'm more concerned about it needing to go twice. Thanks for confirming. Maybe just need to update the help output | 17:05 |
| corvus | that's standard shell behavior -- ^D normally only does EOF after newline. ^D^D is needed if there was no newline. | 17:06 |
| clarkb | oh! til | 17:06 |
| clarkb | cool and hopefully that command is useful to anyone else trying to encrypt things (you'll need to change the parameters for your project and secret names but should be 80% there) | 17:06 |
| corvus | yeah, that's one of those things no one learns until well after they think they have learned everything about unix. :) | 17:07 |
| corvus | you think, "surely i must have hit ^D at the end of a line before" but no, no one does! | 17:07 |
| corvus | you can even see the behavior with cat (a little better -- because you can see that the first one does a flush and the second one does an eof) | 17:08 |
| clarkb | looks like someone has the credentials file in use/locked. Any chance you can let me know when I can look at it? | 17:11 |
| corvus | oops done. | 17:12 |
| clarkb | thank you | 17:12 |
| opendevreview | James E. Blair proposed opendev/system-config master: Move zuul scheduler backups to dedicated dir https://review.opendev.org/c/opendev/system-config/+/972930 | 17:13 |
| opendevreview | James E. Blair proposed opendev/system-config master: Remove /var/lib/zuul/times backup exclusion https://review.opendev.org/c/opendev/system-config/+/972931 | 17:13 |
| opendevreview | James E. Blair proposed opendev/system-config master: Remove zuul status backups https://review.opendev.org/c/opendev/system-config/+/972932 | 17:13 |
| opendevreview | James E. Blair proposed opendev/system-config master: Remove /var/lib/zuul/backup exclusion https://review.opendev.org/c/opendev/system-config/+/972933 | 17:13 |
| corvus | okay that change series addresses the thing that caused us to rotate the zuul secrets. | 17:13 |
| corvus | in the end, technically, we don't actually need to stop running the status backups, so i put that later in the series. but i still think we should do it, because i think that last change is important too. | 17:14 |
| corvus | because if i'm reading this right, we weren't actually backing up our keystore to borg. | 17:14 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Update zuul secrets for docker hub and quay.io https://review.opendev.org/c/opendev/system-config/+/972934 | 17:22 |
| clarkb | ok I think that covers system-config. While this is fresh in my mind I'm going to hunt down the other projects we build images with that upload container images | 17:23 |
| opendevreview | James E. Blair proposed opendev/zuul-providers master: Update image upload secret https://review.opendev.org/c/opendev/zuul-providers/+/972936 | 17:30 |
| opendevreview | Clark Boylan proposed opendev/lodgeit master: Rotate quay.io upload secret https://review.opendev.org/c/opendev/lodgeit/+/972937 | 17:30 |
| opendevreview | Clark Boylan proposed opendev/gerritbot master: Rotate quay.io upload secret https://review.opendev.org/c/opendev/gerritbot/+/972941 | 17:33 |
| opendevreview | Clark Boylan proposed opendev/statusbot master: Rotate quay.io upload secret https://review.opendev.org/c/opendev/statusbot/+/972942 | 17:38 |
| opendevreview | Clark Boylan proposed opendev/grafyaml master: Rotate quay.io upload secret https://review.opendev.org/c/opendev/grafyaml/+/972943 | 17:40 |
| opendevreview | Merged openstack/project-config master: Restore AFS secrets https://review.opendev.org/c/openstack/project-config/+/972847 | 17:44 |
| clarkb | Clark Boylan proposed openstack/ptgbot master: Rotate quay.io upload secret https://review.opendev.org/c/openstack/ptgbot/+/972945 <- this was the last one I could find | 17:44 |
| clarkb | as far as getting system-config back up and running it doesn't look like updating secrets really tests much (probably a good thing in this particular case) so we probably want to go with https://review.opendev.org/c/opendev/system-config/+/972934 to update secrets, then frickler's change to reenable ssh access/management, then corvus' cleanups around zuul backups? I'm going to review | 17:46 |
| clarkb | corvus' changes next | 17:46 |
| corvus | i think i've at least +2d everything i've seen | 17:48 |
| corvus | clarkb: https://review.opendev.org/972936 could use a +3 | 17:49 |
| clarkb | corvus: I left a question on https://review.opendev.org/c/opendev/system-config/+/972930 looking at 972936 next | 17:49 |
| fungi | timburke: clarkb: yeah, github replication is next on my list after release signing | 17:50 |
| clarkb | 972936 is approved | 17:50 |
| opendevreview | James E. Blair proposed opendev/system-config master: Move zuul scheduler backups to dedicated dir https://review.opendev.org/c/opendev/system-config/+/972930 | 17:50 |
| opendevreview | James E. Blair proposed opendev/system-config master: Remove /var/lib/zuul/times backup exclusion https://review.opendev.org/c/opendev/system-config/+/972931 | 17:50 |
| opendevreview | James E. Blair proposed opendev/system-config master: Remove zuul status backups https://review.opendev.org/c/opendev/system-config/+/972932 | 17:50 |
| opendevreview | James E. Blair proposed opendev/system-config master: Remove /var/lib/zuul/backup exclusion https://review.opendev.org/c/opendev/system-config/+/972933 | 17:50 |
| corvus | clarkb: thx, fixed | 17:51 |
| opendevreview | Merged opendev/zuul-providers master: Update image upload secret https://review.opendev.org/c/opendev/zuul-providers/+/972936 | 17:51 |
| clarkb | corvus: related to the next change in that stack I wonder if we need to put those exclusion rules in the zuul group vars rather than zuul02 host vars because we have a zuul02 now | 17:51 |
| clarkb | oh maybe only zuul02 is backed up externally that might explain it | 17:53 |
| corvus | clarkb: i don't know why they're in the host and not group, but we have them for both hosts and they were different. they're the same at the end of my stack. we may be able to refactor? | 17:53 |
| corvus | only zuul02 had the times exclusion | 17:53 |
| corvus | they both had the backup exclusion | 17:53 |
| clarkb | ah. If they end up the same then ya maybe we can simply combine the values into a group vars value at the end of the stack | 17:54 |
| clarkb | your stack lgtm as is though | 17:54 |
| opendevreview | James E. Blair proposed opendev/system-config master: Refactor zuul-scheduler borg backup excludes https://review.opendev.org/c/opendev/system-config/+/972946 | 17:55 |
| corvus | clarkb: ^ cherry on top | 17:55 |
| clarkb | infra-root I think we're quickly approaching a state where we can/should look at reenabling system-config management. That said it is probably low on the priority list in terms of getting user facing things going. Maybe raise your hand when you think we are ready to keep an eye on that and we'll take it from there? frickler can you drop the work in progress status in the meantime | 17:57 |
| clarkb | (otherwise we may need to push a new change or something to get around that) | 17:57 |
| timburke | fungi, thanks! and yeah, makes sense that "be able to release" should take precedence ;-) | 17:59 |
| fungi | clarkb: i'm around for the rest of the day and happy to help monitor/work through deployment job failures or unecpected gotchas | 17:59 |
| opendevreview | Merged opendev/system-config master: Update zuul secrets for docker hub and quay.io https://review.opendev.org/c/opendev/system-config/+/972934 | 18:00 |
| clarkb | fungi: ya its mostly that I think there is a bit of a dance involved (see my comment on frickler's change for details) and i want ot amke sure we aren't distracted with the other stuff we're working on when we get there | 18:01 |
| clarkb | gerritbot is failing on a python2 only assert somehow | 18:01 |
| fungi | timburke: appreciated, and yeah we're not even really at the 24 hour mark from when we first learned there was a zuul vulnerability, so the progress is impressing me most of all | 18:01 |
| clarkb | how did we ever add python3 testing in the first place? I'll just fix it as its a test and I'm not too worried about it | 18:01 |
| opendevreview | Merged opendev/statusbot master: Rotate quay.io upload secret https://review.opendev.org/c/opendev/statusbot/+/972942 | 18:02 |
| corvus | clarkb: i'm in favor of getting the system-config train out of the station | 18:04 |
| clarkb | corvus: ack. I think the main thing we need to do is ensure that when we manually add the new authorized_keys file to bridge that the next jobs that run are for the chagne that update the key list and not another change or hourly runs | 18:06 |
| clarkb | I think the best way to do that is probably to approve the change and then see how things shake out in zuul and just add the file right before the jobs will start? If it looks like hourly will run instaed then we can wait? | 18:06 |
| fungi | and frickler created the new authorized_keys file just didn't move it into place, right? | 18:07 |
| corvus | hourly runs just finished (failed) so nows a good time | 18:07 |
| fungi | making sure i didn't misread earlier discussion | 18:07 |
| opendevreview | Clark Boylan proposed opendev/gerritbot master: Rotate quay.io upload secret https://review.opendev.org/c/opendev/gerritbot/+/972941 | 18:08 |
| clarkb | fungi: yes aiui the file is staged but not in use | 18:09 |
| corvus | https://review.opendev.org/972897 is the change we need to approve? and it's still wip? | 18:10 |
| clarkb | corvus: it was just flipped ready for review | 18:11 |
| frickler | sorry was away for a bit, un-wip that now | 18:11 |
| clarkb | I was going to double check key values too as I realized I didn't do that when I +2'd. I only reviewed the structure | 18:11 |
| clarkb | fungi: /home/zuul/.ssh/new_authorized_keys is the staging location | 18:11 |
| frickler | 972847 uses the keytab files that were generated earlier | 18:12 |
| clarkb | frickler: ack thanks | 18:12 |
| corvus | approved | 18:19 |
| clarkb | I've double checked teh keys in the change and in the staging file on bridge and I think they all lgtm | 18:20 |
| opendevreview | Jeremy Stanley proposed openstack/project-config master: Replace OpenPGP signing subkey https://review.opendev.org/c/openstack/project-config/+/972960 | 18:30 |
| clarkb | fungi: ^ +2 from me but I didn't approve it | 18:32 |
| clarkb | wasn't sure if ytou wanted the iopenstack relase team to look at it first | 18:32 |
| opendevreview | Merged opendev/lodgeit master: Rotate quay.io upload secret https://review.opendev.org/c/opendev/lodgeit/+/972937 | 18:32 |
| fungi | there's, unfortunately, not much to look at since it's an encrypted blob | 18:33 |
| clarkb | true I guess in that case we should probably approve it. Should I do that or do you want to? | 18:33 |
| fungi | i filled them in during their weekly meeting earlier today that this was coming, and i'm going to coordinate test releases with them anyway | 18:33 |
| fungi | feel free to approve | 18:33 |
| clarkb | done | 18:34 |
| corvus | the system-config change estimates 55m remaining, which means it should merge well after the next hourly run | 18:34 |
| clarkb | perfect | 18:34 |
| clarkb | so in theory after the 1900 ish hourly run we can put the staged authorized_key file in place and then monitor the jobs the run when the change merges | 18:34 |
| clarkb | a reminder that it will probably run every single last job because the all.yaml group vars are modified | 18:35 |
| fungi | convenient, if slow, smnoketest | 18:38 |
| opendevreview | Merged opendev/gerritbot master: Rotate quay.io upload secret https://review.opendev.org/c/opendev/gerritbot/+/972941 | 18:44 |
| clarkb | infra-root the number corvus stated earlier is a bit skewed by the opendev-buildset-registry I think. The actual number is actually about 26 minutes from now | 18:47 |
| clarkb | which will be a bit closer to the hourly jobs | 18:47 |
| clarkb | but still ok I think | 18:47 |
| opendevreview | Merged openstack/project-config master: Replace OpenPGP signing subkey https://review.opendev.org/c/openstack/project-config/+/972960 | 18:51 |
| clarkb | statusbot, gerritbot, ptgbot, and lodgeit all updated images on quay.io as part of having their secret updated. This is good means the secrets are working. But it also means those will be updated when the fix system-config deployments change lands | 18:56 |
| clarkb | not a big deal but another behavior to keep an eye out for as we reenable things | 18:56 |
| clarkb | noen of the system-config hosted images appear to have rebuilt which is nice because I don't want to globally udpated everything all at once | 18:57 |
| opendevreview | Merged opendev/grafyaml master: Rotate quay.io upload secret https://review.opendev.org/c/opendev/grafyaml/+/972943 | 18:58 |
| corvus | prod hourly has started | 19:04 |
| clarkb | corvus: I wonder if we should go ahead and deuquee it since its just goign to retry limit everything? | 19:04 |
| clarkb | zuul estimated about 8 minutes to the ssh key fixup change merging | 19:05 |
| corvus | done | 19:05 |
| corvus | dequeued i mean | 19:05 |
| clarkb | thanks. I'm going to move the staged authorized_keys file into the actual location now | 19:05 |
| clarkb | that is done | 19:06 |
| clarkb | in theory that means we'll actually try to run things in a few minutes | 19:09 |
| opendevreview | Merged opendev/system-config master: Update zuul ssh keys https://review.opendev.org/c/opendev/system-config/+/972897 | 19:13 |
| clarkb | hrm there may be another place the key is managed in the boostrap bridge playbook arg | 19:15 |
| clarkb | but that may just be for CI. So I don't think we need to change anything at the moment. Just monitor and see if thigns work and if not then unravel that more? | 19:15 |
| clarkb | oh ya ther eis a comment that says in testing we are called with root_rsa_set | 19:16 |
| clarkb | so hopefully that is testing only | 19:16 |
| clarkb | ansible is running on bridge | 19:16 |
| clarkb | bootstrap bridge paused and infra-prod-base is running now | 19:17 |
| clarkb | corvus: do we want to disable the zuul upgrades and reboots that will start later today? | 19:18 |
| clarkb | maybe that depends on how this deployment goes | 19:18 |
| fungi | i mean, if things are generally working then it should be a non-event | 19:19 |
| clarkb | ya | 19:19 |
| fungi | i expect to be around all weekend, so if something goes sideways with the zuul upgrade then presumably one of us will notice fairly quickly | 19:19 |
| clarkb | authorized keys for zuul on bridge updated I think to drop the comments from the old staged file | 19:19 |
| clarkb | the key values all have the 20260109 comments in them | 19:19 |
| corvus | i'm leaning toward not disabling them | 19:21 |
| clarkb | we have our first successful job (infra-prod-base) and the other ones look like they have been able to ssh in | 19:26 |
| clarkb | corvus: ok I think we may have our first problem and it is super minor. I didn't update the mariadb password in the zuul-db group vars only the zuul connection group vars. But that means the docker-compose file wasn't updated and I think everything nooped when ansible ran on zuul-db01 | 19:29 |
| clarkb | corvus: I think I should go ahead and fix that now in the zuul-db group vars unless you have any concerns with that | 19:29 |
| corvus | i may be missing something -- why did it noop instead of revert to old creds? | 19:31 |
| corvus | oh wait | 19:31 |
| clarkb | corvus: because i changed the password via sql statement in the running db not via the docker-compose file | 19:31 |
| clarkb | so the docker-compose file has stayed the same | 19:31 |
| corvus | ah okay. i think those passwords in compose are only used for bootstrapping | 19:32 |
| clarkb | yes I think that is the case which is why I ended up using sql statement isntead | 19:32 |
| corvus | okay, then i agree this is minor and we should fix it and it should still noop after fixing | 19:32 |
| clarkb | cool I'll update the zuul-db group var now | 19:32 |
| corvus | i mean, it'll change the file but nothing else. not exactly a noop but almost | 19:33 |
| clarkb | ya let me double check the ansible playbook. Maybe it will restart the db if things change? | 19:33 |
| clarkb | but even then the impact should be short and minor | 19:33 |
| fungi | and avoid a nasty surprise at some point in the future if we ever need to re-bootstrap things | 19:33 |
| clarkb | the mariadb_run_compose_up variable controls whether docker compose up -d is run and it is set to false by default and only true in testing as far as I can tell | 19:34 |
| clarkb | so yes this should only update the docker-compose.yaml file and then do nothing else. | 19:35 |
| clarkb | I'm updating the group var data now | 19:35 |
| clarkb | that is done | 19:39 |
| clarkb | zookeeper deployment jobs are running now. Zuul should run shortly | 19:39 |
| clarkb | apparently manage-projects goes first then zuul (makes sense so that zuul doesn't try to load projects that don't exist yet) | 19:42 |
| clarkb | fungi: any chance you might be interested in updating the opendevinfra gpg key for ppa package publishing? I feel like gnupg and I have a fight every time we talk to each other | 19:45 |
| fungi | yeah, i can work on that. i need a break from looking into all our github accounts anyway | 19:46 |
| clarkb | thank you | 19:47 |
| clarkb | manage-projects was successful and now infra-prod-service-zuul is running | 19:47 |
| clarkb | this is the last job in the buildset | 19:47 |
| clarkb | https://zuul.opendev.org/t/openstack/buildset/d51061e3ce2a4c93ab46f1f07b8dc048 we have success | 19:56 |
| clarkb | I'm going to pause for lunch now, but I suspect we can proceed with corvus's changes to system-config that update some of the zuul stuff now | 19:56 |
| clarkb | if anyone else wants to review those that would be great | 19:56 |
| corvus | w00t. i am also going to lunch | 19:56 |
| clarkb | also I see gerritbot reporting changes in other channels after its image update and deployment. statusbot rejoined this channel too | 19:57 |
| fungi | i'm coming to you from the future just to let you know that lunch was indeed great, and you should enjoy it to the fullest | 19:57 |
| corvus | fungi: most important question: what was it? cause i'd love to know that in advance | 19:58 |
| clarkb | I think I've got a turkey sandwich | 20:00 |
| fungi | sea scallops on seaweed salad, followed by rockfish sandwich and cole slaw | 20:00 |
| clarkb | but it hasn't been made yet so that might change. fungi's future lunch sounds better | 20:00 |
| clarkb | corvus: for https://review.opendev.org/c/opendev/system-config/+/972930 we're going to need to manually restart the schedulers so that they pick up the new bind mounts so that the backup at 00:00 Succeeds. Should I got ahead and approve teh change knowing that is the case? | 21:01 |
| clarkb | in the meantime the opendev prod hourly jobs have just started. they shoudl succeed now | 21:02 |
| corvus | clarkb: sgtm | 21:04 |
| clarkb | done | 21:06 |
| clarkb | I haven't approved any of the others thinking it may be good to address thsi particular item first but happy for you to just send it and approve the others if you think that is best | 21:07 |
| clarkb | https://zuul.opendev.org/t/openstack/buildset/04123668b61f446387bad46fece29bfe hourly builds are successful now too | 21:09 |
| corvus | clarkb: stages sounds like a plan. | 21:10 |
| clarkb | I'm going to recheck my base-test test change to see how rax swift uploads are doing | 21:25 |
| clarkb | corvus: one thing that occurs to me is we may want to check that each zuul launcher cloud is able to boot nodes. I guess grafana graphs may be good enough for that? Just to ensure we don't have any rotation bugs for a specific cloud or region that is masked by falling back to other clouds | 21:27 |
| clarkb | ya looking at grafana I think we're using each cloud region except for rax flex sjc3 which we had intentionally disabled last month ish | 21:28 |
| corvus | should be the same creds as the other flex | 21:29 |
| clarkb | yup I think it is likely fine and we can debug when we reenable it after they get back to us if anything doesn't work as expected | 21:30 |
| clarkb | timburke: in addition to the github replication issues I notice that swift's container image publication is failing. I think you control that secret. You'll want to rotate the credential then encrypt the new value and push that to gerrit | 21:52 |
| clarkb | timburke: basically don't reencrypt the old value | 21:52 |
| clarkb | corvus: do we need to wait for the opendaylight removal to land before restarting schedulers/ I Guess we might not need to but it would make logs on startup cleaner? | 22:04 |
| corvus | i figure it's fine to restart again later... if they're close, sure wait, otherwise no need i think. | 22:06 |
| clarkb | ack | 22:06 |
| clarkb | looks like its an extra 15 minutes to wait so not that long | 22:12 |
| clarkb | heh nope it passed tests first so they go in together | 22:16 |
| opendevreview | Merged opendev/system-config master: Move zuul scheduler backups to dedicated dir https://review.opendev.org/c/opendev/system-config/+/972930 | 22:21 |
| opendevreview | Merged opendev/system-config master: Remove opendaylight Zuul connection https://review.opendev.org/c/opendev/system-config/+/972815 | 22:21 |
| clarkb | the second change is deploying now | 22:31 |
| clarkb | corvus: ok both deployments are done now | 22:50 |
| clarkb | corvus: I can stop zuul-scheduler on zuul01 then start it then repeat the process for zuul02 after zuul01 is up and running again? | 22:51 |
| clarkb | or do you want to do it? | 22:51 |
| corvus | clarkb: i'll take care of it | 22:53 |
| clarkb | cool let me know if I can help | 22:53 |
| corvus | i'm going to go ahead and approve the rest of the changes | 22:55 |
| clarkb | sounds good | 22:55 |
| corvus | okay, well, maybe just the next 2. up through removing the backup cron. i think we could have done those earlier, whoops. | 22:56 |
| corvus | we want to remove the backup json files between https://review.opendev.org/972932 and https://review.opendev.org/972933 | 22:56 |
| clarkb | makes sense | 22:57 |
| clarkb | the base-test test change had no problems this time around. I'm going to make a note locally that I should retest it Monday and reenable rax swift log uploads then if it continues to look good. I would do it now but we're still landing fixups and I don't want those going sideways on a swift having problems if we can avoid it | 22:59 |
| clarkb | if it were not Friday I'd probably go for it now but its Friday and ya | 23:00 |
| corvus | both schedulers are restarted and i have manually removed the keystore backup in /var/lib/zuul from both | 23:08 |
| clarkb | thanks! | 23:08 |
| timburke | clarkb, thanks for the heads-up on the docker hub job -- i'd assumed that needed rotation, too, given the description but it's good to have the confirmation. might not get to it until after the weekend, though (it's already saturday for the guy with the email for the account) | 23:25 |
| clarkb | timburke: ack just wanted to call it out since you were asking questions here earlier and I noticed it while looking at other jobs | 23:26 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!