| opendevreview | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/957995 | 02:43 |
|---|---|---|
| *** liuxie is now known as liushy | 05:47 | |
| *** ykarel_ is now known as ykarel | 07:00 | |
| *** dmellado2 is now known as dmellado | 07:18 | |
| *** iurygregory_ is now known as iurygregory | 13:10 | |
| clarkb | corvus: frickler for https://review.opendev.org/c/openstack/project-config/+/960144 do we want to push a new patchset setting max width to 90 or just go with the current code. I think I'm fine with the current change. I actually set my gerrit diff line width to 80 so the current version is slightly better for me. Also keep in mind this is a one time transition. I'm not sure the | 14:06 |
| clarkb | longer line width helps too much beyond making this initial diff smaller? | 14:06 |
| stephenfin | clarkb: Apologies if I asked/suggested this before, but how hard would it be to set up a pre-commit-mirror org on opendev and automatically mirror popular pre-commit plugin repos there? Would that help address your concerns around missing pip caches? | 14:34 |
| fungi | i'd prefer shorter lines, personally, but just because i use 80-column terminals everywhere | 14:35 |
| fungi | stephenfin: git mirrors, yes? | 14:36 |
| stephenfin | yes | 14:36 |
| fungi | they could be added as remote connection projects in zuul and then pushed to workspaces as required-projects | 14:36 |
| opendevreview | Merged openstack/project-config master: Line-wrap projects.yaml but strip trailing spaces https://review.opendev.org/c/openstack/project-config/+/960144 | 14:52 |
| *** dhill is now known as Guest26328 | 15:14 | |
| clarkb | stephenfin: I think I'm mostly concerned about it from a "its bad design and tool choice" standpoint. If projects see failures from github access that they want to mitigate then using zuul to add the repos as required projects would probably address that | 15:19 |
| fungi | out for a quick lunch and then going to work on more afs/kerberos server upgrades | 15:19 |
| clarkb | I'm going to quickly rerun some of those fio checks just to see if things remain consistent hours later | 15:21 |
| fungi | mirror.centos-stream volume move completed in 16 hours btw, mirror.debian-security has been in progress for a few hours now (since 12:12 utc) | 15:22 |
| fungi | bbiab | 15:22 |
| clarkb | mirror.dfw.rax.opendev.org randread: `read: IOPS=19.8k, BW=77.3MiB/s (81.0MB/s)(2318MiB/30003msec)` read: `read: IOPS=142k, BW=556MiB/s (583MB/s)(16.3GiB/30004msec)` | 15:23 |
| clarkb | lists: randread: `read: IOPS=761, BW=3048KiB/s (3121kB/s)(89.6MiB/30105msec)` read: `read: IOPS=20.5k, BW=80.3MiB/s (84.2MB/s)(2413MiB/30059msec)` | 15:24 |
| clarkb | its interesting that lists sequential reads are on par with mirrors' random reads | 15:24 |
| clarkb | but also these numbers appear to be consistent with yesterdays. | 15:25 |
| opendevreview | Elod Illes proposed openstack/project-config master: Make update_constraints.sh work with pyproject.toml https://review.opendev.org/c/openstack/project-config/+/960627 | 15:44 |
| opendevreview | Elod Illes proposed openstack/project-config master: Fix update_constraints.sh for 'python_version>=' cases https://review.opendev.org/c/openstack/project-config/+/960628 | 15:44 |
| clarkb | infra-root I think today is a reasonable day to land https://review.opendev.org/c/opendev/system-config/+/957277 to start the base container image location update process. Any objections? | 15:46 |
| clarkb | we're not going to get everything moved in one day or even a couple so I think we just need to start somewhere and keep pushing on it over a week or three | 15:47 |
| corvus | no objection | 15:51 |
| clarkb | ok I've approved it. It should be in the gate long enough for any last minute objections | 16:17 |
| clarkb | once it lands and things publish I'll send an announcement to service-announce about the move | 16:18 |
| clarkb | then for followup I believe these changes are relatively low impact and can be landed afterwards with some observiation: https://review.opendev.org/c/opendev/system-config/+/958480/ https://review.opendev.org/c/opendev/system-config/+/958598/ https://review.opendev.org/c/opendev/system-config/+/958593/ https://review.opendev.org/c/opendev/system-config/+/958594/ | 16:19 |
| clarkb | https://review.opendev.org/c/opendev/grafyaml/+/958601 https://review.opendev.org/c/opendev/lodgeit/+/958602 https://review.opendev.org/c/opendev/gerritbot/+/958600 and https://review.opendev.org/c/opendev/statusbot/+/958603 | 16:20 |
| clarkb | the more difficult changes to make are gerrit and ircbot | 16:28 |
| clarkb | both are more noticeable outages | 16:28 |
| opendevreview | Merged opendev/system-config master: Move python base images back to quay.io https://review.opendev.org/c/opendev/system-config/+/957277 | 16:46 |
| clarkb | once promotion occurs I can get links to the new stuff and send out my announcement email | 16:47 |
| fungi | sounds good | 16:47 |
| clarkb | arg promotion has failed | 16:48 |
| clarkb | https://zuul.opendev.org/t/openstack/build/78293cc7053f432689474bfd2d740a6c | 16:48 |
| clarkb | everything was going so smoothly up to this point | 16:48 |
| fungi | Parse build response is not very verbose about failures, huh? | 16:49 |
| clarkb | I think I see the bug | 16:49 |
| clarkb | give me a few to confirm but I should be able to get a patch up that we can land quickly and fix this | 16:50 |
| clarkb | promote_container_image_job: system-config-promote-image-python-builder-3.11-bookworm <- I think that needs to be the upload job not the promote job name | 16:51 |
| clarkb | since the promote job is looking up the artifact from the upload job | 16:51 |
| clarkb | I just want to confirm this | 16:51 |
| fungi | working on upgrading kdc04 at the moment | 16:51 |
| fungi | https://review.opendev.org/959962 indicates noble should work | 16:52 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Fix base image quay.io promotion https://review.opendev.org/c/opendev/system-config/+/960644 | 16:58 |
| clarkb | infra-root ^ I think that should fix the issue with publication to quay. Sorry for that. I'm going to go and check the other changes now | 16:58 |
| clarkb | the first base image move change and the trixie image addition change are the only two with this problem | 17:01 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Build trixie python base container images https://review.opendev.org/c/opendev/system-config/+/958480 | 17:05 |
| clarkb | that should fix it for the trixie image change | 17:05 |
| fungi | having trouble untangling whatever edit you made on that revision from the rebase | 17:22 |
| fungi | but it does seem to be using the upload instead of publish jobs like the base change | 17:24 |
| *** dhill is now known as Guest26333 | 17:24 | |
| fungi | lgtm | 17:24 |
| clarkb | fungi: https://review.opendev.org/c/opendev/system-config/+/958480/2..3 shows it just fine | 17:27 |
| clarkb | fungi: if you want to remove your +A on 958480 I think that would be fine. I kinda want to wait to see things work with the existing iamges first | 17:28 |
| clarkb | (though at this point it should work so it is probably fine) | 17:29 |
| fungi | ah, gertty mixed in the diff from the rebase | 17:29 |
| fungi | i guess the gerrit webui has gotten better about filtering out files that were only rebased | 17:30 |
| clarkb | fungi: did the kdcs end up back in the emergency file? I guess either way the real concern is the ansible cache files needing to be cleared out | 17:44 |
| fungi | oh, good point, i forgot they'd been taken out | 17:45 |
| fungi | well, they're upgraded now, i guess i just need to clear the cache on bridge | 17:45 |
| clarkb | ya I think the change you merged might trigger jobs against them and/or the daily runs | 17:46 |
| fungi | i've removed /var/cache/ansible/facts/kdc0{3,4}.openstack.org again | 17:46 |
| clarkb | that is why I thought about it | 17:46 |
| fungi | kdcs are both upgraded, reachable and have the requisite cpu count (2 each, smaller flavor, but as many as they had before upgrading) | 17:46 |
| fungi | and i followed https://docs.opendev.org/opendev/system-config/latest/kerberos.html#no-service-outage-server-maintenance again so it should have been transparent | 17:47 |
| fungi | going to follow https://docs.opendev.org/opendev/system-config/latest/afs.html#afsdb0x-openstack-org for the afsdb upgrades now | 17:48 |
| clarkb | *the change you approved. Sorry it hasn't merged just yet | 17:51 |
| fungi | huh, afsdb03 is using netplan instead of ifupdown, must be a more recent addition/rebuild | 17:52 |
| clarkb | I think it was a newer addition when we realized that we should have 3 dbs instead of 2 | 17:53 |
| opendevreview | Merged opendev/system-config master: Fix base image quay.io promotion https://review.opendev.org/c/opendev/system-config/+/960644 | 18:09 |
| clarkb | ok lets see how promotions go this time | 18:09 |
| clarkb | we have our first success so this is looking good | 18:11 |
| clarkb | they have all succeeded | 18:12 |
| clarkb | I think we can let the trixie job remain in the gate and followup with additiona image promotions given this change didn't break anything | 18:12 |
| fungi | and there was much rejoicing | 18:13 |
| clarkb | anything else we want to check before I send an announcement about the move? | 18:13 |
| fungi | nothing comes to mind | 18:14 |
| fungi | at current pace i should have the rest of the afs servers upgraded to noble by the end of my day, except for afs02.dfw which will need to wait until the volume moves are done (probably early next week) | 18:14 |
| opendevreview | Merged opendev/system-config master: Use Noble for our Kerberos servers https://review.opendev.org/c/opendev/system-config/+/959962 | 18:15 |
| fungi | i expect this weekend the first pass on moves will be done and i'll restart any that got skipped due to conflicting write locks or temporary lack of space | 18:15 |
| clarkb | https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/HO6Z66QIMDIDY7CCVAREDOPSYZYNKIT3/ email should be in inboxes now | 18:18 |
| clarkb | fungi: did you want to review the easy list of followup changes to switch consuming images over to the new location? corvus already reviewed them and I think for most of them i'm comfortable +A'ing as I can monitor | 18:19 |
| fungi | indeed it is, just finished reading it | 18:19 |
| clarkb | but happy to have extra eyeballs if youw ant to take a look | 18:19 |
| clarkb | that list is at 16:19:58 UTC in scrollback | 18:20 |
| fungi | i'll take a quick pass through them while these afs kernel modules build | 18:20 |
| clarkb | I'm happy to hit +A since for most change this will have at least a small impact on production services | 18:20 |
| fungi | looking at all of hashtag:opendev-on-quay (hopefully that covers them) | 18:21 |
| clarkb | fungi: yup that is a super set. I don't want to move gerrit or ircbot at this time | 18:21 |
| clarkb | both will need coordination as we will want to restart the services they host and that causes disruption | 18:21 |
| fungi | k | 18:22 |
| clarkb | but I think all of the others are relatively safe. A small blip to hound and lodgeit on codesearch and paste isn't a big deal for example | 18:22 |
| clarkb | the change to switch kdc testing to noble did not trigger infra-prod jobs against the kdcs. So it is daily runs that will do that first | 18:23 |
| fungi | what happened to make docker capable of building speculatively for single-arch? | 18:23 |
| clarkb | fungi: that was the change where we switched docker builds to do buildx builds just like multiarch but without the emulation | 18:24 |
| fungi | ah, and that made the fallback work for it with the buildset/intermediate registries? | 18:24 |
| clarkb | the custom buildx builder is capable of it (and was capable of it in multiarch buidls which led to confuision over why it didn't work elsewhere) so we simply amde that the builder in all cases | 18:24 |
| clarkb | yes | 18:24 |
| fungi | got it | 18:24 |
| clarkb | that custom builder sets up /etc/hosts for registry access, it configures ssl certs in the builder for registry access, then creates a buildx toml config file that sets up mirrors for each proper registry to go through our buildset registry | 18:25 |
| clarkb | one confusing thing was buildx documents its config file and the impression is that the default builder should honor it but testing seems to indicate this isn't the case. If you want to configure buildx with a custom config then you need a custom builder too | 18:26 |
| clarkb | but once we figured all of that out the solution was reasonably straightforward, just do the custom buildx builder for all builds | 18:26 |
| fungi | 958598 is going to trigger a deploy for all the gitea backends, right? | 18:29 |
| clarkb | fungi: it might, but that image isn't actually used by gitea anymore so any deployment should noop | 18:29 |
| clarkb | fungi: that gitea-init image is a relic of the k8s attempt that we've been keeping alive for some reason | 18:30 |
| fungi | oh! | 18:33 |
| clarkb | however, that reminds me there is a new gitea release | 18:34 |
| clarkb | let me work on a chagne for that while I'm thinking about it | 18:35 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Update gitea to 1.24.6 https://review.opendev.org/c/opendev/system-config/+/960675 | 18:38 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Stop mirroring python base images to quay.io/opendevmirror https://review.opendev.org/c/opendev/system-config/+/960676 | 18:45 |
| opendevreview | Merged opendev/system-config master: Build trixie python base container images https://review.opendev.org/c/opendev/system-config/+/958480 | 18:54 |
| opendevreview | Merged opendev/system-config master: Pull hound's base python image from quay https://review.opendev.org/c/opendev/system-config/+/958593 | 18:58 |
| clarkb | hound should deploy with its updated image once the hourly jobs complete | 19:10 |
| clarkb | I'll keep an eye on it | 19:10 |
| fungi | cool, i'm currently in the middle of the afsdb02 upgrade | 19:11 |
| clarkb | hound has restarted and is doing its startup routine | 19:16 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Mirror golang and node trixie container images https://review.opendev.org/c/opendev/system-config/+/960681 | 19:25 |
| opendevreview | Merged opendev/system-config master: Pull python base image for statsd metric reporters from quay.io https://review.opendev.org/c/opendev/system-config/+/958594 | 19:29 |
| opendevreview | Merged opendev/system-config master: Update jinjia-init and gitea-init to modern image build tooling https://review.opendev.org/c/opendev/system-config/+/958598 | 19:29 |
| clarkb | hound seems to work for me. I'll check on gitea-lb and zookeeper deployments for 958594 once that finishes deploying | 19:34 |
| fungi | with afsdb02 i tried fixing the ppa source ahead of the noble upgrade, but do-release-upgrade disables the ppa source anyway so it still requires a follow-up edit and reinstallation of the packages regardless | 19:38 |
| clarkb | the statsd containers appear to have updated on all four hosts. Grafana graphs appear to have new data at https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&from=now-5m&to=now&timezone=utc and https://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-5m&to=now&timezone=utc | 19:38 |
| clarkb | fungi: that change did trigger a gitea deploy but it should noop so not a big deal | 19:39 |
| fungi | cool | 19:39 |
| clarkb | I need to grab lunch momentarily but so far things seem happy | 19:39 |
| zigo | Keystone release notes are missing from website: https://docs.openstack.org/releasenotes/keystone/unreleased.html | 20:11 |
| zigo | This *always* happen on each release. Something should be done for it, IMO, because that's when I need the release notes the most. :) | 20:11 |
| zigo | It's ok, I can look in the diff and read the sources, but it's not very convenient ! :) | 20:12 |
| Clark[m] | I'm not sure what opendev is expected to do about that. It is almost certainly a job configuration or sphinx build problem | 20:14 |
| Clark[m] | Nova doesn't have this problem so ya I don't think it's a systemic infrastructure issue | 20:16 |
| fungi | it comes up a lot with release notes publication for multiple releases on different branches racing each other, because they all share the same file subtree | 20:18 |
| zigo | Well, something's wrong in the way things are getting released, because on each release there's instances of this. So IMO, not sure what, but something should be made so it doesn't happen every 6 months. | 20:18 |
| fungi | this isn't 6 months from the coordinated openstack release btw, it's avoided for the coordinated release by avoiding tagging releases on multiple branches of the same project at the same time | 20:20 |
| fungi | i'm guessing the release team approved a point release and rc change for keystone at the same time | 20:21 |
| fungi | actually looks like https://review.opendev.org/960080 creates a branch and a tag at the same time, so maybe triggers two release notes builds... checking | 20:25 |
| fungi | clicking "view build history" at the top of https://zuul.opendev.org/t/openstack/build/1c8fdd06259f45a3bd0195e38b98760a spins forever for me | 20:29 |
| corvus | works better if you drop the project | 20:30 |
| corvus | https://zuul.opendev.org/t/openstack/builds?job_name=publish-openstack-releasenotes-python3&skip=0 | 20:30 |
| fungi | yeah, i guess i can hand-filter the entries for that project | 20:30 |
| fungi | looks like that tag pipeline build was the only one for keystone around that time, so probably not a publication race | 20:31 |
| fungi | taking it to the #openstack-releases channel because i suspect there's some additional step needed at rc1 tagging and stable branch creation time to create the release notes page for the upcoming release | 20:37 |
| zigo | Same thing for placement. | 20:38 |
| clarkb | fungi: note you may want to put the afs updatse here :) | 21:32 |
| fungi | d'oh! | 21:33 |
| fungi | all three afsdb servers are upgraded to noble now and back in the cluster, so that just leaves afs02.dfw to upgrade once the rw volume moves are done, and then residual cleanup and removing the afs servers from the emergency list | 21:34 |
| clarkb | might want to remove one db server and one fileserver first and just ensure ansible doesn't do anything unexpected | 21:35 |
| clarkb | then remove the others after the daily runs. We don't have great coverage of ansible things for afs unfortunately | 21:35 |
| fungi | good idea | 21:47 |
| fungi | i can do that now actually | 21:47 |
| fungi | i've removed afs01.ord and afsdb01 from the emergency file | 21:48 |
| fungi | also i'll clear the fact caches for all the afs servers | 21:48 |
| fungi | i've removed /var/cache/ansible/facts/afs* from bridge now | 21:50 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!