Clark[m] | fungi: this weekend ended up being a wash so didn't try to finish up nb04 cleanup. It does look like we have updated images though so that change should be landable now I think | 00:36 |
---|---|---|
opendevreview | Merged openstack/diskimage-builder master: Setup C.UTF-8 as a RPM install lang in yum-minimal https://review.opendev.org/c/openstack/diskimage-builder/+/930932 | 01:25 |
ravlew | Hello, would it be possible to hold clearing the CI/CD instance for kolla-ansible-rocky9 after job is done for my patch? https://review.opendev.org/c/openstack/kolla-ansible/+/904959 | 09:00 |
ravlew | I have tried to deploy it by hand on my own and I am not encountering the issue of unhealthy neutron_ovn_agent. Example here: https://zuul.opendev.org/t/openstack/build/0bc0d7811df34f80bc4e1e7c6c81336b | 09:00 |
opendevreview | Pierre Riteau proposed openstack/project-config master: Cache new cirros 0.6.3 images https://review.opendev.org/c/openstack/project-config/+/931631 | 09:55 |
mnasiadka | clarkb: Since 930932 got merged - how often do you rebuild the images? (just looking how to check if that has fixed anything) | 10:20 |
fungi | Clark[m]: yep, sorry, my workstation rebooted spontaneously over friday night and i lost track of the fact that i was watching that screen session. i agree the delete finished and nb04 has 325G free in /opt now so i'll proceed with the reboot | 12:45 |
fungi | looks like nb04 is on its way back up now | 12:49 |
fungi | losetup now only shows the three loop devices for snapd | 12:51 |
fungi | and the builder is currently creating a ubuntu-jammy-arm64 image | 12:52 |
fungi | #status log Rebooted nb04 to clear leaked loop devices after cleaning up /opt | 12:52 |
opendevstatus | fungi: finished logging | 12:52 |
cardoe | corvus: I'm back from PTO. So no swift SLO issue in rax-flex? | 13:04 |
fungi | ravlew: i've set an autohold and rechecked your change (noting that you're investigating a problem with ovn agent startup on rocky), once the job fails i'll need to know what public ssh key you want me to add to the held node | 13:12 |
fungi | mnasiadka: image rebuild frequency varies depending on how popular we observe certain images to be. see the "rebuild-age" values set/inherited by the diskimages defined here: https://opendev.org/openstack/project-config/src/branch/master/nodepool/nodepool.yaml#L159-L402 | 13:14 |
fungi | could be anywhere from daily to weekly | 13:15 |
fungi | mnasiadka: you can also see the current image build ages at http://nl01.opendev.org/dib-image-list | 13:17 |
fungi | nodepool normally keeps the two most recent successful image builds, so you would want to consider the newer of the two for whatever image(s) you're interested in | 13:18 |
opendevreview | Merged openstack/project-config master: Replace 2024.2/Dalmatian key with 2025.1/Epoxy https://review.opendev.org/c/openstack/project-config/+/927031 | 13:23 |
Clark[m] | mnasiadka: fungi: we consume dib releases in node pool so need a new dib release first | 13:34 |
cardoe | JayF: clarkb: would like to be involved in the pre-commit convo when you guys have it. | 13:37 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Stop caching CirrOS 0.5.2 and 0.6.1 images https://review.opendev.org/c/openstack/project-config/+/931659 | 13:51 |
*** ykarel_ is now known as ykarel | 14:01 | |
corvus | cardoe: correct, issue appears to be related to sdk, sorry for false alarm | 14:03 |
Clark[m] | Still an open question on acls but we were told to try them and see what happens | 14:20 |
clarkb | I've rechecked https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 now that the centos arm image has been rebuilt. That should pass this time around | 15:01 |
clarkb | gmann: kopecmartin: separately I'd like to land https://review.opendev.org/c/openstack/project-config/+/931320 today to bump the opendev ansible version in zuul to version 9. The openstack release is over and prior testing (as well as other tenants) have shown this should generally work | 15:02 |
opendevreview | Merged openstack/project-config master: Cache new cirros 0.6.3 images https://review.opendev.org/c/openstack/project-config/+/931631 | 15:10 |
clarkb | python3.13 has been released | 15:15 |
clarkb | I wonder how long it will be for all the "your software breaks if the gil is disabled" bugs to roll in | 15:16 |
opendevreview | Takashi Kajinami proposed opendev/system-config master: Mirror puppetlabs packages for Ubuntu Noble https://review.opendev.org/c/opendev/system-config/+/931670 | 15:17 |
fungi | clarkb: not tagged yet. it's still on rc3 as of a week ago | 15:18 |
clarkb | fungi: ah yup I guess it releases today and some people have jumped the gun on calling it released | 15:19 |
fungi | there was some discussion about extending testing with an additional rc in order to roll back a new feature which wasn't quite there performance-wise yet | 15:19 |
clarkb | no claimed performance benefits this time around. Seems to be more of a stage setting releasd for the next round of effort around performance (GIL removal and JIT usage) | 15:19 |
fungi | https://discuss.python.org/t/incremental-gc-and-pushing-back-the-3-13-0-release/65285 | 15:20 |
fungi | the incremental gc turned out to be problematic for some things | 15:21 |
fungi | and *there's* the tag | 15:24 |
fungi | compiling now | 15:24 |
clarkb | paraphrasing: object count is a poor proxy for memory consumption. <- that does seem like something that would have drasticly different performance behaviors depending on application | 15:24 |
fungi | agreed | 15:26 |
clarkb | 926970 is failing on x86 rpm builds now. https://zuul.opendev.org/t/openstack/build/c0ee5fa387294d428dc9e951471e3a8d it almost looks like the issue here is not the kernel being out of date on our image but that the version of openafs may not compile against whatever kernel is in centos 9 stream now? | 15:55 |
clarkb | a redeclaration of a function from abort() in linux vs the declaration in openafs? | 15:56 |
clarkb | ya that seems to be the issue `void abort(void);` vs `static_inline void abort(void);` | 15:57 |
* clarkb looks to see where we get our openafs from in that job to see if it can be updated | 15:58 | |
clarkb | https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/931681 has been pushed in an attempt to solve this problem | 16:04 |
opendevreview | Brian Haley proposed openstack/project-config master: Update the Neutron grafana dashboards https://review.opendev.org/c/openstack/project-config/+/931682 | 16:14 |
clarkb | updating the openafs version did fix the compile error on x86. I've rebased the linter update change onto the openafs version bump change as a result. That said arm builds are still failing and it looks like maybe we're got a stale kernel again. Maybe we raced upstream updates | 16:38 |
clarkb | looking at centos 9 stream packages the last kernel update was on the 27th of september. Maybe this was a stale ready node or maybe our images are built against a stale mirror? I'm digging a bit more | 16:44 |
clarkb | oh maybe we haven't uploaded the image yet but it built? | 16:44 |
clarkb | yes I think thati s the issue the uploads must be failing | 16:45 |
clarkb | RemoteDisconnected('Remote end closed connection without response') | 16:47 |
clarkb | this is the error we're getting when trying to upload images | 16:47 |
clarkb | that explains the issue anyway. | 16:49 |
clarkb | Ramereth[m]: ^ fyi there appears to be some issue connecting to upload images to glance? I don't see anything telling on our side like invalid cert or similar. Maybe it is more obvious on your end? | 16:52 |
Ramereth[m] | checking | 16:52 |
Ramereth[m] | clarkb: try now | 16:55 |
clarkb | Ramereth[m]: thanks I think it is proceeding now (previously it took about a second to get the error and now there is an upload in process for a couple minutes or so) | 16:57 |
Ramereth[m] | for whatever reason the glance-api was in a segfault condition and I had to restart the service | 16:57 |
clarkb | that would do it. Thank you for checking and fixing it so quickly | 16:57 |
Ramereth[m] | odd that checking the healthcheck endpoint didn't return an error.. | 16:57 |
opendevreview | Mohammed Naser proposed zuul/zuul-jobs master: Add a method to disable multiarch image https://review.opendev.org/c/zuul/zuul-jobs/+/931578 | 17:29 |
clarkb | and we just successfully uploaded an image | 17:33 |
fungi | yay! | 17:33 |
opendevreview | Merged openstack/project-config master: Update the Neutron grafana dashboards https://review.opendev.org/c/openstack/project-config/+/931682 | 17:45 |
fungi | speaking of python 3.13, if anyone spends a lot of time fiddling around in idle you should check out the improvements there (e.g. multi-line editing). also exceptions with additional guidance like "NameError: name 're' is not defined. Did you forget to import 're'?" | 18:21 |
fungi | oh, though those exception strings actually showed up in 3.12 | 18:22 |
clarkb | the debugging messages have gotten better since like 3.10 too I think | 18:29 |
clarkb | incrementally | 18:29 |
clarkb | anything new to add to the meeting agenda? I'll clean up the release related stuff. Seems like we're in the lull after oepnstack's release (a good thing!) | 18:34 |
fungi | yeah, the only outstanding changes i have are cirros image cleanup and mailman version update | 18:41 |
clarkb | oh mm3 update is worth adding /me makes a note | 18:42 |
clarkb | I suspect we can just go ahead with the cirros cleanup if no one else reviews it today (I already +2'd it) | 18:42 |
fungi | yep, thanks! | 18:45 |
fungi | #status log Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org reducing volume utilization from 91% to 75% | 18:53 |
opendevstatus | fungi: finished logging | 18:53 |
fungi | that's 19 days from the last prune, it started alerting yesterday at 18 days though | 18:53 |
fungi | that's about as long as the previous interval | 18:54 |
Clark[m] | Eating lunch and wondering if there are any servers we have backups for that could be removed at this point maybe? | 19:18 |
clarkb | fungi: ask01, ethercalc02, etherpad01, gitea01, lists, review-dev01, and review01 all have backup dirs. Of those I feel like maybe ask01, ethercalc02, etherpad01, review-dev01 are safe to clean up? | 19:32 |
clarkb | the others are also theoretically safe unless we need to go deep into the backups and lists/review are less epehemeral tahn the other stuff so seem more likely to matter for that? | 19:32 |
fungi | i'd be fine dropping all of them at this point, we took server snapshots before decommissioning as well | 19:43 |
clarkb | I think each directory is a completely independent backup system so in theory we can just remove the top level dir then also remove the login credentials for the corresponding server | 19:49 |
clarkb | I can't remember if it is a separaet user per server or just a separate key per server but we'd clean that up as well as the backup target dir | 19:50 |
clarkb | maybe we pick the lowest risk server of the bunch (ask?) and work through it? I'll make a ntoe for the meeting tomorrow to bring this up | 19:50 |
clarkb | another appraoch (the one we've used histoically I think) is to attach a new target volume and let the old volume die on the vine | 19:51 |
fungi | "The backup server has a unique Unix user for each host to be backed up." https://docs.opendev.org/opendev/system-config/latest/sysadmin.html#backups | 19:55 |
clarkb | cool so the user + key material + the target dir could be cleaned up. Or as mentioned we can move the volume to the side and make a new volume | 19:55 |
clarkb | I'll mention that in the meeting agenda when I put it together | 19:56 |
fungi | and looking in /etc/passwd on a backup server confirms that too, e.g. the borg-gitea01 user entry | 19:56 |
clarkb | for timing on the ansible 9 default bump in opendev zuul tenants maybe we tell the openstack tc meeting it is happening tomorrow then merge it during our meeting or around that timing? | 20:13 |
corvus | no objection | 20:16 |
fungi | wfm | 20:16 |
clarkb | I've updated the meeting agenda to include the ansible 9 switch, mm3 upgrade, backup server pruning, and removed the openstack release topic | 20:48 |
opendevreview | Mohammed Naser proposed zuul/zuul-jobs master: Add a method to disable multiarch image https://review.opendev.org/c/zuul/zuul-jobs/+/931578 | 20:48 |
fungi | thanks! | 20:50 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: WIP: testing https://review.opendev.org/c/opendev/zuul-jobs/+/931347 | 21:42 |
opendevreview | James E. Blair proposed opendev/zuul-jobs master: Finish upload job https://review.opendev.org/c/opendev/zuul-jobs/+/931355 | 21:55 |
corvus | clarkb: fungi ^ okay that's reworked to use the swift cli instead of openstacksdk; testing indicates that works, so i think that's safe to re-review and approve | 21:56 |
clarkb | corvus: out of curiousity is there a reason you use python-swiftclient intead of openstacklient? I'm wondering if that should be recorded if there is a specific need for that | 22:06 |
corvus | clarkb: i was under the impression that's what provides the 'swift' cli program | 22:11 |
corvus | perhaps there's something like an "openstack object upload" command, but if so, i suspect it would have the same issues with slo that the python code path encountered | 22:12 |
corvus | i did not test it | 22:13 |
corvus | okay, i'm sure this would be a fascinating rabbit hole to dive into, but i'm not going to right now. but to summarize: | 22:19 |
corvus | python3-openstackclient installs 155 packages and does not provide the swift binary | 22:20 |
corvus | python3-swiftclient installs 119 packages and does | 22:20 |
fungi | yeah, i think openstackclient has transitioned to using openstacksdk for swift interactions (so does not depend on swiftclient any longer) but you found a regression that has resulted needing swiftclient instead | 22:26 |
fungi | regression or lack of feature parity, not sure which (though seems more of the former than the latter) | 22:27 |
opendevreview | Mohammed Naser proposed zuul/zuul-jobs master: Stop using temporary registry https://review.opendev.org/c/zuul/zuul-jobs/+/931713 | 22:29 |
opendevreview | Mohammed Naser proposed zuul/zuul-jobs master: Stop using temporary registry https://review.opendev.org/c/zuul/zuul-jobs/+/931713 | 22:36 |
Clark[m] | fungi: ah ok that's something I wasn't sure of. In that case using swiftclient makes a lot of sense | 23:45 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!