Monday, 2024-10-07

Clark[m]fungi: this weekend ended up being a wash so didn't try to finish up nb04 cleanup. It does look like we have updated images though so that change should be landable now I think00:36
opendevreviewMerged openstack/diskimage-builder master: Setup C.UTF-8 as a RPM install lang in yum-minimal  https://review.opendev.org/c/openstack/diskimage-builder/+/93093201:25
ravlewHello, would it be possible to hold clearing the CI/CD instance for kolla-ansible-rocky9 after job is done for my patch? https://review.opendev.org/c/openstack/kolla-ansible/+/90495909:00
ravlewI have tried to deploy it by hand on my own and I am not encountering the issue of unhealthy neutron_ovn_agent. Example here: https://zuul.opendev.org/t/openstack/build/0bc0d7811df34f80bc4e1e7c6c81336b09:00
opendevreviewPierre Riteau proposed openstack/project-config master: Cache new cirros 0.6.3 images  https://review.opendev.org/c/openstack/project-config/+/93163109:55
mnasiadkaclarkb: Since 930932 got merged - how often do you rebuild the images? (just looking how to check if that has fixed anything)10:20
fungiClark[m]: yep, sorry, my workstation rebooted spontaneously over friday night and i lost track of the fact that i was watching that screen session. i agree the delete finished and nb04 has 325G free in /opt now so i'll proceed with the reboot12:45
fungilooks like nb04 is on its way back up now12:49
fungilosetup now only shows the three loop devices for snapd12:51
fungiand the builder is currently creating a ubuntu-jammy-arm64 image12:52
fungi#status log Rebooted nb04 to clear leaked loop devices after cleaning up /opt12:52
opendevstatusfungi: finished logging12:52
cardoecorvus: I'm back from PTO. So no swift SLO issue in rax-flex?13:04
fungiravlew: i've set an autohold and rechecked your change (noting that you're investigating a problem with ovn agent startup on rocky), once the job fails i'll need to know what public ssh key you want me to add to the held node13:12
fungimnasiadka: image rebuild frequency varies depending on how popular we observe certain images to be. see the "rebuild-age" values set/inherited by the diskimages defined here: https://opendev.org/openstack/project-config/src/branch/master/nodepool/nodepool.yaml#L159-L40213:14
fungicould be anywhere from daily to weekly13:15
fungimnasiadka: you can also see the current image build ages at http://nl01.opendev.org/dib-image-list13:17
funginodepool normally keeps the two most recent successful image builds, so you would want to consider the newer of the two for whatever image(s) you're interested in13:18
opendevreviewMerged openstack/project-config master: Replace 2024.2/Dalmatian key with 2025.1/Epoxy  https://review.opendev.org/c/openstack/project-config/+/92703113:23
Clark[m]mnasiadka: fungi: we consume dib releases in node pool so need a new dib release first13:34
cardoeJayF: clarkb: would like to be involved in the pre-commit convo when you guys have it.13:37
opendevreviewJeremy Stanley proposed openstack/project-config master: Stop caching CirrOS 0.5.2 and 0.6.1 images  https://review.opendev.org/c/openstack/project-config/+/93165913:51
*** ykarel_ is now known as ykarel14:01
corvuscardoe: correct, issue appears to be related to sdk, sorry for false alarm14:03
Clark[m]Still an open question on acls but we were told to try them and see what happens 14:20
clarkbI've rechecked https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 now that the centos arm image has been rebuilt. That should pass this time around15:01
clarkbgmann: kopecmartin: separately I'd like to land https://review.opendev.org/c/openstack/project-config/+/931320 today to bump the opendev ansible version in zuul to version 9. The openstack release is over and prior testing (as well as other tenants) have shown this should generally work15:02
opendevreviewMerged openstack/project-config master: Cache new cirros 0.6.3 images  https://review.opendev.org/c/openstack/project-config/+/93163115:10
clarkbpython3.13 has been released15:15
clarkbI wonder how long it will be for all the "your software breaks if the gil is disabled" bugs to roll in15:16
opendevreviewTakashi Kajinami proposed opendev/system-config master: Mirror puppetlabs packages for Ubuntu Noble  https://review.opendev.org/c/opendev/system-config/+/93167015:17
fungiclarkb: not tagged yet. it's still on rc3 as of a week ago15:18
clarkbfungi: ah yup I guess it releases today and some people have jumped the gun on calling it released15:19
fungithere was some discussion about extending testing with an additional rc in order to roll back a new feature which wasn't quite there performance-wise yet15:19
clarkbno claimed performance benefits this time around. Seems to be more of a stage setting releasd for the next round of effort around performance (GIL removal and JIT usage)15:19
fungihttps://discuss.python.org/t/incremental-gc-and-pushing-back-the-3-13-0-release/6528515:20
fungithe incremental gc turned out to be problematic for some things15:21
fungiand *there's* the tag15:24
fungicompiling now15:24
clarkbparaphrasing: object count is a poor proxy for memory consumption. <- that does seem like something that would have drasticly different performance behaviors depending on application15:24
fungiagreed15:26
clarkb926970 is failing on x86 rpm builds now. https://zuul.opendev.org/t/openstack/build/c0ee5fa387294d428dc9e951471e3a8d it almost looks like the issue here is not the kernel being out of date on our image but that the version of openafs may not compile against whatever kernel is in centos 9 stream now?15:55
clarkba redeclaration of a function from abort() in linux vs the declaration in openafs?15:56
clarkbya that seems to be the issue `void abort(void);` vs `static_inline void abort(void);`15:57
* clarkb looks to see where we get our openafs from in that job to see if it can be updated15:58
clarkbhttps://review.opendev.org/c/openstack/openstack-zuul-jobs/+/931681 has been pushed in an attempt to solve this problem16:04
opendevreviewBrian Haley proposed openstack/project-config master: Update the Neutron grafana dashboards  https://review.opendev.org/c/openstack/project-config/+/93168216:14
clarkbupdating the openafs version did fix the compile error on x86. I've rebased the linter update change onto the openafs version bump change as a result. That said arm builds are still failing and it looks like maybe we're got a stale kernel again. Maybe we raced upstream updates16:38
clarkblooking at centos 9 stream packages the last kernel update was on the 27th of september. Maybe this was a stale ready node or maybe our images are built against a stale mirror? I'm digging a bit more16:44
clarkboh maybe we haven't uploaded the image yet but it built?16:44
clarkbyes I think thati s the issue the uploads must be failing16:45
clarkbRemoteDisconnected('Remote end closed connection without response')16:47
clarkbthis is the error we're getting when trying to upload images16:47
clarkbthat explains the issue anyway.16:49
clarkbRamereth[m]: ^ fyi there appears to be some issue connecting to upload images to glance? I don't see anything telling on our side like invalid cert or similar. Maybe it is more obvious on your end?16:52
Ramereth[m]checking16:52
Ramereth[m]clarkb: try now16:55
clarkbRamereth[m]: thanks I think it is proceeding now (previously it took about a second to get the error and now there is an upload in process for a couple minutes or so)16:57
Ramereth[m]for whatever reason the glance-api was in a segfault condition and I had to restart the service16:57
clarkbthat would do it. Thank you for checking and fixing it so quickly16:57
Ramereth[m]odd that checking the healthcheck endpoint didn't return an error..16:57
opendevreviewMohammed Naser proposed zuul/zuul-jobs master: Add a method to disable multiarch image  https://review.opendev.org/c/zuul/zuul-jobs/+/93157817:29
clarkband we just successfully uploaded an image17:33
fungiyay!17:33
opendevreviewMerged openstack/project-config master: Update the Neutron grafana dashboards  https://review.opendev.org/c/openstack/project-config/+/93168217:45
fungispeaking of python 3.13, if anyone spends a lot of time fiddling around in idle you should check out the improvements there (e.g. multi-line editing). also exceptions with additional guidance like "NameError: name 're' is not defined. Did you forget to import 're'?"18:21
fungioh, though those exception strings actually showed up in 3.1218:22
clarkbthe debugging messages have gotten better since like 3.10 too I think18:29
clarkbincrementally18:29
clarkbanything new to add to the meeting agenda? I'll clean up the release related stuff. Seems like we're in the lull after oepnstack's release (a good thing!)18:34
fungiyeah, the only outstanding changes i have are cirros image cleanup and mailman version update18:41
clarkboh mm3 update is worth adding /me makes a note18:42
clarkbI suspect we can just go ahead with the cirros cleanup if no one else reviews it today (I already +2'd it)18:42
fungiyep, thanks!18:45
fungi#status log Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org reducing volume utilization from 91% to 75%18:53
opendevstatusfungi: finished logging18:53
fungithat's 19 days from the last prune, it started alerting yesterday at 18 days though18:53
fungithat's about as long as the previous interval18:54
Clark[m]Eating lunch and wondering if there are any servers we have backups for that could be removed at this point maybe?19:18
clarkbfungi: ask01, ethercalc02, etherpad01, gitea01, lists, review-dev01, and review01 all have backup dirs. Of those I feel like maybe ask01, ethercalc02, etherpad01, review-dev01 are safe to clean up?19:32
clarkbthe others are also theoretically safe unless we need to go deep into the backups and lists/review are less epehemeral tahn the other stuff so seem more likely to matter for that?19:32
fungii'd be fine dropping all of them at this point, we took server snapshots before decommissioning as well19:43
clarkbI think each directory is a completely independent backup system so in theory we can just remove the top level dir then also remove the login credentials for the corresponding server19:49
clarkbI can't remember if it is a separaet user per server or just a separate key per server but we'd clean that up as well as the backup target dir19:50
clarkbmaybe we pick the lowest risk server of the bunch (ask?) and work through it? I'll make a ntoe for the meeting tomorrow to bring this up19:50
clarkbanother appraoch (the one we've used histoically I think) is to attach a new target volume and let the old volume die on the vine19:51
fungi"The backup server has a unique Unix user for each host to be backed up." https://docs.opendev.org/opendev/system-config/latest/sysadmin.html#backups19:55
clarkbcool so the user + key material + the target dir could be cleaned up. Or as mentioned we can move the volume to the side and make a new volume19:55
clarkbI'll mention that in the meeting agenda when I put it together19:56
fungiand looking in /etc/passwd on a backup server confirms that too, e.g. the borg-gitea01 user entry19:56
clarkbfor timing on the ansible 9 default bump in opendev zuul tenants maybe we tell the openstack tc meeting it is happening tomorrow then merge it during our meeting or around that timing?20:13
corvusno objection20:16
fungiwfm20:16
clarkbI've updated the meeting agenda to include the ansible 9 switch, mm3 upgrade, backup server pruning, and removed the openstack release topic20:48
opendevreviewMohammed Naser proposed zuul/zuul-jobs master: Add a method to disable multiarch image  https://review.opendev.org/c/zuul/zuul-jobs/+/93157820:48
fungithanks!20:50
opendevreviewJames E. Blair proposed opendev/zuul-jobs master: WIP: testing  https://review.opendev.org/c/opendev/zuul-jobs/+/93134721:42
opendevreviewJames E. Blair proposed opendev/zuul-jobs master: Finish upload job  https://review.opendev.org/c/opendev/zuul-jobs/+/93135521:55
corvusclarkb: fungi ^ okay that's reworked to use the swift cli instead of openstacksdk; testing indicates that works, so i think that's safe to re-review and approve21:56
clarkbcorvus: out of curiousity is there a reason you use python-swiftclient intead of openstacklient? I'm wondering if that should be recorded if there is a specific need for that22:06
corvusclarkb: i was under the impression that's what provides the 'swift' cli program22:11
corvusperhaps there's something like an "openstack object upload" command, but if so, i suspect it would have the same issues with slo that the python code path encountered22:12
corvusi did not test it22:13
corvusokay, i'm sure this would be a fascinating rabbit hole to dive into, but i'm not going to right now.  but to summarize:22:19
corvuspython3-openstackclient installs 155 packages and does not provide the swift binary22:20
corvuspython3-swiftclient installs 119 packages and does22:20
fungiyeah, i think openstackclient has transitioned to using openstacksdk for swift interactions (so does not depend on swiftclient any longer) but you found a regression that has resulted needing swiftclient instead22:26
fungiregression or lack of feature parity, not sure which (though seems more of the former than the latter)22:27
opendevreviewMohammed Naser proposed zuul/zuul-jobs master: Stop using temporary registry  https://review.opendev.org/c/zuul/zuul-jobs/+/93171322:29
opendevreviewMohammed Naser proposed zuul/zuul-jobs master: Stop using temporary registry  https://review.opendev.org/c/zuul/zuul-jobs/+/93171322:36
Clark[m]fungi: ah ok that's something I wasn't sure of. In that case using swiftclient makes a lot of sense23:45

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!