Tuesday, 2025-05-27

opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404509:42
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404509:45
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404510:30
opendevreviewJeremy Stanley proposed openstack/project-config master: Revert "Temporarily require Signed-Off-By in the sandbox"  https://review.opendev.org/c/openstack/project-config/+/95099713:40
opendevreviewJeremy Stanley proposed openstack/project-config master: Replace OpenStack's CLA enforcement with the DCO  https://review.opendev.org/c/openstack/project-config/+/95099813:43
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404514:15
clarkbmnasiadka: tonyb  re https://review.opendev.org/c/openstack/diskimage-builder/+/934045/61/diskimage_builder/elements/sysprep/bin/extract-image that is almost certainly a bug in centos 10 stream's images. In particular they are mislabeling / as /boot with the partition uuid. If we confirm that is the case (the changes and testing seem to confirm but I don't think anyone has manually14:16
clarkbchecked) my personal preference would be to not update extract-image and file a bug with centos. We can support centos 10 stream via the minimal element in the mean time14:16
mnasiadkait is the case14:17
mnasiadkathere are three partitions in the image, MBR, root (labelled as boot looking at TYPE) and EFI vfat partition14:18
clarkbthen ya I don't think we should be bending over backwards to fix upstream. Upstream should be fixed14:19
mnasiadkasee https://paste.openstack.org/show/bzMZfu3yBYH615g03DtM/ (NBD mounted centos stream 10 cloud image qcow2)14:19
clarkbparticularly since we already have an alternative (the minimal element)14:19
mnasiadkaOk, I'll wait for Zuul jobs to finish (just to check if what I did in extract-image works) - and then remove centos element testing from functests for stream 1014:21
mnasiadkaAnd file a bug in centos14:21
clarkbcool I think that is the ideal. If upstream insists that labeling / as /boot is not a bug then I guess we can reassess14:21
clarkbhowever, / has specific uuids so I think this is difficult to dismiss14:22
mnasiadkaI can split the fix out to a separate patch so we don't loose it but mark it as WIP14:23
clarkbdid anyone check the coupel of reports of gerrit being inaccessible over the weekend? Probably just need to check server and container uptimes (though the container isn't supposed to auto restart so I would be really surprised if either of those stopepd and started again)14:38
fungii saw them but haven't checked on the server14:41
fungimight be worth getting on cacti and seeing if there are any gaps in reporting14:41
clarkblooking at root email a number of backups all failed on may 25 at 22:00UTC as well I wonder if those overlap?14:46
clarkboh I got the timestamp wrong I think my mail client showed me local times. But the email content shows utc as may 26 0500UTC ish14:47
clarkbwhich does seem to overlap with the #openstack-infra message. I suspect this was a cloud level networking problem14:47
mnasiadkaclarkb: I had an issue yesterday in my morning, connection time outs - but then it fixed itself - it was 7:45 AM CEST14:58
clarkbmnasiadka: ya I think that lines up with this batch of emails for failed backups to a server in the same cloud region.15:00
clarkbSo I suspect the error was not in the service itself but in the network delivery for the server (and that cloud region)15:00
clarkbfirst email was 0505UTC and last was 0557UTC15:02
opendevreviewJeremy Stanley proposed openstack/project-config master: Add missing ACL inheritance for openstack/grian-ui  https://review.opendev.org/c/openstack/project-config/+/95101015:03
clarkbmade some quick edits to the meeting agenda. Going to send that out in the next 5-10 minutes since it is late due to yesterday being a holiday15:56
clarkblet me know if there is anything else to add15:56
clarkbagenda has been sent. Sorry about the delay16:08
clarkbI've checked review03 and docker ps -a and uptiem output don't indicate anything unexpected16:21
opendevreviewJeremy Stanley proposed openstack/project-config master: Replace OpenStack's CLA enforcement with the DCO  https://review.opendev.org/c/openstack/project-config/+/95099816:21
opendevreviewJeremy Stanley proposed openstack/project-config master: Deduplicate OpenStack ACL content  https://review.opendev.org/c/openstack/project-config/+/95101416:21
clarkbI think the issue was networking within the cloud16:21
clarkbor to the cloud16:21
fungimore public git browsers crumbling under the load of llm training crawlers, i'd bet: https://lists.openinfra.org/archives/list/nordix@lists.openinfra.org/thread/H3VB7BE727MGRACNXTFQWTGU6FAIAXW2/16:51
clarkbmnasiadka: ok I've +2'd https://review.opendev.org/c/opendev/glean/+/941672 as mentioend in my followup there I think that we can proceed with the manual case and if it doesn't work as expected just let people using that case say something16:55
clarkbthe risk should be small since it is for an entirely new platform so we shouldn't have existing users out there16:55
clarkbmnasiadka: also looks like the rocky zuul-providers change had POST_FAILUREs due to swift upload problems. Do we know what happened there? cc corvus16:56
corvushrm, what was the last post_failure issue we fixed?17:07
clarkbI feel like we had issues around uploads on the refetch side not being able to uncompress/inflate things. But that was a while ago and I don't think that is this issue17:08
clarkbcorvus: but I wonder if maybe this is related to the image type change perhaps?17:08
clarkblike maybe the upload list doesn't correspond to the build list? (I haven't looekd closely I just know I updated the job defs to get the image types from zuulconfig)17:09
corvuswell, https://review.opendev.org/949944 didn't merge because of... post_failures, so that shouldn't be at play :)17:09
clarkboh heh I missed that too17:09
clarkbok that discards that theory17:10
corvusit was around the time where we were talking about removing no_log17:10
corvusthen i took a guess at the problem and it turned out to be right and we moved on17:10
corvusi just forgot what that was, and want to double check we're not repeating ourselves :)17:11
clarkbI think that was debugged when I was in texas so I didn't follow it too closely17:11
clarkbmnasiadka: frickler  ^ do you remember?17:11
corvusyep that was the time17:11
corvushttps://meetings.opendev.org/irclogs/%23opendev/%23opendev.2025-05-07.log.html#opendev.2025-05-07.log.html#t2025-05-07T16:07:4117:12
corvusthe issue was "building on the wrong node"17:13
corvusclarkb: and it was you and i together that guessed it :)17:13
corvuslooks like those are all happening on noble, so it's not that issue.17:15
clarkbit is also affecting arm and x8617:16
corvushttps://zuul.opendev.org/t/opendev/buildsets suggests that uploads are generally working17:17
corvusoh but this wasn't today17:17
corvusthis failed on the 23rd17:17
corvushttps://zuul.opendev.org/t/opendev/buildset/42d557ed15b142db9b5c091d8547409617:17
corvusand the periodic builds all failed then too17:17
corvusi think that points to "cloud had a bad day"17:17
corvusand we should just recheck17:17
clarkbya or maybe some dependency made a bad relase then followed up with soime sort of fix17:18
corvus(it had a bad day, but it's feeling better now)17:18
corvusthat too17:18
corvusi did a recheck.  let's let that merge, then we can try the zuul image types change again17:18
corvuswhich failed the day before... so maybe it was 2 bad days?17:19
corvushttps://zuul.opendev.org/t/opendev/buildset/cc016a4db5324d4ba9fb4b297e3f9f14 says yes17:19
clarkbwfm17:19
clarkbcorvus: while looking at this I looekd at the jobs which failed which seem to run image-upload-swift but in zuul-jobs we have upload-image-swift. Are they distinct? I'm suddenly quite confused over where that code is coming from17:20
clarkbah image-upload-swift is in zuul-providers17:20
corvusoh we should switch to zuul-jobs17:20
corvusit started life locally, but i think we can switch to the "upstreamed" version now :)17:20
clarkband is apparently fulyl distinct to what is in zuul-jobs17:20
corvusoriginally it was kind of "do whatever we need to do for opendev" but once i wrote the s3 version too, i got a good idea of what the general implementation should be17:21
clarkbcorvus: should I recheck or will you do that?17:24
corvusi thought i did17:24
corvushttps://review.opendev.org/949696 is priority right?17:24
clarkbhttps://review.opendev.org/c/opendev/zuul-providers/+/949944 I don't see one on that17:24
corvus(i didn't recheck the image formats change)17:25
clarkbgot it. Ya probably having the extra images is more important17:25
corvusyeah, let's omit the recheck until rocky lands (in case rocky doesn't land and formats does, i don't want to introduce a new variable)17:25
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Switch to zuul-jobs upload-image-swift  https://review.opendev.org/c/opendev/zuul-providers/+/95101817:30
corvusclarkb: ^ that should dogfood the zuul-jobs role17:31
clarkbcorvus: but we won't really exercise it until we approve it so it runs in the gate?17:31
clarkbor do we upload the image in check too?17:32
corvusgate only, but is self-testing17:32
corvusso after rocky merges, safe to approve and it works or doesn't.  ;)17:32
clarkback17:32
clarkbjust making sure I understand how to interpret the test results. I thought it was gate only17:32
clarkb+2 from me. Not sure if we want to wait for anyone else to review17:35
opendevreviewMerged opendev/zuul-providers master: Add Rocky 8/9 builds, labels and provider config  https://review.opendev.org/c/opendev/zuul-providers/+/94969619:03
clarkbcorvus: I made a note on https://review.opendev.org/c/opendev/zuul-providers/+/948989 about the role switch and needing to port no log support there and coordinating those efforts with the application/use of no_log removal20:04
clarkbcorvus: might be good to take a look and see if you have a preferred path for that and then we can update changes with depends-on as necessary20:04
corvusclarkb: honestly, i think until zuul-launcher is finished, we can consider that zuul-jobs role pretty flexible (it's basically just still us using it).  so i think we could do either order.20:08
corvus(and given that, i'd say let's proceed with the role switch then update no_log if/when we decide we have consensus; i think that keeps things moving the fastest)20:09
Clark[m]Wfm20:10
clarkbhttps://etherpad.opendev.org/p/cLZ-5ZAqdoph9Lle04zZ how does this look for announcing the hashtags change?21:01
clarkblooks like it is ~time to reprune the smaller backup server21:07
fungii did it pretty recently, i think maybe on the 9th when i was waiting to board a flight home or something21:11
fungiso not even 3 weeks ago?21:11
clarkbfungi: I think part of the issue there is we haven't retired the review02 backups21:12
clarkbso we're carrying two reviews worth21:12
clarkblet me see about a stack of chagnes to fix that21:12
opendevreviewClark Boylan proposed opendev/system-config master: Retire review02 backups on the smaller backup server  https://review.opendev.org/c/opendev/system-config/+/95104021:17
opendevreviewClark Boylan proposed opendev/system-config master: Purge paste01 and review02 backups on the smaller backup host  https://review.opendev.org/c/opendev/system-config/+/95104121:17
clarkbfungi: ^ if we land 951040 then prune we'll prune the backups for review02 except for the most recent one21:17
fungiawesome, thanks! looking now21:17
fungiapproved the first and +2 on the second21:19
clarkbthe second change will delete that most recent set of backups too. I'm not sure we're ready ofr that so split things up21:19
fungiyep, agreed, was giving others an opportunity to weigh in on that one21:19
clarkbfungi: any chance you can weigh in on https://etherpad.opendev.org/p/cLZ-5ZAqdoph9Lle04zZ before I send that email?21:20
clarkbyou'll be referring to it so want to make sure it has the info you think is necessary21:20
corvusclarkb: qq in the etherpad21:44
clarkbcorvus: good point I'll fix21:48
clarkbcorvus: does that look better?21:48
corvuswfm21:49
clarkbcorvus: https://zuul.opendev.org/t/opendev/build/9b02b0c279fa43539c9c1b9c062cad21 this is a new one for me. That ran on openmetal so not rax classic where we have to use the ephemeral drive21:56
clarkbcorvus: I'm wondering if we need to rm things that have already been processed to free up space as we go?21:56
clarkbhttps://zuul.opendev.org/t/opendev/build/9b02b0c279fa43539c9c1b9c062cad21/log/job-output.txt#9810 confirms that sure enough we're at the disk limit21:57
clarkbcorvus: like maybe after this compress image step: https://zuul.opendev.org/t/opendev/build/9b02b0c279fa43539c9c1b9c062cad21/log/job-output.txt#9697 we can delete the uncompressed image?21:58
clarkbI'll propose that change21:59
corvusyeah, we should be able to, i guess zst doesn't22:00
clarkbcorvus: ya I tested locally and zstd seems to keep the original around22:00
clarkbthough let me read the manpage maybe there is an option to have it do this automatically22:00
corvusshould be i think22:01
clarkbSource files are preserved by default. It's possible to remove them automatically by using the --rm command.22:01
clarkbso that is an easy patch. I'll push to both roles22:01
opendevreviewClark Boylan proposed opendev/zuul-providers master: Have zstd remove the source file after compression  https://review.opendev.org/c/opendev/zuul-providers/+/95104422:03
corvusi think this is a good change -- but also, i think it means that after the build, we only have between 7-14 GB free, which is still worrying.22:03
fungioh, right sorry, was meaning to check the draft announcement22:04
clarkbcorvus: looking at zuul-jobs:roles/upload-image-swift I don't think we do any compression there22:04
clarkbcorvus: wondering if we need to do a larger sync from opendev/zuul-providers in that role22:04
corvusno, the compression happens in the post playbook22:05
corvus(so outside of the scope of the role)22:05
corvuswe could move that into the role though22:05
clarkboh I see22:05
fungimaybe worth noting that projects can still override the new default in their own acls if necessary?22:05
clarkbfor now this is probably ok?22:05
corvusi can't think of a reason not to, since it's now baked into zuul-launcher22:05
clarkbfungi: only if they use exclusive perms though. Not sure if we want to encourage that22:05
corvusclarkb: yes i think so22:06
corvusfungi: clarkb i don't want to encourage that :)22:06
clarkbcorvus: re disk space. Should we add a quick post image build df to the job?22:06
fungiclarkb: yeah, i anticipate getting asked. maybe i'll just answer when it comes up instead22:06
corvusclarkb: good idea22:06
opendevreviewClark Boylan proposed opendev/zuul-providers master: Have zstd remove the source file after compression  https://review.opendev.org/c/opendev/zuul-providers/+/95104422:08
opendevreviewClark Boylan proposed opendev/zuul-providers master: Collect df output after building disk images  https://review.opendev.org/c/opendev/zuul-providers/+/95104522:08
corvusi think we should tell people that if they need to be able to annotate changes in a way that requires restricted access, then review labels are appropriate for that, because access can be set on them individually and specifically, and that is reflected in the user interface, which will show users what labels they can and can't edit.22:08
clarkbcorvus: I point the df change under the zstd --rm change so that we can see how close we are before the zstd --rm update22:09
clarkbs/point/put/22:09
clarkbI think we've also developed this bad habit in openstack where we assume everyone is acting nefariously all the time and we need to restrict access to things like that...22:10
clarkbseems like we should be able to work with those problems socially and let people take advantage of the tooling rather than be overly concerned about preventing it in the first place22:11
corvusright, let's assume people are just trying to collaborate in good faith :)22:11
fungiyes, unfortunately in this case the main cluster of acls are for repos maintained by the primary leadership body of the project22:11
fungiit's the highlight of my week when i get to tell the openstack tc collectively that they're being paranoid and to stop worrying so much22:12
clarkbfungi: I can tell them if you'd prefer :)22:12
corvushashtags are super-useful for annotating cross-project changes, and users are running into a brick wall when they work differently in different projects.22:13
corvusi think users are finding it surprising that they work in some places, not others.  unlike labels, where there is an expectation that there are different access levels.22:13
fungiagreed, i hit it just yesterday with the cla->dco changes, where the one pushed by another user doesn't have the tracking hashtag i used set on it because they didn't know where in the ui to do it and i couldn't do it for them (without elevating my account privs anyway)22:14
clarkbfungi: are you happy with the draft as is then?22:15
fungiyes22:15
clarkbcool I'll get that setn out momentarily22:15
fungithanks! i'll try to sync up with the tc and kolla team tomorrow22:18
opendevreviewMerged opendev/system-config master: Retire review02 backups on the smaller backup server  https://review.opendev.org/c/opendev/system-config/+/95104022:19
clarkbfungi: ^ that seems to have enqueued the necessary job22:19
clarkbso basically we can check the retired flag in the review02 homedir on that backup server and once there we should be able to run pruning and have it do normal pruning and also prune everything but the last backup for review0222:20
clarkbfungi: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/SVXT6X4WRYE6XQPB5PTWFUKUATICLO56/22:21
clarkbI'm going to pop out in a few minutes for a bike ride. I'll wait to check that the review02 backups are marked retired properly first22:25
fungid'oh! it just dawned on me that we missed an opportunity to point out that git-review no longer auto-sets change topics on upload22:26
fungias of 2.5.022:26
clarkboh oops. Though until everyone upgradse that is probably less important in the immediate future22:27
clarkbthats a long long term play22:27
clarkbI think I need to upgrade speaking of that22:28
fungiyeah, i figured it was more a bit of additional evidence of our transition away from relying on topics22:28
clarkblooks like the borg backup job succeeded and the backup host has a .retired file in /opt/backups/borg-review0222:31
clarkbso I think we're good to prune and take advantage of that savings now22:31
fungicool, i'll start that running unless you're already doing so22:32
fungilooks like you weren't, so i've got it running in a root screen session now22:34
clarkbya I wasn't sorry trying to get out before it is too late and take advantage of the warm afternoon sun (maybe too warm)22:35
clarkbcorvus: another idea I had is we can rm the dib cache dir after building the image since we don't reuse the node to build more than one image22:35
clarkbcorvus: but we can wait on doing that when we have a bit more info collected22:35
clarkb(that cache dir has the git repo content which may be hardlinked in some cases and the apt/yum/dnf/ packages along with some other stuff stashed in it)22:36
corvusclarkb: true, and that may help get us an extra 7g (the size of one compressed image) since we can do that before compressing the first image.  there'st still that moment at the end of the build where we have 3 uncompressed images and the git cache though.23:04
corvus(but 7g is still a big percentage of an 80g disk, so that's substantial)23:05

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!