opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 09:42 |
---|---|---|
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 09:45 |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 10:30 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Revert "Temporarily require Signed-Off-By in the sandbox" https://review.opendev.org/c/openstack/project-config/+/950997 | 13:40 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Replace OpenStack's CLA enforcement with the DCO https://review.opendev.org/c/openstack/project-config/+/950998 | 13:43 |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 14:15 |
clarkb | mnasiadka: tonyb re https://review.opendev.org/c/openstack/diskimage-builder/+/934045/61/diskimage_builder/elements/sysprep/bin/extract-image that is almost certainly a bug in centos 10 stream's images. In particular they are mislabeling / as /boot with the partition uuid. If we confirm that is the case (the changes and testing seem to confirm but I don't think anyone has manually | 14:16 |
clarkb | checked) my personal preference would be to not update extract-image and file a bug with centos. We can support centos 10 stream via the minimal element in the mean time | 14:16 |
mnasiadka | it is the case | 14:17 |
mnasiadka | there are three partitions in the image, MBR, root (labelled as boot looking at TYPE) and EFI vfat partition | 14:18 |
clarkb | then ya I don't think we should be bending over backwards to fix upstream. Upstream should be fixed | 14:19 |
mnasiadka | see https://paste.openstack.org/show/bzMZfu3yBYH615g03DtM/ (NBD mounted centos stream 10 cloud image qcow2) | 14:19 |
clarkb | particularly since we already have an alternative (the minimal element) | 14:19 |
mnasiadka | Ok, I'll wait for Zuul jobs to finish (just to check if what I did in extract-image works) - and then remove centos element testing from functests for stream 10 | 14:21 |
mnasiadka | And file a bug in centos | 14:21 |
clarkb | cool I think that is the ideal. If upstream insists that labeling / as /boot is not a bug then I guess we can reassess | 14:21 |
clarkb | however, / has specific uuids so I think this is difficult to dismiss | 14:22 |
mnasiadka | I can split the fix out to a separate patch so we don't loose it but mark it as WIP | 14:23 |
clarkb | did anyone check the coupel of reports of gerrit being inaccessible over the weekend? Probably just need to check server and container uptimes (though the container isn't supposed to auto restart so I would be really surprised if either of those stopepd and started again) | 14:38 |
fungi | i saw them but haven't checked on the server | 14:41 |
fungi | might be worth getting on cacti and seeing if there are any gaps in reporting | 14:41 |
clarkb | looking at root email a number of backups all failed on may 25 at 22:00UTC as well I wonder if those overlap? | 14:46 |
clarkb | oh I got the timestamp wrong I think my mail client showed me local times. But the email content shows utc as may 26 0500UTC ish | 14:47 |
clarkb | which does seem to overlap with the #openstack-infra message. I suspect this was a cloud level networking problem | 14:47 |
mnasiadka | clarkb: I had an issue yesterday in my morning, connection time outs - but then it fixed itself - it was 7:45 AM CEST | 14:58 |
clarkb | mnasiadka: ya I think that lines up with this batch of emails for failed backups to a server in the same cloud region. | 15:00 |
clarkb | So I suspect the error was not in the service itself but in the network delivery for the server (and that cloud region) | 15:00 |
clarkb | first email was 0505UTC and last was 0557UTC | 15:02 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Add missing ACL inheritance for openstack/grian-ui https://review.opendev.org/c/openstack/project-config/+/951010 | 15:03 |
clarkb | made some quick edits to the meeting agenda. Going to send that out in the next 5-10 minutes since it is late due to yesterday being a holiday | 15:56 |
clarkb | let me know if there is anything else to add | 15:56 |
clarkb | agenda has been sent. Sorry about the delay | 16:08 |
clarkb | I've checked review03 and docker ps -a and uptiem output don't indicate anything unexpected | 16:21 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Replace OpenStack's CLA enforcement with the DCO https://review.opendev.org/c/openstack/project-config/+/950998 | 16:21 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Deduplicate OpenStack ACL content https://review.opendev.org/c/openstack/project-config/+/951014 | 16:21 |
clarkb | I think the issue was networking within the cloud | 16:21 |
clarkb | or to the cloud | 16:21 |
fungi | more public git browsers crumbling under the load of llm training crawlers, i'd bet: https://lists.openinfra.org/archives/list/nordix@lists.openinfra.org/thread/H3VB7BE727MGRACNXTFQWTGU6FAIAXW2/ | 16:51 |
clarkb | mnasiadka: ok I've +2'd https://review.opendev.org/c/opendev/glean/+/941672 as mentioend in my followup there I think that we can proceed with the manual case and if it doesn't work as expected just let people using that case say something | 16:55 |
clarkb | the risk should be small since it is for an entirely new platform so we shouldn't have existing users out there | 16:55 |
clarkb | mnasiadka: also looks like the rocky zuul-providers change had POST_FAILUREs due to swift upload problems. Do we know what happened there? cc corvus | 16:56 |
corvus | hrm, what was the last post_failure issue we fixed? | 17:07 |
clarkb | I feel like we had issues around uploads on the refetch side not being able to uncompress/inflate things. But that was a while ago and I don't think that is this issue | 17:08 |
clarkb | corvus: but I wonder if maybe this is related to the image type change perhaps? | 17:08 |
clarkb | like maybe the upload list doesn't correspond to the build list? (I haven't looekd closely I just know I updated the job defs to get the image types from zuulconfig) | 17:09 |
corvus | well, https://review.opendev.org/949944 didn't merge because of... post_failures, so that shouldn't be at play :) | 17:09 |
clarkb | oh heh I missed that too | 17:09 |
clarkb | ok that discards that theory | 17:10 |
corvus | it was around the time where we were talking about removing no_log | 17:10 |
corvus | then i took a guess at the problem and it turned out to be right and we moved on | 17:10 |
corvus | i just forgot what that was, and want to double check we're not repeating ourselves :) | 17:11 |
clarkb | I think that was debugged when I was in texas so I didn't follow it too closely | 17:11 |
clarkb | mnasiadka: frickler ^ do you remember? | 17:11 |
corvus | yep that was the time | 17:11 |
corvus | https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2025-05-07.log.html#opendev.2025-05-07.log.html#t2025-05-07T16:07:41 | 17:12 |
corvus | the issue was "building on the wrong node" | 17:13 |
corvus | clarkb: and it was you and i together that guessed it :) | 17:13 |
corvus | looks like those are all happening on noble, so it's not that issue. | 17:15 |
clarkb | it is also affecting arm and x86 | 17:16 |
corvus | https://zuul.opendev.org/t/opendev/buildsets suggests that uploads are generally working | 17:17 |
corvus | oh but this wasn't today | 17:17 |
corvus | this failed on the 23rd | 17:17 |
corvus | https://zuul.opendev.org/t/opendev/buildset/42d557ed15b142db9b5c091d85474096 | 17:17 |
corvus | and the periodic builds all failed then too | 17:17 |
corvus | i think that points to "cloud had a bad day" | 17:17 |
corvus | and we should just recheck | 17:17 |
clarkb | ya or maybe some dependency made a bad relase then followed up with soime sort of fix | 17:18 |
corvus | (it had a bad day, but it's feeling better now) | 17:18 |
corvus | that too | 17:18 |
corvus | i did a recheck. let's let that merge, then we can try the zuul image types change again | 17:18 |
corvus | which failed the day before... so maybe it was 2 bad days? | 17:19 |
corvus | https://zuul.opendev.org/t/opendev/buildset/cc016a4db5324d4ba9fb4b297e3f9f14 says yes | 17:19 |
clarkb | wfm | 17:19 |
clarkb | corvus: while looking at this I looekd at the jobs which failed which seem to run image-upload-swift but in zuul-jobs we have upload-image-swift. Are they distinct? I'm suddenly quite confused over where that code is coming from | 17:20 |
clarkb | ah image-upload-swift is in zuul-providers | 17:20 |
corvus | oh we should switch to zuul-jobs | 17:20 |
corvus | it started life locally, but i think we can switch to the "upstreamed" version now :) | 17:20 |
clarkb | and is apparently fulyl distinct to what is in zuul-jobs | 17:20 |
corvus | originally it was kind of "do whatever we need to do for opendev" but once i wrote the s3 version too, i got a good idea of what the general implementation should be | 17:21 |
clarkb | corvus: should I recheck or will you do that? | 17:24 |
corvus | i thought i did | 17:24 |
corvus | https://review.opendev.org/949696 is priority right? | 17:24 |
clarkb | https://review.opendev.org/c/opendev/zuul-providers/+/949944 I don't see one on that | 17:24 |
corvus | (i didn't recheck the image formats change) | 17:25 |
clarkb | got it. Ya probably having the extra images is more important | 17:25 |
corvus | yeah, let's omit the recheck until rocky lands (in case rocky doesn't land and formats does, i don't want to introduce a new variable) | 17:25 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Switch to zuul-jobs upload-image-swift https://review.opendev.org/c/opendev/zuul-providers/+/951018 | 17:30 |
corvus | clarkb: ^ that should dogfood the zuul-jobs role | 17:31 |
clarkb | corvus: but we won't really exercise it until we approve it so it runs in the gate? | 17:31 |
clarkb | or do we upload the image in check too? | 17:32 |
corvus | gate only, but is self-testing | 17:32 |
corvus | so after rocky merges, safe to approve and it works or doesn't. ;) | 17:32 |
clarkb | ack | 17:32 |
clarkb | just making sure I understand how to interpret the test results. I thought it was gate only | 17:32 |
clarkb | +2 from me. Not sure if we want to wait for anyone else to review | 17:35 |
opendevreview | Merged opendev/zuul-providers master: Add Rocky 8/9 builds, labels and provider config https://review.opendev.org/c/opendev/zuul-providers/+/949696 | 19:03 |
clarkb | corvus: I made a note on https://review.opendev.org/c/opendev/zuul-providers/+/948989 about the role switch and needing to port no log support there and coordinating those efforts with the application/use of no_log removal | 20:04 |
clarkb | corvus: might be good to take a look and see if you have a preferred path for that and then we can update changes with depends-on as necessary | 20:04 |
corvus | clarkb: honestly, i think until zuul-launcher is finished, we can consider that zuul-jobs role pretty flexible (it's basically just still us using it). so i think we could do either order. | 20:08 |
corvus | (and given that, i'd say let's proceed with the role switch then update no_log if/when we decide we have consensus; i think that keeps things moving the fastest) | 20:09 |
Clark[m] | Wfm | 20:10 |
clarkb | https://etherpad.opendev.org/p/cLZ-5ZAqdoph9Lle04zZ how does this look for announcing the hashtags change? | 21:01 |
clarkb | looks like it is ~time to reprune the smaller backup server | 21:07 |
fungi | i did it pretty recently, i think maybe on the 9th when i was waiting to board a flight home or something | 21:11 |
fungi | so not even 3 weeks ago? | 21:11 |
clarkb | fungi: I think part of the issue there is we haven't retired the review02 backups | 21:12 |
clarkb | so we're carrying two reviews worth | 21:12 |
clarkb | let me see about a stack of chagnes to fix that | 21:12 |
opendevreview | Clark Boylan proposed opendev/system-config master: Retire review02 backups on the smaller backup server https://review.opendev.org/c/opendev/system-config/+/951040 | 21:17 |
opendevreview | Clark Boylan proposed opendev/system-config master: Purge paste01 and review02 backups on the smaller backup host https://review.opendev.org/c/opendev/system-config/+/951041 | 21:17 |
clarkb | fungi: ^ if we land 951040 then prune we'll prune the backups for review02 except for the most recent one | 21:17 |
fungi | awesome, thanks! looking now | 21:17 |
fungi | approved the first and +2 on the second | 21:19 |
clarkb | the second change will delete that most recent set of backups too. I'm not sure we're ready ofr that so split things up | 21:19 |
fungi | yep, agreed, was giving others an opportunity to weigh in on that one | 21:19 |
clarkb | fungi: any chance you can weigh in on https://etherpad.opendev.org/p/cLZ-5ZAqdoph9Lle04zZ before I send that email? | 21:20 |
clarkb | you'll be referring to it so want to make sure it has the info you think is necessary | 21:20 |
corvus | clarkb: qq in the etherpad | 21:44 |
clarkb | corvus: good point I'll fix | 21:48 |
clarkb | corvus: does that look better? | 21:48 |
corvus | wfm | 21:49 |
clarkb | corvus: https://zuul.opendev.org/t/opendev/build/9b02b0c279fa43539c9c1b9c062cad21 this is a new one for me. That ran on openmetal so not rax classic where we have to use the ephemeral drive | 21:56 |
clarkb | corvus: I'm wondering if we need to rm things that have already been processed to free up space as we go? | 21:56 |
clarkb | https://zuul.opendev.org/t/opendev/build/9b02b0c279fa43539c9c1b9c062cad21/log/job-output.txt#9810 confirms that sure enough we're at the disk limit | 21:57 |
clarkb | corvus: like maybe after this compress image step: https://zuul.opendev.org/t/opendev/build/9b02b0c279fa43539c9c1b9c062cad21/log/job-output.txt#9697 we can delete the uncompressed image? | 21:58 |
clarkb | I'll propose that change | 21:59 |
corvus | yeah, we should be able to, i guess zst doesn't | 22:00 |
clarkb | corvus: ya I tested locally and zstd seems to keep the original around | 22:00 |
clarkb | though let me read the manpage maybe there is an option to have it do this automatically | 22:00 |
corvus | should be i think | 22:01 |
clarkb | Source files are preserved by default. It's possible to remove them automatically by using the --rm command. | 22:01 |
clarkb | so that is an easy patch. I'll push to both roles | 22:01 |
opendevreview | Clark Boylan proposed opendev/zuul-providers master: Have zstd remove the source file after compression https://review.opendev.org/c/opendev/zuul-providers/+/951044 | 22:03 |
corvus | i think this is a good change -- but also, i think it means that after the build, we only have between 7-14 GB free, which is still worrying. | 22:03 |
fungi | oh, right sorry, was meaning to check the draft announcement | 22:04 |
clarkb | corvus: looking at zuul-jobs:roles/upload-image-swift I don't think we do any compression there | 22:04 |
clarkb | corvus: wondering if we need to do a larger sync from opendev/zuul-providers in that role | 22:04 |
corvus | no, the compression happens in the post playbook | 22:05 |
corvus | (so outside of the scope of the role) | 22:05 |
corvus | we could move that into the role though | 22:05 |
clarkb | oh I see | 22:05 |
fungi | maybe worth noting that projects can still override the new default in their own acls if necessary? | 22:05 |
clarkb | for now this is probably ok? | 22:05 |
corvus | i can't think of a reason not to, since it's now baked into zuul-launcher | 22:05 |
clarkb | fungi: only if they use exclusive perms though. Not sure if we want to encourage that | 22:05 |
corvus | clarkb: yes i think so | 22:06 |
corvus | fungi: clarkb i don't want to encourage that :) | 22:06 |
clarkb | corvus: re disk space. Should we add a quick post image build df to the job? | 22:06 |
fungi | clarkb: yeah, i anticipate getting asked. maybe i'll just answer when it comes up instead | 22:06 |
corvus | clarkb: good idea | 22:06 |
opendevreview | Clark Boylan proposed opendev/zuul-providers master: Have zstd remove the source file after compression https://review.opendev.org/c/opendev/zuul-providers/+/951044 | 22:08 |
opendevreview | Clark Boylan proposed opendev/zuul-providers master: Collect df output after building disk images https://review.opendev.org/c/opendev/zuul-providers/+/951045 | 22:08 |
corvus | i think we should tell people that if they need to be able to annotate changes in a way that requires restricted access, then review labels are appropriate for that, because access can be set on them individually and specifically, and that is reflected in the user interface, which will show users what labels they can and can't edit. | 22:08 |
clarkb | corvus: I point the df change under the zstd --rm change so that we can see how close we are before the zstd --rm update | 22:09 |
clarkb | s/point/put/ | 22:09 |
clarkb | I think we've also developed this bad habit in openstack where we assume everyone is acting nefariously all the time and we need to restrict access to things like that... | 22:10 |
clarkb | seems like we should be able to work with those problems socially and let people take advantage of the tooling rather than be overly concerned about preventing it in the first place | 22:11 |
corvus | right, let's assume people are just trying to collaborate in good faith :) | 22:11 |
fungi | yes, unfortunately in this case the main cluster of acls are for repos maintained by the primary leadership body of the project | 22:11 |
fungi | it's the highlight of my week when i get to tell the openstack tc collectively that they're being paranoid and to stop worrying so much | 22:12 |
clarkb | fungi: I can tell them if you'd prefer :) | 22:12 |
corvus | hashtags are super-useful for annotating cross-project changes, and users are running into a brick wall when they work differently in different projects. | 22:13 |
corvus | i think users are finding it surprising that they work in some places, not others. unlike labels, where there is an expectation that there are different access levels. | 22:13 |
fungi | agreed, i hit it just yesterday with the cla->dco changes, where the one pushed by another user doesn't have the tracking hashtag i used set on it because they didn't know where in the ui to do it and i couldn't do it for them (without elevating my account privs anyway) | 22:14 |
clarkb | fungi: are you happy with the draft as is then? | 22:15 |
fungi | yes | 22:15 |
clarkb | cool I'll get that setn out momentarily | 22:15 |
fungi | thanks! i'll try to sync up with the tc and kolla team tomorrow | 22:18 |
opendevreview | Merged opendev/system-config master: Retire review02 backups on the smaller backup server https://review.opendev.org/c/opendev/system-config/+/951040 | 22:19 |
clarkb | fungi: ^ that seems to have enqueued the necessary job | 22:19 |
clarkb | so basically we can check the retired flag in the review02 homedir on that backup server and once there we should be able to run pruning and have it do normal pruning and also prune everything but the last backup for review02 | 22:20 |
clarkb | fungi: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/SVXT6X4WRYE6XQPB5PTWFUKUATICLO56/ | 22:21 |
clarkb | I'm going to pop out in a few minutes for a bike ride. I'll wait to check that the review02 backups are marked retired properly first | 22:25 |
fungi | d'oh! it just dawned on me that we missed an opportunity to point out that git-review no longer auto-sets change topics on upload | 22:26 |
fungi | as of 2.5.0 | 22:26 |
clarkb | oh oops. Though until everyone upgradse that is probably less important in the immediate future | 22:27 |
clarkb | thats a long long term play | 22:27 |
clarkb | I think I need to upgrade speaking of that | 22:28 |
fungi | yeah, i figured it was more a bit of additional evidence of our transition away from relying on topics | 22:28 |
clarkb | looks like the borg backup job succeeded and the backup host has a .retired file in /opt/backups/borg-review02 | 22:31 |
clarkb | so I think we're good to prune and take advantage of that savings now | 22:31 |
fungi | cool, i'll start that running unless you're already doing so | 22:32 |
fungi | looks like you weren't, so i've got it running in a root screen session now | 22:34 |
clarkb | ya I wasn't sorry trying to get out before it is too late and take advantage of the warm afternoon sun (maybe too warm) | 22:35 |
clarkb | corvus: another idea I had is we can rm the dib cache dir after building the image since we don't reuse the node to build more than one image | 22:35 |
clarkb | corvus: but we can wait on doing that when we have a bit more info collected | 22:35 |
clarkb | (that cache dir has the git repo content which may be hardlinked in some cases and the apt/yum/dnf/ packages along with some other stuff stashed in it) | 22:36 |
corvus | clarkb: true, and that may help get us an extra 7g (the size of one compressed image) since we can do that before compressing the first image. there'st still that moment at the end of the build where we have 3 uncompressed images and the git cache though. | 23:04 |
corvus | (but 7g is still a big percentage of an 80g disk, so that's substantial) | 23:05 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!