Friday, 2025-07-11

*** ykarel_ is now known as ykarel08:53
frickleroh, adding labels for trixie likely would also be helpful ... :-D08:58
opendevreviewDr. Jens Harbott proposed opendev/zuul-providers master: Add labels for debian-trixie  https://review.opendev.org/c/opendev/zuul-providers/+/95470109:05
opendevreviewDr. Jens Harbott proposed opendev/zuul-providers master: Add labels for debian-trixie  https://review.opendev.org/c/opendev/zuul-providers/+/95470109:18
fricklercorvus: ^^ the config error for PS1 was ... not too helpful, not sure if that could be improved?09:19
fricklerI'm going to self-approve the update now in order to get some testing done, feel free to revert/amend later. like I only added new style labels now, assuming that we've passed the testing phase. one thing to discuss might be whether we want to keep some default labels without RAM spec to keep nodeset definitions simpler09:21
opendevreviewMerged opendev/zuul-providers master: Add labels for debian-trixie  https://review.opendev.org/c/opendev/zuul-providers/+/95470109:22
opendevreviewMichal Nasiadka proposed opendev/system-config master: docker-mirror: Add Ubuntu 24.04 and Debian Bookworm/Trixie mirrors  https://review.opendev.org/c/opendev/system-config/+/95470309:23
fricklerit lives \o/ https://zuul.opendev.org/t/openstack/build/4feda999fae44f22bc54175a0da0a8f6 ... and it fails quite fast, holding a node now for checking (and testing autohold with niz ;)09:36
*** clarkb is now known as Guest2173410:51
opendevreviewDr. Jens Harbott proposed opendev/zuul-providers master: Fix unbound installation for trixie  https://review.opendev.org/c/opendev/zuul-providers/+/95471611:22
frickler^^ that's the fix for unbound, do we still want to keep the project-config/nodepool version in sync?11:23
*** dmellado6 is now known as dmellado13:02
fungifrickler: i'm inclined to say we don't care about fixing that for nodepool-built images, since we're not building trixie with nodepool and planning to turn off the nodepool services at any moment13:18
fricklerand still waiting for CI results :(13:31
fungifrickler: no longer!13:37
fungiall green13:38
fungilooks like the bionic and focal builds worked, so i guess p.u.c just prunes data about earlier lts versions13:39
frickleryes, I checked that when creating the patch, commented on the review. there is still a small chance that this change might break unbound for those, do we have a way to emergency-delete images with zuul-launcher? or would we have to wait for a revert to merge and get promoted in the worst case? I'm a bit reluctant to simply self-approve because of this13:44
corvusfrickler: i think we should only have the ram-suffixed labels, and not the ones without it (going forward).  one reason: zuul thinks they are different enough that a ready node for one can't be used for another.  most users will just use pre-defined nodesets, and their names can continue to be simple (ie, just "debian-trixie")13:57
corvusfrickler: the api and the web ui can both be used to delete uploads or builds.  you must use the "opendev" tenant (the tenant where the images are built)13:58
corvusi agree, the error messages need some work :)14:02
corvusfor the first time, the node graphs for last night's periodic jobs in are the shape i've been looking for: https://imgur.com/a/F1KjyeR14:10
corvus(when we're at quota, we want more requests in the "requested" state and fewer nodes in the "requested" state)14:12
corvusclarkb: would you mind a re-review on https://review.opendev.org/931824 ? i switched the test fixture and made the validation optional.14:28
Guest21734corvus: done. Though I seem to have been guestified. I'll work on fixing that next14:45
Guest21734frickler: fungi I went ahead and approved the unbound fixup too. I left a comment with why I believe this is safe14:48
fungithanks Guest21734!14:51
fungi;)14:51
*** Guest21734 is now known as clarkb14:53
opendevreviewJeremy Stanley proposed openstack/project-config master: Drop requirements branch override for translations  https://review.opendev.org/c/openstack/project-config/+/95474714:58
clarkbfungi: do you have a quick moment to rereivew https://review.opendev.org/c/opendev/system-config/+/954624 now with scrolling on the grafana pages so all graphs render?14:59
clarkband then should we go ahead and land the specs cleanup and fixup changes?14:59
clarkbmy main concern is less with the fixup and more with my application of cleanups which may be biased. But I think its easy to undo that sort of documentation change if we wish15:00
fungiyeah15:03
fungiall lgtm now15:03
opendevreviewJeremy Stanley proposed openstack/project-config master: Drop requirements branch override for translations  https://review.opendev.org/c/openstack/project-config/+/95474715:08
clarkbfungi: thinking about https://review.opendev.org/c/zuul/zuul-jobs/+/954280 more. Maybe the easiest least impactful choice is to drop backports from our debian image builds?15:25
clarkbthat isn't my personal preference but I Think going that route avoids and potential conflict with other people running debian images with the configure mirrors role15:26
clarkbthat might require updates to dib though15:30
clarkbwhich makes me wonder if anyone would be building images this way anyway15:30
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add 32GB labels  https://review.opendev.org/c/opendev/zuul-providers/+/95474915:37
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add 32GB labels and flavors  https://review.opendev.org/c/opendev/zuul-providers/+/95474915:40
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add 32GB labels to vexxhost  https://review.opendev.org/c/opendev/zuul-providers/+/95475215:40
opendevreviewMerged opendev/system-config master: Scroll grafana pages to force all graphs to load  https://review.opendev.org/c/opendev/system-config/+/95462415:40
opendevreviewMerged opendev/zuul-providers master: Add 32GB labels and flavors  https://review.opendev.org/c/opendev/zuul-providers/+/95474915:46
fungiclarkb: yeah, i'm not sure. as long as we don't need any backported packages during image building then they'll still get enabled by default by that role at job runtime so shouldn't result in regressions for anyone15:52
fungiso i agree that's probably the least impactful, as it doesn't require changing anything in zuul-jobs15:52
clarkbya but I'm 99% sure it requires changes to dib.15:52
clarkbI don't think dib is actually using backports to install any packages, but it is configuring backports as a repo15:53
fungiit's controlled by a variable we set15:53
fungiif you look at the dib elements for ubuntu-minimal and debian-minimal there's a list of source suites that are passed in15:54
fungiwe could *probably* do it in our dib configuration without even altering the dib elements themselves15:54
clarkbright we could override the entire DIB_APT_SOURCES_CONF_DEFAULT15:54
fungibut also changing the defaults in dib is a possibility15:55
clarkbactually maybe not that doesn't seem to accept a different avlue. Where is it used15:55
clarkbDIB_APT_SOURCES_CONF this is the var to override15:55
fungithere were two vars, i think, and it's the other one you want15:55
fungiah, yeah that15:55
clarkbya the _DEFAULT is the dfeault value for the one without _DEFAULT15:56
clarkbmy concern with this appraoch is it seems less correct from a build a debian image perspective15:56
clarkbdebian upstream cloud images incldue backports, dib image builds include backports. backports are only used if explicitly requested for a packge15:56
clarkbthe accepted practice seems to be that you should configure backports15:56
fungiwell, about that. after going back over the discussions from 2015/2016 i think that got undone15:57
fungiso my recollection is outdated15:57
fungithere was a time when it was enabled because they needed newer versions of cloud-init in the images15:58
clarkbin upstream cloud images you mean?15:58
fungiyeah, but now i think the cloud team policy is to just keep updating cloud-init rather than trying to keep it stable in stable debian versions15:58
fungii don't use the stable cloud images myself, always testing/unstable, so i hadn't noticed they weren't adding backports in the stable cloud images15:59
clarkbI see15:59
clarkbgiven that I kinda think that changing dib itself to drop backports to match is maybe better than us overriding that default var list15:59
fungiso not including backports in our images these days would probably be more consistent with official debian cloud images15:59
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add 32GB flavor and labels to vexxhost  https://review.opendev.org/c/opendev/zuul-providers/+/95475216:00
clarkbI'll work on a change16:01
opendevreviewMerged opendev/zuul-providers master: Add 32GB flavor and labels to vexxhost  https://review.opendev.org/c/opendev/zuul-providers/+/95475216:01
corvusfrickler: hrm, it looks like we may have been intending to have folks switch to -ram suffixed nodesets too, so... maybe strike my comment from earlier about that.  but, i think we could consider keeping the non-ram-suffixed nodesets if we like that idea.  i don't think it would cause a problem.16:02
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Move ubuntu-bionic/focal nodeset definition  https://review.opendev.org/c/opendev/zuul-providers/+/95475616:04
corvusi think the only remaining nodepool labels are now:  'ubuntu-bionic-arm64' and 'ubuntu-focal-arm64'16:04
corvuswe apparently did not see fit to add those images to zuul-launcher.  but there have been a couple of requests for them.16:05
corvuspublish-wheel-cache-ubuntu-focal-arm64 publish-wheel-cache-ubuntu-bionic-arm64 requested them16:06
corvusi think the node requests are coming from inside the house16:06
fungii think we could probably just ditch those jobs16:07
fungiat least that would be my first preference16:07
fungii don't believe they're providing anything useful if there are no otehr focal or bionic jobs running to take advantage of what they're producing16:07
clarkb++ lets drop those jobs16:08
opendevreviewJames E. Blair proposed openstack/project-config master: Remove bionic/focal arm64 wheel jobs  https://review.opendev.org/c/openstack/project-config/+/95475816:09
corvustonyb: can we delete this autohold? https://zuul.opendev.org/t/openstack/autohold/000000020816:12
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Remove nodepool-labels file  https://review.opendev.org/c/opendev/zuul-providers/+/95475916:14
corvus^ when that merges, we can shut down nodepool.16:15
fungiare the centos-9-stream config errors there expected?16:17
corvusoops deleted wrong file hah :)16:18
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Remove nodepool-nodesets file  https://review.opendev.org/c/opendev/zuul-providers/+/95475916:19
opendevreviewMerged openstack/project-config master: Remove bionic/focal arm64 wheel jobs  https://review.opendev.org/c/openstack/project-config/+/95475816:25
opendevreviewClark Boylan proposed openstack/diskimage-builder master: Drop backports from debian-minimal by default  https://review.opendev.org/c/openstack/diskimage-builder/+/95476016:25
clarkbfungi: ^ something like that maybe.16:25
corvusremote:   https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/954761 Remove bionic- and focal- arm64 jobs [NEW]        16:28
corvusoh that repo is not in good shape16:28
corvusapparently there are jobs in that repo that refer to an "ubuntu-xenial" nodeset which is not defined16:30
clarkbI wonder if it is sufficient to remove those two ubuntu-xenial jobs along with the old arm64 nodeset using jobs16:31
corvusyeah... i think what we did was we stopped loading nodesets from opendev/base-jobs, instead loading them from zuul-providers16:31
corvusand there is a xenial defined in base-jobs, but not zuul-providers16:31
corvusi think we decided not to copy it over because it was unused16:32
corvusi think if we still believe that, then, yeah, let's try yanking those jobs and see if it really is unused16:32
clarkbHopefully all the stable branches with references to py35 jobs have been deleted and nothing will complain16:32
clarkbI've gone ahead and self approved https://review.opendev.org/c/opendev/infra-specs/+/954670 to make infra specs buildable again16:34
clarkbthat should only result in cosmetic changes16:34
clarkbThe followup makes some judgement calls in https://review.opendev.org/c/opendev/infra-specs/+/954662 and I'll probably do the same in an hour or two if there is no additional feedback on that16:34
clarkbI'd like to get that sorted out before I start on the matrix for comms spec as I don't want to mix spec writing and spec management in the same stack16:35
clarkbarg looks like things refer to py35 still16:35
clarkbbut maybe we can do a quick update to project-config to remove all those and be good?16:36
corvuswhat about py38-arm64?16:38
clarkbhttps://codesearch.opendev.org/?q=py38-arm64&i=nope&literal=nope&files=&excludeFiles=&repos= this seems to indicate that swift may have a usage but not through the jobs defined in openstack-zuul-jobs?16:39
clarkbsame nodeset, different job tree16:40
corvusi'm starting to think this exceeds my current openstack expertise.... i think i may need to turn this work over to someone else.16:40
clarkbfundamentally the problem here is that zuul doesn't really allow you to ignore tech debt in the ci job + pipeline configuration. Which is great if you're willing to tend to the garden but openstack has struggled with that16:41
clarkbthinking out loud here: I wonder if an escape hatch is to define the nodeset but make it empty?16:41
clarkbthen we don't need to build and boot and manage the images. THe jobs are still configured with what they think is a valid nodeset and then the jobs themselves will effectively become less efficient noops?16:42
opendevreviewMerged opendev/infra-specs master: Make infra specs buildable again  https://review.opendev.org/c/opendev/infra-specs/+/95467016:42
clarkbhttps://docs.opendev.org/opendev/infra-specs/latest/ has updated due to ^16:45
corvusclarkb: that's fine with me.  how about we define that nodeset only in the openstack tenant?16:46
clarkbcorvus: ya I think that makes sense16:46
clarkbcorvus: taking that idea a step further I wonder if something like a noop job builtin but for nodeset labels along the lines of INVALID_LABEL might be a way to force the jobs to fail at runtime but validate configuration?16:48
clarkbthat is probably a lot of extra logic to encode into zuul for something that is already solvable if addressed properly16:48
clarkb(whereas noop solves a fundamental issue of trying to make a noop job use as few resources as possible)16:48
corvuswell, we can still request a label that doesn't exist, which will NODE_ERROR; that's what will happen today (and what i'm about to propose for openstack-zuul-jobs)16:49
corvusbut if we add labels to config validation like we discussed, then that won't work, and maybe we should think about what you just suggested16:49
clarkboh I thought we would do cross validation of the labels. But you're right we couldn't do that with nodepool before and must not with zuul-launcher at least not yet16:49
clarkbya16:49
corvusyeah, we won't do that at least until the nodepool deprecation period ends... and it hasn't started yet :)16:50
clarkbcorvus: and then we can still remove the specific jobs you identified to build wheels as we don't need them anymore16:50
clarkbcorvus: in that latest update you used the ubuntu-xenial name, wouldn't we fallback to nodepool for that?16:52
clarkbbased on my codesearch search earlier I'm hopeful that cleaning out the project-template definitions that use the removed jobs in openstack-zuul-jobs will make that mergeable16:53
corvusoh, ha... you know what, we are missing a switch in zuul to turn off nodepool fallback :)16:56
fungisorry, stepped away for a few and just catching up, but i agree the best solution is one which forces node_error results on jobs and doesn't block us from removing deprecated configuraiton16:56
fungiconfiguration16:56
fungiallowing projects to perform config cleanup (or not) on their schedule without impacting ours16:57
corvusi still don't understand how to unwind https://review.opendev.org/95476116:57
clarkbcorvus: can we update the project-templates in the same change?16:58
corvussure!16:58
corvusi'm just saying, openstack's job cornfiguration is outside of my area of expertise16:59
clarkbya I think the problem is its outside of anyones at this point16:59
clarkbso doing the minimal we can get away with makes sense to me16:59
fungii think move stable/2024.1 to the branches list in openstack-tox-py39-arm64 instead16:59
fungigmaan: ^ ?17:00
corvusi mean, strictly speaking, this is not blocking niz.  we can merge https://review.opendev.org/954759 and all it will do is introduce more errors into the openstack tenant, but for things that are presumably already broken or disused17:01
corvusi'm going to bow out of this and leave it to others with more openstack expertise17:02
corvusokay, one more thought: maybe the easiest thing is to add a dummy nodeset for xenial, and both arm labels?17:03
clarkbnone of these arem jobs were ever voting. Let me push up a change that drops them from the project-templates and if that goes green then we can proceed with that. If not we can use the dummy nodest instead17:04
gmaanfungi: but we do not test it on stable/2024.1 right? https://github.com/openstack/openstack-zuul-jobs/blob/master/zuul.d/project-templates.yaml#L120617:04
gmaanyeah, I am not sure anyone working to make it voting or not, manytime I thought of removing those17:05
fungigmaan: thanks, yeah mainly for now we just need a solution that lets us drop ubuntu-focal-arm64 nodes17:06
fungiif removing that non-voting testing entirely makes sense, fine by me17:06
gmaanyeah, I am not sure why we are keeping these non voting on stable. keeping it on master make sense if anyone comes up and fix but if it went non voiting to stable we should remove17:08
clarkbgmaan: because no one is doing the gardening17:08
clarkbI don't think theer is any intention behind it but keeping things pruned and tended to requires effort that no on is doing17:09
fungiweeding the garden especially17:09
gmaanlet me propose the change and raise on ML if no objection we can go that way next week or so17:10
clarkbgmaan: I don't think need to wait that long17:11
fungigmaan: in the near term being able to at least drop the openstack-tox-py39-arm64 somewhat immmediately would help to not block our work on removing nodepool17:11
clarkblike I think if this works we can land it right now17:11
fungier, openstack-tox-py38-arm64 i mean17:12
gmaansure, that's works fine for me. seeing no one interesting in those for so many cycle, is ok for me17:12
fungithe one that needs focal nodes, specifically17:12
gmaanwhenever I prepare the new cycle template I ask these question to myself that why we have these non voting jobs for so long17:13
fungithe only branch even running those is due to become unmaintained in <3 months17:13
clarkbcorvus: I restored your change and updated it with my proposal above. But zuul is still complaining about the project-template definition even though it is updated in the same change. Is this something zuul will force us to do in multiple steps?17:13
gmaanfungi: yeah that one but I will just say not to continue non voting things on stable gate17:13
clarkbgmaan: I have asked the release team to make cleaning up this stuff part of the new branch creation process rather than the branch deletion process so that we get ahead of it but I think the main issue is no one is really around to clean this stuff up17:13
gmaanyeah17:14
clarkbok I think the stack that begins at https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/954763 may resolve this for us cc corvus 17:22
clarkbseems I needed to start with the fundamental config issue first then work my way up17:22
clarkbthough it looks like the last change in the stack is still failing because of neutron17:23
corvusfungi: do you mind reviewing https://review.opendev.org/954756 (you reviewed its child).  it's pretty pro-forma17:28
opendevreviewMerged opendev/zuul-providers master: Move ubuntu-bionic/focal nodeset definition  https://review.opendev.org/c/opendev/zuul-providers/+/95475617:30
fungiyep, approved, that was straightforward17:30
corvusclarkb: actually, i think we can start doing label validation early -- we can tie it to the "i have no nodepool" tenant switch, which we need in order to remove nodepool anyway.17:31
clarkbcorvus: that would break the hack to make a valid xenial nodeset then?17:32
clarkbcorvus: digging into ubuntu-focal-arm64 more it does seem to be used in a few more places: https://codesearch.opendev.org/?q=ubuntu-focal-arm64&i=nope&literal=nope&files=&excludeFiles=&repos=17:32
clarkbhowever its still fairly minimal17:33
corvusre xenial: yep!17:33
clarkbshoudl we not do that then?17:33
clarkbI feel like this is a nice hack to separate the image management side of things from the job configuration side of things without forcing us to force merge changes or become involved in many indepednent job configuration in various projects17:34
corvusi mean, it's a valid configuration error.. openstack has 285 at this point.17:34
corvusi don't think it's actually a problem17:34
clarkbI guess we'd have to force merge one change that adds the invalid nodeset label17:35
corvusi believe the only thing we're doing here is volunteering to weed openstack's garden17:35
corvusi don't tihnk we need to force merge anything today, or even if we had label validation in place17:35
corvusthe only time zuul is going to stop us is if it breaks the opendev tenant17:36
clarkbright I guess the force merge is in the openstack config17:36
corvusi mean, it's possible we get annoyed by excessive non-blocking errors from other tenants...17:36
clarkbcorvus: to udnerstand your next step you want to remove these two nodesets right: https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/nodesets.yaml#L44-L54 ?17:37
corvusoh yes, if someone wants to merge anything to openstack-zuul-jobs then they would need to fix the network of errors involving that repo17:37
clarkbI started looking at dummy nodeset options for those two arm nodesets and found we're still defining them unlike the xenial nodeset so want to make sure I understand what is going on there17:38
corvushrm, i guess so17:38
corvusi didn't realize those were there too17:38
corvusbut yes, if we have decided that opendev doesn't provide those labels, then we should remove that too17:39
clarkbwe don't currently build those images right?17:39
clarkb(just making sureI udnerstand what prompted all of this)17:40
corvuswait, those nodeset defs don't matter17:40
corvusthose are unused now, we can remove the whole file17:40
clarkbcorvus: thay rae used17:40
corvusno, we exclude that file from config loading17:40
clarkbhrm when I updated openstack-zuul-jobs it complained about that but maybe I'm getting my wires crossed and becoming confused17:41
opendevreviewClark Boylan proposed opendev/system-config master: Drop bionic and focal arm64 testing  https://review.opendev.org/c/opendev/system-config/+/95476517:41
corvusthese are the real definition: https://review.opendev.org/954759  and yes i want to remove them17:41
clarkbaha that is the piece of info I was missing thanks17:42
corvusso on that change (759) zuul is saying "we can merge this, but, btw, this will add some config errors to the openstack tenant" and we are being polite and trying to avoid adding those errors17:42
clarkbcorvus: so I think 954765 is something we need to do on our (opendev) end. Then https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/954763/ and https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/954761/ are incomplete cleanups on the openstack side that some of us may or may not help shuffle along17:43
corvusto progress 761 you will need to make a change to neutron's stable branch, right?17:45
clarkbcorvus: ya to all of the stable branches too17:45
clarkbwhich is why I was thinking I might go ahead with the dummy nodeset again but now I'm less sure of that17:45
corvusare you thinking there might be legit testing using it?  should we add focal/bionic arm64?17:46
clarkbcorvus: based on what I've seen so far I think swift and liberasure code have valid arm64 focal based test jobs17:48
clarkbcorvus: everything else seems to be non voting as part of the early days of dipping toes into the arm64 waters17:48
corvusokay, so it seems like our assumption that "no one is using arm64 bionic/focal" may have been wrong and we should either consider adding those images, or deciding that now is the cut-off time for that17:49
clarkbwhich IMO was valid while those branches were tip and now with stable branch policies are no longer valid17:49
clarkbidea: openstack release process should drop all these jobs that were maybe informative but not gating and providing stability assurances when branches become stable17:50
clarkbcorvus: I think for xenial we should be drawing the line in the sand at this point for sure17:51
clarkbwhich we have already done17:51
fungiyeah, i can see pushing for a policy that stable branches have no long-term nonvoting jobs17:51
clarkbI think I'm leaning a bit towards being ok with building a focal arm64 job17:51
clarkbI haven't found evidence of bionic arm64 jobs that make sense to me17:52
corvusi'll start on a change to add a focal image build17:52
clarkband I'll update the other changes we've been pushing up to drop bionic arm64 but not focal arm6417:53
opendevreviewClark Boylan proposed opendev/system-config master: Drop bionic arm64 testing  https://review.opendev.org/c/opendev/system-config/+/95476517:55
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add ubuntu-focal-arm64 images and labels  https://review.opendev.org/c/opendev/zuul-providers/+/95476818:02
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add ubuntu-focal-arm64 image builds  https://review.opendev.org/c/opendev/zuul-providers/+/95476918:02
opendevreviewMerged opendev/zuul-providers master: Add ubuntu-focal-arm64 images and labels  https://review.opendev.org/c/opendev/zuul-providers/+/95476818:05
fungicorvus: did 954769 get mixed up with a change for debian-trixie?18:06
corvusno, that was a reorg because trixie was out of order...18:07
fungii see you commented in the change too18:07
clarkbI'm slowly backporting the cleanup in openstack/requirements for the wheel cache build job then once I've got all of those pushed I'll update the openstack-zuul-job cleanup change18:07
corvusokay, that change is syntactically correct....18:07
fungicool18:08
corvusclarkb: fungi due to the extreme load on the arm nodes, i think the best thing we could do would be to review and approve https://review.opendev.org/954769 now and send it straight to gate.  even with that, it's going to take a long time to merge.18:08
fungiand yeah i misread the diff, confused by the trixie definition getting relocated which made it look like that was the new addition18:09
corvusyeah that diff did not end up great, and i should have mentioned it in the commit msg18:09
fungino worries, looks right to me18:09
corvusbut if you look at the resultant file, it's really easy to see the changes from each image to the next as we progress through bionic, focal, focal-arm64, jammy, ...18:10
fungii went agead and approved to save testing tie18:10
fungitime18:10
corvusthanks, we have hours if anyone else wants to review18:10
clarkbc +2 from me18:10
clarkbthere is an unmaintained/2023.1 but no unmaintained/2023.218:11
corvusthis might actually be a good real-world test of relative priority... it should jump the line and beat out the next openstack job18:11
clarkbI nkow this makes sense to some but not to me18:11
fungiclarkb: non-slurp branches don't transition to unmaintained, since upgrading between slurp branches is tested to work18:12
fungithey just go eol immediately once stable maintenance ends18:12
fungithat was one of the compromises made to keep the branch count down18:13
clarkbfungi: I'll be honest this feels like the opposite of keeping the branch count down. There are 10 branches that need this commit made to it18:13
clarkband about 60% of them are not trivial backports (they merge conflict in a mostly trivial way at least)18:14
fungipre-slurp unmaintained branches still need to get explicitly eol'd and that hasn't happened yet18:14
fungibut breaking them is fine, let the unmaintainers sort out any resulting mess18:14
fungimaybe elodilles wants to help with ^18:15
clarkbI'm almost done at this point18:15
corvus(relative priority did work!)18:15
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Remove nodepool-nodesets file  https://review.opendev.org/c/opendev/zuul-providers/+/95475918:18
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Move ubuntu-focal-arm64 nodeset  https://review.opendev.org/c/opendev/zuul-providers/+/95477818:18
corvusdo we want to add back any of the focal-arm64 stuff we removed from openstack/project-config?  like the wheel build jobs?18:19
clarkbcorvus: yes I'm working on that now18:19
corvusoh ok.18:19
clarkbI have to get my 10 depends on in a row first :)18:22
fungii don't think we need the wheel build jobs, to be honest18:23
clarkbfungi: ya I figure we can cleanup focal stuff a bit less urgently though18:24
clarkbI think what this should be a signal for to openstack is that xenial, bionic, and focal things should start to be pruned18:24
clarkbxenial and some cases of bionic are getting more upfront forced cleanups on the opendev side18:24
clarkbbut anything that isn't forcefully removed in that list should still be claened up18:25
fungiopenstack's stable constraints lists are frozen, and the last stable branch using focal (it shouldn't have been) reaches end of maintenance in a few months18:25
fungihttps://governance.openstack.org/tc/reference/runtimes/2024.1.html says it only required testing on jammy18:26
clarkbhttps://review.opendev.org/c/openstack/openstack-zuul-jobs/+/954761 has been updated with requirements depends on and I leave focal for now18:26
fungimaybe the python 3.8 jobs were being kept for other reasons and needed focal to supply an old enough interpreter, i really dunno18:27
clarkbI think that concludes my hacking openstack-zuul jobs. gmaan I think you can apply your cleanups on top now18:28
gmaank18:28
clarkbfungi: https://review.opendev.org/c/opendev/system-config/+/954765 should be a quick review and any reason to not approve https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/954763 now?18:30
fungiapproved18:31
gmaanclarkb: you want to rebase this on top of latest parent https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/95476218:33
clarkbgmaan: I can, I stopped updating that one since it seemed to (logically at least) conflict with the change you pushed18:33
clarkbgmaan: do you want me to rebase that change or do you want to take over and merge it into your change?18:34
gmaanclarkb: I can merge18:34
clarkback thanks18:34
fricklerdo we know something about the swift upload errors on https://review.opendev.org/c/opendev/zuul-providers/+/954716 ? are we maybe hitting quota limits or might these need longer timeouts?18:38
opendevreviewMerged opendev/system-config master: Drop bionic arm64 testing  https://review.opendev.org/c/opendev/system-config/+/95476518:39
clarkbfrickler: that sort of error isn't something I would expect client side timeouts to help with18:39
clarkbI read that as the client got an EOF in violation of the protocol. Possibly because the server side closed the connection?18:40
corvusit's worth noting that requests doesn't have a retry policy by default.  i don't think that openstacksdk has a setting for that.  but we might be able to create a retry object and configure the keystoneauth session to use that18:44
corvusthat feels a little hacky though?18:44
corvusbut we might need something like that; it may be asking too much to expect to push that much data over http without retrying on error18:46
corvus(i'm guessing that's a result of something like a load balancer shift or similar)18:46
clarkbya that makes sense to me18:47
clarkbok last call on https://review.opendev.org/c/opendev/infra-specs/+/954662 for anyone to object with my continued cleanup in that repo19:02
clarkbI'd like to start on the matrix spec after lunch so intend on approving that soon19:02
elodillesfungi: ACK, i'll keep an eye on those patches19:37
gmaanclarkb: this is ready, I pinged neutron core to review neutron backports https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/95478119:49
gmaanelodilles: ^^ as you are here, can you please review unmaintained branches backports https://review.opendev.org/q/Ieb1b116c4d0866bd8208f9b8b440c1e274c82b1c19:50
clarkbgmaan: thanks I approved it which means anyone should be able to recheck once the depends on are merged19:58
gmaan++19:58
opendevreviewMerged opendev/infra-specs master: Update existing specs to match the current reality  https://review.opendev.org/c/opendev/infra-specs/+/95466220:02
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add backoff handling to swift upload  https://review.opendev.org/c/opendev/zuul-providers/+/95480420:10
corvusthat is not self-testing, but i think we could merge it and just revert if it fails.20:11
corvus(i don't think it's worth bothering to make that self-testing since we're trying to switch to the zuul-jobs version of that role anyway)20:12
Clark[m]corvus: I stepped away from the computer but that change and approach seems reasonable to me20:23
opendevreviewMerged opendev/zuul-providers master: Add backoff handling to swift upload  https://review.opendev.org/c/opendev/zuul-providers/+/95480421:07
clarkbcorvus: I've approved that change21:07
clarkboh wow it approved quickly21:07
corvus954769 the change to add the new arm buildfailed with a post-failure; i re-enqueued it so it'll be the test of whether that works21:19
corvus(954716 also failed, but started before that merged; it's a severed head now based on the old upload code)21:19
corvusclarkb: i made the change to add an option to turn off nodepool and validate labels; i put some thought into what that would look like based on our earlier conversation and included thoughts in the commit message.  https://review.opendev.org/95482521:21
corvus(we don't need to rush that; maybe we can land that next week)21:21
corvusi'm going to dequeue 716 and re-enqueue since it's failing and using a bunch of arm nodes21:23
corvushttps://zuul.opendev.org/t/opendev/build/a986072e8c344bba9ce5164820a9012d is our canary build for the upload retries21:25
clarkback I'll try to review the label validation change once I get this spec written and pushed21:27
opendevreviewClark Boylan proposed opendev/infra-specs master: Add spec to use Matrix for OpenDev comms  https://review.opendev.org/c/opendev/infra-specs/+/95482622:07
clarkbcorvus: left a couple of thoughts/questions but overall looks about how I woudl expect it to22:20
clarkbhttps://zuul.opendev.org/t/opendev/build/a30c8fcb64774249ab312d79374bd929 looks like we get a new upload error but I don't think the code itself is fundamentally flawed22:22
clarkbseems like this is similar to the old errors we were trying to work around but they are bubbling up differently now?22:23
clarkbthere are also some successful builds so the change itself isn't 100% fatal22:27
corvusclarkb: thanks, replied and updated.22:29
corvusclarkb: yeah, that's a fascinating new error.  i agree, i don't think we need to revert, but it clearly didn't solve all the probs22:32
corvusi'll try to figure out what "499: Client Error for url" even means22:32
clarkb+2 thanks22:32
corvushttps://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html i wonder if we need to do something with status codes there; it's all a bit muddled22:38
corvusand i wonder what issued the 499?  is that a load balancer or something?22:40
clarkbhttps://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html#urllib3.util.Retry.RETRY_AFTER_STATUS_CODES that list I guess?22:40
corvusyeah... i'm not entirely sure what the default behavior is!22:40
clarkb"client disconnected from the server before the server could send a response"22:40
clarkbalmost seemsl ike maybe we have a client side timeout afterall?22:41
clarkbthe original failures looked like server side teimouts to me but frickler thought changing timeouts might help and maybe that is the case22:41
corvusright, but for us to get a 499 code -- obviously we're not the client, unless that's some urllib internal thing where it makes up a 499 code for an internal timeout.22:41
clarkbya thats what I'm wondering. Or if the server issues 499 as part of the write out to handle client disconnect?22:41
clarkbdefinitely muddled22:41
corvus(like, if a server literally sent us a 499 response code, it's obvious that we didn't timeout -- but it might have if it is acting as a client for a backend)22:42
clarkboh! right22:42
clarkbthe proxy may be the one that disconnected then sends a 499 to us the real client22:42
corvusyeah.  i've found some stackoverflows that suggest haproxy may do that22:42
corvusbut maybe since it's a weird code urllib3 retries don't retry it by default22:44
Clark[m]Maybe add 499 to the default status retry list and then set that on the retry object? Sorry had to pop out for an errand so it's a bit difficult to get all the terms right22:57
Clark[m]Also james_denton and dan_with may be interested in looking at the proxy22:58
corvusi'm still learning, but i think the status_forcelist is only used in a case that doesn't apply to us22:59
corvusi think the retry logic works like this: 1) if it is a connection error, then increment the connection retry counter and retry22:59
corvus2) if it is a read error, increment the read error counter and retry22:59
corvus3) if the server sent a "Retry-After: ..." header in the response, then consult the retry_after options including status_forcelist to decide whether to retry23:00
corvusiow, i think status_forcelist is only used in that last case, which makes it basically: "if the server sent retry-after, and we are configured to honor retry-after, and the status code is one of the codes in RETRY_AFTER_STATUS_CODES or in status_forcelist, then retry"23:01
Clark[m]Thinking out loud here but maybe we'd get more consistent regular server responses if we use smaller objects in multipart uploads. Not sure if that is also an option23:01
corvusso case 3 would only apply if the 499 code arrives with a retry-after header, and ... i doubt it?  but i don't know.23:01
Clark[m]Like if that is 5gb now maybe 1gb sharding would be more reliable23:02
corvuscase 1 is what we thought we were doing when we started this: dealing with connection errors.  we may have actually addressed that EOF that we got due to case 1, but then maybe on the retry we got a 499?  and the 499 is falling through because it doesn't match case 1 or case 3. (i'm still checking case 2)23:03
corvuscase 1: https://github.com/urllib3/urllib3/blob/main/src/urllib3/util/retry.py#L36523:04
corvuscase 2: https://github.com/urllib3/urllib3/blob/main/src/urllib3/util/retry.py#L37323:04
corvuscase 3: https://github.com/urllib3/urllib3/blob/main/src/urllib3/util/retry.py#L38723:04
corvusso yeah, i'm like 95% sure that if we get a 499, we would only retry it if it comes with a Retry-After header23:05
Clark[m]Got it23:05
corvusClark: i agree that we should consider whether our behavior is causing or contributing to this and if we should change it :)23:07
corvuswe are... trying to push data quickly.23:07
corvusit's unclear whether there's a correlation with multiple uploads from different hosts.  are we triggering some kind of account limit?  or is it coincidence and the proxies are just having a bad day right as we're adding more images?23:08
corvusClark: we currently upload 500MB chunks23:10
corvusif we want 499 to cause a retry, I think we can implement our own subclass of Retry and override "_is_connection_error" (though we'd be overriding a "private" method).  the nearest "public" method to override would be "increment", but that's pretty complex.  it'd probably be okay to override _is_connection_error.23:16
corvusjamesdenton: hi! we're encountering some unusual errors when uploading large objects to swift in flex.  the total object size is many gigabytes, and we're using SLO with 500MB chunks.23:20
corvusone example error is at 2025-07-11 22:03:28.180894 -- HttpException: 499: Client Error for url: https://swift.api.sjc3.rackspacecloud.com/v1/AUTH_ac0fed44dbe4539d83485bcefc4e2d4b/images-7b7d44d25aa9/cfe16fd7553c4921bfe241b237d4a2f8-rockylinux-8.vhd.zst/000006, Client DisconnectThe client was disconnected during request.23:20
corvusis that due to an error on the cloud side?  should we retry if that happens?  or are we inadvertently causing a problem due to the way we're uploading files and should we do something different?23:22
corvushere's a whole set of errors from the most recent buildset: https://paste.opendev.org/show/bnlkcRemnDVqN5xqn7vZ/23:27
corvusi'm not seeing much in the way of commonalities in times; not much clustering of the error times, nor of the segment numbers (so it happens at different points in the upload processes)23:28
clarkbya I wonder if we should consider the semaphore again but thats a very big stick. Also we really only need to rate limit the uploads not the builds.23:33
clarkbNot sure I have any good ideas for doing that23:33
clarkbI guess staggering them out like nodepool did if we can express that through zuul23:33
corvuswe can also reduce our parallelism for individual uploads23:33
corvusi just don't want to guess.  if there's some limit, let's find out what it is.  otherwise, we might be 2xing our upload time just because someone is doing maintenance on a load balancer today23:34
opendevreviewMerged openstack/project-config master: Drop requirements branch override for translations  https://review.opendev.org/c/openstack/project-config/+/95474723:36
clarkbcorvus: ++23:49

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!