Friday, 2025-05-16

*** liuxie is now known as liushy02:23
*** ykarel__ is now known as ykarel04:06
opendevreviewMerged zuul/zuul-jobs master: mirror-workspace-git-repos: Allow deleting current branch  https://review.opendev.org/c/zuul/zuul-jobs/+/94603305:22
opendevreviewMerged zuul/zuul-jobs master: Add limit-log-files role  https://review.opendev.org/c/zuul/zuul-jobs/+/94579505:24
*** ralonsoh_ is now known as ralonsoh05:43
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404506:01
fricklerinfra-root: the latest entry in the exim paniclog on review03 is from 2025-05-07, I'm going to archive that file now in my homedirectory so we get rid of the daily warning mail06:44
opendevreviewMerged opendev/git-review master: Add CC similarly to reviewers  https://review.opendev.org/c/opendev/git-review/+/84921906:53
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404508:07
*** ykarel_ is now known as ykarel09:21
ykarelelodille1, tonyb can you check https://review.opendev.org/c/openstack/puppet-openstack-integration/+/94995709:26
*** elodille1 is now known as elodilles09:33
elodillesykarel: +2+W'd it09:34
ykarelthx elodilles 09:35
clarkbfungi: looking at gerrit.sh there is a while loop that modifies JAVA_HOME and calls dirname. I suspect that is where the script straces are getting into infinite dirname loops.14:50
clarkbfungi: something about the java21 vs java17 install on noble that causes the script to not be able to find the java home and exit cleanly?14:51
clarkbfungi: the other thing I noticed is that the golden site etc/gerrit.config explicit set JAVA_HOME already so the easiest fix here may be to read that value out of the golden site and write it into our newly templated config (there were a couple of other config items that we may need to port over too but I think those may be orthogonal to the we don't even startup issue)14:52
opendevreviewClark Boylan proposed opendev/git-review master: Preserve gerrit javaHome from golden site to test sites  https://review.opendev.org/c/opendev/git-review/+/95019615:06
opendevreviewClark Boylan proposed opendev/git-review master: Preserve gerrit javaHome from golden site to test sites  https://review.opendev.org/c/opendev/git-review/+/95019615:12
clarkbLuca mentioned overnight that he always reindexes after a restart...15:15
clarkbSo ya my takeaway is that we should be doing that as its what those "in the know" do/expect15:15
clarkbthat said my personal opinion is pretty strong that an admin should be able to gracefully stop and start gerrit and not need to mitigate against data corruption issues15:16
clarkbI'll see if I can push things towards accomplishing that goal15:16
clarkbfungi: ok I think 950196 is working for 3.10 under java 2115:19
clarkbfungi: gerrit 3.11 appears to be rejecting the pushes from git review and I'm guessing this is a similar issue to the refs/meta/config update problems we've seen elsewhere. We may need to configure acls15:19
fungioh, good find!15:20
clarkbmy suggestion would be to squash those three changes together and then drop gerrit 3.11 so that we have something mergeable. Then add gerrit 3.11 as a followup since it seems specific to that version of gerrit and not the test framework itself15:21
fungiyeah, i'll throw that together now15:22
clarkbreally the trick was finding the looping execve for dirname after ssh'ing to the host15:23
clarkbI think the only place gerrit.sh does a dirname in a loop is looking up java home so working back from there if java home is set to a valid value it doesn't have to discover it and then loop...15:23
clarkbthe logging of gerrit.sh output is maybe of minimal value but it seems to work and we may as well keep it15:25
fungiagreed, i'm adding it now15:25
opendevreviewJeremy Stanley proposed opendev/git-review master: Update higher Gerrit versions used in testing  https://review.opendev.org/c/opendev/git-review/+/94991615:33
opendevreviewJeremy Stanley proposed opendev/git-review master: Add a Gerrit 3.11 test job  https://review.opendev.org/c/opendev/git-review/+/95019915:33
fungi949916 is the 3.9->3.10 squash with the other patches, 950199 separates out the 3.11 job addition15:34
clarkb+2 on 949916 from me. I expect that to pass testing based on the previous testing runs.15:35
clarkbI abandoned the two leaf changes that got squashed15:36
fungii figure once we get that merged, and update release notes for anything major that's missing, we can tag a new git-review and then work on pyproject.toml migration for it (which should probably entail dropping support for some older python)15:36
clarkb++15:36
fungithat way we'll have a last good version for python 3.615:37
TheJuliafor custom nodeset definitions, is it possible to use the ubuntu-noble-16GB label as defined in zuul base config?15:41
fungiat the moment i only see a niz-ubuntu-noble-16GB label, no ubuntu-noble-16GB15:42
TheJuliaokay, I guess the way zuul works, it could be in config but not actually usable in intermediate config then?15:43
TheJuliahttps://opendev.org/opendev/zuul-providers/src/branch/master/zuul.d/nodesets.yaml#L134-L13915:43
fungihuh, i wonder why that didn't turn up in codesearch...15:44
TheJulia... yeah, I've run into that myself ;)15:44
corvusTheJulia: that's part of the nodepool-in-zuul that's being tested in some tenants but is not available to openstack yet15:44
fungioh, wait it did, my eyes just deceived me15:44
TheJuliacool, I'm not (THAT) crazy!15:44
corvusthese are the labels supported by nodepool in the openstack tenant: https://zuul.opendev.org/t/openstack/labels15:45
fungi(note that it was a nodeset named ubuntu-noble-16GB not a label)15:45
fungithe ubuntu-noble-16GB nodeset is using the niz-ubuntu-noble-16GB label15:46
TheJuliaokay, so then openstack project config overlays in between then15:46
TheJuliathats a good detail15:46
TheJulia(and explains some things)15:46
fungiwell, at present only the zuul and opendev tenants have these, the openstack tenant does not yet but will soon15:47
fungiwe wanted to test them out at a smaller scale before rolling it out for broader use15:47
TheJuliai'm basically at a point do we loose the development cycle to trying to build a bunch of tooling to drop tinycore and find a better low memory path, or just... deal with the impact and right size some of the higher memory required jobs which run with greater parallelism15:47
fungiyou could try the ubuntu-noble-32GB label?15:48
TheJuliayeah, going to give that a spin15:48
fungithat's available in the openstack tenant now15:48
TheJuliadidn't realize there was the translation mapping in the tenant level15:48
TheJulia(but it makes sense)15:48
TheJuliaThanks!15:48
clarkbalso multinode15:49
fungiwell, we generally don't have tenant-specific labels, this is special to the nodepool-in-zuul migration that's in progress15:49
clarkbyou can run your entire control plan in 8GB over here then have 8GB over there to run your workload15:49
fungionce we're done we should be back to having consistent labels across tenants again15:49
clarkbroughly equivalent to a single 16GB node as long as you don't need more than 8GB for the workload15:49
clarkbthe downside to the 32GB label is it is provided by a single cloud region and if that has an outage you will too15:50
clarkband ist total capacity is fairly limited so you want to use it sparingly too15:51
clarkb(otherwise you'll dos yourself)15:51
TheJuliaYeah, except retooling on multinode and then shipping a bunch of data over tunnels is also going to be costly and increase failure risk as well.15:51
clarkbI mean its worked well for about a decade...15:52
TheJuliaand even then, there is an increased cost to retool as well, its just a little bit less on that path15:52
clarkbthe tooling should already be 99% there15:52
clarkbits just devstack job config to select which services run where. You're right the tunnels have created problems in the past when mtus aren't respected. But neutron addressed that a long time ago and as far as I know it still works15:53
clarkb(basically you have to configure the mtu on intermediate l2 hops since they can't fragment without an l3 address)15:54
TheJuliaI hear you, but I functionally need to fit 5-6 2.5-3GB VMs (on the minimum), and then ship enough data over the wire between the nodes on tunnels to simulate deployments. We may still end up doing that, but I have two jobs specifically that it really just makes no sense to try and also fake everything out across even more nodes. Its all about right sizing.15:58
clarkb6 3GB VMs will not fit in a 16GB node either15:58
TheJuliano, but I can also tune the job down to run slower and longer15:59
clarkbright which you could do in the scenario I described as well15:59
TheJuliaagain, all about right sizing15:59
JayFI think some of the technical difficulty of adopting multinode, which hasn't even been said yet, is that our BMC emulators (vbmc/sushy-tools) are configured by ironic's devstack plugin, rahter than having one of its own15:59
fungilooks like 949916 is passing and 950199 is just failing on the added job, as expected16:01
TheJuliaJayF: we already have two multinode jobs. We've had to disable them for entire cycles in the past because of packet loss issues and and troubleshooting headaches... and to this day they stay non-voting because their reliability is nowhere near a single node due to sporatic networking fun16:01
clarkbTheJulia: JayF while I totally believe that those jobs may not be reliable I don't think that those problems are inherent to multinode. Nova runs a number of multinode jobs that as far as I know are reliable enough to continue gating for them (multinode grenade, live migration, and multiple cell jobs and probably more)16:03
clarkbfungi: for debugging gerrit 3.11 on 950199 I wonder if we need to grab the all projects meta config to see what the acls say there (and maybe compare to the 3.10 acls)16:04
clarkbfungi: looking at https://gerrit-review.googlesource.com/c/gerrit/+/421141/2/java/com/google/gerrit/server/schema/AllProjectsCreator.java it appears that registered users should still have the ability to push to refs/for/* but was removed for refs/heads/* and refs/meta/config16:09
clarkbfungi: and sure enough the error for the tests is remote: * 'admin' cannot perform 'push' with force=false on project 'test/test_project' for ref 'refs/heads/master'16:10
clarkbfungi: so we may just need to push that update as a change and submit it then continue with the test16:10
TheJuliaif we were to move all multinode jobs to "vmedia" based deploys, we could likely avoid most of the packet loss problems around pxe boot... might be a path to head towards and keep all PXE based stuff on local nodes. Just, still a lot of refactoring.16:11
TheJuliaJayF: ^16:11
JayFthat's a good idea16:11
clarkbfungi: ya looks like we're creating a .gitreview etc on those branches and we just want them in place. So I think our two options are either to update acls to allow that push or to push all of that as changes and submit them16:11
TheJuliaThat would mean on-single-node parallelism of 1, we would have to tune single nodes meaning we might need to add more jobs as well to maintain coverage.16:12
TheJuliaOr just let them run for 1-2 hours16:13
TheJuliaThats really a project level question16:13
clarkbfungi: ya give me a sec I think I see a quick patch that may help16:15
clarkbfungi: just trying to figure out if I can safely write out the commit message hook that is now needed to propose changes without impacting the tests that check test hook writing16:24
opendevreviewClark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job  https://review.opendev.org/c/opendev/git-review/+/95019916:42
clarkbfungi: ^ something like that maybe. The test suite for git review actually has a lot of the support framework to make this easy. It just takes a bit to figure out what all the tools are in there since its been so long16:42
opendevreviewClark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job  https://review.opendev.org/c/opendev/git-review/+/95019916:45
opendevreviewClark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job  https://review.opendev.org/c/opendev/git-review/+/95019916:59
clarkbfungi: ya ^ is still failing with those issues but on specific test cases and not every test case now. I think we're close17:57
opendevreviewClark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job  https://review.opendev.org/c/opendev/git-review/+/95019918:16
fungiaha18:17
fungiconvenient that the gerrit review command accepts a commit id in lieu of a change id or change number18:19
opendevreviewClark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job  https://review.opendev.org/c/opendev/git-review/+/95019918:27
clarkbfungi: total brainstorm think out loud moment: but it occurs to me that we could run this as third party CI against gerrit if they publish dev wars somewhere18:31
clarkbwe could also build our own dev wars too18:31
clarkbmostly thinking that would give them signal that hey people are doing things like pushing as administrator in the wild maybe we should consider if the default is fine18:31
clarkb(personally I think the default of not allowing force push is sufficient for auditing purposes... but I could see them making this decision anyway if we started failing git review against their updates)18:32
clarkbthen separately but sort of related I've updated the gerrit issue I filed yesterday with more details/suggestions on fixing this from mfick: https://issues.gerritcodereview.com/issues/41802570218:34
opendevreviewClark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job  https://review.opendev.org/c/opendev/git-review/+/95019918:41
clarkbfungi: it passes now18:56
clarkband I'm finding lunch18:56
clarkband since it isn't actively raining post lunch I'm going to go out for a bike ride20:13
clarkbI think both of those git review changes should be ready for review and/or landing then we can focus on the apckaging aspects20:13
fungisounds like a great idea20:32
fungiand i approved both20:36
fungithanks for figuring that out!20:36
opendevreviewMerged opendev/git-review master: Update higher Gerrit versions used in testing  https://review.opendev.org/c/opendev/git-review/+/94991620:48
opendevreviewMerged opendev/git-review master: Add a Gerrit 3.11 test job  https://review.opendev.org/c/opendev/git-review/+/95019920:51
corvusclarkb: the trick will be triggering the 3pci on gerrit-review :)   might be easier to run it on gerrit's zuul.23:39
corvus(could always have that zuul import the jobs from opendev)23:39
Clark[m]Oh that's a good point. I think we do have their Gerrit setup as a connection in our zuul but not sure if we listen to the event stream too23:49
corvusthere is no event stream :(23:49
corvus(it is an authenticated connection, so we can report, there's just no publicly available trigger)23:49
Clark[m]Oh right they don't do ssh at all23:50

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!