*** liuxie is now known as liushy | 02:23 | |
*** ykarel__ is now known as ykarel | 04:06 | |
opendevreview | Merged zuul/zuul-jobs master: mirror-workspace-git-repos: Allow deleting current branch https://review.opendev.org/c/zuul/zuul-jobs/+/946033 | 05:22 |
---|---|---|
opendevreview | Merged zuul/zuul-jobs master: Add limit-log-files role https://review.opendev.org/c/zuul/zuul-jobs/+/945795 | 05:24 |
*** ralonsoh_ is now known as ralonsoh | 05:43 | |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 06:01 |
frickler | infra-root: the latest entry in the exim paniclog on review03 is from 2025-05-07, I'm going to archive that file now in my homedirectory so we get rid of the daily warning mail | 06:44 |
opendevreview | Merged opendev/git-review master: Add CC similarly to reviewers https://review.opendev.org/c/opendev/git-review/+/849219 | 06:53 |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 08:07 |
*** ykarel_ is now known as ykarel | 09:21 | |
ykarel | elodille1, tonyb can you check https://review.opendev.org/c/openstack/puppet-openstack-integration/+/949957 | 09:26 |
*** elodille1 is now known as elodilles | 09:33 | |
elodilles | ykarel: +2+W'd it | 09:34 |
ykarel | thx elodilles | 09:35 |
clarkb | fungi: looking at gerrit.sh there is a while loop that modifies JAVA_HOME and calls dirname. I suspect that is where the script straces are getting into infinite dirname loops. | 14:50 |
clarkb | fungi: something about the java21 vs java17 install on noble that causes the script to not be able to find the java home and exit cleanly? | 14:51 |
clarkb | fungi: the other thing I noticed is that the golden site etc/gerrit.config explicit set JAVA_HOME already so the easiest fix here may be to read that value out of the golden site and write it into our newly templated config (there were a couple of other config items that we may need to port over too but I think those may be orthogonal to the we don't even startup issue) | 14:52 |
opendevreview | Clark Boylan proposed opendev/git-review master: Preserve gerrit javaHome from golden site to test sites https://review.opendev.org/c/opendev/git-review/+/950196 | 15:06 |
opendevreview | Clark Boylan proposed opendev/git-review master: Preserve gerrit javaHome from golden site to test sites https://review.opendev.org/c/opendev/git-review/+/950196 | 15:12 |
clarkb | Luca mentioned overnight that he always reindexes after a restart... | 15:15 |
clarkb | So ya my takeaway is that we should be doing that as its what those "in the know" do/expect | 15:15 |
clarkb | that said my personal opinion is pretty strong that an admin should be able to gracefully stop and start gerrit and not need to mitigate against data corruption issues | 15:16 |
clarkb | I'll see if I can push things towards accomplishing that goal | 15:16 |
clarkb | fungi: ok I think 950196 is working for 3.10 under java 21 | 15:19 |
clarkb | fungi: gerrit 3.11 appears to be rejecting the pushes from git review and I'm guessing this is a similar issue to the refs/meta/config update problems we've seen elsewhere. We may need to configure acls | 15:19 |
fungi | oh, good find! | 15:20 |
clarkb | my suggestion would be to squash those three changes together and then drop gerrit 3.11 so that we have something mergeable. Then add gerrit 3.11 as a followup since it seems specific to that version of gerrit and not the test framework itself | 15:21 |
fungi | yeah, i'll throw that together now | 15:22 |
clarkb | really the trick was finding the looping execve for dirname after ssh'ing to the host | 15:23 |
clarkb | I think the only place gerrit.sh does a dirname in a loop is looking up java home so working back from there if java home is set to a valid value it doesn't have to discover it and then loop... | 15:23 |
clarkb | the logging of gerrit.sh output is maybe of minimal value but it seems to work and we may as well keep it | 15:25 |
fungi | agreed, i'm adding it now | 15:25 |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Update higher Gerrit versions used in testing https://review.opendev.org/c/opendev/git-review/+/949916 | 15:33 |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Add a Gerrit 3.11 test job https://review.opendev.org/c/opendev/git-review/+/950199 | 15:33 |
fungi | 949916 is the 3.9->3.10 squash with the other patches, 950199 separates out the 3.11 job addition | 15:34 |
clarkb | +2 on 949916 from me. I expect that to pass testing based on the previous testing runs. | 15:35 |
clarkb | I abandoned the two leaf changes that got squashed | 15:36 |
fungi | i figure once we get that merged, and update release notes for anything major that's missing, we can tag a new git-review and then work on pyproject.toml migration for it (which should probably entail dropping support for some older python) | 15:36 |
clarkb | ++ | 15:36 |
fungi | that way we'll have a last good version for python 3.6 | 15:37 |
TheJulia | for custom nodeset definitions, is it possible to use the ubuntu-noble-16GB label as defined in zuul base config? | 15:41 |
fungi | at the moment i only see a niz-ubuntu-noble-16GB label, no ubuntu-noble-16GB | 15:42 |
TheJulia | okay, I guess the way zuul works, it could be in config but not actually usable in intermediate config then? | 15:43 |
TheJulia | https://opendev.org/opendev/zuul-providers/src/branch/master/zuul.d/nodesets.yaml#L134-L139 | 15:43 |
fungi | huh, i wonder why that didn't turn up in codesearch... | 15:44 |
TheJulia | ... yeah, I've run into that myself ;) | 15:44 |
corvus | TheJulia: that's part of the nodepool-in-zuul that's being tested in some tenants but is not available to openstack yet | 15:44 |
fungi | oh, wait it did, my eyes just deceived me | 15:44 |
TheJulia | cool, I'm not (THAT) crazy! | 15:44 |
corvus | these are the labels supported by nodepool in the openstack tenant: https://zuul.opendev.org/t/openstack/labels | 15:45 |
fungi | (note that it was a nodeset named ubuntu-noble-16GB not a label) | 15:45 |
fungi | the ubuntu-noble-16GB nodeset is using the niz-ubuntu-noble-16GB label | 15:46 |
TheJulia | okay, so then openstack project config overlays in between then | 15:46 |
TheJulia | thats a good detail | 15:46 |
TheJulia | (and explains some things) | 15:46 |
fungi | well, at present only the zuul and opendev tenants have these, the openstack tenant does not yet but will soon | 15:47 |
fungi | we wanted to test them out at a smaller scale before rolling it out for broader use | 15:47 |
TheJulia | i'm basically at a point do we loose the development cycle to trying to build a bunch of tooling to drop tinycore and find a better low memory path, or just... deal with the impact and right size some of the higher memory required jobs which run with greater parallelism | 15:47 |
fungi | you could try the ubuntu-noble-32GB label? | 15:48 |
TheJulia | yeah, going to give that a spin | 15:48 |
fungi | that's available in the openstack tenant now | 15:48 |
TheJulia | didn't realize there was the translation mapping in the tenant level | 15:48 |
TheJulia | (but it makes sense) | 15:48 |
TheJulia | Thanks! | 15:48 |
clarkb | also multinode | 15:49 |
fungi | well, we generally don't have tenant-specific labels, this is special to the nodepool-in-zuul migration that's in progress | 15:49 |
clarkb | you can run your entire control plan in 8GB over here then have 8GB over there to run your workload | 15:49 |
fungi | once we're done we should be back to having consistent labels across tenants again | 15:49 |
clarkb | roughly equivalent to a single 16GB node as long as you don't need more than 8GB for the workload | 15:49 |
clarkb | the downside to the 32GB label is it is provided by a single cloud region and if that has an outage you will too | 15:50 |
clarkb | and ist total capacity is fairly limited so you want to use it sparingly too | 15:51 |
clarkb | (otherwise you'll dos yourself) | 15:51 |
TheJulia | Yeah, except retooling on multinode and then shipping a bunch of data over tunnels is also going to be costly and increase failure risk as well. | 15:51 |
clarkb | I mean its worked well for about a decade... | 15:52 |
TheJulia | and even then, there is an increased cost to retool as well, its just a little bit less on that path | 15:52 |
clarkb | the tooling should already be 99% there | 15:52 |
clarkb | its just devstack job config to select which services run where. You're right the tunnels have created problems in the past when mtus aren't respected. But neutron addressed that a long time ago and as far as I know it still works | 15:53 |
clarkb | (basically you have to configure the mtu on intermediate l2 hops since they can't fragment without an l3 address) | 15:54 |
TheJulia | I hear you, but I functionally need to fit 5-6 2.5-3GB VMs (on the minimum), and then ship enough data over the wire between the nodes on tunnels to simulate deployments. We may still end up doing that, but I have two jobs specifically that it really just makes no sense to try and also fake everything out across even more nodes. Its all about right sizing. | 15:58 |
clarkb | 6 3GB VMs will not fit in a 16GB node either | 15:58 |
TheJulia | no, but I can also tune the job down to run slower and longer | 15:59 |
clarkb | right which you could do in the scenario I described as well | 15:59 |
TheJulia | again, all about right sizing | 15:59 |
JayF | I think some of the technical difficulty of adopting multinode, which hasn't even been said yet, is that our BMC emulators (vbmc/sushy-tools) are configured by ironic's devstack plugin, rahter than having one of its own | 15:59 |
fungi | looks like 949916 is passing and 950199 is just failing on the added job, as expected | 16:01 |
TheJulia | JayF: we already have two multinode jobs. We've had to disable them for entire cycles in the past because of packet loss issues and and troubleshooting headaches... and to this day they stay non-voting because their reliability is nowhere near a single node due to sporatic networking fun | 16:01 |
clarkb | TheJulia: JayF while I totally believe that those jobs may not be reliable I don't think that those problems are inherent to multinode. Nova runs a number of multinode jobs that as far as I know are reliable enough to continue gating for them (multinode grenade, live migration, and multiple cell jobs and probably more) | 16:03 |
clarkb | fungi: for debugging gerrit 3.11 on 950199 I wonder if we need to grab the all projects meta config to see what the acls say there (and maybe compare to the 3.10 acls) | 16:04 |
clarkb | fungi: looking at https://gerrit-review.googlesource.com/c/gerrit/+/421141/2/java/com/google/gerrit/server/schema/AllProjectsCreator.java it appears that registered users should still have the ability to push to refs/for/* but was removed for refs/heads/* and refs/meta/config | 16:09 |
clarkb | fungi: and sure enough the error for the tests is remote: * 'admin' cannot perform 'push' with force=false on project 'test/test_project' for ref 'refs/heads/master' | 16:10 |
clarkb | fungi: so we may just need to push that update as a change and submit it then continue with the test | 16:10 |
TheJulia | if we were to move all multinode jobs to "vmedia" based deploys, we could likely avoid most of the packet loss problems around pxe boot... might be a path to head towards and keep all PXE based stuff on local nodes. Just, still a lot of refactoring. | 16:11 |
TheJulia | JayF: ^ | 16:11 |
JayF | that's a good idea | 16:11 |
clarkb | fungi: ya looks like we're creating a .gitreview etc on those branches and we just want them in place. So I think our two options are either to update acls to allow that push or to push all of that as changes and submit them | 16:11 |
TheJulia | That would mean on-single-node parallelism of 1, we would have to tune single nodes meaning we might need to add more jobs as well to maintain coverage. | 16:12 |
TheJulia | Or just let them run for 1-2 hours | 16:13 |
TheJulia | Thats really a project level question | 16:13 |
clarkb | fungi: ya give me a sec I think I see a quick patch that may help | 16:15 |
clarkb | fungi: just trying to figure out if I can safely write out the commit message hook that is now needed to propose changes without impacting the tests that check test hook writing | 16:24 |
opendevreview | Clark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job https://review.opendev.org/c/opendev/git-review/+/950199 | 16:42 |
clarkb | fungi: ^ something like that maybe. The test suite for git review actually has a lot of the support framework to make this easy. It just takes a bit to figure out what all the tools are in there since its been so long | 16:42 |
opendevreview | Clark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job https://review.opendev.org/c/opendev/git-review/+/950199 | 16:45 |
opendevreview | Clark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job https://review.opendev.org/c/opendev/git-review/+/950199 | 16:59 |
clarkb | fungi: ya ^ is still failing with those issues but on specific test cases and not every test case now. I think we're close | 17:57 |
opendevreview | Clark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job https://review.opendev.org/c/opendev/git-review/+/950199 | 18:16 |
fungi | aha | 18:17 |
fungi | convenient that the gerrit review command accepts a commit id in lieu of a change id or change number | 18:19 |
opendevreview | Clark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job https://review.opendev.org/c/opendev/git-review/+/950199 | 18:27 |
clarkb | fungi: total brainstorm think out loud moment: but it occurs to me that we could run this as third party CI against gerrit if they publish dev wars somewhere | 18:31 |
clarkb | we could also build our own dev wars too | 18:31 |
clarkb | mostly thinking that would give them signal that hey people are doing things like pushing as administrator in the wild maybe we should consider if the default is fine | 18:31 |
clarkb | (personally I think the default of not allowing force push is sufficient for auditing purposes... but I could see them making this decision anyway if we started failing git review against their updates) | 18:32 |
clarkb | then separately but sort of related I've updated the gerrit issue I filed yesterday with more details/suggestions on fixing this from mfick: https://issues.gerritcodereview.com/issues/418025702 | 18:34 |
opendevreview | Clark Boylan proposed opendev/git-review master: Add a Gerrit 3.11 test job https://review.opendev.org/c/opendev/git-review/+/950199 | 18:41 |
clarkb | fungi: it passes now | 18:56 |
clarkb | and I'm finding lunch | 18:56 |
clarkb | and since it isn't actively raining post lunch I'm going to go out for a bike ride | 20:13 |
clarkb | I think both of those git review changes should be ready for review and/or landing then we can focus on the apckaging aspects | 20:13 |
fungi | sounds like a great idea | 20:32 |
fungi | and i approved both | 20:36 |
fungi | thanks for figuring that out! | 20:36 |
opendevreview | Merged opendev/git-review master: Update higher Gerrit versions used in testing https://review.opendev.org/c/opendev/git-review/+/949916 | 20:48 |
opendevreview | Merged opendev/git-review master: Add a Gerrit 3.11 test job https://review.opendev.org/c/opendev/git-review/+/950199 | 20:51 |
corvus | clarkb: the trick will be triggering the 3pci on gerrit-review :) might be easier to run it on gerrit's zuul. | 23:39 |
corvus | (could always have that zuul import the jobs from opendev) | 23:39 |
Clark[m] | Oh that's a good point. I think we do have their Gerrit setup as a connection in our zuul but not sure if we listen to the event stream too | 23:49 |
corvus | there is no event stream :( | 23:49 |
corvus | (it is an authenticated connection, so we can report, there's just no publicly available trigger) | 23:49 |
Clark[m] | Oh right they don't do ssh at all | 23:50 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!