Thursday, 2025-01-30

opendevreviewJames E. Blair proposed opendev/system-config master: Mirror node 23 container image  https://review.opendev.org/c/opendev/system-config/+/94041900:11
opendevreviewMerged opendev/system-config master: Mirror node 23 container image  https://review.opendev.org/c/opendev/system-config/+/94041900:19
fricklerinfra-root: nb04 has /opted to be full again. maybe looking at transitioning arm builds to zuul would be a better plan than having to keep babysitting that server?05:54
*** ykarel_ is now known as ykarel06:01
*** elodilles_pto is now known as elodilles08:13
tobias-urdinafter review.opendev.org upgrade to 3.10 it's blazingly fast, something was really improved, never been this fast to-date from what i can remember and that's almost ten years by now09:03
tobias-urdinnice work! :)09:04
fungitobias-urdin: that's great to hear! i wonder if it's really gerrit 3.10 performing better, or the fact that we rest the h2 databases for its change caches... maybe the caches being so massive was causing them to do the opposite of their intended purpose12:53
funginormally we preserved the caches, but h2 doesn't really shrink backing files when records are deleted (just flags them so the engine can skip them) and they'd grown so massive over the years that they were causing other problems at shutdown/startup for the gerrit service12:56
fungiso during the last upgrade we decided to remove them and let gerrit start up with cold caches12:58
fungiand now we're thinking we should just do that every time we restart it for upgrades or config changes to keep the caches from growing so large12:58
fricklerseems github is a bit sad, just in case people notice failures https://www.githubstatus.com/14:48
fungithanks for the heads up!14:52
fungiin other news, i've had a bit of a eureka moment wrt the pypi warnings we started getting a few months ago about non-normalized sdist filenames... newer setuptools fixes that, so my zuul-jobs change to switch from direct setup.py invocation to using pyproject-build will solve it (by pulling in newer setuptools automatically)14:53
clarkbyay automatic fixew15:49
clarkb*fixes15:49
fricklerfungi: the job failures in https://review.opendev.org/c/zuul/zuul-jobs/+/940273 are not relevant for https://review.opendev.org/c/zuul/zuul-jobs/+/940273 , do I understand that correctly? are we ready to approve that stack, then?15:54
clarkbfrickler: I think you linked th same change twice. Which failures?15:55
fungifrickler: i assume you mean 940314?15:55
fricklerargl, sorry, I meant https://review.opendev.org/c/opendev/bindep/+/94025815:55
fungiif so, that was an experiment based off a suggestion clarkb made, it's not relevant15:55
clarkbfrickler: I think the failures in 940258 are due to pbr only listing a dependency on setuptools for python3.12 and newer and line 2 in the pyproject.toml dropped setuptools15:57
fungithe failures on the current iteration of 940258 will be fixed once https://review.opendev.org/940118 merges and pbr 6.1.1 is on pypi15:57
clarkbthe next pbr release (either final or beta) will list setuptools for all python versions. So ya those failures are not relevant15:57
fungifrickler: patchset 5 of 940258 is probably a better result to look at15:59
fungithat was the one that added the depends-on to the zuul-jobs change, ps6 was testing what happens when removing setuptools from the build-system.requires in pyproject.toml16:00
fungi(i was testing a variety of different things over the life of that dnm change)16:01
clarkbOnce I've loaded my ssh keys I'm going to clean up the etherpad held node16:01
clarkbI don't think we need it anymore16:01
clarkboh I don't think we need the helf grafana node anymore either now that we're going to proxy. I can clean that one up too16:02
clarkbI have cleaned up my etherpad and grafana autoholds16:11
opendevreviewMerged zuul/zuul-jobs master: Add ensure-pyproject-build role  https://review.opendev.org/c/zuul/zuul-jobs/+/94026717:04
opendevreviewMerged zuul/zuul-jobs master: build-python-release: pyproject-build by default  https://review.opendev.org/c/zuul/zuul-jobs/+/94027317:19
clarkbfungi: for bindep we're still waiting on the pbr release right?18:42
clarkbwhich I guess can proceed nowish now that package build tools have updated18:43
fungiclarkb: basically yes, i mean it'll work without pbr 6.1.1 but the simplified build-system.requires won't be viable until it exists19:12
clarkbcorvus: https://review.opendev.org/c/opendev/system-config/+/940403 this is a container image mirroring change that is related to opendev's zuul deployment if you have a moment19:14
corvus+319:20
clarkbthanks19:28
opendevreviewMerged opendev/system-config master: Mirror haproxy container image to opendevmirror on quay.io  https://review.opendev.org/c/opendev/system-config/+/94040319:29
fricklerkolla may be the victim of its self-generated load, but all the timeouts I checked on https://review.opendev.org/c/openstack/kolla-ansible/+/938819 were on rax-dfw19:52
fricklerI've also seen an unusual number of timeouts on requirements checks over the last couple of days and they also seemed to be concentrated on that cloud19:53
fricklernothing we can really act upon I guess, but still worth mentioning I'd think19:53
fungiwhat do requirements checks do that makes them more prone to timeouts?19:54
fricklernothing special, normal tempest/devstack jobs, so I do think some slowness of nodes in that cloud is happening. or maybe IO slowness?19:57
fungioh, you mean jobs run for the openstack/requirements project20:16
clarkbya I mean we've always theorized that we are our own worst noisy neighbors21:21
clarkbI think that the biggest thing we can do to mitigate that is try and improve the efficiency of our jobs particularly when it comes to avoiding heavy swap use. That seems to thrash everything21:22
clarkbalso kolla-ansible running 68 jobs per patchset is something else that might be optimized. For example there are LE specific jobs. Why not just run LE all the time and drop the specific jobs?21:35
clarkbthere are mariadb specific jobs (as opposed to mysql?) maybe just run mariadb all the time?21:36
clarkbthere are ipv6 specific jobs could probably just do ipv6 all the time21:36
clarkb(I don't actually know what divisions make sense to collapse across; I'm just trying to illustrate what it might look like)21:36
clarkbthere are different bifrost and ironic jobs too21:37
clarkband kolla-ansible isn't the only offender we saw similar with tacker the other day21:41
clarkbI wonder if this is something we should put on the tc meeting agenda21:44
clarkbfor example tack runs tacker-ft-v2-df-userdata-basic-max and tacker-ft-v2-df-userdata-basic-min. The -max job runs a single test case that takes 1600 seconds with a total job runtime of 1 hour 18 minutes in this example. The -min job runs 4 test cases that take 350 seconds with a runtime of 52 minutes in this example. Both are 4 node jobs. If we ran the 350 seconds of test cases in21:47
clarkbthe -max job we could save ~45 minutes * 4 test nodes per patchset just by collasping these two jobs together21:47
clarkbbut there are ~36 * 4 node jobs for each tacker change so that only takes us to 35 * 4 nodes. Still a measurable improvement but a lot more needs to be done21:48
clarkbI know once upon a time openstack was super concerned about resource utilziation and projects like zuul and starlingx canabilizing available resources to openstack's detrimment but time and time again we see that it is openstack's own house creating the problems21:48
clarkbanyway I don't want to put a ban hammer on anyone or any job. I just want to ask that develoeprs look at the tests they are running and ask "does this make sense?" "can we do this more efficiently?"21:49
fungiyay! the openafs package maintainer for debian finally uploaded a new enough version to sid to work with linux 6.12 and 6.1323:11
fungii'll finally be able to upgrade my kernel again23:11

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!