opendevreview | James E. Blair proposed opendev/system-config master: Mirror node 23 container image https://review.opendev.org/c/opendev/system-config/+/940419 | 00:11 |
---|---|---|
opendevreview | Merged opendev/system-config master: Mirror node 23 container image https://review.opendev.org/c/opendev/system-config/+/940419 | 00:19 |
frickler | infra-root: nb04 has /opted to be full again. maybe looking at transitioning arm builds to zuul would be a better plan than having to keep babysitting that server? | 05:54 |
*** ykarel_ is now known as ykarel | 06:01 | |
*** elodilles_pto is now known as elodilles | 08:13 | |
tobias-urdin | after review.opendev.org upgrade to 3.10 it's blazingly fast, something was really improved, never been this fast to-date from what i can remember and that's almost ten years by now | 09:03 |
tobias-urdin | nice work! :) | 09:04 |
fungi | tobias-urdin: that's great to hear! i wonder if it's really gerrit 3.10 performing better, or the fact that we rest the h2 databases for its change caches... maybe the caches being so massive was causing them to do the opposite of their intended purpose | 12:53 |
fungi | normally we preserved the caches, but h2 doesn't really shrink backing files when records are deleted (just flags them so the engine can skip them) and they'd grown so massive over the years that they were causing other problems at shutdown/startup for the gerrit service | 12:56 |
fungi | so during the last upgrade we decided to remove them and let gerrit start up with cold caches | 12:58 |
fungi | and now we're thinking we should just do that every time we restart it for upgrades or config changes to keep the caches from growing so large | 12:58 |
frickler | seems github is a bit sad, just in case people notice failures https://www.githubstatus.com/ | 14:48 |
fungi | thanks for the heads up! | 14:52 |
fungi | in other news, i've had a bit of a eureka moment wrt the pypi warnings we started getting a few months ago about non-normalized sdist filenames... newer setuptools fixes that, so my zuul-jobs change to switch from direct setup.py invocation to using pyproject-build will solve it (by pulling in newer setuptools automatically) | 14:53 |
clarkb | yay automatic fixew | 15:49 |
clarkb | *fixes | 15:49 |
frickler | fungi: the job failures in https://review.opendev.org/c/zuul/zuul-jobs/+/940273 are not relevant for https://review.opendev.org/c/zuul/zuul-jobs/+/940273 , do I understand that correctly? are we ready to approve that stack, then? | 15:54 |
clarkb | frickler: I think you linked th same change twice. Which failures? | 15:55 |
fungi | frickler: i assume you mean 940314? | 15:55 |
frickler | argl, sorry, I meant https://review.opendev.org/c/opendev/bindep/+/940258 | 15:55 |
fungi | if so, that was an experiment based off a suggestion clarkb made, it's not relevant | 15:55 |
clarkb | frickler: I think the failures in 940258 are due to pbr only listing a dependency on setuptools for python3.12 and newer and line 2 in the pyproject.toml dropped setuptools | 15:57 |
fungi | the failures on the current iteration of 940258 will be fixed once https://review.opendev.org/940118 merges and pbr 6.1.1 is on pypi | 15:57 |
clarkb | the next pbr release (either final or beta) will list setuptools for all python versions. So ya those failures are not relevant | 15:57 |
fungi | frickler: patchset 5 of 940258 is probably a better result to look at | 15:59 |
fungi | that was the one that added the depends-on to the zuul-jobs change, ps6 was testing what happens when removing setuptools from the build-system.requires in pyproject.toml | 16:00 |
fungi | (i was testing a variety of different things over the life of that dnm change) | 16:01 |
clarkb | Once I've loaded my ssh keys I'm going to clean up the etherpad held node | 16:01 |
clarkb | I don't think we need it anymore | 16:01 |
clarkb | oh I don't think we need the helf grafana node anymore either now that we're going to proxy. I can clean that one up too | 16:02 |
clarkb | I have cleaned up my etherpad and grafana autoholds | 16:11 |
opendevreview | Merged zuul/zuul-jobs master: Add ensure-pyproject-build role https://review.opendev.org/c/zuul/zuul-jobs/+/940267 | 17:04 |
opendevreview | Merged zuul/zuul-jobs master: build-python-release: pyproject-build by default https://review.opendev.org/c/zuul/zuul-jobs/+/940273 | 17:19 |
clarkb | fungi: for bindep we're still waiting on the pbr release right? | 18:42 |
clarkb | which I guess can proceed nowish now that package build tools have updated | 18:43 |
fungi | clarkb: basically yes, i mean it'll work without pbr 6.1.1 but the simplified build-system.requires won't be viable until it exists | 19:12 |
clarkb | corvus: https://review.opendev.org/c/opendev/system-config/+/940403 this is a container image mirroring change that is related to opendev's zuul deployment if you have a moment | 19:14 |
corvus | +3 | 19:20 |
clarkb | thanks | 19:28 |
opendevreview | Merged opendev/system-config master: Mirror haproxy container image to opendevmirror on quay.io https://review.opendev.org/c/opendev/system-config/+/940403 | 19:29 |
frickler | kolla may be the victim of its self-generated load, but all the timeouts I checked on https://review.opendev.org/c/openstack/kolla-ansible/+/938819 were on rax-dfw | 19:52 |
frickler | I've also seen an unusual number of timeouts on requirements checks over the last couple of days and they also seemed to be concentrated on that cloud | 19:53 |
frickler | nothing we can really act upon I guess, but still worth mentioning I'd think | 19:53 |
fungi | what do requirements checks do that makes them more prone to timeouts? | 19:54 |
frickler | nothing special, normal tempest/devstack jobs, so I do think some slowness of nodes in that cloud is happening. or maybe IO slowness? | 19:57 |
fungi | oh, you mean jobs run for the openstack/requirements project | 20:16 |
clarkb | ya I mean we've always theorized that we are our own worst noisy neighbors | 21:21 |
clarkb | I think that the biggest thing we can do to mitigate that is try and improve the efficiency of our jobs particularly when it comes to avoiding heavy swap use. That seems to thrash everything | 21:22 |
clarkb | also kolla-ansible running 68 jobs per patchset is something else that might be optimized. For example there are LE specific jobs. Why not just run LE all the time and drop the specific jobs? | 21:35 |
clarkb | there are mariadb specific jobs (as opposed to mysql?) maybe just run mariadb all the time? | 21:36 |
clarkb | there are ipv6 specific jobs could probably just do ipv6 all the time | 21:36 |
clarkb | (I don't actually know what divisions make sense to collapse across; I'm just trying to illustrate what it might look like) | 21:36 |
clarkb | there are different bifrost and ironic jobs too | 21:37 |
clarkb | and kolla-ansible isn't the only offender we saw similar with tacker the other day | 21:41 |
clarkb | I wonder if this is something we should put on the tc meeting agenda | 21:44 |
clarkb | for example tack runs tacker-ft-v2-df-userdata-basic-max and tacker-ft-v2-df-userdata-basic-min. The -max job runs a single test case that takes 1600 seconds with a total job runtime of 1 hour 18 minutes in this example. The -min job runs 4 test cases that take 350 seconds with a runtime of 52 minutes in this example. Both are 4 node jobs. If we ran the 350 seconds of test cases in | 21:47 |
clarkb | the -max job we could save ~45 minutes * 4 test nodes per patchset just by collasping these two jobs together | 21:47 |
clarkb | but there are ~36 * 4 node jobs for each tacker change so that only takes us to 35 * 4 nodes. Still a measurable improvement but a lot more needs to be done | 21:48 |
clarkb | I know once upon a time openstack was super concerned about resource utilziation and projects like zuul and starlingx canabilizing available resources to openstack's detrimment but time and time again we see that it is openstack's own house creating the problems | 21:48 |
clarkb | anyway I don't want to put a ban hammer on anyone or any job. I just want to ask that develoeprs look at the tests they are running and ask "does this make sense?" "can we do this more efficiently?" | 21:49 |
fungi | yay! the openafs package maintainer for debian finally uploaded a new enough version to sid to work with linux 6.12 and 6.13 | 23:11 |
fungi | i'll finally be able to upgrade my kernel again | 23:11 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!