Thursday, 2026-02-19

@mnaser:matrix.orgIn order to be more efficient of donor infra.. does it make sense to setup reporting that we can look at from time to time?03:40
@mnaser:matrix.orgFrom the start of this month for example, Kolla has spent 1590h from the start of this month, aka 66.25 days of compute now .. all on non voting jobs that are constantly failing .. Cinder has 630 hours, Manila has 509 hrs, etc03:43
-@gerrit:opendev.org- chandan kumar proposed: [openstack/project-config] 977194: Add snu-csl/nvmevirt to available repos https://review.opendev.org/c/openstack/project-config/+/97719406:13
@fungicide:matrix.orgmnaser: Clark has put together ad hoc reports from time to time totalling the node-hours consumed by openstack sub-projects, though i think they were put together by scraping logs or doing low-level database queries, there wasn't a running tally kept up all the time11:56
@fungicide:matrix.orgthough i think it's been a few years, at the time the largest consumer (by far) was tripleo11:57
@fungicide:matrix.orgmnaser: depending on how you're querying that now, you'll want to make sure you multuply the build elapsed time by the number of nodes (and now the node types matter too since we no longer use just one node size, so maybe we need to work out scaling measurements by node ram as well)11:58
@fungicide:matrix.orgi'm going to use the zuul webui's "build image" button on https://zuul.opendev.org/t/opendev/image/ubuntu-noble to get an updated os-testr venv containing testtools 2.8.4 from about an hour ago12:59
@fungicide:matrix.orgspecifically for this commit in the new release: https://github.com/testing-cabal/testtools/commit/eca83db13:06
@fungicide:matrix.orgdoing ubuntu-jammy as well (apparently debian-trixie is fine, the problem that update addresses is on earlier python versions)13:34
@fungicide:matrix.orgthe last round of image builds for ubuntu-noble seems to have succeeded, but the uploads are all in "pending" state for the past 9+ hours... is there any way to tell from the webui why they didn't upload yet?13:40
@fungicide:matrix.orgactually, it looks like previous images also waited around 19-20 hours between build completion and uploading, i guess they just operate on independent timers?13:42
@harbott.osism.tech:regio.chator is this our "standard" backlog because uploads are so slow? can we maybe revert to older images still instead? likely not after the fresh rebuild14:05
@fungicide:matrix.orgoh that could indeed be delay due to volume14:15
@fungicide:matrix.orgbut yeah, the ubuntu-noble images at least finished uploading shortly before their replacements were built (perhaps even while they were building)14:16
@jim:acmegating.comthe launchers handle one image at a time, but upload to multiple endpoints simultaneously.14:20
zl01 is uploading debian-trixie since 2026-02-19T13:46:59
zl02 is uploading debian-trixie-arm64 since 2026-02-19T14:02:34
@jim:acmegating.comif the previous image would be sufficient, then perhaps we should delete the most current image and roll back?  that's our usual procedure, i think.14:23
@jim:acmegating.comlooks like we have images from 02-17 and 02-18; i assume the 2-18 ones are the problem and we can roll back to 2-17?14:24
@jim:acmegating.comi think the procedure would be to delete the image build artifact, since we want all the uploads to go away (and we don't want zuul to try uploading them again).  deleting the artifact should cause the launchers (after a short delay) to mark all the uploads for deletion.14:39
unfortunately, i think because of the backlog, it won't actually delete them for a while, and due to an oversight, will continue to use them. i can make a live patch to the launchers to fix that though. equivalent to: https://review.opendev.org/c/zuul/zuul/+/977326 Don't use un-ready image uploads [NEW]
@jim:acmegating.comi'll start working on the live patch; but i'm going to leave it to fungi or Jens Harbott to decide about deleting the image.14:39
@jim:acmegating.comfungi: that patch is in place, if you want to delete the image build artifact(s) then i think the rollback should work14:46
@mnaser:matrix.orgAh yes good point.. I scraped the API for jobs which were in check pipeline .. non voting .. failing and never passed in the same range 14:51
@fungicide:matrix.orgcorvus: in this case i don't know if the previous image would be helpful, unless it's old enough to pre-date testtools 2.8.314:55
@jim:acmegating.comi don't remember this happening yesterday14:56
@jim:acmegating.com3.8.3 is Tue, 17 Feb 2026 15:17:00 GMT from https://pypi.org/rss/project/testtools/releases.xml14:57
@fungicide:matrix.org15:17 utc on 2026-02-17 is when testtools 2.8.3 packages were uploaded to pypi, so images built and uploaded after that carry the breakage14:57
@fungicide:matrix.orgbut if the upload is happening on nearly a day delay, then it wouldn't have started happening until yesterday14:58
@jim:acmegating.comhttps://zuul.opendev.org/t/opendev/build/0428a64a82ae403b9b3e4fd0966c51f3 Completed at 2026-02-17 03:56:1214:58
@fungicide:matrix.orgokay, so should work14:59
@jim:acmegating.comright -- today, the 19th we're using images built on the 18th with the problem.  yesterday, the 18th, we were using images built on the 17th without the problem.  thus i'm thinking that deleting the bad images from the 18th would leave us with good images from the 17th.14:59
@jim:acmegating.comfungi: for clarity, i'm under the impression you'll issue the image built artifact delete  :)15:04
@fungicide:matrix.orgcorvus: i guess there are 3 builds for the different image formats? judging from https://zuul.opendev.org/t/opendev/image/ubuntu-noble15:06
@fungicide:matrix.orgjust confirming those are the 3 i'm deleting for ubuntu-noble15:07
@fungicide:matrix.orgd375a2cc24f44a9783842874c6d4bf2c, e34a5225598a4c8cbca7914a2482179d and 8341ce5f1f5045d9b3204db8226c43fe from 2026-02-19T04:08:5515:08
@fungicide:matrix.orgi've sufficiently convinced myself those are the correct images to delete, so doing that now15:09
@jim:acmegating.comaren't those the artifacts for the pending uploads?15:10
@fungicide:matrix.orgthose are the old pending uploads from 10 hours ago, yeah15:10
@fungicide:matrix.orgoh, i guess we also need to delete the three that are in use, more importantly15:10
@jim:acmegating.comto roll back, we need to delete the artifacts for the currently in-use images: f7a3c43cb4b34e689a3698e00f8460f3 e0f7d874b84c4f3aa028fe90717f75a0 04aec63688d3485cb6ff05c98cc605e315:10
@fungicide:matrix.orgyep, done now, so i told it to delete all 6 (the three most recent in use and the three older pending upload)15:12
@fungicide:matrix.orgthe three most recent pending upload were built on-demand after the fixed testtools made it onto pypi, so i left those untouched15:12
@fungicide:matrix.orgnot sure how long it takes for those to transition to deleting or disappear15:13
@jim:acmegating.comme neither -- i'm hoping that the launcher switches the uploads to 'deleting' soon; but if it doesn't we'll have to do that ourselves.  let's give it a little bit.15:14
@fungicide:matrix.orgokay they updated to deleting state15:16
@fungicide:matrix.orgwell, the builds updated to deleting anyway15:16
@fungicide:matrix.orgthe uploads are still ready and pending15:17
@fungicide:matrix.orgor does deleting the image build artifacts not automatically cascade to deleting the uploads?15:18
@jim:acmegating.comi think it should, i'm checking on what could cause a delay there15:18
@fungicide:matrix.orgi guess i can manually select "delete upload" on each of them15:18
@fungicide:matrix.orgokay, i'll hold off15:19
@jim:acmegating.comfungi: okay, i think it's the same queue as the uploads.  i think i want to adjust my patch to take the artifact state into account15:23
@fungicide:matrix.orgso i should go ahead and ask it to delete the uploads individually as well?15:25
@jim:acmegating.comthat shouldn't be necessary15:26
@fungicide:matrix.orgoh, i guess the launcher won't boot nodes from those uploads now that the corredponding image builds are deleting?15:26
@jim:acmegating.comthat will be the case once i patch the launchers (but it isn't right now)15:27
@fungicide:matrix.orgokay, thanks15:28
@jim:acmegating.comall right, both launchers are patched, so nodes created after this point should not use those images15:30
@fungicide:matrix.orgawesome!15:31
@jim:acmegating.comnow we need to make a change to zuul.  there are two ways we could do this:15:32
1) make zuul behave the way that i just patched it: if you delete the artifact, then it won't use the uploads for that artifact. but the uploads still show in "ready" state.
2) change the image delete api call so that it also marks all the uploads for deletion when it marks the artifact for deletion
@jim:acmegating.comright now, i'm thinking i don't love the idea that the web ui says the artifact is deleting but the upload is ready, and as a human, we have to look at both of those to determine the state.  so i'm kind of leaning toward #2 as the long-term fix.15:32
@fungicide:matrix.orgonce upon a time we had talked about only preserving images on disk for one image format, as a space-saving measure. does that affect build deletion?15:33
@fungicide:matrix.orglike, would that require being able to boot from ready uploads for builds that were deleted, or is the backing file on disk independent from whether the "build" is deleted?15:34
@jim:acmegating.comnow that the actual storage location is in swift, we don't do that any more, so they're all available in the cloud as long as the corresponding artifact is there.  but we could delete only the qcow2 artifact, for example, and it won't affect the others.  they are independent.15:34
@fungicide:matrix.orgoh right, there is no longer a need to keep any copies locally15:35
@fungicide:matrix.orgso anyway, i agree #2 makes the most sense to me15:36
@jim:acmegating.comok, i'll work on a change to do that.15:37
@jim:acmegating.comfungi: https://review.opendev.org/977326 and https://review.opendev.org/977333 should get zuul to our desired behavior.15:47
@fungicide:matrix.orgyep, thanks again!16:08
@fungicide:matrix.orghttps://zuul.opendev.org/t/openstack/builds?job_name=openstack-tox-py310&skip=0 indicates that the image rollback seems to have worked16:13
@priteau:matrix.orgThanks fungi, I am rechecking our blazar-nova change, it seems that we were the last POST_FAILURE16:18
@priteau:matrix.orgsuccess 😄16:21
@fungicide:matrix.orgexcellent16:28
-@gerrit:opendev.org- Zuul merged on behalf of Dr. Jens Harbott: [openstack/diskimage-builder] 976345: Add tox-py313 job https://review.opendev.org/c/openstack/diskimage-builder/+/97634518:20
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [openstack/project-config] 977380: Replace 2026.1/Gazpacho key with 2026.2/Hibiscus https://review.opendev.org/c/openstack/project-config/+/97738022:33

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!