Saturday, 2025-11-22

fungithe important bits are still readable from it, should be fine00:00
corvusclarkb: your understanding of the utility of the glance checksums matches mine...01:05
Clark[m]It will be very interesting to see what they find on the cloud side01:06
corvusClark: looking at scrollback from oct 6, i think we decided that there probably wasn't anything else we could do on the upload side and maybe this was a reason we should create validation jobs.01:07
clarkboh right validation jobs01:07
clarkbso yes I think doing that is now an extra good idea. But also helping the cloud not corrupt data is also good01:08
corvusoho i think there is something else we can do01:10
corvusclarkb: that `checksum` field that you noted doesn't match -- there's an argument to create_image to check that01:11
corvusso i think that means we should at least be able to detect this on upload01:11
Clark[m]Interesting I wonder why that isn't the default..fungi mentioned if the cloud converts the image that value may change?01:12
Clark[m]Maybe we just have to ensure we don't ever convert the image then we can use that?01:12
Clark[m]Anyway a problem for another day. I've got dinner to prep now01:12
corvusClark: remote:   https://review.opendev.org/c/zuul/zuul/+/968090 Openstack: validate image checksum on upload [NEW]        01:14
corvusClark: fungi ^ we can decide next week if that's something we want to experiment with (i think it would be a good idea; and after all, if something does go wrong, we've got backup images).01:14
cardoecorvus: Clark[m]: https://review.opendev.org/c/openstack/openstacksdk/+/966291 would that be something similar to what you want?05:38
fungicardoe: related, but md5 is generally sufficient for our use case. the odds we'll encounter an identical collision after a random corruption event is statistically near-impossible14:02
fungion my last recheck of 966200 only opendev-build-diskimage-debian-trixie-arm64 failed because Task: Wait for sha256 timed out18:02
fungiTASK [upload-image-swift : Wait for sha256]18:03
fungihttps://zuul.opendev.org/t/opendev/build/c1c85105c6f943a4a91e4f190380434f/log/job-output.txt#9255-926618:03
fungiat least i think that's where it all went sideways18:05
clarkbfungi: that looks like an ansible timeout not a zuul tiemout18:17
clarkbso maybe we need to increase the timeout there?18:17
clarkbor see why getting the sha256 in that context was slow I guess18:17
fungii think it's a proxy for when glance is done ingesting the upload18:32
clarkbfungi: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-image-swift/tasks/main.yaml#L57-L63 this is the task it is waiting for the result of https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-image-swift/tasks/main.yaml#L1-L8 to complete18:39
clarkblooks like we have a default timeout of 600 seconds and that async task is supposed to retry 600 / 10 times with a 10 second delay between attempts to implement that18:46
clarkbbased on the log I think it only ran one attempt then decided some timeout had been exceeded18:46
clarkboh we start calculating the timeout from when the original task starts running https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-image-swift/tasks/main.yaml#L618:48
clarkbso ya I think we increase upload_image_swift_hash_timeout to some larger value than 60018:48
fungiso maybe it starts expired?18:48
clarkbfungi: I think >600 seconds elapsed from when the first task starts and when we asynchronously check it at the end18:48
clarkbthe timeout isn'y for the single task at the end its for the original task to complete we just check it later for up to as many times as the timeout would allow18:49
fungiright, that's basically what i meant18:49
clarkboh ya the second task starts expired yup18:49
fungii can push up a change unless you're already working on one18:49
clarkbI am not sorry don't have my ssh keys loaded18:51
fungino worries, so we need to override the default upload_image_swift_hash_timeout in our job i guess18:51
fungiand yeah, https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-image-swift/defaults/main.yaml#L7 sets it to 600 seconds but we currently don't override that anywhere18:52
clarkbeitehr that or make sha256 calculations faster18:53
* fungi looks around for his magic wand18:54
fungioverride it is18:54
opendevreviewJeremy Stanley proposed opendev/zuul-providers master: Add trixie-arm64  https://review.opendev.org/c/opendev/zuul-providers/+/96620018:58
fungibumped from 10 minutes to 2018:58
fungiSigned-off-by: it takes a village19:00

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!