| fungi | the important bits are still readable from it, should be fine | 00:00 |
|---|---|---|
| corvus | clarkb: your understanding of the utility of the glance checksums matches mine... | 01:05 |
| Clark[m] | It will be very interesting to see what they find on the cloud side | 01:06 |
| corvus | Clark: looking at scrollback from oct 6, i think we decided that there probably wasn't anything else we could do on the upload side and maybe this was a reason we should create validation jobs. | 01:07 |
| clarkb | oh right validation jobs | 01:07 |
| clarkb | so yes I think doing that is now an extra good idea. But also helping the cloud not corrupt data is also good | 01:08 |
| corvus | oho i think there is something else we can do | 01:10 |
| corvus | clarkb: that `checksum` field that you noted doesn't match -- there's an argument to create_image to check that | 01:11 |
| corvus | so i think that means we should at least be able to detect this on upload | 01:11 |
| Clark[m] | Interesting I wonder why that isn't the default..fungi mentioned if the cloud converts the image that value may change? | 01:12 |
| Clark[m] | Maybe we just have to ensure we don't ever convert the image then we can use that? | 01:12 |
| Clark[m] | Anyway a problem for another day. I've got dinner to prep now | 01:12 |
| corvus | Clark: remote: https://review.opendev.org/c/zuul/zuul/+/968090 Openstack: validate image checksum on upload [NEW] | 01:14 |
| corvus | Clark: fungi ^ we can decide next week if that's something we want to experiment with (i think it would be a good idea; and after all, if something does go wrong, we've got backup images). | 01:14 |
| cardoe | corvus: Clark[m]: https://review.opendev.org/c/openstack/openstacksdk/+/966291 would that be something similar to what you want? | 05:38 |
| fungi | cardoe: related, but md5 is generally sufficient for our use case. the odds we'll encounter an identical collision after a random corruption event is statistically near-impossible | 14:02 |
| fungi | on my last recheck of 966200 only opendev-build-diskimage-debian-trixie-arm64 failed because Task: Wait for sha256 timed out | 18:02 |
| fungi | TASK [upload-image-swift : Wait for sha256] | 18:03 |
| fungi | https://zuul.opendev.org/t/opendev/build/c1c85105c6f943a4a91e4f190380434f/log/job-output.txt#9255-9266 | 18:03 |
| fungi | at least i think that's where it all went sideways | 18:05 |
| clarkb | fungi: that looks like an ansible timeout not a zuul tiemout | 18:17 |
| clarkb | so maybe we need to increase the timeout there? | 18:17 |
| clarkb | or see why getting the sha256 in that context was slow I guess | 18:17 |
| fungi | i think it's a proxy for when glance is done ingesting the upload | 18:32 |
| clarkb | fungi: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-image-swift/tasks/main.yaml#L57-L63 this is the task it is waiting for the result of https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-image-swift/tasks/main.yaml#L1-L8 to complete | 18:39 |
| clarkb | looks like we have a default timeout of 600 seconds and that async task is supposed to retry 600 / 10 times with a 10 second delay between attempts to implement that | 18:46 |
| clarkb | based on the log I think it only ran one attempt then decided some timeout had been exceeded | 18:46 |
| clarkb | oh we start calculating the timeout from when the original task starts running https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-image-swift/tasks/main.yaml#L6 | 18:48 |
| clarkb | so ya I think we increase upload_image_swift_hash_timeout to some larger value than 600 | 18:48 |
| fungi | so maybe it starts expired? | 18:48 |
| clarkb | fungi: I think >600 seconds elapsed from when the first task starts and when we asynchronously check it at the end | 18:48 |
| clarkb | the timeout isn'y for the single task at the end its for the original task to complete we just check it later for up to as many times as the timeout would allow | 18:49 |
| fungi | right, that's basically what i meant | 18:49 |
| clarkb | oh ya the second task starts expired yup | 18:49 |
| fungi | i can push up a change unless you're already working on one | 18:49 |
| clarkb | I am not sorry don't have my ssh keys loaded | 18:51 |
| fungi | no worries, so we need to override the default upload_image_swift_hash_timeout in our job i guess | 18:51 |
| fungi | and yeah, https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-image-swift/defaults/main.yaml#L7 sets it to 600 seconds but we currently don't override that anywhere | 18:52 |
| clarkb | eitehr that or make sha256 calculations faster | 18:53 |
| * fungi looks around for his magic wand | 18:54 | |
| fungi | override it is | 18:54 |
| opendevreview | Jeremy Stanley proposed opendev/zuul-providers master: Add trixie-arm64 https://review.opendev.org/c/opendev/zuul-providers/+/966200 | 18:58 |
| fungi | bumped from 10 minutes to 20 | 18:58 |
| fungi | Signed-off-by: it takes a village | 19:00 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!