-@gerrit:opendev.org- Ian Wienand proposed: | 06:19 | |
- [zuul/zuul-jobs] 878614: registry-tag-remove: role to delete tags from registry https://review.opendev.org/c/zuul/zuul-jobs/+/878614 | ||
- [zuul/zuul-jobs] 878740: promote-container-image: use generic tag removal role https://review.opendev.org/c/zuul/zuul-jobs/+/878740 | ||
@iwienand:matrix.org | ^ corvus Clark i do see what we mean by uploading from the promote role, but i do think the current system has a fair bit of merit in the way it tries to keep the window between code-commit and tag-update as small as possible. i think that abstracting out the handling of the tags to a role is a practical solution ... ^ implements it. | 06:22 |
---|---|---|
@tobias-urdin:matrix.org | felixedel: hello 👋 thanks for looking through the changes! any more feedback on the dark theme change? | 09:20 |
@clarkb:matrix.org | corvus: before I review ianw's stack at https://review.opendev.org/c/zuul/zuul-jobs/+/878612/ to handle image promotion and tag deletion did you feel strongly about his proposal there? It diverges from what you had originally proposed | 17:19 |
@clarkb:matrix.org | I guess I'm happy to review it but don't want to dig in if you are already a hard no | 17:19 |
@jim:acmegating.com | Clark: well, it sounds like what i described as option 1b | 17:35 |
@jim:acmegating.com | so i agree, it's a solution we can live with. but it makes the whole thing non-generic. it means that if we move from quay.io to something else, we are right back to where we were, which is we need to implement a registry-specific way of deleting tags. | 17:38 |
@jim:acmegating.com | i was hoping we could find a way to avoid vendor-specific apis, and i think options 2, 2b, and 2c have reasonable trade-offs to do that. it's a question of whether we think the fast tag swap is more important than avoiding vendor lock-in. and it's worth noting that 2b and 2c are ways of potentially accomplishing both. | 17:42 |
@jim:acmegating.com | having said all that, at least separating out the cleanup part makes maintenance and future implementation easier | 17:43 |
@jim:acmegating.com | Clark: to directly answer the question, definitely not a hard no. worth considering. there's the trade-offs as i see it ^ | 17:46 |
@clarkb:matrix.org | I think I'm mostly trying to figure out where to best put my energy to move forward on the registry move. Ianw has written one path forward but as you mention it relies on vendor specific apis to maintain the old workflow we've got. | 18:02 |
@clarkb:matrix.org | I or someone else could start work on 2)/2b) but before that happens we should probably agree that we don't want to proceed with what ianw has written? | 18:02 |
@jim:acmegating.com | well, i started on 2, with the assumption that we could upgrade to 2b or 2c if we want | 18:03 |
@jim:acmegating.com | i'm happy to discard the work on 2 if we want to change course and go with 1. but i stopped where i did so we could discuss it and decide | 18:03 |
@clarkb:matrix.org | oh did I miss those changes? | 18:03 |
@jim:acmegating.com | so... let's not decide by saying that 1 is already written. | 18:03 |
@clarkb:matrix.org | I may have. I've been pulled in a bunch of different directions the last few days | 18:03 |
@jim:acmegating.com | corvus proposed: [zuul/zuul-jobs] 878538: WIP: Update promote-container-image to copy from intermediate registry | 18:04 |
@clarkb:matrix.org | oh right the docs outline change | 18:04 |
@jim:acmegating.com | (also, ianw's implementation of 1 is incomplete as well, which is fine; i don't think we should run either of these to completion until we, you know, actually decide what we want) | 18:05 |
@clarkb:matrix.org | ++ | 18:05 |
@clarkb:matrix.org | I think where I'm currently at is that I'd like to avoid the vendor specific APIs if we can and fallback to vendor specific tooling if necessary. Put another way I'd like to push 2 to completion and only fall back to 1 if 2 fails | 18:06 |
@clarkb:matrix.org | (and we don't have any reason to think 2 would fail at the moment) | 18:06 |
@jim:acmegating.com | i think the tricky part about that is that 2 will pull in the intermediate registry as a requirement, and once we go down that path, it may be hard to back that out | 18:07 |
@jim:acmegating.com | might be worth ciphering on whether we can make that optional, or the fallback more graceful or something | 18:08 |
@clarkb:matrix.org | one way to make it optional is to rebuild the image in the post merge pipeline | 18:09 |
@clarkb:matrix.org | (not a great answer, but an option I think) | 18:09 |
@jim:acmegating.com | so basically say, if you have an intermediate registry, use build, upload, and promote jobs. if you don't, then just use build and upload jobs. | 18:09 |
@jim:acmegating.com | * so basically say, if you have an intermediate registry, use build, upload, and promote jobs. if you don't, then just use build and upload jobs. ? | 18:10 |
@clarkb:matrix.org | yes. Basically no promotion as you'd build and upload post merge and not have an artifact to manage | 18:10 |
@jim:acmegating.com | maybe we should name the role "container-image-intermediate-registry-promote" when implementing option 2 to leave room for "container-image-registry-promote" as option 1 (so if we end up implementing 2 and we later decide we want 1, we can end up with 4 roles all designed together) | 18:12 |
@clarkb:matrix.org | I like that. Gives flexibility too for end users if they have a strong need for the tag updates to happen quickly too | 18:14 |
@clarkb:matrix.org | they could maintain that half of the tooling for example while others used the generic jobs/roles | 18:14 |
@jim:acmegating.com | cool, let's see what ianw says later | 18:15 |
@iwienand:matrix.org | unhelpfully i don't have any strong opinion :) i feel like a tag cleanup role is probably not a bad thing to have in zuul-jobs independent of this anyway. of things like looked docker, quay, gitlab, artifactory, digitalocean all had a simple endpoint to remove a tag. google-cloud i could only see being able to do it using the client libraries. github doesn't seem to map and you have to remove packages or something? | 20:58 |
@iwienand:matrix.org | > <@jim:acmegating.com> maybe we should name the role "container-image-intermediate-registry-promote" when implementing option 2 to leave room for "container-image-registry-promote" as option 1 (so if we end up implementing 2 and we later decide we want 1, we can end up with 4 roles all designed together) | 20:59 |
I think I was probably coming to this idea too. Have container-image-promote work as it does now, using the "generic" tag removal; but in docs note that you need to use a supported registry | ||
@iwienand:matrix.org | but also have an option to upload from the intermediate registry if you have that without re-tagging | 21:00 |
@iwienand:matrix.org | i tried to update the docs to explain a bit more about the code-merged -> image published latency in https://review.opendev.org/c/zuul/zuul-jobs/+/878612/1 | 21:01 |
@iwienand:matrix.org | my thought was that would have "this can run in one of 3 modes" with this model... | 21:02 |
@jim:acmegating.com | sounds like no strong opinions all around, and willingness to experiment. i'm glad we all like the idea of making room for both roles, and i'm happy to review and merge both. we'll still need to decide which thing we want to try first with zuul (and also opendev). but that's also not urgent. | 21:03 |
@jim:acmegating.com | * sounds like no strong opinions all around, and willingness to experiment. i'm glad we all like the idea of making room for both roles, and i'm happy to review/author as appropriate and merge both. we'll still need to decide which thing we want to try first with zuul (and also opendev). but that's also not urgent. | 21:03 |
@iwienand:matrix.org | i would say 878612 is ready for review -- the explanations i've put in there are my understanding of the way it works, so if that's wrong, i'm starting from the wrong place :) | 21:06 |
@iwienand:matrix.org | https://review.opendev.org/c/zuul/zuul-jobs/+/878494/1 is another one that is just docs but tries to pull apart the buildx things a bit. the whole buildx path is quite impressive when you dig a bit like that :) | 21:10 |
@iwienand:matrix.org | Clark: https://review.opendev.org/c/zuul/zuul-jobs/+/878487/2 is related; just trying to simplify the buildx path a bit. per the cl the re i think the multi-stage push was actually fixed by your work a little while ago on atomic uploads to zuul-registry | 21:11 |
@jim:acmegating.com | ianw: i agree with your understanding in 878612 -- i just left some nit-level comments on it | 21:13 |
@jim:acmegating.com | well, almost-nit level...i mean, comments that don't radically change the understanding being communicated :) | 21:13 |
@clarkb:matrix.org | ok cool I'll take a look at reviewing the changes we've got at this point | 21:13 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 21:16 | |
- [zuul/nodepool] 878179: Use a persistent recursive watch for caches https://review.opendev.org/c/zuul/nodepool/+/878179 | ||
- [zuul/nodepool] 877431: Use image cache when launching nodes https://review.opendev.org/c/zuul/nodepool/+/877431 | ||
- [zuul/nodepool] 877432: Use node cache in node deleter https://review.opendev.org/c/zuul/nodepool/+/877432 | ||
- [zuul/nodepool] 877565: Log the reason we decline a request https://review.opendev.org/c/zuul/nodepool/+/877565 | ||
@clarkb:matrix.org | unrelated I'm running zuul's test suite locally to compare my python3.10 without x86_64-v3 support to python3.11 with x86_64-v3 support and am noticing a few tests are failing. Many of them seem related to file io somehow? makes me wonder if btrfs or my version of tmpfs is causing problems | 21:16 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 21:16 | |
- [zuul/nodepool] 878178: Vendor persistent recursive watch Kazoo support https://review.opendev.org/c/zuul/nodepool/+/878178 | ||
- [zuul/nodepool] 878179: Use a persistent recursive watch for caches https://review.opendev.org/c/zuul/nodepool/+/878179 | ||
- [zuul/nodepool] 877431: Use image cache when launching nodes https://review.opendev.org/c/zuul/nodepool/+/877431 | ||
- [zuul/nodepool] 877432: Use node cache in node deleter https://review.opendev.org/c/zuul/nodepool/+/877432 | ||
- [zuul/nodepool] 877565: Log the reason we decline a request https://review.opendev.org/c/zuul/nodepool/+/877565 | ||
@jim:acmegating.com | Clark: both versions failing? i have not seen that locally | 21:17 |
@clarkb:matrix.org | corvus: I havne't gotten to 3.11 yet. Failures under 3.10 though. I guess I should run under 3.11 and see if the same tests fail consistently | 21:18 |
@clarkb:matrix.org | /tmp is a tmpfs here and other things under / are btrfs though | 21:18 |
@jim:acmegating.com | mine are running under ext4 right now | 21:21 |
@jim:acmegating.com | ianw: again 99% agreement with content in 878494 | 21:27 |
-@gerrit:opendev.org- Ian Wienand proposed: | 21:30 | |
- [zuul/zuul-jobs] 878612: promote-image-container: do not delete tags https://review.opendev.org/c/zuul/zuul-jobs/+/878612 | ||
- [zuul/zuul-jobs] 878614: registry-tag-remove: role to delete tags from registry https://review.opendev.org/c/zuul/zuul-jobs/+/878614 | ||
- [zuul/zuul-jobs] 878740: promote-container-image: use generic tag removal role https://review.opendev.org/c/zuul/zuul-jobs/+/878740 | ||
@iwienand:matrix.org | thanks, will loop back after school run | 21:31 |
@clarkb:matrix.org | thinking about these test runs more I did some fairly big runs with the warnings cleanup and they were fine I think. Maybe I've got some local test setup problem | 21:33 |
@clarkb:matrix.org | I did just rebuild things | 21:33 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 878725: Check Gerrit submit requirements https://review.opendev.org/c/zuul/zuul/+/878725 | 21:34 | |
@jim:acmegating.com | Clark: i just did a full run on a tmpfs and saw no errors | 21:35 |
@clarkb:matrix.org | corvus: ack I've probably got some issue I need to sort out | 21:36 |
@clarkb:matrix.org | ianw: corvus at the end of https://review.opendev.org/c/zuul/zuul-jobs/+/878612/2/roles/build-container-image/common.rst we document that the temporary change tags will be cleaned up bu then in the same change we remove that code (bceause it isn't reliable). Should we update the bit in the docs indicating the role doesn't currently do tag cleanup? | 21:45 |
@clarkb:matrix.org | and then I guess expand on that later as we implement the options available to us for working around that? | 21:45 |
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/zuul-jobs] 878487: build-container-image: directly push with buildx https://review.opendev.org/c/zuul/zuul-jobs/+/878487 | 21:50 | |
@clarkb:matrix.org | no failures yet under 3.11. I wonder if we've somehow broken 3.10 or as I mentioned I've broken 3.10 on my local setup | 21:51 |
@clarkb:matrix.org | Interestingly comparing specific test case runtimes for some of the longer running tests some are faster and some are slower with the supposedly faster python3.11 installation | 22:05 |
@clarkb:matrix.org | I probably need to ensure even less stuff is runnogn on the test machine to get consistent comparisons between runs | 22:05 |
@clarkb:matrix.org | and python3.11 was successful. Runtime total was less than the failed 3.10 case too | 22:09 |
@iwienand:matrix.org | > <@clarkb:matrix.org> ianw: corvus at the end of https://review.opendev.org/c/zuul/zuul-jobs/+/878612/2/roles/build-container-image/common.rst we document that the temporary change tags will be cleaned up bu then in the same change we remove that code (bceause it isn't reliable). Should we update the bit in the docs indicating the role doesn't currently do tag cleanup? | 22:14 |
yeah i removed the skopeo removal because it's actually bad in it removing the underlying image, but i guess it's intended to be temporary | ||
@iwienand:matrix.org | the other thing i noticed in some of the output was "docker build" saying it was deprecated. so we should probably just make the buildx path the default | 22:15 |
@clarkb:matrix.org | ianw: I think one issue with that is docker pre 23 didn't include buildx by default (and even with 23 if you install from distro packages they don't pull in buildx by default...) | 22:19 |
@iwienand:matrix.org | ahh i guess we're always using the ce packaes that have it | 22:21 |
@clarkb:matrix.org | ya they have it but you have to explicitly install it even though buildx is the dfeault now | 22:23 |
@clarkb:matrix.org | I think they ended up not modifying their package metadata for the new release much compared to the old release so there are some weird inconsistencies compared to the release notes | 22:24 |
@clarkb:matrix.org | also the whole apparmor thing | 22:24 |
@clarkb:matrix.org | corvus: left some thoughts on the gerrit submit requirements chnage but +2 from me | 22:24 |
@clarkb:matrix.org | corvus: `"failed_modules": {"ansible.legacy.setup": {"failed": true, "module_stderr": "/bin/sh: line 1: /home/clark/src/zuul/zuul/.nox/tests-3-10/bin/python3.10: No such file or directory\\n", "module_stdout": "", "msg": "The module failed to execute correctly, you probably need to set the interpreter.` I think this is the problem I've got. That path does exist though | 22:44 |
@clarkb:matrix.org | oh except this is in bwrap so maybe that isn't mounted? | 22:45 |
@clarkb:matrix.org | yes I think that is the issue. It is running ansible out of the nox env and ansible dirs are getting bind mounted but not the python nox venv. I don't know why python3.11 would work though | 22:47 |
@clarkb:matrix.org | hrm python3.11 doesn't seem to add the equivalent 3.11 path to the ro bind paths | 22:56 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 878802: Report leaked resource metrics in statemachine driver https://review.opendev.org/c/zuul/nodepool/+/878802 | 23:09 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 878725: Check Gerrit submit requirements https://review.opendev.org/c/zuul/zuul/+/878725 | 23:20 | |
@jim:acmegating.com | Clark: replied and updated re submit reqs | 23:20 |
@jim:acmegating.com | Clark: re bwrap -- i think there's some special detection of the python venv install path to bind mount it into bwrap? | 23:21 |
@clarkb:matrix.org | corvus: maybe? I haven't found it in the logs at least. The more I dig into this the more I'm confused :/ The next thing I notice is that the job seems to run a playbook against hosts: all but it isn't clear what the inventory looks like. I think I need to hold the entire test env and not have it get cleaned up | 23:27 |
@clarkb:matrix.org | looks like there is an env var to do that | 23:29 |
@clarkb:matrix.org | corvus: under python3.11 playbook_0's inventory has python set to auto but python3.10 does not | 23:40 |
@clarkb:matrix.org | it also sets the connection to local under 3.11 but not 3.10. I think this is the underlying cause of the failure for me | 23:40 |
@clarkb:matrix.org | with 3.10 it is actually attempting to do ssh which it cannot do | 23:40 |
@clarkb:matrix.org | oh hrm the freeze playbook is failing which might be causing that | 23:48 |
@clarkb:matrix.org | so this is a number of failures deep now. elasticsearch reporting test has no ES index because the reporter doesn't run on RETRY_LIMIT? That happens because there is no connection when running playbook_0 and that happens because freeze playbook is failing on that module thing I posted earlier | 23:49 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!