@jim:acmegating.com | the registry just restarted? | 00:01 |
---|---|---|
@clarkb:matrix.org | I haven't touched the hosts so wasn't me if it did | 00:03 |
@jim:acmegating.com | the storage appears to have persisted, so not a problem | 00:03 |
@iwienand:matrix.org | ahh sorry i just killed it, didn't know you were looking. will leave it | 00:04 |
@iwienand:matrix.org | the build will have come from ... | 00:04 |
@iwienand:matrix.org | https://zuul.opendev.org/t/openstack/build/69f9654bebfe435fb2792ba173f9dffc i would say | 00:05 |
@iwienand:matrix.org | Update upload metadata chunks: [{'size': 7670, 'md5': '54a131e3f12627a57edf174190aabebf'}] | 00:06 |
@iwienand:matrix.org | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_69f/805932/3/check/opendev-buildset-registry/69f9654/docker/buildset_registry.txt is the registry log | 00:06 |
@jim:acmegating.com | ianw: i'd probably run the buildset registry with a patch that output the received manifest to confirm what docker pushed | 00:15 |
@iwienand:matrix.org | corvus: will opendev-buildset-registry obey depends-on? | 00:17 |
@iwienand:matrix.org | i feel like no | 00:17 |
@jim:acmegating.com | ianw: i just mean restart it on the held node. you don't need to run it in docker, just clone the repo and run it by hand. | 00:18 |
@jim:acmegating.com | anyway, i'd bet a nickel that it's reporting a size of 7671 because that's what's in the manifest that's being uploaded. | 00:20 |
@jim:acmegating.com | ianw: hosts are all yours, i've logged out | 00:21 |
@iwienand:matrix.org | 2021-09-02 00:44:35,172 INFO registry.api: processing {'mediaType': 'application/vnd.docker.image.rootfs.diff.tar.gzip', 'digest': 'sha256:8ba24a495b7a3a8458f46a749b1072be2e88ec0476b88b61289c5b51341b8874'} | 00:46 |
2021-09-02 00:44:35,172 INFO registry.api: digest:sha256:8ba24a495b7a3a8458f46a749b1072be2e88ec0476b88b61289c5b51341b8874 blob_size:3121 | ||
@iwienand:matrix.org | -rw-r--r-- 1 root root 3121 Sep 2 00:43 data | 00:46 |
@jim:acmegating.com | ianw: does that mean i lose my nickel? | 00:47 |
@iwienand:matrix.org | we do not seem to get a size from docker push (if i'm doing this right). the blob_size seems to match what's on disk ... | 00:48 |
@jim:acmegating.com | ianw: that's not 716x? | 00:49 |
@jim:acmegating.com | ianw: what's the info for sha256:a65b430471b85457a204da7fcbaf12a5f18ea5c33cc9a31a2749aa43f6290fb1 ? | 00:50 |
@iwienand:matrix.org | i've rebuilt so i guess the numbers are slightly different | 00:51 |
@jim:acmegating.com | ianw: also, that looks like the first layer -- why is the sha different? | 00:51 |
@jim:acmegating.com | oh | 00:51 |
@jim:acmegating.com | why did you rebuild? | 00:51 |
@iwienand:matrix.org | i figured that was the way to docker push it? | 00:51 |
@jim:acmegating.com | you can just run docker push | 00:52 |
@jim:acmegating.com | anyway, what's the info for the third layer this time? | 00:52 |
@iwienand:matrix.org | well doh, i just tried to push the original and i think i've cleared it | 00:54 |
@iwienand:matrix.org | nice, now i'm getting 500 errors from teh registry | 00:55 |
@iwienand:matrix.org | ok, cleared /stroage and started again | 00:56 |
@iwienand:matrix.org | 021-09-02 00:56:12,069 INFO registry.api: processing {'mediaType': 'application/vnd.docker.image.rootfs.diff.tar.gzip', 'size': 7670, 'digest': 'sha256:3a7c8ad16a18e78ae62b0490ab345b44303e3718bc6af474e20facc335020748'} | 00:56 |
@iwienand:matrix.org | this time the push *did* specify a size | 00:56 |
@iwienand:matrix.org | now things are even *further* out | 01:01 |
@iwienand:matrix.org | the manifest says | 01:01 |
@iwienand:matrix.org | "size": 7670, | 01:02 |
"digest": "sha256:3a7c8ad16a18e78ae62b0490ab345b44303e3718bc6af474e20facc335020748" | ||
@iwienand:matrix.org | if i get that blob | 01:02 |
@iwienand:matrix.org | curl -H "Authorization: Bearer ${TOKEN}" 'https://zuul-jobs.buildset-registry:5000/v2/opendevorg/assets/blobs/sha256:3a7c8ad16a18e78ae62b0490ab345b44303e3718bc6af474e20facc335020748?ns=docker.io' --output test | 01:02 |
@iwienand:matrix.org | -rw-r--r-- 1 root root 7669 Sep 2 01:01 test | 01:02 |
@iwienand:matrix.org | oh no, 69 + 1 = 70. just confused. i think this replicates ... | 01:03 |
@jim:acmegating.com | so the push was mismatched -- either the size it sent us is wrong, or the data were truncated? | 01:04 |
@iwienand:matrix.org | it seems so | 01:05 |
@iwienand:matrix.org | sigh, the timeout just expired and the nodes went away | 01:06 |
@iwienand:matrix.org | https://etherpad.opendev.org/p/zrobo <- i'm going to put some debugging things here | 01:07 |
@iwienand:matrix.org | clarkb: if you a have a sec for https://review.opendev.org/c/opendev/base-jobs/+/806818 i might be able to make a more persistent debugging setup in infra | 01:09 |
@clarkb:matrix.org | ianw: approved, I guess be prepared to revert it if it breaks jobs when we don't want them to fail. That is in base jobs which is always a bit tricky to test | 01:12 |
@iwienand:matrix.org | yeah, thanks. i think, if i set that var and set holds for opendev-buildset-registry and the system-config image build job i can replicate it | 01:14 |
@jim:acmegating.com | ianw: did that patch just merge without a gerritbot announcement? | 01:28 |
@jim:acmegating.com | oh that was in base jobs, nm | 01:28 |
@iwienand:matrix.org | @corvus i i think cause it's opendev doesn't echo here | 01:29 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Delete election https://review.opendev.org/c/zuul/zuul/+/807005 | 01:35 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 02:52 | |
- [zuul/zuul] Remove nodeset from NodeRequest https://review.opendev.org/c/zuul/zuul/+/806063 | ||
- [zuul/zuul] Make node requests persistent https://review.opendev.org/c/zuul/zuul/+/806280 | ||
- [zuul/zuul] Add node request cache to zk nodepool interface https://review.opendev.org/c/zuul/zuul/+/806639 | ||
- [zuul/zuul] Wrap nodepool request completed events with election https://review.opendev.org/c/zuul/zuul/+/806653 | ||
- [zuul/zuul] Remove unecessary node request cancelation code https://review.opendev.org/c/zuul/zuul/+/806814 | ||
- [zuul/zuul] Refactor the checkNodeRequest method https://review.opendev.org/c/zuul/zuul/+/806816 | ||
- [zuul/zuul] Don't store node requests/nodesets on queue items https://review.opendev.org/c/zuul/zuul/+/806821 | ||
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Reduce leaked file descriptors in tests https://review.opendev.org/c/zuul/zuul/+/806995 | 02:53 | |
@jim:acmegating.com | Clark swest tobiash : ^ okay, i think i fixed the leak (with `del self.election`) and addressed/responded to all comments. | 02:57 |
-@gerrit:opendev.org- Jiri Podivin proposed: [zuul/zuul-jobs] DNM https://review.opendev.org/c/zuul/zuul-jobs/+/807031 | 06:46 | |
@iwienand:matrix.org | Clark: corvus see https://etherpad.opendev.org/p/zrobo ... i have a pretty good debugging setup i think, but haven't quite pinpointed what's wrong | 08:05 |
@iwienand:matrix.org | tl;dr ... there is a screen running on 23.253.159.167 ; one window you can "docker push" from and the other has mitmproxy running (docker is setup to http_proxy through this) | 08:06 |
@iwienand:matrix.org | the registry is 104.130.13.181. as zuul you can run ".tox/venv/bin/zuul-registry -d -c /home/zuul/buildset_registry/conf/registry.yaml serve" and then "docker push" to it from the other host | 08:07 |
@iwienand:matrix.org | m -rf /home/zuul/storage/_local/* and rinse and repeat ... | 08:08 |
@iwienand:matrix.org | we are one byte short on *all* the layers; info in the etherpad | 08:08 |
@iwienand:matrix.org | feel free to fiddle ... | 08:09 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] Remove hold request cache from nodepool interface https://review.opendev.org/c/zuul/zuul/+/806813 | 08:54 | |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: | 09:00 | |
- [zuul/zuul] Don't treat failed requirement jobs as ready https://review.opendev.org/c/zuul/zuul/+/806781 | ||
- [zuul/zuul] Remove returnNodeSet calls from scheduler https://review.opendev.org/c/zuul/zuul/+/806062 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: | 09:54 | |
- [zuul/zuul] Periodically maintain connection caches https://review.opendev.org/c/zuul/zuul/+/806756 | ||
- [zuul/zuul] Clean up dangling cache data nodes more often https://review.opendev.org/c/zuul/zuul/+/807102 | ||
-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul-jobs] Drop deprecated url param of upload_logs_XXX modules https://review.opendev.org/c/zuul/zuul-jobs/+/807118 | 11:18 | |
-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul-jobs] Drop deprecated url param of upload_logs_XXX modules https://review.opendev.org/c/zuul/zuul-jobs/+/807118 | 11:23 | |
@gtema:matrix.org | corvus, I tried to complete https://review.opendev.org/c/zuul/zuul-jobs/+/776677 with https://review.opendev.org/c/opendev/base-jobs/+/806941 and https://review.opendev.org/c/zuul/zuul-jobs/+/807118 as requested | 11:23 |
@gtema:matrix.org | please have a look whether there is something else still missing | 11:24 |
-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul] Add support for gitlab squash merge https://review.opendev.org/c/zuul/zuul/+/806231 | 11:31 | |
-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul-jobs] Drop deprecated url param of upload_logs_XXX modules https://review.opendev.org/c/zuul/zuul-jobs/+/807118 | 11:40 | |
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] Make sure dynamic change queues are marked as such https://review.opendev.org/c/zuul/zuul/+/807121 | 11:43 | |
@gtema:matrix.org | avass, I see you implemented initially upload-logs-s3 role. Do you still know what is this `_undocumented_test_worker_node_`? Potentially it started failing in https://review.opendev.org/c/zuul/zuul-jobs/+/776677 (at least this is the only difference to regular swift role | 12:37 |
@avass:vassast.org | gtema: I think it was added to make it easier to test the role | 12:39 |
@gtema:matrix.org | due to the need of boto? | 12:40 |
@avass:vassast.org | it's not possible to target localhost like that in an untrusted context. but it may also be a good idea to remove that variable completely | 12:41 |
@gtema:matrix.org | I wonder why it now fails | 12:42 |
@gtema:matrix.org | I see boto is installed on test node, while role now tries to execute delegated (what in reality should be same node) | 12:43 |
-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul-jobs] [DNM] Test dropping delegation in the upload_logs_s3 role https://review.opendev.org/c/zuul/zuul-jobs/+/807132 | 12:47 | |
-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul-jobs] [DNM] Test dropping delegation in the upload_logs_s3 role https://review.opendev.org/c/zuul/zuul-jobs/+/807132 | 13:05 | |
@gtema:matrix.org | oh, now I got, cause syntax in https://opendev.org/zuul/zuul-jobs/src/branch/master/test-playbooks/upload-logs-s3.yaml#L48 apparently does not work anymore since last security change | 13:15 |
-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul-jobs] [DNM] Test upload_logs_s3 role https://review.opendev.org/c/zuul/zuul-jobs/+/807132 | 13:22 | |
@jim:acmegating.com | gtema: thanks! see comments on https://review.opendev.org/776677 -- it looks like there already was a change for step 3 at https://review.opendev.org/c/opendev/base-jobs/+/777087 but i don't think it ever got linked to 776677 :/ -- your change for step 5 lgtm. | 14:22 |
-@gerrit:opendev.org- Jiri Podivin proposed: [zuul/zuul-jobs] DNM https://review.opendev.org/c/zuul/zuul-jobs/+/807031 | 14:23 | |
-@gerrit:opendev.org- Jiri Podivin proposed: [zuul/zuul-jobs] DNM https://review.opendev.org/c/zuul/zuul-jobs/+/807031 | 14:42 | |
@gtema:matrix.org | corvus: thks. Sadly the 776677 fails now, I assume since security release. And honestly I have now no clue what to do with it | 15:01 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 15:02 | |
- [zuul/zuul] Make node requests persistent https://review.opendev.org/c/zuul/zuul/+/806280 | ||
- [zuul/zuul] Add node request cache to zk nodepool interface https://review.opendev.org/c/zuul/zuul/+/806639 | ||
- [zuul/zuul] Wrap nodepool request completed events with election https://review.opendev.org/c/zuul/zuul/+/806653 | ||
- [zuul/zuul] Remove unecessary node request cancelation code https://review.opendev.org/c/zuul/zuul/+/806814 | ||
- [zuul/zuul] Refactor the checkNodeRequest method https://review.opendev.org/c/zuul/zuul/+/806816 | ||
- [zuul/zuul] Don't store node requests/nodesets on queue items https://review.opendev.org/c/zuul/zuul/+/806821 | ||
- [zuul/zuul] WIP: re-add test_zookeeper_disconnect2 https://review.opendev.org/c/zuul/zuul/+/807166 | ||
@clarkb:matrix.org | catching up on ianw's debugging of the zuul-registry do we know if the builder push is supplying the manifest to zuul-registry with the off by one sizes or is zuul-registry calculating them at off by one values? It isn't clear to me from the mitm log in the etherpad if we've checked what manifest data is supplied to the registry and whether or not the registry is updating the manifest | 15:05 |
@jim:acmegating.com | Clark: i haven't looked into that today | 15:08 |
@clarkb:matrix.org | ok, I've got a meeting and breakfast but I'll try to look into that in a bit | 15:09 |
@clarkb:matrix.org | I half wonder if the docker hub registry is always rewriting the size in the manifest and the builder is supplying the wrong value due to a bug | 15:10 |
-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul-jobs] [DNM] Test upload_logs_s3 role https://review.opendev.org/c/zuul/zuul-jobs/+/807132 | 15:12 | |
@jim:acmegating.com | that sure would be something | 15:30 |
@jpew:matrix.org | Is there a recommended way to validate that the configuration in a trusted project is correct before merging it? | 15:47 |
@jpew:matrix.org | (since it won't use the pending changes in the check or gate(?) pipelines) | 15:48 |
@clarkb:matrix.org | jpew: we have a test base job that allows us to land a changes without affecting any jobs, then we can reparent other jobs to it speculatively or temporarily to test things work well. The biggest downside to this approach is that you can often end up with copies of stuff | 15:51 |
@clarkb:matrix.org | Similar setup works for roles, modify a test role, use test role in test base job and so on | 15:51 |
@clarkb:matrix.org | pabelanger: has an approach where as little as possible is put in the trusted repo to minimize the amount of code that needs testing in this way | 15:52 |
@jim:acmegating.com | tobiash: i noticed https://review.opendev.org/788680 is going to conflict with the nodepool sos work -- can you take a look at that soon? you have a previous +2 on it, but i want to make sure you're okay with the minor change i requested afterwords. | 16:28 |
@jpew:matrix.org | Clark: So you have a project that is just a basic test job, then you submit a change to it with a Depends-On change for the pending change in the config repo? | 16:30 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 16:30 | |
- [zuul/zuul] Remove nodeset from NodeRequest https://review.opendev.org/c/zuul/zuul/+/806063 | ||
- [zuul/zuul] Make node requests persistent https://review.opendev.org/c/zuul/zuul/+/806280 | ||
- [zuul/zuul] Add node request cache to zk nodepool interface https://review.opendev.org/c/zuul/zuul/+/806639 | ||
- [zuul/zuul] Wrap nodepool request completed events with election https://review.opendev.org/c/zuul/zuul/+/806653 | ||
- [zuul/zuul] Remove unecessary node request cancelation code https://review.opendev.org/c/zuul/zuul/+/806814 | ||
- [zuul/zuul] Refactor the checkNodeRequest method https://review.opendev.org/c/zuul/zuul/+/806816 | ||
- [zuul/zuul] Don't store node requests/nodesets on queue items https://review.opendev.org/c/zuul/zuul/+/806821 | ||
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Benjamin Schanzel: [zuul/zuul] Add tenant name on NodeRequests for Nodepool https://review.opendev.org/c/zuul/zuul/+/788680 | 16:30 | |
@jim:acmegating.com | jpew: https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/jobs.yaml#L5-L23 explains the test process for base-jobs | 16:31 |
@clarkb:matrix.org | jpew: no the base test job is a copy of the base job for the tenant. But we can modify it without affecting everything. Then we merge changes to that base-test job and we can update zuul's unittests for example to parent to that and exercise the trusted stuff | 16:31 |
@jpew:matrix.org | Ah, OK | 16:32 |
@jpew:matrix.org | I'll look at that, thanks | 16:32 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] Reduce leaked file descriptors in tests https://review.opendev.org/c/zuul/zuul/+/806995 | 17:17 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Remove internal nodepool request cache https://review.opendev.org/c/zuul/zuul/+/807196 | 18:13 | |
-@gerrit:opendev.org- Clark Boylan proposed: | 18:43 | |
- [zuul/nodepool] Catch more potential errors in PoolWorker run loop https://review.opendev.org/c/zuul/nodepool/+/807199 | ||
- [zuul/nodepool] Deregister launchers when restarting them https://review.opendev.org/c/zuul/nodepool/+/807200 | ||
@clarkb:matrix.org | corvus: I think ^ those two things may be related to the leaked launchers in zk | 18:43 |
-@gerrit:opendev.org- Zuul merged on behalf of Benjamin Schanzel: [zuul/zuul] Add tenant name on NodeRequests for Nodepool https://review.opendev.org/c/zuul/zuul/+/788680 | 19:04 | |
@jim:acmegating.com | Clark: nice catch -- i left a q on one, but +2 on both | 20:34 |
@clarkb:matrix.org | thanks responded | 20:37 |
@jim:acmegating.com | kk | 20:40 |
@mordred:inaugust.com | corvus: "zuul-tox-py38 https://zuul.opendev.org/t/zuul/build/a31ec1e2fe9744b0976f3acea87c3451 : TIMED_OUT in 1h 31m 59s" <-- that's on the bottom patch in your SOS stack | 20:44 |
@mordred:inaugust.com | is that a thing that bears investigating? | 20:44 |
@jim:acmegating.com | mordred: i think we're getting to a point where we're adding enough additional zk traffic that we're noticably slowing down the tests. in particular, we put zero-node requests into zk now, and a lot of tests use those (previously, we just no-op'd it in the scheduler, but it'd be hard to do that with multi-scheduler without some zk traffic) | 21:22 |
@iwienand:matrix.org | @corvus that's what i was hoping to catch with the mitmproxy dump, see what the on-the-wire body length is as it leaves the docker push, before zuul-registry writes it out | 21:34 |
@iwienand:matrix.org | sorry, i mean @clarkb | 21:34 |
@iwienand:matrix.org | * @clarkb that's what i was hoping to catch with the mitmproxy dump, see what the on-the-wire body length is as it leaves the docker push, before zuul-registry writes it out | 21:34 |
@pearcetyler:matrix.org | Hello 👋 Is it possible to allow Zuul to run a gate pipeline even if GitHub's status checks have not passed? I'd like to make `gate` a required check in GitHub (To force devs to use the gate pipeline instead of manually merging), but when I make `gate` required, Zuul does not queue the PR because the required checks haven't passed | 21:36 |
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: | 21:37 | |
- [zuul/nodepool] Catch more potential errors in PoolWorker run loop https://review.opendev.org/c/zuul/nodepool/+/807199 | ||
- [zuul/nodepool] Deregister launchers when restarting them https://review.opendev.org/c/zuul/nodepool/+/807200 | ||
@jim:acmegating.com | Tyler Pearce: should work if you do this: https://zuul-ci.org/docs/zuul/reference/drivers/github.html#branch-protection-rules | 21:50 |
@clarkb:matrix.org | > <@iwienand:matrix.org> @clarkb that's what i was hoping to catch with the mitmproxy dump, see what the on-the-wire body length is as it leaves the docker push, before zuul-registry writes it out | 21:55 |
Unfortunately I think it was gzipped over the wire so the byte length there isn't telling us | ||
@pearcetyler:matrix.org | corvus: I looked at that, though it suggests making `check` required. I want to make `gate` required, but doing so prevents Zuul from running on that PR. Ultimately I want to block users from manually merging (force them to use Zuul for the project gating benefits) | 21:56 |
@mordred:inaugust.com | > <@jim:acmegating.com> mordred: i think we're getting to a point where we're adding enough additional zk traffic that we're noticably slowing down the tests. in particular, we put zero-node requests into zk now, and a lot of tests use those (previously, we just no-op'd it in the scheduler, but it'd be hard to do that with multi-scheduler without some zk traffic) | 21:57 |
Nod, makes sense. So the timeout itself isn't indicative of an specific issue, but there's a deeper thing we might need to ponder at some point as the time continues to grow | ||
@iwienand:matrix.org | Clark: i think mitmproxy dump is unzipped. i'm just trying to figure out how to get it show the body length in some reasonable way; it weirdly doesn't show that in mitmdump | 22:02 |
@jim:acmegating.com | Tyler Pearce: The intent with zuul is that if zuul knows that it will report some piece of information that will satisfy the code review system's requirements to merge a change, then it should enqueue the change regardless of that value. IOW, it should do what you want. If everything is set up correctly and that's not working, then there may be a bug in the github driver. | 22:08 |
@jim:acmegating.com | ianw: why not just have zuul-registry emit the info instead of using mitm? or do you want to try to eliminate the possibility of a framework error? | 22:09 |
@jim:acmegating.com | mordred: yeah, i'm not seeing any easy answers right now, other than increase timeout and/or run tests on larger nodes. fwiw, i've actually seen fairly modest test runtime growth when running locally; i think the clouds are exacerbating the issue. | 22:12 |
@iwienand:matrix.org | @corvus yes, and also if there's any chance of getting someone who knows docker a trace that says "here's a trace: these PATCH requests send X bytes and then it sends a manifest that has X+1 bytes" stands alone | 22:13 |
@jim:acmegating.com | ianw: btw the "@user" isn't doing anything special -- you can just call me "corvus" :) | 22:14 |
@pearcetyler:matrix.org | > <@jim:acmegating.com> Tyler Pearce: The intent with zuul is that if zuul knows that it will report some piece of information that will satisfy the code review system's requirements to merge a change, then it should enqueue the change regardless of that value. IOW, it should do what you want. If everything is set up correctly and that's not working, then there may be a bug in the github driver. | 22:15 |
Thanks. It's likely I set something up incorrectly 😅 I'll investigate | ||
@iwienand:matrix.org | corvus oh, in element it makes it pop up like a tag | 22:15 |
@jim:acmegating.com | ianw: are you using element to type that? | 22:15 |
@iwienand:matrix.org | yep | 22:15 |
@jim:acmegating.com | ianw: something isn't working then; it's coming across as plain text for me in element. it looks correct when other people mention me. | 22:16 |
@clarkb:matrix.org | it is coming through as a literal @user | 22:17 |
@clarkb:matrix.org | the client should translate @user to a real user thing though | 22:17 |
@jim:acmegating.com | Clark: the sending client | 22:17 |
@clarkb:matrix.org | but you have to tab complete onto it | 22:17 |
@clarkb:matrix.org | corvus: yes the sending client, you do @corvus then tab and that produces corvus | 22:17 |
@iwienand:matrix.org | Clark: corvus ok, it seems i need to press tab | 22:17 |
@iwienand:matrix.org | heh, yes ;) | 22:18 |
@jim:acmegating.com | ianw, Clark: yes, also you don't actually need @ | 22:18 |
@pearcetyler:matrix.org | corvus: After approving the PR I was testing, the `gate` pipeline works as expected. It seems Zuul may be checking for an approval, even if the branch protection rules do not require approval. In a real workflow we should always have approvals, it just caught me off guard for my local testing :) | 22:18 |
@clarkb:matrix.org | looks like without the @ you have to hit tab twice | 22:18 |
@clarkb:matrix.org | with the @ it is just once. Same amount of typing though. An interesting UX choice | 22:19 |
@jim:acmegating.com | Clark: weird, "cla^I" gets me this. i'm using the "experimental irc" mode. but i didn't think that changed anything regarding input, only line height | 22:19 |
@jim:acmegating.com | i'm also using a slightly different element than matrix.org | 22:20 |
@jpew:matrix.org | We have to allow untrusted projects to download from an authenticated artifactory server. I was trying to set it up so that the trusted context would generate a temporary artifactory token using the secret password which the untrusted context could use, then revoke it at the end of the build. However, I'm not sure the best way to "transfer" the generated token; I tried an ansible fact, but that didn't work (ansible facts don't transfer between playbooks?).... are there any suggestions or precendence for these types of things? | 22:21 |
@jim:acmegating.com | jpew: cacheable ansible facts do | 22:22 |
@jpew:matrix.org | Ah, so if I set the `cachable` flag in `set_fact` it should be available all the time? | 22:22 |
@jim:acmegating.com | jpew: yep https://docs.ansible.com/ansible/latest/collections/ansible/builtin/set_fact_module.html#parameter-cacheable | 22:23 |
@jpew:matrix.org | Ya, I saw that but I wasn't sure if zuul had caching enabled. Thanks! | 22:23 |
@jim:acmegating.com | good point, not sure if we mention that in docs :) | 22:23 |
@jim:acmegating.com | but we do, explicitly for that purpose | 22:23 |
@jim:acmegating.com | jpew: just a quick thought: you might be able to construct a job where a trusted playbook gets the artifacts without giving the token to an untrusted playbook. (but may not be necessary if you fully trust your "untrusted" users) | 22:26 |
@clarkb:matrix.org | ianw: ok looks like my nodepool changes for the leaked launchers have landed and I've fixed the leaked changes in the check queue for openstack. I'm finally to the point where I can look at the zuul-registry off by one thing if it will help. But I figure you've got more of that paged in so let me know if I can help | 22:47 |
@clarkb:matrix.org | I'm going to try and review hashtag:sos in the maentime | 22:47 |
@iwienand:matrix.org | thanks; yeah getting a dump that shows lengths is harder than you'd think | 22:49 |
@iwienand:matrix.org | one thing that has me thinking is that Chunked Transfer-Encoding ends with a 0\r\n ... that seems suspicious | 22:49 |
@clarkb:matrix.org | corvus: mordred re: test runtimes I think we divide the number of cpus in half before letting the job run. Its possible that may be too conservative? (or maybe not and using more CPUs will create more contention and break more often) | 22:53 |
@jim:acmegating.com | Clark: i think the cpu usage is actually pretty good -- but i wouldn't mind a dstat chart to double check that :) | 22:57 |
@jim:acmegating.com | by good i mean we're not leaving any/much on the table | 22:57 |
@iwienand:matrix.org | Clark: ok, got it; if you want me to bring you up to where i'm at I can | 23:00 |
@iwienand:matrix.org | docker push is definitely putting X-1 bytes on the wire, and reporting X bytes in the manifest | 23:02 |
@clarkb:matrix.org | ianw: nice! I theorized that might be happening earlier and suggested that maybe docker hub et al are just alays writing the manifest over themselves :) | 23:06 |
@clarkb:matrix.org | corvus: I'm struggling with your response to https://review.opendev.org/c/zuul/zuul/+/806653/6/zuul/nodepool.py I think the sendNodesProvisionedEvent() event is on a different queue than the node requests events that trigger _handleNodeRequestEvent ? | 23:06 |
@clarkb:matrix.org | corvus: there are different states being checked between the two as well. Fulfilled vs completed and deleted vs failed | 23:08 |
@clarkb:matrix.org | hrm actually ZooKeeperNodepool might be doing that translation for us? Does this mean the only risk here is you need some next event to trigger to process a prior one that hit the race? | 23:09 |
@clarkb:matrix.org | corvus: I think that case is worth considering if say it is a quiet period like a weekend you might have to push up another change to get the node request for an older change to finally process if it had previously hit that race and short circuited | 23:10 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Try harder to unlock failed build requests https://review.opendev.org/c/zuul/zuul/+/807221 | 23:12 | |
@jim:acmegating.com | Clark: ^ that's from the stuck build earlier. i'll context switch to what you just wrote now. :) | 23:13 |
@jim:acmegating.com | ianw: i'd be okay expanding the registry's dockerhub compat rewriting | 23:14 |
@iwienand:matrix.org | i'm thinking zuul-registry should start validating the size to make sure it doesn't ingest bad data | 23:18 |
@jim:acmegating.com | ianw: that's clearly not bug-compatible with dockerhub :) | 23:19 |
@jim:acmegating.com | Clark: whenever an election happens, we 1) enable newly received completion events (from zk) to put node provisioned events on the scheduler's result queue, then 2) go over all existing requests and put node provisioned events on the scheduler's result queue. i think that covers all cases. | 23:20 |
@jim:acmegating.com | there's an overlap between those two things, which means we can generate duplicate events, but we shouldn't miss anything. | 23:20 |
@jim:acmegating.com | (and specifically, #2 happens after #1) | 23:22 |
@clarkb:matrix.org | corvus: but we aren't garunteed to process those events with the lock held are we? | 23:22 |
@clarkb:matrix.org | thats the issue. What happens if we are in _handleNodeRequestEvent without the election being won | 23:22 |
@clarkb:matrix.org | * corvus: but we aren't garunteed to process those events with the election won are we? | 23:23 |
@clarkb:matrix.org | oh now I see it, beacuse we do the same work once the election is won you'd only wait for the election to be won | 23:23 |
@jim:acmegating.com | then the event we're processing will be handled by whoever either has already won the election or will win the election next | 23:23 |
@clarkb:matrix.org | not some other external event | 23:23 |
@jim:acmegating.com | yep, and we're continuously trying to win the election | 23:24 |
@clarkb:matrix.org | in my head I keep thinking these events will fire based on exteranl events from either gerrit or nodepool but in this specific case the election won handler is only an interaction with zk | 23:24 |
@jim:acmegating.com | correct | 23:24 |
@clarkb:matrix.org | corvus: in https://review.opendev.org/c/zuul/zuul/+/807221/1/zuul/executor/server.py line 3890 when you say exit there you mean process exit? Since there shouldn't be any code there that will raise? | 23:31 |
@jim:acmegating.com | Clark: correct | 23:32 |
@clarkb:matrix.org | that race should really only become a problem in a multi scheduler setup then? | 23:32 |
@jim:acmegating.com | and i did check _stopped in the retry so it doesn't hang forever, so either a crash or a normal exit could trigger that race condition (if it happens along with a zk outage) | 23:32 |
@jim:acmegating.com | Clark: no, that's the executor, so if an executor lost ZK connectivity and exited between those two sections, we get a perpetually stuck build | 23:33 |
@clarkb:matrix.org | ah. | 23:33 |
@clarkb:matrix.org | I don't see where _stopped is checked | 23:33 |
@jim:acmegating.com | sorry _running | 23:33 |
@jim:acmegating.com | in _retry | 23:34 |
@clarkb:matrix.org | oh right different word for similar thing | 23:34 |
@jim:acmegating.com | (note it's checked after the first try, so we'll always try those methods at least once; that's why we'd need to have zk errors to trigger it too) | 23:34 |
@clarkb:matrix.org | Thanks for looking into that problem. +2 from me. I'm going to need to pop out for more school stuff (I hate how last minute this all feels, surprise its school time do all the things!) but I do still want to review the hashtag:sos stack more properly | 23:39 |
@clarkb:matrix.org | don't wait on me though if people are reviewing and approving that is fine too | 23:39 |
@jim:acmegating.com | no rush :) | 23:39 |
@iwienand:matrix.org | i've filed https://github.com/docker/for-linux/issues/1296 which i have a feeling is a place issues go to die | 23:42 |
@iwienand:matrix.org | mitmweb view of the dump file is pretty cool | 23:43 |
@jim:acmegating.com | ianw: in paragraph 2, first sentence, you might want to clarify "it" is "docker" that's getting things wrong; because otherwise i'd assume the anticedent is "our registry" | 23:44 |
@jim:acmegating.com | ianw: otherwise looks pretty clear :) | 23:45 |
@jim:acmegating.com | ianw: oh wait, i think i misread that sentence | 23:46 |
@iwienand:matrix.org | i reworded it anyway to be a little clearer | 23:46 |
@jim:acmegating.com | ianw: i think it was correct as originally, so i guess my only feedback is that it was easy for me to read wrong :). it's definitely harder for me to mess up now. :) | 23:47 |
@iwienand:matrix.org | what are other registries that we could look at? gitlab has one maybe? | 23:50 |
@iwienand:matrix.org | it might be useful to see if they just ignore the size sent to them | 23:51 |
@jim:acmegating.com | ianw: the open source docker registry | 23:51 |
@jim:acmegating.com | jfrog | 23:51 |
@fungicide:matrix.org | back in my c programming days i would have guessed this was the result of counting the length of an array containing a null-terminated string ;) | 23:51 |
@jim:acmegating.com | ianw: quay | 23:52 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!