-@gerrit:opendev.org- Francisco Seruca Salgado proposed: [zuul/zuul-jobs] 955583: Trigger Test https://review.opendev.org/c/zuul/zuul-jobs/+/955583 | 11:09 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 955545: Require multinode requests served from same provider https://review.opendev.org/c/zuul/zuul/+/955545 | 16:03 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 16:17 | |
- [zuul/zuul] 955040: Add QuotaCache class https://review.opendev.org/c/zuul/zuul/+/955040 | ||
- [zuul/zuul] 955106: Plumb zk_client through to endpoints https://review.opendev.org/c/zuul/zuul/+/955106 | ||
- [zuul/zuul] 955107: Update drivers to use QuotaCache https://review.opendev.org/c/zuul/zuul/+/955107 | ||
- [zuul/zuul] 955325: Implement zuul-launcher connection filter https://review.opendev.org/c/zuul/zuul/+/955325 | ||
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 955617: Launcher: implement preferred provider https://review.opendev.org/c/zuul/zuul/+/955617 | 17:19 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 955619: Launcher: skip non-pending uploads https://review.opendev.org/c/zuul/zuul/+/955619 | 17:35 | |
@clarkb:matrix.org | corvus: looks like the new test cases in https://review.opendev.org/c/zuul/zuul/+/955545 may still be failing? | 21:50 |
---|---|---|
@clarkb:matrix.org | it almost looks like both providers end up failing the nodescans and then we're out of quota and eventually the job times out. Maybe we're using two counters for the myadvance method so they can each go to two? | 22:22 |
@clarkb:matrix.org | also it almost looks like deleting nodes in the test suite isn't returning their quota back to the available quota | 22:23 |
@clarkb:matrix.org | corvus: we're using the NodescanRequest.node.uuid to index into the counter dict for the number of failures. Do we actually need it to use the uuid of the NodeRequest | 22:28 |
@clarkb:matrix.org | ya something like `[ns_request.node.request_id]` for the indexing? | 22:30 |
@clarkb:matrix.org | that way we fail node 0a, node1a, and node0b with the `>0` check against the same request | 22:32 |
@clarkb:matrix.org | ok posted some comments on the change after digging through the test case logs | 22:43 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 955545: Require multinode requests served from same provider https://review.opendev.org/c/zuul/zuul/+/955545 | 22:59 | |
@jim:acmegating.com | Clark: i think it's the quota; the other thing should be fine. i left a reply | 22:59 |
@clarkb:matrix.org | corvus: looking at https://35824449a46d30e681e8-d5237dd035b9f3c7532839ca52b23c88.ssl.cf5.rackcdn.com/zuul/2a997f3e85f244aaaad3a829bab588a9/testr_results.html it looks like each AWSProviderNode has a different uuid value | 23:01 |
@jim:acmegating.com | yes, so after we've failed for 3 unique nodes, we're done | 23:02 |
@jim:acmegating.com | (the advance method can get called more than once for a node) | 23:02 |
@clarkb:matrix.org | ohhh we take the len of the keys `failed_nodes = len(failed_count_by_node.keys())` | 23:02 |
@clarkb:matrix.org | In my head we were looking at the value in each key | 23:03 |
@clarkb:matrix.org | re the quota issue is decrementing the values unimplemented in the test framework or are we just not waiting long enough for the caches to determine that we have quota again after deletions? | 23:05 |
@jim:acmegating.com | likely the second | 23:05 |
@jim:acmegating.com | and it's not important for this test, so i lifted the quotas | 23:05 |
@clarkb:matrix.org | ya makes sense | 23:06 |
@clarkb:matrix.org | for the second test failure I would've expected quota failures to cause it to fail the node request though | 23:07 |
@clarkb:matrix.org | in the log it looked like it was still trying to boot nodes 30 seconds after starting which is why I thought maybe we need to reduce the total number of attempts | 23:07 |
@clarkb:matrix.org | I guess quota handling has it retrying maybe and it will fail earlier now that it won't kick back quota errors? | 23:08 |
@jim:acmegating.com | yes that's how i read it | 23:15 |
@clarkb:matrix.org | semi related the zuul gate is really interesting right now | 23:15 |
@clarkb:matrix.org | the tip is ps4 of that change then after that is ps3 for some reason? | 23:16 |
@clarkb:matrix.org | I don't think this is a big issue as zuul has correctly identified that ps3 is unmergable (says it has a merge conflict) just unexpected in the rendering ofthe state of things | 23:16 |
@jim:acmegating.com | i think 619 depends on ps3 | 23:16 |
@clarkb:matrix.org | oh I see | 23:17 |
@jim:acmegating.com | since 619 was already approved, it probably raced the update | 23:17 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 955619: Launcher: skip non-pending uploads https://review.opendev.org/c/zuul/zuul/+/955619 | 23:17 | |
@clarkb:matrix.org | putting the new patchset of the child in the gate didn't evict the ps3 parent. But also I think it should resolve itself | 23:19 |
@jim:acmegating.com | yeah, ps3 is not in the main queue | 23:22 |
@clarkb:matrix.org | looking at https://review.opendev.org/c/zuul/zuul/+/955617 that doesn't seem necessary to address the underlying issue right? Is the idea there that end users could potentially supply that info via nodeset configuration? | 23:31 |
@jim:acmegating.com | oh no. definitely not. | 23:34 |
@jim:acmegating.com | it's just brining zl up to par with nodepool | 23:35 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 955617: Launcher: implement preferred provider https://review.opendev.org/c/zuul/zuul/+/955617 | 23:35 | |
@jim:acmegating.com | that's a feature that exists; it's just not very important in opendev right now, so no one has noticed it's missing | 23:36 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-jobs] 954886: Update s3 minio tests https://review.opendev.org/c/zuul/zuul-jobs/+/954886 | 23:36 | |
@clarkb:matrix.org | oh I see its specific to paused builds | 23:36 |
@clarkb:matrix.org | and ya opendev I think is "resilient" to that as opendev can fetch container images from jobs in other clouds and very few other users within opendev use pause jobs | 23:37 |
@jim:acmegating.com | yep | 23:37 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!