-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 804956: web: Jobs: Use TreeView for job overview page https://review.opendev.org/c/zuul/zuul/+/804956 | 00:08 | |
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 804956: web: Jobs: Use TreeView for job overview page https://review.opendev.org/c/zuul/zuul/+/804956 | 00:24 | |
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 804956: web: Jobs: Use TreeView for job overview page https://review.opendev.org/c/zuul/zuul/+/804956 | 00:41 | |
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul] 860280: Clarify extra vars are not passed with -e https://review.opendev.org/c/zuul/zuul/+/860280 | 02:42 | |
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 804956: web: Jobs: Use TreeView for job overview page https://review.opendev.org/c/zuul/zuul/+/804956 | 03:38 | |
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul-jobs] 859943: Allow overriding of Bazel installer checksum https://review.opendev.org/c/zuul/zuul-jobs/+/859943 | 05:44 | |
-@gerrit:opendev.org- Benedikt Löffler proposed: [zuul/nodepool] 860470: Cleanup local builds without .d folder https://review.opendev.org/c/zuul/nodepool/+/860470 | 14:59 | |
-@gerrit:opendev.org- Benedikt Löffler proposed: [zuul/nodepool] 860470: Cleanup local builds without .d folder https://review.opendev.org/c/zuul/nodepool/+/860470 | 15:11 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 828614: Correct exit routine in web, merger https://review.opendev.org/c/zuul/zuul/+/828614 | 16:31 | |
@jim:acmegating.com | tobiash: fungi ^ that needed a rebase | 16:31 |
---|---|---|
@jim:acmegating.com | Clark: your link from monday that brought up a queueitem and a merge now returns 2 queueitems: https://tracing.opendev.org/search?end=1664816893236000&limit=20&lookback=1h&maxDuration&minDuration&operation=Build&service=zuul&start=1664813293236000&tags=%7B%22zuul_event_id%22%3A%22163bc9a5f8744d57aa4bde6ee693746c%22%7D | 17:52 |
@jim:acmegating.com | Clark: so i suspect we were seeing an incomplete buildset at the time | 17:53 |
@jim:acmegating.com | * Clark: so i suspect we were seeing an incomplete queue item at the time | 17:53 |
@clarkb:matrix.org | But doesn't a queueitem initiate the trace? | 17:59 |
@jim:acmegating.com | Clark: yes, but the way it works is that it determines the trace id, and then it and all of the child traces report their spans with that trace id. since spans are reported to the collector at the end of the span, it's normal behavior for the deepest spans to report first and the highest level ones to report later (mind you, in most cases contemplated by otlp, i have to imagine they were expecting milliseconds between them, not hours or days). so anyway, jaeger won't be able to show us the queueitem until it's dequeued, but it may have other completed steps. | 18:05 |
@jim:acmegating.com | so i think we've learned a visual cue here: if the highest spans we see aren't our expected root level spans, then we know that operations are still in progress (or, possibly, data corruption) | 18:06 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 860487: Include tenant and pipeline in QueueItem span https://review.opendev.org/c/zuul/zuul/+/860487 | 18:07 | |
@jim:acmegating.com | Clark: ^ and i think that may be useful based on your feedback | 18:07 |
@clarkb:matrix.org | I see we're operating a LIFO ish system | 18:09 |
@jim:acmegating.com | yep | 18:09 |
@jim:acmegating.com | (and that's not our choice, that's determined by the OTEL protocol) | 18:10 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 860488: Don't trace merge jobs that we don't lock https://review.opendev.org/c/zuul/zuul/+/860488 | 18:20 | |
@jim:acmegating.com | swest: Clark tristanC ^ i consider that a discussion-starter. i'm in favor of that for reasons i explained in the commit message, but i think it's a good thing to check on our goals. | 18:21 |
@clarkb:matrix.org | oh yup that was something I wondered about for half a second when looking at tracing.opendev.org but didn't dig into it. I agree that change makes sense | 18:23 |
@jim:acmegating.com | yeah, took me a minute to figure out what was going on | 18:24 |
@westphahl:matrix.org | corvus: makes sense. adding the span for jobs where we can't lock the request was a mistake on my end | 18:26 |
@jim:acmegating.com | swest: cool, i thought that might be the case (i missed that in review too!), but technically it does give us more information, so i just wanted to double check before we reduced it :) | 18:27 |
@clarkb:matrix.org | I think if we want that info we could maybe do another nested span | 18:27 |
@clarkb:matrix.org | the outer one for all merge jobs with or without a lock and the inner for when the lock is held. Then it will be more clear in the visualization | 18:27 |
@clarkb:matrix.org | But I don't think that info is terribly useful | 18:27 |
@jim:acmegating.com | agree on both | 18:27 |
@jim:acmegating.com | we can keep that in our back pocket if we change our minds | 18:28 |
@fungicide:matrix.org | i'm perplexed by the failure on https://zuul.opendev.org/t/zuul/build/9395a118af824786ae2e91236ea8f04f | 19:36 |
@fungicide:matrix.org | seems ansible tries 30 times to get the status page and finally gives up, but by then the web container log claims the service had been up for roughly 6 minutes | 19:37 |
@fungicide:matrix.org | it's getting a connection reset by the socket, like it's not listening at all, or blocked, or connecting to the wrong socket or address maybe | 19:38 |
@fungicide:matrix.org | seems to fail the same way consistently for that change | 19:38 |
@jim:acmegating.com | fungi: hrm, since it deals with init code, it may be the change at issue. i'll take a look in a bit. | 19:41 |
@fungicide:matrix.org | yeah, that was my suspicion too for the same reason, but it's unclear to me what didn't actually initialize. the logs seem to indicate the service started | 19:45 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 828614: Correct exit routine in web, merger https://review.opendev.org/c/zuul/zuul/+/828614 | 20:25 | |
@jim:acmegating.com | fungi: i think there was a logic error in there ^ | 20:25 |
@fungicide:matrix.org | i've been up longer than usual today, so not super confident in my assessment of the difference there, but continuing an infinite loop when something is unset vs breaking out of an infinite loop when that thing is set seems logically the same to me at the moment | 21:29 |
@jim:acmegating.com | yeah. the best defense i have is that change and its antecedents have had to deal with a lot of changes around it (and maybe if/when we get it right we should probably not let it sit on the vine too long) | 21:38 |
@fungicide:matrix.org | oh! nevermind, i see it now | 21:39 |
@fungicide:matrix.org | if we only continue the loop, we'll never escape (unless the other conditional with the return happens to match) | 21:39 |
@fungicide:matrix.org | the continue was at the end of th eloop block, and therefore entirely redundant. the loop would continue regardless | 21:40 |
@fungicide:matrix.org | * the continue was at the end of the loop block, and therefore entirely redundant. the loop would continue regardless | 21:40 |
@fungicide:matrix.org | so that conditional was doing approximately nothing | 21:40 |
@fungicide:matrix.org | i'll take that as a sign i've been staring at the screen too long | 21:41 |
@jim:acmegating.com | yep, i think it was backwards | 21:51 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 860506: Include skipped builds in database and web ui https://review.opendev.org/c/zuul/zuul/+/860506 | 22:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!