@y2kenny:matrix.org | Hi, is there a way to silent Zuul on branches that it should not do anything? For some reason Zuul keeps reporting "This change depends on a change that failed to merge", scoring verified-1 and blocking a change on a branch that it was not configured for. (The branch is only configured with a submit job.) | 14:30 |
---|---|---|
@avass:vassast.org | Kenny Ho: you mean branches where zuul shouldn't run anything at all? :) | 14:32 |
@avass:vassast.org | If so I started on this to configure zuul to ignore certain branches completely: https://review.opendev.org/c/zuul/zuul/+/837559 | 14:32 |
@y2kenny:matrix.org | Albin Vass: that would also be useful in other instances (I have other situations where Zuul is commenting/scoring when there is no configuration.) For the current case, the branch is actually configured with some submit jobs and I don't know why Zuul is doing anything before submission. | 14:34 |
@y2kenny:matrix.org | the issue here is that I am getting complaint about Zuul getting into the way of other people's work and I can't explain the issue or stop it. | 14:35 |
@avass:vassast.org | Kenny Ho: There's this for silencing merge-failures: https://www.zuul-ci.org/docs/zuul/latest/config/pipeline.html#attr-pipeline.merge-conflict but I don't think it's possible to configure that for specific branches | 14:36 |
@y2kenny:matrix.org | Albin Vass: Oh that one should be useful for my other cases. But in this particular case it's kind of weird. I have only seen "This change depends on a change that failed to merge" on gate pipeline (i.e. when Zuul try to auto submit things) | 14:38 |
@y2kenny:matrix.org | There is a gate pipeline configured for the repository but there is a branch filter associated with the jobs. So I have no clue why Zuul even tries anything there. | 14:39 |
-@gerrit:opendev.org- Joshua Watt proposed: [zuul/nodepool] 839226: Do not reset quota cache timestamp when invalid https://review.opendev.org/c/zuul/nodepool/+/839226 | 14:42 | |
@avass:vassast.org | Kenny Ho: I think zuul reports something like that from merge failures when it tries to load config from a branch | 14:42 |
@avass:vassast.org | but eh, maybe not "depends on a change that failed to merge" | 14:42 |
@y2kenny:matrix.org | I think that's a good hypothesis for my other issues as well. But on the line of thought, the projects is already included in the tenant config with "include: []" | 14:44 |
@y2kenny:matrix.org | so I am not sure why Zuul would try to load any config there either | 14:45 |
@clarkb:matrix.org | Kenny Ho: you are trying to make the parent change be ignored by Zuul? but you get reports on the child change that zuul cannot merge the parent? | 14:45 |
@y2kenny:matrix.org | I am trying to make that branch ignored by Zuul entirely. Zuul shouldn't be doing anything with the parent commit or the child commit pre-submit. | 14:47 |
@y2kenny:matrix.org | but on the child commit it is reporting Verified-1 and "This change depends on a change that failed to merge". The parent commit is a merge commit but I am not sure if that means anything.) | 14:48 |
@clarkb:matrix.org | you should remove the pipeline configs for the branch | 14:48 |
@clarkb:matrix.org | then zuul should ignore it | 14:48 |
@y2kenny:matrix.org | The only other time I have seen this is on branches with a gate pipeline and Zuul try to auto submit a patch that is not properly rebased. | 14:49 |
@clarkb:matrix.org | include: [] applies at an entire repo level not per branch iirc. If you are wanting zuul to apply to some branches but not others you need to remove the configs for the specific branch | 14:49 |
@y2kenny:matrix.org | ok I should be more clear (but may be this is where my confusion is.) The include is applied to the repo in question. The repo does not contain zuul configs. The zuul config for this repo is stored in a separate repo. (Does this nullify the include: []?) | 14:51 |
@clarkb:matrix.org | The include: [] talks about where to load configs from. Not what to apply them to | 14:52 |
@clarkb:matrix.org | It says "do not load any zuul.yaml configs from this repository" | 14:52 |
@clarkb:matrix.org | but if you define configuration for that repository in other repos it can be loaded from those repos instead | 14:52 |
@clarkb:matrix.org | you should remove that configuration if you do not want zuul to operate on the repo | 14:52 |
@y2kenny:matrix.org | um... I think I am confusing things by talking about multiple issues at once. I have repo A that is an open source project that has not adapted Zuul. It is included in the Zuul tenant with include: []. I have zuul config in repo B that define jobs and triggers. In repo B, various pipeline and triggers are defined for some branch of repo A but for some reason, Zuul is acting on branches and events that is not defined by the zuul configs in repo B. | 15:06 |
@clarkb:matrix.org | Kenny Ho: Ok that helps. Zuul should only run jobs on the branches that you have specified to run the jobs on. However, in order to make that determination I believe it does a minimal amount of processing for all events on all branches. And that includes merge checking | 15:08 |
@clarkb:matrix.org | In this case it seems merging is failing for one reason or another and it is reporting that information? | 15:09 |
@y2kenny:matrix.org | this seems like a second order to that. I have certainly seen Zuul attempting merging thing and reporting merge issue all over the place. But this seems to be happening to the child commit to the parent commit that can't merge. | 15:10 |
@y2kenny:matrix.org | I am not sure if it's the same kind of things or something unrelated | 15:11 |
@clarkb:matrix.org | I think it may be the same issue if the parent can't merge beacuse merging a child implies also merging the parent | 15:11 |
@y2kenny:matrix.org | I have seen valid "This change depends on a change that failed to merge" message in other context, but it feels weird in this context | 15:11 |
@y2kenny:matrix.org | but it could be just a mis-routed exception pointing to a confusing error message | 15:11 |
@clarkb:matrix.org | But I'd need to look at logs to confirm that. You should be able to find the event logs for the child event and see what decision were made | 15:11 |
@y2kenny:matrix.org | ok I will watch out for that | 15:12 |
@y2kenny:matrix.org | > <@clarkb:matrix.org> Kenny Ho: Ok that helps. Zuul should only run jobs on the branches that you have specified to run the jobs on. However, in order to make that determination I believe it does a minimal amount of processing for all events on all branches. And that includes merge checking | 17:26 |
Clark: when you said "minimal amount of processing for all events on all branches", what kind of processing are they? Can you point me to a particular section of the code base for me to read? | ||
@clarkb:matrix.org | Kenny Ho: I think https://opendev.org/zuul/zuul/src/branch/master/zuul/scheduler.py#L2261-L2287 is it. Notice the very end of that function is what checks if the matchers match but there is stuff done prior to that | 17:52 |
@y2kenny:matrix.org | On the issue of scheduler getting overwhelmed, I have turned on the debug log but I am not sure there are much more info I can get out of it. Without debug, I see repeated logs of "Adding change <Change > to queue <ChangeQueue... > in pipeline" (lots and lots of them.) | 18:26 |
@ecsantos:matrix.org | Hello folks, just a quick question: how can I configure pipeline.success and pipeline.failure so that Zuul comments on changes but doesn't leave a label value (e.g. Verified: +1)? My team is configuring a third-party CI but we don't want to change the Verified label on changes | 18:26 |
@y2kenny:matrix.org | with debug log on, I see additional logs like "Checking for changes needed by <Change>" | 18:26 |
@y2kenny:matrix.org | Change < ...003> needs change <...002>: Needed change is already ahead in the queue | 18:27 |
@y2kenny:matrix.org | Other logs I noticed just now... "Running Tarjan's algorithm on current dependencies..." | 18:30 |
@y2kenny:matrix.org | all of these are on a commit series that will end up being noop | 18:30 |
@clarkb:matrix.org | ecsantos: the verified vote is distinct. Compare https://opendev.org/openstack/project-config/src/branch/master/zuul.d/pipelines.yaml#L45-L57 to https://opendev.org/openstack/project-config/src/branch/master/zuul.d/pipelines.yaml#L388-L391 Note that the success and failure messages may not be a thing anymore the expectation is that you always get the normal message now? I could be wrong about that though | 18:32 |
@clarkb:matrix.org | Kenny Ho: you should get event ids on the debug logs. If you grep for the event id across the service logs you'll get a pretty good pciture of what each individual event ends up running. From that you can work backwards to see how to shut off extra activity if possible | 18:33 |
@ecsantos:matrix.org | Clark: Interesting, gonna try the connection name with the empty dicts, that looks like it'd work | 18:34 |
@y2kenny:matrix.org | Clark: I am greping for the event id but there's just so much data... almost feel like something recursive is running. i.e. let say someone pushed a patchset of commit 1->2->3->4 (with commit 1 being the oldest)... I see repeated logs of "Checking for changes needed by change 1" and then for change 2, there's the same thing for change 2 but I will also have Change 1 needs Change 1 and "Needed change is already ahead in the queue" | 18:39 |
@y2kenny:matrix.org | (but instead of a patchset of 4 commits, it's a patchset of 30 commits) | 18:40 |
@clarkb:matrix.org | yes if you push a stack it will look that way as zuul handles those events as discrete items | 18:40 |
@clarkb:matrix.org | gerrit emits 4 events for patchset created when you push a stack. It isn't a single event | 18:40 |
@y2kenny:matrix.org | 4 events per pipeline? | 18:41 |
@clarkb:matrix.org | its one event per change. ANd if you have multiple changes then multiple events. Then for each event each pipeline considers if it applies to itself | 18:42 |
@y2kenny:matrix.org | does that also multiply across scheduler if I have multiple scheduler? is the division of labour between scheduler before or after this? | 18:43 |
@y2kenny:matrix.org | between schedulers* | 18:44 |
@clarkb:matrix.org | the event can be handled by multiple schedulers. But only one scheduler will hanlde the event for a specific pipeline | 18:45 |
@clarkb:matrix.org | that means shceduler1 can process the event for pipeline check and scheduler2 can process the event for pipeline gate | 18:46 |
@y2kenny:matrix.org | um... ok... | 18:49 |
@y2kenny:matrix.org | what is the meaning of "locked pipeline"? That's another log items that I've noticed. | 18:50 |
@clarkb:matrix.org | that is what ensures multiple scheduelrs don't process a pipeline at the same time. So in my example scheduler 1 would lock the check pipeline and process it preventing scheduler2 from processing it so scheduler 2 would look at the next pipeline (gate) and lock it then process it | 18:53 |
@y2kenny:matrix.org | ok so that's probably normal | 18:54 |
@y2kenny:matrix.org | does the scheduler contact gerrit directly or does it do so via the executor? | 18:55 |
@clarkb:matrix.org | the scheduler does it directly | 18:56 |
@y2kenny:matrix.org | ok, so if there's connection issue I should see it in the scheduler log | 18:56 |
@clarkb:matrix.org | yes | 18:58 |
@clarkb:matrix.org | For github the callbacks are sent to zuul web and web adds them to zookeeper then the scheudler processes them | 18:58 |
@clarkb:matrix.org | but for gerrit it listens to the event stream over ssh directly from the scheduler | 18:58 |
@y2kenny:matrix.org | Is there a way to inspect the "ChangeQueue" of a pipeline? I am wondering what is causing all the log message "Needed change is already ahead in the queue" | 19:25 |
@y2kenny:matrix.org | This is the kind of things that is filling up my log: | 19:31 |
https://paste.openstack.org/show/bVwB9QDF3tmYx9qUNjSA/ | ||
@y2kenny:matrix.org | I am guess zuul is not getting what it needs from Gerrit in a timely fashion but I am wondering if there's something I can configure to get zuul to back off a bit | 19:32 |
@clarkb:matrix.org | > <@y2kenny:matrix.org> Is there a way to inspect the "ChangeQueue" of a pipeline? I am wondering what is causing all the log message "Needed change is already ahead in the queue" | 19:33 |
That is just zuul recording that it needs that chaneg ahead and it is already ahead so it can continue on. Otherwise you'd get log messages about enqueing the change ahead | ||
@clarkb:matrix.org | maybe noisy but nothing to be concerned about | 19:33 |
@y2kenny:matrix.org | oh ok. | 19:33 |
@y2kenny:matrix.org | I am out of idea... this feels like zuul is basically ddos'ed by someone pushing a 30 commit patchset | 19:42 |
@y2kenny:matrix.org | I guess it's not entirely out of service... some of the new jobs are showing up about 30minutes later | 19:44 |
@y2kenny:matrix.org | but then the 30-commit patchset is still ahead of the rest of the jobs waiting to be resolved into noop | 19:45 |
@y2kenny:matrix.org | oh and now all of a sudden all of the backlog in one of the pipeline disappeared... | 19:48 |
@y2kenny:matrix.org | I am very confused.... | 19:48 |
@y2kenny:matrix.org | oh... here's another question. When the scheduler talk to Gerrit, is it a git operation or is it just ssh or REST api query? | 19:56 |
@y2kenny:matrix.org | I get that streamevent is an ssh thing, but not too sure about the processing afterward | 20:00 |
@clarkb:matrix.org | it does all of the above | 20:01 |
@clarkb:matrix.org | depending on how you configure it. If you configure it with a rest token it will use that for some things that cannot be done over ssh. It will use ssh for event streams at least since those don't have an http analogue | 20:02 |
@clarkb:matrix.org | and finally the mergers will fetch from gerrit using git protocols over ssh (and maybe http if you configure it to use http isntead) | 20:02 |
@y2kenny:matrix.org | Ah... I forgot about the merger | 20:03 |
@y2kenny:matrix.org | right | 20:03 |
-@gerrit:opendev.org- Joshua Watt proposed: [zuul/nodepool] 839226: Do not reset quota cache timestamp when invalid https://review.opendev.org/c/zuul/nodepool/+/839226 | 20:04 | |
@y2kenny:matrix.org | After 2 hrs, all the backlog seems to have resolved on its own but I don't think I am any closer to preventing this from happening again... my gut feel is that I have too many merger and scheduler on an already heavily loaded Gerrit but I can't really tell for sure. If I want to configure Zuul to talk to Gerrit replica and master at the same time, is that possible with Zuul? Do I just define multiple Gerrit connection? | 20:06 |
@y2kenny:matrix.org | connections* | 20:06 |
@y2kenny:matrix.org | I am not sure how different events are coordinated though. I think Jenkins can handle replication-complete event but I don't recall coming across similar thing in Zuul | 20:07 |
@clarkb:matrix.org | multiple gerrit connections represent multiple logical gerrit installations | 20:09 |
@clarkb:matrix.org | I think zuul may currently expect you to address that with load balancing that is transparent to zuul | 20:09 |
@clarkb:matrix.org | it is possible that expectation is flawed | 20:09 |
@y2kenny:matrix.org | ok.... | 20:09 |
@y2kenny:matrix.org | I think it's a reasonable expectation but unfortunately in my current context, the gerrit deployment is outside of my control | 20:10 |
@y2kenny:matrix.org | I also have a strong feeling that a lot of the issue I am seeing is due to gerrit connection backing up but there's not much I can do about it | 20:11 |
@y2kenny:matrix.org | another question... are there any 'smartness' in merger usage? For example, let say I have 5 mergers, one of the mergers just finished processing an event from the linux repo | 20:13 |
@y2kenny:matrix.org | is zuul smart enough to go back to that merger for next event from linux repo? | 20:14 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 20:16 | |
- [zuul/zuul] 804177: Add include- and exclude-branches tenant config options https://review.opendev.org/c/zuul/zuul/+/804177 | ||
- [zuul/zuul] 841336: Add always-dynamic-branches option https://review.opendev.org/c/zuul/zuul/+/841336 | ||
@y2kenny:matrix.org | alternatively, if there is a burst 10 events, would the scheduler immediately try to spread the workload across all 5 mergers or would it try to process most of the events in the merger with the "warm cache"? | 20:17 |
@avass:vassast.org | corvus: nice :) | 20:18 |
@avass:vassast.org | corvus: small comment on that patch | 20:20 |
@jim:acmegating.com | Albin Vass: thanks | 20:21 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 20:22 | |
- [zuul/zuul] 804177: Add include- and exclude-branches tenant config options https://review.opendev.org/c/zuul/zuul/+/804177 | ||
- [zuul/zuul] 841336: Add always-dynamic-branches option https://review.opendev.org/c/zuul/zuul/+/841336 | ||
@y2kenny:matrix.org | Just to balance out my negativity a bit... here's a positive observation. I just have a patchset (single commit) that spawn 176 jobs and Zuul is handling it like a champ. (Jobs are completing as nodes are available, etc., ,etc.) | 22:02 |
@clarkb:matrix.org | wow I think our large job count is like 20 something | 23:40 |
@jim:acmegating.com | i've worked with folks with job counts in the hundreds. | 23:46 |
@jim:acmegating.com | * i've worked with folks with single-item job counts in the hundreds. | 23:47 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!