opendevreview | Ian Wienand proposed zuul/zuul master: web: JobVariant: use @patternfly/react-table https://review.opendev.org/c/zuul/zuul/+/804478 | 05:53 |
---|---|---|
opendevreview | Ian Wienand proposed zuul/zuul master: web: JobVariant: visual cleanups https://review.opendev.org/c/zuul/zuul/+/804479 | 05:53 |
opendevreview | Ian Wienand proposed zuul/zuul master: web: Nodeset: convert to a treeview https://review.opendev.org/c/zuul/zuul/+/804480 | 05:53 |
opendevreview | Ian Wienand proposed zuul/zuul master: web: Add release note for Job/JobVariant changes https://review.opendev.org/c/zuul/zuul/+/804481 | 05:53 |
opendevreview | Ian Wienand proposed zuul/zuul master: web: Use PF4 sizing/spacing css https://review.opendev.org/c/zuul/zuul/+/804599 | 05:53 |
opendevreview | Ian Wienand proposed zuul/zuul master: web: JobVariant: add icons https://review.opendev.org/c/zuul/zuul/+/804600 | 05:53 |
opendevreview | Ian Wienand proposed zuul/zuul master: web: JobVariant : convert to DescriptionList https://review.opendev.org/c/zuul/zuul/+/804601 | 05:53 |
opendevreview | Ian Wienand proposed zuul/zuul master: web: JobVariant : use wide layout for DescriptionList https://review.opendev.org/c/zuul/zuul/+/804602 | 05:53 |
*** jpena|off is now known as jpena | 07:42 | |
*** sshnaidm|pto is now known as sshnaidm | 10:30 | |
*** sshnaidm is now known as sshnaidm|pto | 10:31 | |
*** jpena is now known as jpena|lunch | 11:16 | |
*** dviroel|out is now known as dviroel|ruck | 11:26 | |
*** jpena|lunch is now known as jpena | 12:16 | |
corvus | ianw, clarkb: please see my comment on https://review.opendev.org/804481 | 14:03 |
opendevreview | Benjamin Schanzel proposed zuul/zuul master: Add tenant name on NodeRequests for Nodepool https://review.opendev.org/c/zuul/zuul/+/788680 | 14:35 |
corvus | zuul-maint: https://review.opendev.org/804532 fixes a regression; we might want to merge and release that | 14:57 |
fungi | thanks, that came up | 15:12 |
fungi | last week | 15:12 |
fungi | and i wasn't initially sure if it was intended and we'd forgotten to update the docs, or was inadvertent collateral damage from the url deprecation | 15:13 |
Clark[m] | I guess the build page will render an arbitrary result and message set by a job or is that going away at some point too? | 15:15 |
corvus | Clark: yes it should; but it won't know what color to use | 15:16 |
corvus | so honestly, i think it would be better to remove it, but intentionally -- this removal was an accident | 15:16 |
pabelanger[m] | good morning, would love to get some eyes / reviews on https://review.opendev.org/c/zuul/zuul/+/804305 to fix an issue with semaphores and pipeline precedence | 15:35 |
clarkb | pabelanger[m]: that was on my list for this morning. I'm having a bit of a slow start as my office is converted back to an office from a guest room | 15:36 |
pabelanger[m] | thanks! | 15:36 |
pabelanger[m] | I'm going to try and get to unpause jobs this week | 15:37 |
clarkb | I suspect that the forced wait for the hourly deploy pipeline in opendev may be "fixed" by pabelanger[m]'s semaphore change too | 15:43 |
clarkb | though I haven't double checked the priority on those queues yet | 15:43 |
clarkb | corvus: ^ including pabelanger's change in any restart (and release) for 804532 may be a good idea? | 15:53 |
clarkb | its a small regression but I think the zk queues changed that behavior | 15:53 |
*** jpena is now known as jpena|off | 15:58 | |
*** marios is now known as marios|out | 16:01 | |
opendevreview | Merged zuul/zuul master: Restore job success/failure message https://review.opendev.org/c/zuul/zuul/+/804532 | 16:10 |
clarkb | corvus: re https://review.opendev.org/c/zuul/zuul/+/804601 I agree about the spacing there. Do you feel comfortable enough with the prior changes to land the stack up to taht point? I don't feel so strongly about the lack of the explicit separation to avoid landing it if so | 16:14 |
clarkb | I think overall they are good improvements | 16:14 |
opendevreview | Paul Belanger proposed zuul/zuul master: Set default precedence for semaphore testing https://review.opendev.org/c/zuul/zuul/+/804785 | 16:54 |
*** timburke__ is now known as timburke | 17:16 | |
corvus | clarkb, pabelanger: it semaphore precedence sounds like a good change, but i don't think it's been asserted that it's a zk-related regression? | 17:24 |
corvus | s/it// | 17:24 |
clarkb | corvus: I think it is because before gearman would've ensured that ordering on the executor side | 17:24 |
clarkb | corvus: because the exuecotrs would process the jobs in that order regardless of how zuul put them on the gearman queue? (that was likely racey though) | 17:25 |
corvus | clarkb: no, semaphore acquisition happens before enqueing | 17:25 |
clarkb | oh | 17:25 |
corvus | clarkb: zk executor queue was a like-for-like replacement, that was the design | 17:25 |
clarkb | I'm confused how the opendev deploy pipelines have chagned though | 17:25 |
corvus | pabelanger's test case fails when applied to zuul 4.0.0 | 17:25 |
clarkb | maybe it was just racey and the zk io is different enough to chagne the race order in most situation for opendev | 17:26 |
corvus | i didn't realize something had changed in opendev | 17:27 |
fungi | i'm not convinced anything has | 17:27 |
clarkb | it definitely changed | 17:27 |
fungi | what behavior were we seeing "before"? | 17:28 |
clarkb | the deploy pipeline would get the semaphore from the hourly pipeline | 17:28 |
clarkb | now the deploy pipeline must always wait for the hourly pipeline to complete | 17:28 |
clarkb | I'm not sure that one behavior is necessarily more correct than the other but the behavior changed for sure | 17:28 |
clarkb | this is why it takes so much extra time for landed chagnes to apply beacuse they are always waiting for the entirety of the houlry pipeline to complete | 17:28 |
fungi | i guess my memory is fuzzy, i don't recall that being consistent, nor am i sure i've watched it enough lately to feel it's consistent now | 17:29 |
corvus | that does sound like a race | 17:29 |
clarkb | then if you have multiple changes in the deploy pipeline what happens now is the first one waits for the hourly to finish and then starts. This often takes long enough that a new hourly is enqueued and it takes the semaphore back again before the next chagne in the deploy pipeline further delaying those changes from applying | 17:29 |
clarkb | previously the deploy pipeline would only wait for the running hourly job to finish then it would start, then the next deploy change would run, then the hourly pipeline would start where it left off | 17:30 |
fungi | but yeah, seems like maybe we've traded one undefined behavior for another, either way being explicit with priorities there would be nice | 17:30 |
clarkb | this made deploying specific cahgnes much more responsive | 17:30 |
corvus | clarkb: from a design pov, the opendev pipeline configuration is still flawed -- ideally we would not want deploy to interrupt hourly. the real issue is that hourly is slow :( | 17:31 |
corvus | it just "usually works and is okay" | 17:31 |
fungi | i do wonder now if we've designed the hourly jobs not to roll back the promoted deploy jobs | 17:32 |
clarkb | yup hourly being slow is definitely part of the problem and it seems that a big part of that is ansible is slow :/ | 17:32 |
corvus | anyway, precedence makes more sense than definition order for this, so i have no objection to the change :) | 17:32 |
clarkb | fungi: pretty sure we havne't beacuse they actually run latest system-config | 17:33 |
clarkb | fungi: so what actually happens is that the hourly jobs are running the updates for the changes that are landing. Then an hour later the chagne jobs are running and nooping | 17:33 |
corvus | clarkb: if there's a behavior change, it may be related to event results in zk -- that could have caused a race before, but now is less likely to race. | 17:33 |
fungi | running latest at time of build start would be fine. running latest at enqueue time would be my concern | 17:33 |
corvus | (i just wouldn't characterize it as a regression, that's all) | 17:33 |
clarkb | fungi: oh good point, I'm not sure which is used | 17:34 |
clarkb | corvus: gotcha | 17:34 |
corvus | fungi: we now will run latest at enqueue time | 17:34 |
corvus | and that is a change | 17:34 |
corvus | (that's a new behavior introduced by the global repo state -- the idea is every build for a queue item should get the same state, and that state is frozen at enqueue time) | 17:34 |
corvus | another entirely sensible change that unfortunately doesn't do quite what we want in order to work around the opendev-specific deployment issues :/ | 17:35 |
fungi | yeah, so periodic item is enqueued and starts a build or several, partway through the buildset a promote job is enqueued with a newer state and grabs the semaphore, deploying newer system-config on systems, finishes its buildset and then the periodic resumes its builds rolling out an older system-config state (from its original enqueue time) rolling back some deployed software/configs | 17:35 |
corvus | yep. to be correct we should really hold a semaphore for the entire buildset -- which we could do with a parent job. that would get us back to the behavior which we have now (before merging pabelanger's change) and which rightly annoys us. | 17:37 |
fungi | if we're pushing the state to be used onto the deployment bastion, then we'll end up in that situation. if the periodics are just making sure the worktree is updated at build start then that would solve it | 17:37 |
corvus | we now have a choice: correct vs expedient. (unfortunately, "slow" still applies to both) | 17:37 |
corvus | fungi: good point, i don't recall which we do | 17:38 |
clarkb | the base jobs does a workspace sync from the executor to bridge | 17:39 |
clarkb | but we could override that after the fact | 17:39 |
fungi | it's a convenient generalization in this particular case to assert that *all* deploy jobs (whether timer or change-merged triggered) should use the latest branch state at build start | 17:39 |
corvus | clarkb: regarding 804601 -- i agree that 804600 is an improvement over current state and am okay with merging up to that point. | 17:40 |
clarkb | infra_prod_run_from_master: "{{ zuul.pipeline|default('') in ['periodic', 'opendev-prod-hourly'] }}" | 17:40 |
fungi | but that's down to nuances of how opendev is handling its deployment model really | 17:40 |
clarkb | I think we use ^ in system-config to work around this problem | 17:40 |
clarkb | fungi: I think that this will end up being a common problem for anyone using zuul for CD though and is worth considering from zuul's perspective how one might address it | 17:41 |
fungi | that also means it's safe to reenqueue a deploy queue item, i guess (to answer a question i had sometime recently) | 17:41 |
clarkb | fungi: yes opendev is running into it but it is a generic CD problem when you are trying to apply many changes concurrently | 17:41 |
clarkb | fungi: no because deploy isn't in 'periodic' or 'opendev-prod-hourly' | 17:41 |
clarkb | If we added deploy to that list and basically just said always run from master then we would be fine. But now you wouldn't have changes that maybe did cleanups like remove a cron then remove teh config management for it applying rpoperly | 17:42 |
fungi | oh, got it. so we should add deploy to that list maybe? | 17:42 |
fungi | that would also mean it would be safe to make deploy use the supercedent manager | 17:43 |
corvus | clarkb: i think you missed https://review.opendev.org/804478 | 17:43 |
clarkb | fungi: yup see the tradeoff I mentioned though | 17:43 |
clarkb | fungi: basically if you want the system to apply things in order you can't rely on that. But maybe losing that functionality is the least bad option | 17:43 |
clarkb | corvus: oh ya I was going to +2 it if you were happy to land the stack up to the child point. I'll +2 now | 17:44 |
fungi | oh, actually no we can't use supercedent in deploy if we also want to use file filters | 17:44 |
clarkb | corvus: I think you can +A up to that point where spacing gets weird | 17:44 |
clarkb | thinking out loud: What if we could have one pipeline supercede another? And basically make the hourly deploy eventually consistent? It would finish its current job(s) but then get evicted if a change showed up in deploy. Then deploy would run. This risk starvation but maybe is the least confusing setup | 17:46 |
corvus | clarkb, fungi: i think the fundamental issue is that we have an hourly pipeline that takes too long | 17:46 |
corvus | it should run daily and/or be fast | 17:47 |
clarkb | corvus: yes, but I'm not sure how we fix that without replacing ansible. (related note the job that runs puppet against everything which is admittedly a shrinking list runs very quickly) | 17:47 |
corvus | this is getting off-topic for #zuul | 17:47 |
clarkb | I do think that zuul providing tools to make this less painful is a good idea. I doubt opendev is unique here. However, I don't really know what zuul can do that would be effective yet | 17:48 |
*** dviroel|ruck is now known as dviroel|out | 17:49 | |
corvus | clarkb: sure, i mean, i think zuul provides those tools. how opendev uses them is more of an #opendev thing. :) | 17:49 |
corvus | clarkb: pipelines can superecede each other; gate does check. but identifying that a queue item in a deploy pipeline should supercede one in periodic would not be trivial -- they have very different identifiers. | 17:50 |
pabelanger[m] | clarkb: for your hourly job, using mitogen will likely help a ton. I had to use that for zuul.a.c deploy job. I cannot remember how fast it improved things however | 18:53 |
clarkb | pabelanger[m]: the problem is in forking the processes which mitogen doesn't help with aiui | 18:55 |
opendevreview | Merged zuul/zuul master: web: yarn update to latest @patternfly/react-core https://review.opendev.org/c/zuul/zuul/+/804473 | 18:59 |
opendevreview | Merged zuul/zuul master: web: use spread operator in JOB_FETCH_SUCCESS reducer https://review.opendev.org/c/zuul/zuul/+/804474 | 18:59 |
opendevreview | Merged zuul/zuul master: web: Job: remove height setting for job page https://review.opendev.org/c/zuul/zuul/+/804475 | 18:59 |
opendevreview | Merged zuul/zuul master: web: Job: use PF4 tabs https://review.opendev.org/c/zuul/zuul/+/804476 | 19:03 |
opendevreview | Merged zuul/zuul master: web: JobVariant: update to @patternfly/react-icons https://review.opendev.org/c/zuul/zuul/+/804477 | 19:04 |
opendevreview | Merged zuul/zuul master: web: JobVariant: use @patternfly/react-table https://review.opendev.org/c/zuul/zuul/+/804478 | 19:04 |
opendevreview | Merged zuul/zuul master: web: JobVariant: visual cleanups https://review.opendev.org/c/zuul/zuul/+/804479 | 19:04 |
corvus | pabelanger, clarkb, tobiash: i think there's another issue to check for on https://review.opendev.org/804305 | 19:10 |
opendevreview | Merged zuul/zuul master: web: Nodeset: convert to a treeview https://review.opendev.org/c/zuul/zuul/+/804480 | 19:11 |
opendevreview | Merged zuul/zuul master: web: Use PF4 sizing/spacing css https://review.opendev.org/c/zuul/zuul/+/804599 | 19:11 |
opendevreview | Merged zuul/zuul master: web: JobVariant: add icons https://review.opendev.org/c/zuul/zuul/+/804600 | 19:11 |
clarkb | corvus: that is an interesting one beacuse it was only really handled by convention previously, but definitely something that we want to happen | 19:49 |
corvus | clarkb: yep -- the same is true for this change, which is why i think we need to make sure it's handled | 19:50 |
corvus | (to be clear, i don't know if it's an issue; i think it needs to be verified) | 19:53 |
TylerPearce[m] | Hi! I'm working on a branch called "zuul-test", but on the zuul web UI it's showing the branch as "master". Am I misunderstanding something here, or should the branch in Zuul's UI be "zuul-test"? | 20:35 |
clarkb | TylerPearce[m]: in which context is it showing the branch as master? I think we need a bit more info to properly understand | 20:36 |
fungi | TylerPearce[m]: what code review system is the change uploaded into, and what branch is it targeting? that's what zuul will show, it has nothing to do with (and won't even know about) your local topic branches | 20:36 |
* TylerPearce[m] uploaded an image: (208KiB) < https://matrix.org/_matrix/media/r0/download/matrix.org/XjycKSmDypwVgjhElJzWBjeT/image.png > | 20:39 | |
TylerPearce[m] | I'm using GitHub. I've configured Zuul to run a "check" pipeline when a PR is created, which seems to work as intended | 20:39 |
TylerPearce[m] | Then in Zuul's web UI it's showing the branch as "master" | 20:39 |
* TylerPearce[m] uploaded an image: (57KiB) < https://matrix.org/_matrix/media/r0/download/matrix.org/bZWJCxqyQyZRzOFuVBCzedFn/image.png > | 20:39 | |
TylerPearce[m] | Is this because the merge target is "master"? | 20:39 |
fungi | TylerPearce[m]: yes, zuul tests "the future" where your change has merged to its intended target branch | 20:40 |
TylerPearce[m] | Ahh okay, thank you for explaining! That makes sense | 20:40 |
fungi | this becomes much more apparent when gating, and you have multiple changes trying to merge at the same time, it needs clear test states incorporating those changes together | 20:41 |
fungi | so if it tested them all individually on separate branches then it could end up trying to merge different changes which break one another functionally | 20:41 |
TylerPearce[m] | Ah I see | 20:41 |
TylerPearce[m] | One more question if you don't mind: When inspecting a job on Zuul's web UI there's a tab called "Console". Where does this console read from (or rather, how can I write data to this console to be viewed in the UI?) | 20:44 |
fungi | the "console" view is a hierarchical rendering of the json returned by ansible | 20:44 |
fungi | each playbook as the top-level of the hierarchy, and then the individual tasks run within the scope of that playbook form the second level | 20:45 |
fungi | then within each task you get the stdout, stderr, exit code, and so on | 20:45 |
TylerPearce[m] | Ah okay, so from the quickstart guide I might expect to see the debug message "Hello world" from `testjob.yaml`, is that correct? | 20:46 |
fungi | it's basically more fine-grained and detailed than the "console log" which is essentially a timestamped and combined stream of all the stdout/stderr | 20:46 |
* TylerPearce[m] < https://matrix.org/_matrix/media/r0/download/matrix.org/HoNztJBmfaJUbmhhpaFMToRk/message.txt > | 20:47 | |
fungi | sounds right, i'd need to look back at what the tasks are in the demo jobs for the quickstart, but you should be able to find the corresponding task in the console breakdown, yes | 20:47 |
TylerPearce[m] | Hmm okay. I must have borked something on my end because none of my tasks have any console output, though Zuul seems to be running them | 20:49 |
fungi | as one of the final tasks zuul should publish the json for that to the logserver. maybe that's not happening, or the dashboard is having trouble reading the file from the logserver container | 20:51 |
TylerPearce[m] | Ah yep, I'm not using `playbooks/base/post-logs.yaml`. I imagine that's what should publish the logs | 20:52 |
fungi | does the normally the console would look something like this: https://zuul.opendev.org/t/zuul/build/514def4b7daa4650959030a8b28c9df5/console | 20:52 |
fungi | that fetches and renders the job-output.json file for the build from the log location for it | 20:53 |
fungi | (which you can also find conveniently linked in the logs section) | 20:53 |
fungi | so if there's no job-output.json, that would explain the empty console | 20:54 |
TylerPearce[m] | Ah nice, that looks much better than my blank view 😂 I didn't set up my configuration to utilize the zuul jobs repository, I'm guessing that's what I need to get `post-logs.yaml` to work | 20:54 |
fungi | and yes, post-logs.yaml is what should, during the post-run phase, copy those files to the logserver | 20:54 |
fungi | there are some standard roles for log collection and publication in zuul-jobs, which i expect post-logs.yaml relies on, yes | 20:55 |
TylerPearce[m] | Thank you very much! I'll add zuul-jobs into my config and give that a try | 20:56 |
fungi | you're welcome, sorry i couldn't be more help but i'm not at my desk at the moment | 20:57 |
TylerPearce[m] | That was all the help I need for now, I appreciate it :) | 20:57 |
TylerPearce[m] | Okay so I added zuul-jobs to my config and the job succeeded, but I'm getting an error when viewing the job | 21:11 |
TylerPearce[m] | ```Network Error (Unable to fetch URL, check your network connectivity, browser plugins, ad-blockers, or try to refresh this page) http://localhost:8000/6/6/c7667393ff325324b46027979fd28b879a529029/check/testjob/21a9ee0/zuul-manifest.json``` | 21:11 |
TylerPearce[m] | I think this is because I'm not viewing this on localhost, but instead it's on an AWS EC2 container. Where can I overwrite `localhost:8000` to point to my EC2 instance URL? | 21:11 |
clarkb | TylerPearce[m]: that is part of the base job iirc | 21:18 |
clarkb | I think there is a zuul return to set the log url | 21:18 |
TylerPearce[m] | Ah yep, I'm blind 🕶️ Thank you! | 21:30 |
TylerPearce[m] | Tweaked my security group settings and now it's all working. Thanks again to everyone who helped me :) | 21:51 |
fungi | TylerPearce[m]: thanks for trying it out! | 21:57 |
TylerPearce[m] | Does Zuul have a job variable for the current branch? (Not the target branch like https://zuul-ci.org/docs/zuul/reference/jobs.html#var-zuul.branch) | 23:26 |
clarkb | TylerPearce[m]: in zuul's world they are the same thing | 23:28 |
clarkb | TylerPearce[m]: when zuul prepares the workspace it will checkout the target branch with the changes applied to it. If you need to see your delta to the target branch in the workspace you can look at origin/target-branch as that is preserved with the original state | 23:29 |
TylerPearce[m] | I see. Is there any way to reference the branch where the changes came from? | 23:29 |
clarkb | If you need the actual branch name the PR was requested under I'm not sure. I don't use github with zuul much. YOu can check the inventory file that zuul records as taht should have all the vars | 23:29 |
TylerPearce[m] | Got it, thank you! | 23:30 |
clarkb | it might record it as part of zuul.change or zuul.change_url? | 23:30 |
clarkb | I suspect that will just give you PR numbers though | 23:30 |
TylerPearce[m] | change is the pr number, I'll check change_url | 23:30 |
clarkb | TylerPearce[m]: I suspect that https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L615-L635 and the event model would need to be updated to record that info | 23:34 |
clarkb | as that seems to be where we make a recording of the relevant bits and the source branch isn't in there | 23:35 |
clarkb | another option may be to look it up with the change url | 23:35 |
clarkb | in the job I mean rather than having zuul record it | 23:35 |
TylerPearce[m] | Awesome info, thank you! I probably don't specifically need the branch name, I think I can accomplish the task via the PR number and the dev looking at the logs can infer the branch name or click the link :) | 23:36 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!