Monday, 2021-08-16

opendevreview	Ian Wienand proposed zuul/zuul master: web: JobVariant: use @patternfly/react-table https://review.opendev.org/c/zuul/zuul/+/804478	05:53
opendevreview	Ian Wienand proposed zuul/zuul master: web: JobVariant: visual cleanups https://review.opendev.org/c/zuul/zuul/+/804479	05:53
opendevreview	Ian Wienand proposed zuul/zuul master: web: Nodeset: convert to a treeview https://review.opendev.org/c/zuul/zuul/+/804480	05:53
opendevreview	Ian Wienand proposed zuul/zuul master: web: Add release note for Job/JobVariant changes https://review.opendev.org/c/zuul/zuul/+/804481	05:53
opendevreview	Ian Wienand proposed zuul/zuul master: web: Use PF4 sizing/spacing css https://review.opendev.org/c/zuul/zuul/+/804599	05:53
opendevreview	Ian Wienand proposed zuul/zuul master: web: JobVariant: add icons https://review.opendev.org/c/zuul/zuul/+/804600	05:53
opendevreview	Ian Wienand proposed zuul/zuul master: web: JobVariant : convert to DescriptionList https://review.opendev.org/c/zuul/zuul/+/804601	05:53
opendevreview	Ian Wienand proposed zuul/zuul master: web: JobVariant : use wide layout for DescriptionList https://review.opendev.org/c/zuul/zuul/+/804602	05:53
*** jpena\|off is now known as jpena		07:42
*** sshnaidm\|pto is now known as sshnaidm		10:30
*** sshnaidm is now known as sshnaidm\|pto		10:31
*** jpena is now known as jpena\|lunch		11:16
*** dviroel\|out is now known as dviroel\|ruck		11:26
*** jpena\|lunch is now known as jpena		12:16
corvus	ianw, clarkb: please see my comment on https://review.opendev.org/804481	14:03
opendevreview	Benjamin Schanzel proposed zuul/zuul master: Add tenant name on NodeRequests for Nodepool https://review.opendev.org/c/zuul/zuul/+/788680	14:35
corvus	zuul-maint: https://review.opendev.org/804532 fixes a regression; we might want to merge and release that	14:57
fungi	thanks, that came up	15:12
fungi	last week	15:12
fungi	and i wasn't initially sure if it was intended and we'd forgotten to update the docs, or was inadvertent collateral damage from the url deprecation	15:13
Clark[m]	I guess the build page will render an arbitrary result and message set by a job or is that going away at some point too?	15:15
corvus	Clark: yes it should; but it won't know what color to use	15:16
corvus	so honestly, i think it would be better to remove it, but intentionally -- this removal was an accident	15:16
pabelanger[m]	good morning, would love to get some eyes / reviews on https://review.opendev.org/c/zuul/zuul/+/804305 to fix an issue with semaphores and pipeline precedence	15:35
clarkb	pabelanger[m]: that was on my list for this morning. I'm having a bit of a slow start as my office is converted back to an office from a guest room	15:36
pabelanger[m]	thanks!	15:36
pabelanger[m]	I'm going to try and get to unpause jobs this week	15:37
clarkb	I suspect that the forced wait for the hourly deploy pipeline in opendev may be "fixed" by pabelanger[m]'s semaphore change too	15:43
clarkb	though I haven't double checked the priority on those queues yet	15:43
clarkb	corvus: ^ including pabelanger's change in any restart (and release) for 804532 may be a good idea?	15:53
clarkb	its a small regression but I think the zk queues changed that behavior	15:53
*** jpena is now known as jpena\|off		15:58
*** marios is now known as marios\|out		16:01
opendevreview	Merged zuul/zuul master: Restore job success/failure message https://review.opendev.org/c/zuul/zuul/+/804532	16:10
clarkb	corvus: re https://review.opendev.org/c/zuul/zuul/+/804601 I agree about the spacing there. Do you feel comfortable enough with the prior changes to land the stack up to taht point? I don't feel so strongly about the lack of the explicit separation to avoid landing it if so	16:14
clarkb	I think overall they are good improvements	16:14
opendevreview	Paul Belanger proposed zuul/zuul master: Set default precedence for semaphore testing https://review.opendev.org/c/zuul/zuul/+/804785	16:54
*** timburke__ is now known as timburke		17:16
corvus	clarkb, pabelanger: it semaphore precedence sounds like a good change, but i don't think it's been asserted that it's a zk-related regression?	17:24
corvus	s/it//	17:24
clarkb	corvus: I think it is because before gearman would've ensured that ordering on the executor side	17:24
clarkb	corvus: because the exuecotrs would process the jobs in that order regardless of how zuul put them on the gearman queue? (that was likely racey though)	17:25
corvus	clarkb: no, semaphore acquisition happens before enqueing	17:25
clarkb	oh	17:25
corvus	clarkb: zk executor queue was a like-for-like replacement, that was the design	17:25
clarkb	I'm confused how the opendev deploy pipelines have chagned though	17:25
corvus	pabelanger's test case fails when applied to zuul 4.0.0	17:25
clarkb	maybe it was just racey and the zk io is different enough to chagne the race order in most situation for opendev	17:26
corvus	i didn't realize something had changed in opendev	17:27
fungi	i'm not convinced anything has	17:27
clarkb	it definitely changed	17:27
fungi	what behavior were we seeing "before"?	17:28
clarkb	the deploy pipeline would get the semaphore from the hourly pipeline	17:28
clarkb	now the deploy pipeline must always wait for the hourly pipeline to complete	17:28
clarkb	I'm not sure that one behavior is necessarily more correct than the other but the behavior changed for sure	17:28
clarkb	this is why it takes so much extra time for landed chagnes to apply beacuse they are always waiting for the entirety of the houlry pipeline to complete	17:28
fungi	i guess my memory is fuzzy, i don't recall that being consistent, nor am i sure i've watched it enough lately to feel it's consistent now	17:29
corvus	that does sound like a race	17:29
clarkb	then if you have multiple changes in the deploy pipeline what happens now is the first one waits for the hourly to finish and then starts. This often takes long enough that a new hourly is enqueued and it takes the semaphore back again before the next chagne in the deploy pipeline further delaying those changes from applying	17:29
clarkb	previously the deploy pipeline would only wait for the running hourly job to finish then it would start, then the next deploy change would run, then the hourly pipeline would start where it left off	17:30
fungi	but yeah, seems like maybe we've traded one undefined behavior for another, either way being explicit with priorities there would be nice	17:30
clarkb	this made deploying specific cahgnes much more responsive	17:30
corvus	clarkb: from a design pov, the opendev pipeline configuration is still flawed -- ideally we would not want deploy to interrupt hourly. the real issue is that hourly is slow :(	17:31
corvus	it just "usually works and is okay"	17:31
fungi	i do wonder now if we've designed the hourly jobs not to roll back the promoted deploy jobs	17:32
clarkb	yup hourly being slow is definitely part of the problem and it seems that a big part of that is ansible is slow :/	17:32
corvus	anyway, precedence makes more sense than definition order for this, so i have no objection to the change :)	17:32
clarkb	fungi: pretty sure we havne't beacuse they actually run latest system-config	17:33
clarkb	fungi: so what actually happens is that the hourly jobs are running the updates for the changes that are landing. Then an hour later the chagne jobs are running and nooping	17:33
corvus	clarkb: if there's a behavior change, it may be related to event results in zk -- that could have caused a race before, but now is less likely to race.	17:33
fungi	running latest at time of build start would be fine. running latest at enqueue time would be my concern	17:33
corvus	(i just wouldn't characterize it as a regression, that's all)	17:33
clarkb	fungi: oh good point, I'm not sure which is used	17:34
clarkb	corvus: gotcha	17:34
corvus	fungi: we now will run latest at enqueue time	17:34
corvus	and that is a change	17:34
corvus	(that's a new behavior introduced by the global repo state -- the idea is every build for a queue item should get the same state, and that state is frozen at enqueue time)	17:34
corvus	another entirely sensible change that unfortunately doesn't do quite what we want in order to work around the opendev-specific deployment issues :/	17:35
fungi	yeah, so periodic item is enqueued and starts a build or several, partway through the buildset a promote job is enqueued with a newer state and grabs the semaphore, deploying newer system-config on systems, finishes its buildset and then the periodic resumes its builds rolling out an older system-config state (from its original enqueue time) rolling back some deployed software/configs	17:35
corvus	yep. to be correct we should really hold a semaphore for the entire buildset -- which we could do with a parent job. that would get us back to the behavior which we have now (before merging pabelanger's change) and which rightly annoys us.	17:37
fungi	if we're pushing the state to be used onto the deployment bastion, then we'll end up in that situation. if the periodics are just making sure the worktree is updated at build start then that would solve it	17:37
corvus	we now have a choice: correct vs expedient. (unfortunately, "slow" still applies to both)	17:37
corvus	fungi: good point, i don't recall which we do	17:38
clarkb	the base jobs does a workspace sync from the executor to bridge	17:39
clarkb	but we could override that after the fact	17:39
fungi	it's a convenient generalization in this particular case to assert that all deploy jobs (whether timer or change-merged triggered) should use the latest branch state at build start	17:39
corvus	clarkb: regarding 804601 -- i agree that 804600 is an improvement over current state and am okay with merging up to that point.	17:40
clarkb	infra_prod_run_from_master: "{{ zuul.pipeline\|default('') in ['periodic', 'opendev-prod-hourly'] }}"	17:40
fungi	but that's down to nuances of how opendev is handling its deployment model really	17:40
clarkb	I think we use ^ in system-config to work around this problem	17:40
clarkb	fungi: I think that this will end up being a common problem for anyone using zuul for CD though and is worth considering from zuul's perspective how one might address it	17:41
fungi	that also means it's safe to reenqueue a deploy queue item, i guess (to answer a question i had sometime recently)	17:41
clarkb	fungi: yes opendev is running into it but it is a generic CD problem when you are trying to apply many changes concurrently	17:41
clarkb	fungi: no because deploy isn't in 'periodic' or 'opendev-prod-hourly'	17:41
clarkb	If we added deploy to that list and basically just said always run from master then we would be fine. But now you wouldn't have changes that maybe did cleanups like remove a cron then remove teh config management for it applying rpoperly	17:42
fungi	oh, got it. so we should add deploy to that list maybe?	17:42
fungi	that would also mean it would be safe to make deploy use the supercedent manager	17:43
corvus	clarkb: i think you missed https://review.opendev.org/804478	17:43
clarkb	fungi: yup see the tradeoff I mentioned though	17:43
clarkb	fungi: basically if you want the system to apply things in order you can't rely on that. But maybe losing that functionality is the least bad option	17:43
clarkb	corvus: oh ya I was going to +2 it if you were happy to land the stack up to the child point. I'll +2 now	17:44
fungi	oh, actually no we can't use supercedent in deploy if we also want to use file filters	17:44
clarkb	corvus: I think you can +A up to that point where spacing gets weird	17:44
clarkb	thinking out loud: What if we could have one pipeline supercede another? And basically make the hourly deploy eventually consistent? It would finish its current job(s) but then get evicted if a change showed up in deploy. Then deploy would run. This risk starvation but maybe is the least confusing setup	17:46
corvus	clarkb, fungi: i think the fundamental issue is that we have an hourly pipeline that takes too long	17:46
corvus	it should run daily and/or be fast	17:47
clarkb	corvus: yes, but I'm not sure how we fix that without replacing ansible. (related note the job that runs puppet against everything which is admittedly a shrinking list runs very quickly)	17:47
corvus	this is getting off-topic for #zuul	17:47
clarkb	I do think that zuul providing tools to make this less painful is a good idea. I doubt opendev is unique here. However, I don't really know what zuul can do that would be effective yet	17:48
*** dviroel\|ruck is now known as dviroel\|out		17:49
corvus	clarkb: sure, i mean, i think zuul provides those tools. how opendev uses them is more of an #opendev thing. :)	17:49
corvus	clarkb: pipelines can superecede each other; gate does check. but identifying that a queue item in a deploy pipeline should supercede one in periodic would not be trivial -- they have very different identifiers.	17:50
pabelanger[m]	clarkb: for your hourly job, using mitogen will likely help a ton. I had to use that for zuul.a.c deploy job. I cannot remember how fast it improved things however	18:53
clarkb	pabelanger[m]: the problem is in forking the processes which mitogen doesn't help with aiui	18:55
opendevreview	Merged zuul/zuul master: web: yarn update to latest @patternfly/react-core https://review.opendev.org/c/zuul/zuul/+/804473	18:59
opendevreview	Merged zuul/zuul master: web: use spread operator in JOB_FETCH_SUCCESS reducer https://review.opendev.org/c/zuul/zuul/+/804474	18:59
opendevreview	Merged zuul/zuul master: web: Job: remove height setting for job page https://review.opendev.org/c/zuul/zuul/+/804475	18:59
opendevreview	Merged zuul/zuul master: web: Job: use PF4 tabs https://review.opendev.org/c/zuul/zuul/+/804476	19:03
opendevreview	Merged zuul/zuul master: web: JobVariant: update to @patternfly/react-icons https://review.opendev.org/c/zuul/zuul/+/804477	19:04
opendevreview	Merged zuul/zuul master: web: JobVariant: use @patternfly/react-table https://review.opendev.org/c/zuul/zuul/+/804478	19:04
opendevreview	Merged zuul/zuul master: web: JobVariant: visual cleanups https://review.opendev.org/c/zuul/zuul/+/804479	19:04
corvus	pabelanger, clarkb, tobiash: i think there's another issue to check for on https://review.opendev.org/804305	19:10
opendevreview	Merged zuul/zuul master: web: Nodeset: convert to a treeview https://review.opendev.org/c/zuul/zuul/+/804480	19:11
opendevreview	Merged zuul/zuul master: web: Use PF4 sizing/spacing css https://review.opendev.org/c/zuul/zuul/+/804599	19:11
opendevreview	Merged zuul/zuul master: web: JobVariant: add icons https://review.opendev.org/c/zuul/zuul/+/804600	19:11
clarkb	corvus: that is an interesting one beacuse it was only really handled by convention previously, but definitely something that we want to happen	19:49
corvus	clarkb: yep -- the same is true for this change, which is why i think we need to make sure it's handled	19:50
corvus	(to be clear, i don't know if it's an issue; i think it needs to be verified)	19:53
TylerPearce[m]	Hi! I'm working on a branch called "zuul-test", but on the zuul web UI it's showing the branch as "master". Am I misunderstanding something here, or should the branch in Zuul's UI be "zuul-test"?	20:35
clarkb	TylerPearce[m]: in which context is it showing the branch as master? I think we need a bit more info to properly understand	20:36
fungi	TylerPearce[m]: what code review system is the change uploaded into, and what branch is it targeting? that's what zuul will show, it has nothing to do with (and won't even know about) your local topic branches	20:36
* TylerPearce[m] uploaded an image: (208KiB) < https://matrix.org/_matrix/media/r0/download/matrix.org/XjycKSmDypwVgjhElJzWBjeT/image.png >		20:39
TylerPearce[m]	I'm using GitHub. I've configured Zuul to run a "check" pipeline when a PR is created, which seems to work as intended	20:39
TylerPearce[m]	Then in Zuul's web UI it's showing the branch as "master"	20:39
* TylerPearce[m] uploaded an image: (57KiB) < https://matrix.org/_matrix/media/r0/download/matrix.org/bZWJCxqyQyZRzOFuVBCzedFn/image.png >		20:39
TylerPearce[m]	Is this because the merge target is "master"?	20:39
fungi	TylerPearce[m]: yes, zuul tests "the future" where your change has merged to its intended target branch	20:40
TylerPearce[m]	Ahh okay, thank you for explaining! That makes sense	20:40
fungi	this becomes much more apparent when gating, and you have multiple changes trying to merge at the same time, it needs clear test states incorporating those changes together	20:41
fungi	so if it tested them all individually on separate branches then it could end up trying to merge different changes which break one another functionally	20:41
TylerPearce[m]	Ah I see	20:41
TylerPearce[m]	One more question if you don't mind: When inspecting a job on Zuul's web UI there's a tab called "Console". Where does this console read from (or rather, how can I write data to this console to be viewed in the UI?)	20:44
fungi	the "console" view is a hierarchical rendering of the json returned by ansible	20:44
fungi	each playbook as the top-level of the hierarchy, and then the individual tasks run within the scope of that playbook form the second level	20:45
fungi	then within each task you get the stdout, stderr, exit code, and so on	20:45
TylerPearce[m]	Ah okay, so from the quickstart guide I might expect to see the debug message "Hello world" from `testjob.yaml`, is that correct?	20:46
fungi	it's basically more fine-grained and detailed than the "console log" which is essentially a timestamped and combined stream of all the stdout/stderr	20:46
* TylerPearce[m] < https://matrix.org/_matrix/media/r0/download/matrix.org/HoNztJBmfaJUbmhhpaFMToRk/message.txt >		20:47
fungi	sounds right, i'd need to look back at what the tasks are in the demo jobs for the quickstart, but you should be able to find the corresponding task in the console breakdown, yes	20:47
TylerPearce[m]	Hmm okay. I must have borked something on my end because none of my tasks have any console output, though Zuul seems to be running them	20:49
fungi	as one of the final tasks zuul should publish the json for that to the logserver. maybe that's not happening, or the dashboard is having trouble reading the file from the logserver container	20:51
TylerPearce[m]	Ah yep, I'm not using `playbooks/base/post-logs.yaml`. I imagine that's what should publish the logs	20:52
fungi	does the normally the console would look something like this: https://zuul.opendev.org/t/zuul/build/514def4b7daa4650959030a8b28c9df5/console	20:52
fungi	that fetches and renders the job-output.json file for the build from the log location for it	20:53
fungi	(which you can also find conveniently linked in the logs section)	20:53
fungi	so if there's no job-output.json, that would explain the empty console	20:54
TylerPearce[m]	Ah nice, that looks much better than my blank view 😂 I didn't set up my configuration to utilize the zuul jobs repository, I'm guessing that's what I need to get `post-logs.yaml` to work	20:54
fungi	and yes, post-logs.yaml is what should, during the post-run phase, copy those files to the logserver	20:54
fungi	there are some standard roles for log collection and publication in zuul-jobs, which i expect post-logs.yaml relies on, yes	20:55
TylerPearce[m]	Thank you very much! I'll add zuul-jobs into my config and give that a try	20:56
fungi	you're welcome, sorry i couldn't be more help but i'm not at my desk at the moment	20:57
TylerPearce[m]	That was all the help I need for now, I appreciate it :)	20:57
TylerPearce[m]	Okay so I added zuul-jobs to my config and the job succeeded, but I'm getting an error when viewing the job	21:11
TylerPearce[m]	```Network Error (Unable to fetch URL, check your network connectivity, browser plugins, ad-blockers, or try to refresh this page) http://localhost:8000/6/6/c7667393ff325324b46027979fd28b879a529029/check/testjob/21a9ee0/zuul-manifest.json```	21:11
TylerPearce[m]	I think this is because I'm not viewing this on localhost, but instead it's on an AWS EC2 container. Where can I overwrite `localhost:8000` to point to my EC2 instance URL?	21:11
clarkb	TylerPearce[m]: that is part of the base job iirc	21:18
clarkb	I think there is a zuul return to set the log url	21:18
TylerPearce[m]	Ah yep, I'm blind 🕶️ Thank you!	21:30
TylerPearce[m]	Tweaked my security group settings and now it's all working. Thanks again to everyone who helped me :)	21:51
fungi	TylerPearce[m]: thanks for trying it out!	21:57
TylerPearce[m]	Does Zuul have a job variable for the current branch? (Not the target branch like https://zuul-ci.org/docs/zuul/reference/jobs.html#var-zuul.branch)	23:26
clarkb	TylerPearce[m]: in zuul's world they are the same thing	23:28
clarkb	TylerPearce[m]: when zuul prepares the workspace it will checkout the target branch with the changes applied to it. If you need to see your delta to the target branch in the workspace you can look at origin/target-branch as that is preserved with the original state	23:29
TylerPearce[m]	I see. Is there any way to reference the branch where the changes came from?	23:29
clarkb	If you need the actual branch name the PR was requested under I'm not sure. I don't use github with zuul much. YOu can check the inventory file that zuul records as taht should have all the vars	23:29
TylerPearce[m]	Got it, thank you!	23:30
clarkb	it might record it as part of zuul.change or zuul.change_url?	23:30
clarkb	I suspect that will just give you PR numbers though	23:30
TylerPearce[m]	change is the pr number, I'll check change_url	23:30
clarkb	TylerPearce[m]: I suspect that https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L615-L635 and the event model would need to be updated to record that info	23:34
clarkb	as that seems to be where we make a recording of the relevant bits and the source branch isn't in there	23:35
clarkb	another option may be to look it up with the change url	23:35
clarkb	in the job I mean rather than having zuul record it	23:35
TylerPearce[m]	Awesome info, thank you! I probably don't specifically need the branch name, I think I can accomplish the task via the PR number and the dev looking at the logs can infer the branch name or click the link :)	23:36

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!