Tuesday, 2018-07-03

*** threestrands_ has joined #zuul		00:07
*** threestrands_ has quit IRC		00:07
*** threestrands_ has joined #zuul		00:07
*** threestrands_ has quit IRC		00:08
*** threestrands_ has joined #zuul		00:09
*** threestrands_ has quit IRC		00:09
*** threestrands_ has joined #zuul		00:09
*** threestrands has quit IRC		00:10
*** threestrands_ has quit IRC		00:10
*** threestrands_ has joined #zuul		00:10
*** threestrands_ has quit IRC		00:11
*** threestrands_ has joined #zuul		00:12
*** threestrands_ has quit IRC		00:13
*** threestrands_ has joined #zuul		00:13
*** threestrands_ has quit IRC		00:13
*** threestrands_ has joined #zuul		00:13
*** threestrands_ has quit IRC		00:14
*** threestrands_ has joined #zuul		00:15
*** threestrands_ has quit IRC		00:15
*** threestrands_ has joined #zuul		00:15
*** threestrands_ has quit IRC		00:16
*** threestrands_ has joined #zuul		00:16
tristanC	corvus: i worry fixing get request on '/zuul/', or even '/' for multi-tenant is going to be tricky. It seems like we need to change the routing strategy by calling a speculative 'api/info' endpoint to be able to check if it's a white-label tenant or not	00:53
tristanC	it's either that (an extra http call), or we document that multi-tenant deployment needs to redirect /zuul/ to /zuul/t/tenants.html	00:54
tristanC	i would prefer the later, so doing a 3.1.1 release with 579418	00:54
tristanC	i mean, i already switch back to master tracking in order to get the min_hdd_avail sensor and other fixes... so i don't mind waiting longer for a release, but either ways that's not ideal	00:57
tristanC	another strategy would be to drop http path <-> component relationship and switch to what storyboard is doing, e.g.: redirect everythin to a generic index.html, and manage navigation using the "#!" anchor	00:58
tristanC	well there is another option, we could make entrypoint redirect to t/tenants.html by default, and white-label setup can be configured to redirect to status.html..	01:14
tristanC	let me start a zuul-discuss thread	01:14
*** hwoarang has quit IRC		02:10
*** bhavik1 has joined #zuul		04:04
*** threestrands_ has quit IRC		04:25
*** bhavik1 has quit IRC		05:11
*** hwoarang has joined #zuul		05:14
*** Rohaan has joined #zuul		05:23
*** hwoarang has quit IRC		06:23
*** hwoarang has joined #zuul		06:23
*** hwoarang has quit IRC		06:23
*** hwoarang has joined #zuul		06:23
tobiash	corvus: yesterday we had a case that zuul tried to execute a post playbook that was added with a PR but not merged in a trusted base job and other jobs tried to run that and failed then because they didn't find the playbook	06:24
tobiash	I have no clue how this could happen and the logs don't seem to help me either	06:24
*** nchakrab has joined #zuul		06:24
tobiash	could it be that we still have a subtile bug in the caching and the PR modified the master-version of the base job?	06:25
*** gtema has joined #zuul		06:33
*** hashar has joined #zuul		06:39
*** Rohaan has quit IRC		06:48
*** openstackgerrit has quit IRC		06:49
*** gtema has quit IRC		06:59
*** gtema has joined #zuul		07:00
*** openstackgerrit has joined #zuul		07:28
openstackgerrit	Merged openstack-infra/nodepool master: launcher: add pool quota debug and log information https://review.openstack.org/579048	07:28
*** jpena\|off is now known as jpena		07:52
tobiash	corvus: according to the logs we have at least one case where a non-merged base job description got into the active layout and that even in a different tenant	08:00
*** tobiash has quit IRC		08:14
*** tobiash has joined #zuul		08:16
*** electrofelix has joined #zuul		08:26
*** Rohaan has joined #zuul		08:34
tobiash	corvus: I can think of two ways how this could happen. Either a bug in the shared config caching or maybe some side effect of concurrently running cat and execute jobs on the exeutor.	09:01
tobiash	corvus: at least right before that happened I see a cat job for that repo in the scheduler log	09:01
tristanC	on a similar topic, we also add a weird issue where a job was running twice on the same nodeset, perhaps a bug in ansible retry code that leaked the first thread or someting...	09:48
tobiash	tristanC: was that github or gerrit?	09:50
tobiash	Ah nodeset not pipeline?	09:50
tristanC	it was with a gerrit change, but i don't think it was related	09:51
tristanC	tobiash: yes, a single executor picked the job, and it failed with odds anomalies, in journald there was clearly a parallel execution of the same run playbook	09:51
*** hwoarang has quit IRC		09:53
*** hwoarang has joined #zuul		09:54
*** hwoarang has quit IRC		09:54
*** hwoarang has joined #zuul		09:54
*** hwoarang has quit IRC		09:56
tristanC	nodeset message file was: https://softwarefactory-project.io/logs/87/12787/1/gate/sf-ci-functional-minimal/6330df5/logs/managesf.sfdomain.com/var/log/messages , each ansible logs are repeated twice with a 5 seconds delay	09:57
*** hwoarang has joined #zuul		09:57
*** hwoarang has quit IRC		09:57
*** hwoarang has joined #zuul		09:57
*** jpena is now known as jpena\|lunch		11:29
*** elyezer has quit IRC		12:07
*** jpena\|lunch is now known as jpena		12:28
*** rlandy has joined #zuul		12:30
*** elyezer has joined #zuul		12:39
*** Rohaan has quit IRC		12:53
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Fix zuul startup with inexisting project template and gate https://review.openstack.org/579859	12:55
*** nchakrab_ has joined #zuul		13:27
*** nchakrab_ has quit IRC		13:27
*** nchakrab_ has joined #zuul		13:28
*** nchakrab has quit IRC		13:31
*** nchakrab has joined #zuul		13:32
*** nchakrab_ has quit IRC		13:32
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Fix zuul startup with inexisting project template and gate https://review.openstack.org/579859	13:39
*** nchakrab has quit IRC		13:45
*** nchakrab has joined #zuul		13:45
*** elyezer has quit IRC		13:47
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Fix zuul startup with inexisting project template and gate https://review.openstack.org/579859	13:57
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Tolerate missing project https://review.openstack.org/579872	13:57
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: DNM: test persistent-firewall job https://review.openstack.org/579874	14:00
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: DNM: test persistent-firewall job https://review.openstack.org/579874	14:01
*** jiapei has joined #zuul		14:01
corvus	tristanC: why is the /zuul/ url an issue with the angular change?	14:14
mordred	corvus: it's related to the routing stuff	14:17
*** ianychoi has quit IRC		14:18
corvus	mordred: ack	14:19
mordred	tristanC: I actually have been planning on adding the GET /info call you mentioned and just haven't quite gotten to it yet	14:19
*** ianychoi has joined #zuul		14:19
mordred	tristanC: because of where the routing table is declared, I've gotten myself stuck thinking about the 'right' way to accomplish that	14:19
corvus	tobiash: there's a lock in the executor so it shouldn't run a cat job (or any merger job) at the same time it's cloning out of the cache for a job	14:19
mordred	tristanC: because I want the call to be defined in the ZuulService class, but we don't have access to the injectable instance of that class (I don't think) in the right place to be able to call it to get the answer for that routing table	14:21
mordred	but that's one of the main reasons for adding that /info call - so that the dashboard could make a call against it and not have to guess so many things based on the urls and whatnot	14:22
mordred	corvus: oh - also - there's a thing in the /info call that either never worked properly or was broken by the cherrypy patches	14:23
mordred	corvus: if you look at http://zuul.openstack.org/api/info - you'll see that capabilities.job_history is false - even though we have the sql driver enabled and thus should have job history support	14:24
Shrews	/c 2	14:24
Shrews	doh	14:24
mordred	corvus: I think the original _intent_ was that something in the driver would set the flag to true	14:24
mordred	corvus: I don't know if that was working properly in the old version of the code - but it certainly isn't being set by anything now - and on a brief look I wasn't 100% sure the best way to fix	14:25
corvus	mordred: yep, i broke it	14:26
corvus	it was being set as a side effect of getting the sql handler	14:27
mordred	corvus: cool!	14:28
corvus	mordred: job_history was global, but should be tenant-specific	14:28
mordred	oh - hrm. that's very interesting	14:29
mordred	corvus: I suppose the most correct version would be based on whether or not the tenant has any pipelines that are configured to report to the sql driver	14:30
mordred	corvus: although - just _having_ the sql driver enabled globally will make the sql queries in the rest api work - even though a given tenant might not have any data if it's not reporting to the db	14:31
corvus	mordred: right. that's how the /builds api call works now. it always exists, but if you call it and there's no pipeline configured with a sql reporter, then it raises an exception (so it should return 500)	14:32
corvus	(no pipeline configured with a sql reporter in that tenant)	14:32
mordred	yah	14:32
corvus	we could have it return [] in that case. or, we could only add job_history to the tenant info endpoint.	14:34
mordred	I think the questionfrom the dashboard pov would be whether or not we want to show the builds link in the navbar if the tenant doesn't have history	14:35
mordred	because we can certainly have the builds page just have no data - or a 'this tenant doesn't have history' message	14:36
corvus	yeah, for that, i think we need it to hit the tenant info api endpoint. does it do that if you're non-whitelabel? or does it only ever hit one info endpoint?	14:36
mordred	right now it hits no info endpoints	14:36
mordred	but we need to get it to - so we can define the right thing for it to do	14:36
corvus	ack	14:37
*** hashar is now known as hasharAway		14:37
corvus	tobiash: have you seen any config contamination issues within the same tenant?	14:37
corvus	(i should say, "within a single tenant")	14:38
tobiash	corvus: I don't think it's related to any tenant, all tenants were contaminated	14:56
corvus	tobiash: sorry, my question was trying to get at whether a multi-tenant environment is required to trigger the bug	14:57
*** nchakrab has quit IRC		14:57
corvus	i asked it poorly	14:57
tobiash	corvus: I still have no clue how this happens but I don't think a multi-tenant env is necessary	14:57
corvus	tobiash: if you have any logs you can share, maybe i can help narrow down hypotheses	14:59
tobiash	I wasn't able to create a test case to reproduce so far and I couldn't spot an issue by looking at cache handling and executor locks	14:59
tobiash	corvus: unfortunately I don't have the executor logs of that timeframe but I saw a cat job to that repo before the error started to happen	15:01
tobiash	it broke the base job and a manually triggered reconfiguration fixed it	15:01
tobiash	one of the broken jobs contain this:	15:02
tobiash	2018-07-02 15:48:12,789 DEBUG zuul.layout: Variant <Job base-cilib branches: None source: codecraft/zuul-conf-global/zuul.d/jobs.yaml@master#123> matched <Change 0x7f1d52094c50 16,b64a68d2622968dc9947a2002aade1101aa41931>	15:02
tobiash	where the source line indicates that it matched the version of the PR and not master	15:03
tobiash	(the pr version of that repo)	15:12
*** weshay\|ruck is now known as weshay		15:29
*** sshnaidm is now known as sshnaidm\|rover		15:29
corvus	tobiash: we got a report last week that the source contexts of our base jobs are wrong. eg: http://logs.openstack.org/59/579859/3/check/tox-py35/7108831/zuul-info/inventory.yaml	15:34
corvus	tobiash: 'base' is not defined in zuul-jobs.yaml, it's in jobs.yaml	15:35
corvus	tobiash: but so far, i haven't seen an indication that the jobs themselves are affected. this may be related, or it may be that you've seen two separate problems.	15:36
corvus	oh, i guess it's job.start_mark, not the source context that gives us those line numbers	15:42
corvus	er, no it's both. the line number comes from the start_mark, but the rest comes from the source_context	15:46
tobiash	corvus: oh that's an interesting information	15:56
tobiash	corvus: because at least in our case I'm pretty sure that the source context was correct (in terms of line number) but the config itself wasn't	15:57
tobiash	corvus: that might indicate these issues are the same	15:58
corvus	tobiash: it looks like the errors we're seeing are that the line number is correct (start_mark) but the filename is wrong (source_context)	15:59
openstackgerrit	Logan V proposed openstack-infra/zuul-jobs master: bindep: Ensure virtualenv is present https://review.openstack.org/579906	16:00
tobiash	oh, ok, that seems different	16:00
tobiash	corvus: it would be interesting if your source context changes after a full reconfiguration	16:01
corvus	tobiash: in both cases, we're observing correct line numbers. then in my case, i'm observing an incorrect filename and you're observing incorrect content. right?	16:02
corvus	i'm looking in our logs for instances of the correct filenames, and i see some, but far fewer than the incorrect ones. the one i'm looking at right now appears to be for a project-config change which encountered a config error but did not report it because it didn't think the source context matched: https://review.openstack.org/579690	16:02
tobiash	yes	16:02
corvus	i'm concerned that our config-errors api endpoint is not returning data	16:09
corvus	that is to say, the request has not returned after several minutes	16:10
tobiash	the config-errors api doesn't return?	16:13
tobiash	corvus: is that a new api that needs a scheduler restart?	16:19
*** yolanda has joined #zuul		16:24
*** elyezer has joined #zuul		17:11
Shrews	corvus: ah ha. i see the issue from yesterday. the shade api is being used improperly. tl;dr we should not be trying to send it a dict and just call get_image_by_id() instead	17:14
Shrews	corvus: within shade itself, if you send get_image() an object that has an 'id' attribute, it assumes that object is already the thing you want	17:15
Shrews	corvus: we send it a dict() assuming it will do the lookup by id. it only does that if the thing you send it "looks like" a uuid	17:17
Shrews	so i think my original fix may do what we intend. validating...	17:17
Shrews	corvus: confirmed, but we must have 'use_direct_get: true' set in our clouds.yaml.	17:25
Shrews	otherwise it does a list search as normal	17:25
mordred	Shrews: yes - and we don't want use_direct_get: true in our clouds.yaml because batching	17:29
Shrews	and we do not set that clouds option. might be better to just change nodepool to call get_image_by_id directly	17:29
Shrews	mordred: so much batching	17:30
mordred	Shrews: fwiw, I would expect get_image to work with a dict with an id key - I feel like I fixed that as a bug at some point recently	17:30
Shrews	mordred: it does not	17:30
Shrews	mordred: hasattr does not work on a plain dict	17:30
mordred	like, get_image(dict(id='asdf')) should return the dict you pass in without making any remote calls	17:30
mordred	so that's a bug	17:30
Shrews	mordred: i don't think we want that though	17:30
mordred	what are we looking for? I may be missing context	17:31
Shrews	what good is that dict() to the return code?	17:31
mordred	depends on what we're trying to do?	17:31
Shrews	mordred: that code is an "optimization" saying "you've already sent me the thing you're looking for"	17:31
mordred	yah	17:31
mordred	that is true	17:32
Shrews	if we send it a full Image() object, great	17:32
Shrews	if we send it an empty dict(), we don't have the full object	17:32
Shrews	so returning it back is useless	17:32
mordred	well - it depends on where it is in the flow - but I agree with you that if whatyou are looking for is the full object and you have an id that returning a dict with only the id is typically useless	17:33
Shrews	mordred: i feel like that's only useful within shade itself. for users of shade, they need use_direct_get	17:33
*** electrofelix has quit IRC		17:34
mordred	it's useful in other place s- create_server('foo', image=dict(id='asdf')) is useful if you know you have an id and you don't want the create_server call to do a lookup for you to find the image id to pass to nova	17:35
*** jpena is now known as jpena\|off		17:35
mordred	but I'm not sure what use_direct_get has to do with it? use_direct_get just controls GET /images \| local_filter vs GET /images/{id} ... but I'm totally jumping in half-way and am not sure the problem you're working on solving?	17:36
* mordred is likely being unhelpful		17:36
Shrews	mordred: actually, for the nodepool use, it would work as you describe if we did just return the dict()... it only wants the image id	17:37
mordred	Shrews: ah - https://review.openstack.org/#/c/579664 right?	17:37
Shrews	mordred: yeah	17:37
mordred	yah. the external property is returning something suitable for passing to create_server without an extra lookup	17:38
mordred	well - it would be if the dict thing was working	17:38
Shrews	yeah	17:38
mordred	you could work around it in nodepool for the moment by returning a munch instead of a dict from that method	17:38
mordred	but I do think we should fix the code in shade/sdk	17:39
Shrews	mordred: _get_entity() would be the thing to fix in shade. but yeah, we can just pass it an object instead of dict in nodepool	17:39
mordred	yup to both	17:40
Shrews	no need to get munch involved as a dependency	17:40
mordred	Shrews: something like this:	17:40
mordred	if (hasattr(name_or_id, 'id')	17:41
mordred	or (isinstance(name_or_id, dict) and 'id' in name_or_id)):	17:41
mordred	right?	17:41
Shrews	mordred: exactly	17:41
Shrews	i might just have nodepool call get_image_by_id() directly. this whole passing a dict() vs. str stuff is a wonky API that causes confusion	17:43
mordred	Shrews: just do it at the top and cache the result?	17:45
Shrews	oh don't even need that call, actually	17:46
*** gtema has quit IRC		17:48
mordred	Shrews: well, you do if caching isn't turned on - if you pass that image-id into create_server's image argument without wrapping it in a dict you lose the fact that it's an id and create_server will helpfully do a roundtrip it doens't need	17:49
Shrews	bah	17:49
mordred	yah	17:50
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Tolerate missing project https://review.openstack.org/579872	17:50
Shrews	mordred: i mean, i was going to change that too	17:50
Shrews	basically steal your _is_uuid_like() code from shade	17:51
mordred	Shrews: well ... you can't quite do that though	17:51
mordred	Shrews: glance does not enforce that image ids are uuid iirc	17:52
Shrews	mordred: so shade would be similarly broken	17:52
mordred	I think your idea of doing a get_image_by_id call is a good one - as long as you do it at a $time when you can cache the results	17:53
Shrews	mordred: that's not a thing we need to do though. we just need the id (which we already have)	17:54
Shrews	we're just using shade as a way to say "yep, you have an id already"	17:54
Shrews	we could do that ourselves	17:54
mordred	right - but you don't need is_uuid_like for that	17:54
Shrews	true	17:55
mordred	we know, because the config setting is 'image-id'	17:55
Shrews	we can get that based... yah	17:55
mordred	Shrews: do you want to stab me yet?	17:55
Shrews	mordred: nope, i got to that same place, just microseconds behind you	17:56
mordred	Shrews: darn. I'll try harder next time	17:56
corvus	tobiash: yep. we somehow restarted between the change adding support for loading with config errors and the change which exposed them through the api. which means we're blind. :(	18:05
tobiash	when is your next scheduler restart planned?	18:10
corvus	tobiash: i'll do it in about 2 hours since i think i need that in order to continue looking at this bug (or bugs)	18:10
*** yolanda_ has joined #zuul		18:18
tobiash	corvus: what do you think about limiting the history of the repos in merger and executors?	18:19
corvus	tobiash: we should remove the zuul refs and let git gc take care of it	18:20
tobiash	one of my users wants to add a repo that has 2.5 million commits and takes 3gb and 25minutes to clone (most time in resolving deltas)	18:20
corvus	oh, like shallow clones?	18:20
tobiash	yes	18:20
*** yolanda has quit IRC		18:20
tobiash	currently the merger has a hard coded limit of 5 minutes for clone operations	18:21
tobiash	that repo cannot be handled currently with zuul	18:21
tobiash	and tbh I don't want to raise that timeout to 30 minutes	18:21
corvus	tobiash: it's part of the design that the repo that appears in the job is what you would get if you clone. without the full repo on the executor, that won't happen.	18:21
corvus	tobiash: this will take some thought. i have to run now though.	18:22
tobiash	ok	18:22
*** yolanda__ has joined #zuul		18:27
*** yolanda_ has quit IRC		18:30
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Add test for referencing cloud image by ID https://review.openstack.org/579702	19:33
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Fix for referencing cloud image by ID https://review.openstack.org/579664	19:33
Shrews	corvus: mordred: verified 579702 using my personal vexxhost account so appears to fix the problem	19:43
corvus	Shrews: that's the one that just adds the test?	19:44
Shrews	corvus: oh, the other one	19:44
mordred	Shrews: it fixed it with current shade?	19:45
Shrews	mordred: yes	19:45
Shrews	mordred: i can put up the shade/sdk fix next if you haven't already	19:46
mordred	Shrews: I think I :q!-ed it ... so if you've got it, awesome :)	19:46
corvus	Shrews: so the createServer part worked, but not getImage?	19:47
Shrews	corvus: right	19:48
corvus	okay i think that all makes sense to me now :)	19:49
Shrews	corvus: the labelReady change fixes the problem reported in storyboard. the handler changes just copy the pattern	19:49
Shrews	now, who can i send my vexxhost bill to?? ;)	19:51
*** AJaeger has joined #zuul		20:00
*** elyezer has quit IRC		20:02
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP add repl https://review.openstack.org/579962	20:08
mordred	Shrews: I always just bug mnaser ^^	20:08
mnaser	hi	20:09
mnaser	everytime i see a highlight in #zuul i assume javascript	20:09
mordred	"vexxhost: used to diagnose, test and fix all shade/openstacksdk issues"	20:09
mordred	mnaser: hehe	20:09
Shrews	mordred: mnaser: it's all good. probably only cost me a few cents	20:12
*** yolanda_ has joined #zuul		20:14
mnaser	i always worry that y'all test against our stuff	20:14
mnaser	and we have something broken somehow	20:14
mnaser	haha	20:14
Shrews	mnaser: we test against yours b/c it works the best	20:15
mnaser	woo	20:15
Shrews	i mean, who wants to find MORE bugs while fixing bugs?	20:15
mnaser	i have 11 hours of planes tomorrow so maybe dashboard hacking	20:16
*** yolanda has joined #zuul		20:16
*** yolanda__ has quit IRC		20:18
*** yolanda_ has quit IRC		20:18
*** yolanda_ has joined #zuul		20:19
*** yolanda has quit IRC		20:22
pabelanger	finger urls :)	20:23
*** yolanda__ has joined #zuul		20:24
mnaser	oooh	20:24
mnaser	that might be a fun one	20:24
*** yolanda_ has quit IRC		20:24
corvus	tobiash: okay, restarted and confirmed our broken config is running; that explains some of the behavior i was seeing earlier, and confirms that there's a bug somewhere that caused us not to report on a breaking config error. i strongly suspect it's related to the source_context not being correct (since that would cause zuul to suppress a config error report)	20:26
corvus	i will try to confirm that next, and continue to follow leads to try to find a root cause for that, also with the hope that there's a suggestion as to how that could run the wrong content	20:27
corvus	since the restart, only the wrong path has shown up in the logs. never the correct path. i find that interesting and surprising. but also, it gives me hope. :)	20:27
mordred	mnaser: also - check out the mailing list message from tristanC and my response to it ... there's a "fun" task related to setting up angular routing differently based on the results of api/info	20:28
mordred	mnaser: I'm going to take a stab at it - but if you got bored and did it instead I wouldn't complain :)	20:28
mordred	although also pabelanger's finger urls would be pretty awesome	20:28
tobiash	corvus: that's really surprising. I thought it would be correct after the restart	20:29
corvus	me too. but maybe it means it's more reproducible in our env	20:30
tobiash	corvus: is that repl thing somrthing you want to build into zuul permanently or is it just for debugging now	20:31
corvus	tobiash: if we're going to merge it, i need to check with the author of some of that code about licensing.	20:33
tobiash	that would be an interesting an powerful tool for debugging	20:33
corvus	it is. it's also very dangerous and should be disabled by default	20:33
corvus	(like, telnet in and ask zuul for decrypted secrets)	20:33
tobiash	Agreed :)	20:33
corvus	okay i went ahead and fired off an email about licensing	20:38
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: DNM: Break Zuul's config https://review.openstack.org/579986	21:00
corvus	tobiash: well, i have found one problem, which is how we ended up breaking our config. it's because the load-with-broken-config system allows us to remove a job that's still in use in another project.	21:11
corvus	fbo: ^	21:11
corvus	tobiash: unfortunately, that suggests that it's not related to the source_context being wrong. and even less to the issue you saw of running with the wrong content.	21:12
corvus	fbo: all of openstack's projects are gated by zuul, but we still broke zuul's config by removing a job which was still in use in another repo.	21:13
corvus	fbo: that's because if you remove a job, and it's not in use in the repo in which it's defined, then the only error messages are the ones from the other repos which use it. since they aren't the current repo of the change, they are suppressed, so we don't report them to the user. even though they really are caused by the current change.	21:14
corvus	i wonder if we keyed all the errors by the source_context+start_mark (project+branch+file+line) and said if any of the new configuration error keys don't appear in the current configuration error keys, we report the error.	21:19
corvus	that would probably uniquely identify errors enough to avoid most cases like this.	21:19
corvus	oh, it also looks like if zuul accumulates too many errors, it won't report on any more errors	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Split test_dynamic_conf_on_broken_config https://review.openstack.org/579996	22:20
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Report config errors when removing cross-repo jobs https://review.openstack.org/579997	22:20
corvus	fbo: ^	22:20
*** dtruong has quit IRC		22:30
gundalow	Hi all. Myself and mattclay have been thinking more about softwarefactory-zuul for ansible/ansible. We are considering creating gh/ansible/zuul-config which will be very generic, like https://github.com/ansible-network/zuul-config And rather than having gh/ansible/zuul-jobs, we are considering putting all of that in gh/ansible/ansible so everything is versions together. In the past we've faced issues with other CI frameworks where certain things	22:56
gundalow	have been versioned independently to others.	22:56
clarkb	gundalow: probably one thing to keep in mind when figuring out if things should be split or not is the limitations placed on trusted config projects	22:57
clarkb	gundalow: trusted config projects cannot be tested pre merge and if you end up needing to make gh/ansible/ansible trusted then you'd not have self testing config changes for the subset that could be self tested	22:57
gundalow	interesting. https://softwarefactory-project.io/r/gitweb?p=config.git;a=blob;f=zuul/ansible_networking.yaml;h=fc7e43ceb305453f5ec7cb3b2f95c9cf5c4f8682;hb=HEAD#l7 contains the config I'm using to test gh/ansible-network, according to that only `ansible-network/zuul-config` is trusted (which defines very little), so I think that's OK	22:58
gundalow	What may cause me to want to make gh/ansible/ansible trusted?	22:59
clarkb	if you've got gh/ansible/zuul-config for that then possibly nothing	23:00
gundalow	woot	23:00
gundalow	Thanks. I'll create a PR and see how it goes	23:00
clarkb	corvus: for identifying errors, the downside to the method used in your change above is that you can introduce a new error in a child change on the same line that would go unreported until the parent error was fixed?	23:03
clarkb	corvus: could we address that by comparing the lines (either directly or via a hash)	23:03
*** pwhalen has quit IRC		23:06
*** pwhalen has joined #zuul		23:09
*** pwhalen has joined #zuul		23:09
clarkb	oh it does hash the line if it is supplied, nevermind	23:12
clarkb	corvus: I left some comments but I don't think they are -1 worthy	23:58
clarkb	mostly thinking out loud about a couple things	23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!