*** threestrands_ has joined #zuul | 00:07 | |
*** threestrands_ has quit IRC | 00:07 | |
*** threestrands_ has joined #zuul | 00:07 | |
*** threestrands_ has quit IRC | 00:08 | |
*** threestrands_ has joined #zuul | 00:09 | |
*** threestrands_ has quit IRC | 00:09 | |
*** threestrands_ has joined #zuul | 00:09 | |
*** threestrands has quit IRC | 00:10 | |
*** threestrands_ has quit IRC | 00:10 | |
*** threestrands_ has joined #zuul | 00:10 | |
*** threestrands_ has quit IRC | 00:11 | |
*** threestrands_ has joined #zuul | 00:12 | |
*** threestrands_ has quit IRC | 00:13 | |
*** threestrands_ has joined #zuul | 00:13 | |
*** threestrands_ has quit IRC | 00:13 | |
*** threestrands_ has joined #zuul | 00:13 | |
*** threestrands_ has quit IRC | 00:14 | |
*** threestrands_ has joined #zuul | 00:15 | |
*** threestrands_ has quit IRC | 00:15 | |
*** threestrands_ has joined #zuul | 00:15 | |
*** threestrands_ has quit IRC | 00:16 | |
*** threestrands_ has joined #zuul | 00:16 | |
tristanC | corvus: i worry fixing get request on '/zuul/', or even '/' for multi-tenant is going to be tricky. It seems like we need to change the routing strategy by calling a speculative 'api/info' endpoint to be able to check if it's a white-label tenant or not | 00:53 |
---|---|---|
tristanC | it's either that (an extra http call), or we document that multi-tenant deployment needs to redirect /zuul/ to /zuul/t/tenants.html | 00:54 |
tristanC | i would prefer the later, so doing a 3.1.1 release with 579418 | 00:54 |
tristanC | i mean, i already switch back to master tracking in order to get the min_hdd_avail sensor and other fixes... so i don't mind waiting longer for a release, but either ways that's not ideal | 00:57 |
tristanC | another strategy would be to drop http path <-> component relationship and switch to what storyboard is doing, e.g.: redirect everythin to a generic index.html, and manage navigation using the "#!" anchor | 00:58 |
tristanC | well there is another option, we could make entrypoint redirect to t/tenants.html by default, and white-label setup can be configured to redirect to status.html.. | 01:14 |
tristanC | let me start a zuul-discuss thread | 01:14 |
*** hwoarang has quit IRC | 02:10 | |
*** bhavik1 has joined #zuul | 04:04 | |
*** threestrands_ has quit IRC | 04:25 | |
*** bhavik1 has quit IRC | 05:11 | |
*** hwoarang has joined #zuul | 05:14 | |
*** Rohaan has joined #zuul | 05:23 | |
*** hwoarang has quit IRC | 06:23 | |
*** hwoarang has joined #zuul | 06:23 | |
*** hwoarang has quit IRC | 06:23 | |
*** hwoarang has joined #zuul | 06:23 | |
tobiash | corvus: yesterday we had a case that zuul tried to execute a post playbook that was added with a PR but not merged in a trusted base job and other jobs tried to run that and failed then because they didn't find the playbook | 06:24 |
tobiash | I have no clue how this could happen and the logs don't seem to help me either | 06:24 |
*** nchakrab has joined #zuul | 06:24 | |
tobiash | could it be that we still have a subtile bug in the caching and the PR modified the master-version of the base job? | 06:25 |
*** gtema has joined #zuul | 06:33 | |
*** hashar has joined #zuul | 06:39 | |
*** Rohaan has quit IRC | 06:48 | |
*** openstackgerrit has quit IRC | 06:49 | |
*** gtema has quit IRC | 06:59 | |
*** gtema has joined #zuul | 07:00 | |
*** openstackgerrit has joined #zuul | 07:28 | |
openstackgerrit | Merged openstack-infra/nodepool master: launcher: add pool quota debug and log information https://review.openstack.org/579048 | 07:28 |
*** jpena|off is now known as jpena | 07:52 | |
tobiash | corvus: according to the logs we have at least one case where a non-merged base job description got into the active layout and that even in a different tenant | 08:00 |
*** tobiash has quit IRC | 08:14 | |
*** tobiash has joined #zuul | 08:16 | |
*** electrofelix has joined #zuul | 08:26 | |
*** Rohaan has joined #zuul | 08:34 | |
tobiash | corvus: I can think of two ways how this could happen. Either a bug in the shared config caching or maybe some side effect of concurrently running cat and execute jobs on the exeutor. | 09:01 |
tobiash | corvus: at least right before that happened I see a cat job for that repo in the scheduler log | 09:01 |
tristanC | on a similar topic, we also add a weird issue where a job was running twice on the same nodeset, perhaps a bug in ansible retry code that leaked the first thread or someting... | 09:48 |
tobiash | tristanC: was that github or gerrit? | 09:50 |
tobiash | Ah nodeset not pipeline? | 09:50 |
tristanC | it was with a gerrit change, but i don't think it was related | 09:51 |
tristanC | tobiash: yes, a single executor picked the job, and it failed with odds anomalies, in journald there was clearly a parallel execution of the same run playbook | 09:51 |
*** hwoarang has quit IRC | 09:53 | |
*** hwoarang has joined #zuul | 09:54 | |
*** hwoarang has quit IRC | 09:54 | |
*** hwoarang has joined #zuul | 09:54 | |
*** hwoarang has quit IRC | 09:56 | |
tristanC | nodeset message file was: https://softwarefactory-project.io/logs/87/12787/1/gate/sf-ci-functional-minimal/6330df5/logs/managesf.sfdomain.com/var/log/messages , each ansible logs are repeated twice with a 5 seconds delay | 09:57 |
*** hwoarang has joined #zuul | 09:57 | |
*** hwoarang has quit IRC | 09:57 | |
*** hwoarang has joined #zuul | 09:57 | |
*** jpena is now known as jpena|lunch | 11:29 | |
*** elyezer has quit IRC | 12:07 | |
*** jpena|lunch is now known as jpena | 12:28 | |
*** rlandy has joined #zuul | 12:30 | |
*** elyezer has joined #zuul | 12:39 | |
*** Rohaan has quit IRC | 12:53 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix zuul startup with inexisting project template and gate https://review.openstack.org/579859 | 12:55 |
*** nchakrab_ has joined #zuul | 13:27 | |
*** nchakrab_ has quit IRC | 13:27 | |
*** nchakrab_ has joined #zuul | 13:28 | |
*** nchakrab has quit IRC | 13:31 | |
*** nchakrab has joined #zuul | 13:32 | |
*** nchakrab_ has quit IRC | 13:32 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix zuul startup with inexisting project template and gate https://review.openstack.org/579859 | 13:39 |
*** nchakrab has quit IRC | 13:45 | |
*** nchakrab has joined #zuul | 13:45 | |
*** elyezer has quit IRC | 13:47 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix zuul startup with inexisting project template and gate https://review.openstack.org/579859 | 13:57 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Tolerate missing project https://review.openstack.org/579872 | 13:57 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: DNM: test persistent-firewall job https://review.openstack.org/579874 | 14:00 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: DNM: test persistent-firewall job https://review.openstack.org/579874 | 14:01 |
*** jiapei has joined #zuul | 14:01 | |
corvus | tristanC: why is the /zuul/ url an issue with the angular change? | 14:14 |
mordred | corvus: it's related to the routing stuff | 14:17 |
*** ianychoi has quit IRC | 14:18 | |
corvus | mordred: ack | 14:19 |
mordred | tristanC: I actually have been planning on adding the GET /info call you mentioned and just haven't quite gotten to it yet | 14:19 |
*** ianychoi has joined #zuul | 14:19 | |
mordred | tristanC: because of where the routing table is declared, I've gotten myself stuck thinking about the 'right' way to accomplish that | 14:19 |
corvus | tobiash: there's a lock in the executor so it shouldn't run a cat job (or any merger job) at the same time it's cloning out of the cache for a job | 14:19 |
mordred | tristanC: because I *want* the call to be defined in the ZuulService class, but we don't have access to the injectable instance of that class (I don't think) in the right place to be able to call it to get the answer for that routing table | 14:21 |
mordred | but that's one of the main reasons for adding that /info call - so that the dashboard could make a call against it and not have to guess so many things based on the urls and whatnot | 14:22 |
mordred | corvus: oh - also - there's a thing in the /info call that either never worked properly or was broken by the cherrypy patches | 14:23 |
mordred | corvus: if you look at http://zuul.openstack.org/api/info - you'll see that capabilities.job_history is false - even though we have the sql driver enabled and thus should have job history support | 14:24 |
Shrews | /c 2 | 14:24 |
Shrews | doh | 14:24 |
mordred | corvus: I think the original _intent_ was that *something* in the driver would set the flag to true | 14:24 |
mordred | corvus: I don't know if that was working properly in the old version of the code - but it certainly isn't being set by anything now - and on a brief look I wasn't 100% sure the best way to fix | 14:25 |
corvus | mordred: yep, i broke it | 14:26 |
corvus | it was being set as a side effect of getting the sql handler | 14:27 |
mordred | corvus: cool! | 14:28 |
corvus | mordred: job_history was global, but should be tenant-specific | 14:28 |
mordred | oh - hrm. that's very interesting | 14:29 |
mordred | corvus: I suppose the most correct version would be based on whether or not the tenant has any pipelines that are configured to report to the sql driver | 14:30 |
mordred | corvus: although - just _having_ the sql driver enabled globally will make the sql queries in the rest api work - even though a given tenant might not have any data if it's not reporting to the db | 14:31 |
corvus | mordred: right. that's how the /builds api call works now. it always exists, but if you call it and there's no pipeline configured with a sql reporter, then it raises an exception (so it should return 500) | 14:32 |
corvus | (no pipeline configured with a sql reporter in that tenant) | 14:32 |
mordred | yah | 14:32 |
corvus | we could have it return [] in that case. or, we could only add job_history to the tenant info endpoint. | 14:34 |
mordred | I think the questionfrom the dashboard pov would be whether or not we want to show the builds link in the navbar if the tenant doesn't have history | 14:35 |
mordred | because we can certainly have the builds page just have no data - or a 'this tenant doesn't have history' message | 14:36 |
corvus | yeah, for that, i think we need it to hit the tenant info api endpoint. does it do that if you're non-whitelabel? or does it only ever hit one info endpoint? | 14:36 |
mordred | right now it hits no info endpoints | 14:36 |
mordred | but we need to get it to - so we can define the right thing for it to do | 14:36 |
corvus | ack | 14:37 |
*** hashar is now known as hasharAway | 14:37 | |
corvus | tobiash: have you seen any config contamination issues within the same tenant? | 14:37 |
corvus | (i should say, "within a single tenant") | 14:38 |
tobiash | corvus: I don't think it's related to any tenant, all tenants were contaminated | 14:56 |
corvus | tobiash: sorry, my question was trying to get at whether a multi-tenant environment is required to trigger the bug | 14:57 |
*** nchakrab has quit IRC | 14:57 | |
corvus | i asked it poorly | 14:57 |
tobiash | corvus: I still have no clue how this happens but I don't think a multi-tenant env is necessary | 14:57 |
corvus | tobiash: if you have any logs you can share, maybe i can help narrow down hypotheses | 14:59 |
tobiash | I wasn't able to create a test case to reproduce so far and I couldn't spot an issue by looking at cache handling and executor locks | 14:59 |
tobiash | corvus: unfortunately I don't have the executor logs of that timeframe but I saw a cat job to that repo before the error started to happen | 15:01 |
tobiash | it broke the base job and a manually triggered reconfiguration fixed it | 15:01 |
tobiash | one of the broken jobs contain this: | 15:02 |
tobiash | 2018-07-02 15:48:12,789 DEBUG zuul.layout: Variant <Job base-cilib branches: None source: codecraft/zuul-conf-global/zuul.d/jobs.yaml@master#123> matched <Change 0x7f1d52094c50 16,b64a68d2622968dc9947a2002aade1101aa41931> | 15:02 |
tobiash | where the source line indicates that it matched the version of the PR and not master | 15:03 |
tobiash | (the pr version of that repo) | 15:12 |
*** weshay|ruck is now known as weshay | 15:29 | |
*** sshnaidm is now known as sshnaidm|rover | 15:29 | |
corvus | tobiash: we got a report last week that the source contexts of our base jobs are wrong. eg: http://logs.openstack.org/59/579859/3/check/tox-py35/7108831/zuul-info/inventory.yaml | 15:34 |
corvus | tobiash: 'base' is not defined in zuul-jobs.yaml, it's in jobs.yaml | 15:35 |
corvus | tobiash: but so far, i haven't seen an indication that the jobs themselves are affected. this may be related, or it may be that you've seen two separate problems. | 15:36 |
corvus | oh, i guess it's job.start_mark, not the source context that gives us those line numbers | 15:42 |
corvus | er, no it's both. the line number comes from the start_mark, but the rest comes from the source_context | 15:46 |
tobiash | corvus: oh that's an interesting information | 15:56 |
tobiash | corvus: because at least in our case I'm pretty sure that the source context was correct (in terms of line number) but the config itself wasn't | 15:57 |
tobiash | corvus: that might indicate these issues are the same | 15:58 |
corvus | tobiash: it looks like the errors we're seeing are that the line number is correct (start_mark) but the filename is wrong (source_context) | 15:59 |
openstackgerrit | Logan V proposed openstack-infra/zuul-jobs master: bindep: Ensure virtualenv is present https://review.openstack.org/579906 | 16:00 |
tobiash | oh, ok, that seems different | 16:00 |
tobiash | corvus: it would be interesting if your source context changes after a full reconfiguration | 16:01 |
corvus | tobiash: in both cases, we're observing correct line numbers. then in my case, i'm observing an incorrect filename and you're observing incorrect content. right? | 16:02 |
corvus | i'm looking in our logs for instances of the *correct* filenames, and i see some, but far fewer than the incorrect ones. the one i'm looking at right now appears to be for a project-config change which encountered a config error but did not report it because it didn't think the source context matched: https://review.openstack.org/579690 | 16:02 |
tobiash | yes | 16:02 |
corvus | i'm concerned that our config-errors api endpoint is not returning data | 16:09 |
corvus | that is to say, the request has not returned after several minutes | 16:10 |
tobiash | the config-errors api doesn't return? | 16:13 |
tobiash | corvus: is that a new api that needs a scheduler restart? | 16:19 |
*** yolanda has joined #zuul | 16:24 | |
*** elyezer has joined #zuul | 17:11 | |
Shrews | corvus: ah ha. i see the issue from yesterday. the shade api is being used improperly. tl;dr we should not be trying to send it a dict and just call get_image_by_id() instead | 17:14 |
Shrews | corvus: within shade itself, if you send get_image() an *object* that has an 'id' attribute, it assumes that object is already the thing you want | 17:15 |
Shrews | corvus: we send it a dict() assuming it will do the lookup by id. it only does that if the thing you send it "looks like" a uuid | 17:17 |
Shrews | so i think my original fix may do what we intend. validating... | 17:17 |
Shrews | corvus: confirmed, but we must have 'use_direct_get: true' set in our clouds.yaml. | 17:25 |
Shrews | otherwise it does a list search as normal | 17:25 |
mordred | Shrews: yes - and we don't want use_direct_get: true in our clouds.yaml because batching | 17:29 |
Shrews | and we do not set that clouds option. might be better to just change nodepool to call get_image_by_id directly | 17:29 |
Shrews | mordred: so much batching | 17:30 |
mordred | Shrews: fwiw, I would expect get_image to work with a dict with an id key - I feel like I fixed that as a bug at some point recently | 17:30 |
Shrews | mordred: it does not | 17:30 |
Shrews | mordred: hasattr does not work on a plain dict | 17:30 |
mordred | like, get_image(dict(id='asdf')) should return the dict you pass in without making any remote calls | 17:30 |
mordred | so that's a bug | 17:30 |
Shrews | mordred: i don't think we want that though | 17:30 |
mordred | what are we looking for? I may be missing context | 17:31 |
Shrews | what good is that dict() to the return code? | 17:31 |
mordred | depends on what we're trying to do? | 17:31 |
Shrews | mordred: that code is an "optimization" saying "you've already sent me the thing you're looking for" | 17:31 |
mordred | yah | 17:31 |
mordred | that is true | 17:32 |
Shrews | if we send it a full Image() object, great | 17:32 |
Shrews | if we send it an empty dict(), we don't have the full object | 17:32 |
Shrews | so returning it back is useless | 17:32 |
mordred | well - it depends on where it is in the flow - but I agree with you that if whatyou are looking for is the full object and you have an id that returning a dict with only the id is typically useless | 17:33 |
Shrews | mordred: i feel like that's only useful within shade itself. for users of shade, they need use_direct_get | 17:33 |
*** electrofelix has quit IRC | 17:34 | |
mordred | it's useful in other place s- create_server('foo', image=dict(id='asdf')) is useful if you know you have an id and you don't want the create_server call to do a lookup for you to find the image id to pass to nova | 17:35 |
*** jpena is now known as jpena|off | 17:35 | |
mordred | but I'm not sure what use_direct_get has to do with it? use_direct_get just controls GET /images | local_filter vs GET /images/{id} ... but I'm totally jumping in half-way and am not sure the problem you're working on solving? | 17:36 |
* mordred is likely being unhelpful | 17:36 | |
Shrews | mordred: actually, for the nodepool use, it would work as you describe if we did just return the dict()... it only wants the image id | 17:37 |
mordred | Shrews: ah - https://review.openstack.org/#/c/579664 right? | 17:37 |
Shrews | mordred: yeah | 17:37 |
mordred | yah. the external property is returning something suitable for passing to create_server without an extra lookup | 17:38 |
mordred | well - it would be if the dict thing was working | 17:38 |
Shrews | yeah | 17:38 |
mordred | you could work around it in nodepool for the moment by returning a munch instead of a dict from that method | 17:38 |
mordred | but I do think we should fix the code in shade/sdk | 17:39 |
Shrews | mordred: _get_entity() would be the thing to fix in shade. but yeah, we can just pass it an object instead of dict in nodepool | 17:39 |
mordred | yup to both | 17:40 |
Shrews | no need to get munch involved as a dependency | 17:40 |
mordred | Shrews: something like this: | 17:40 |
mordred | if (hasattr(name_or_id, 'id') | 17:41 |
mordred | or (isinstance(name_or_id, dict) and 'id' in name_or_id)): | 17:41 |
mordred | right? | 17:41 |
Shrews | mordred: exactly | 17:41 |
Shrews | i might just have nodepool call get_image_by_id() directly. this whole passing a dict() vs. str stuff is a wonky API that causes confusion | 17:43 |
mordred | Shrews: just do it at the top and cache the result? | 17:45 |
Shrews | oh don't even need that call, actually | 17:46 |
*** gtema has quit IRC | 17:48 | |
mordred | Shrews: well, you do if caching isn't turned on - if you pass that image-id into create_server's image argument without wrapping it in a dict you lose the fact that it's an id and create_server will helpfully do a roundtrip it doens't need | 17:49 |
Shrews | bah | 17:49 |
mordred | yah | 17:50 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Tolerate missing project https://review.openstack.org/579872 | 17:50 |
Shrews | mordred: i mean, i was going to change that too | 17:50 |
Shrews | basically steal your _is_uuid_like() code from shade | 17:51 |
mordred | Shrews: well ... you can't quite do that though | 17:51 |
mordred | Shrews: glance does not enforce that image ids are uuid iirc | 17:52 |
Shrews | mordred: so shade would be similarly broken | 17:52 |
mordred | I think your idea of doing a get_image_by_id call is a good one - as long as you do it at a $time when you can cache the results | 17:53 |
Shrews | mordred: that's not a thing we need to do though. we just need the id (which we already have) | 17:54 |
Shrews | we're just using shade as a way to say "yep, you have an id already" | 17:54 |
Shrews | we could do that ourselves | 17:54 |
mordred | right - but you don't need is_uuid_like for that | 17:54 |
Shrews | true | 17:55 |
mordred | we know, because the config setting is 'image-id' | 17:55 |
Shrews | we can get that based... yah | 17:55 |
mordred | Shrews: do you want to stab me yet? | 17:55 |
Shrews | mordred: nope, i got to that same place, just microseconds behind you | 17:56 |
mordred | Shrews: darn. I'll try harder next time | 17:56 |
corvus | tobiash: yep. we somehow restarted between the change adding support for loading with config errors and the change which exposed them through the api. which means we're blind. :( | 18:05 |
tobiash | when is your next scheduler restart planned? | 18:10 |
corvus | tobiash: i'll do it in about 2 hours since i think i need that in order to continue looking at this bug (or bugs) | 18:10 |
*** yolanda_ has joined #zuul | 18:18 | |
tobiash | corvus: what do you think about limiting the history of the repos in merger and executors? | 18:19 |
corvus | tobiash: we should remove the zuul refs and let git gc take care of it | 18:20 |
tobiash | one of my users wants to add a repo that has 2.5 million commits and takes 3gb and 25minutes to clone (most time in resolving deltas) | 18:20 |
corvus | oh, like shallow clones? | 18:20 |
tobiash | yes | 18:20 |
*** yolanda has quit IRC | 18:20 | |
tobiash | currently the merger has a hard coded limit of 5 minutes for clone operations | 18:21 |
tobiash | that repo cannot be handled currently with zuul | 18:21 |
tobiash | and tbh I don't want to raise that timeout to 30 minutes | 18:21 |
corvus | tobiash: it's part of the design that the repo that appears in the job is what you would get if you clone. without the full repo on the executor, that won't happen. | 18:21 |
corvus | tobiash: this will take some thought. i have to run now though. | 18:22 |
tobiash | ok | 18:22 |
*** yolanda__ has joined #zuul | 18:27 | |
*** yolanda_ has quit IRC | 18:30 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Add test for referencing cloud image by ID https://review.openstack.org/579702 | 19:33 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Fix for referencing cloud image by ID https://review.openstack.org/579664 | 19:33 |
Shrews | corvus: mordred: verified 579702 using my personal vexxhost account so appears to fix the problem | 19:43 |
corvus | Shrews: that's the one that just adds the test? | 19:44 |
Shrews | corvus: oh, the other one | 19:44 |
mordred | Shrews: it fixed it with current shade? | 19:45 |
Shrews | mordred: yes | 19:45 |
Shrews | mordred: i can put up the shade/sdk fix next if you haven't already | 19:46 |
mordred | Shrews: I think I :q!-ed it ... so if you've got it, awesome :) | 19:46 |
corvus | Shrews: so the createServer part worked, but not getImage? | 19:47 |
Shrews | corvus: right | 19:48 |
corvus | okay i think that all makes sense to me now :) | 19:49 |
Shrews | corvus: the labelReady change fixes the problem reported in storyboard. the handler changes just copy the pattern | 19:49 |
Shrews | now, who can i send my vexxhost bill to?? ;) | 19:51 |
*** AJaeger has joined #zuul | 20:00 | |
*** elyezer has quit IRC | 20:02 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP add repl https://review.openstack.org/579962 | 20:08 |
mordred | Shrews: I always just bug mnaser ^^ | 20:08 |
mnaser | hi | 20:09 |
mnaser | everytime i see a highlight in #zuul i assume javascript | 20:09 |
mordred | "vexxhost: used to diagnose, test and fix all shade/openstacksdk issues" | 20:09 |
mordred | mnaser: hehe | 20:09 |
Shrews | mordred: mnaser: it's all good. probably only cost me a few cents | 20:12 |
*** yolanda_ has joined #zuul | 20:14 | |
mnaser | i always worry that y'all test against our stuff | 20:14 |
mnaser | and we have something broken somehow | 20:14 |
mnaser | haha | 20:14 |
Shrews | mnaser: we test against yours b/c it works the best | 20:15 |
mnaser | woo | 20:15 |
Shrews | i mean, who wants to find MORE bugs while fixing bugs? | 20:15 |
mnaser | i have 11 hours of planes tomorrow so maybe dashboard hacking | 20:16 |
*** yolanda has joined #zuul | 20:16 | |
*** yolanda__ has quit IRC | 20:18 | |
*** yolanda_ has quit IRC | 20:18 | |
*** yolanda_ has joined #zuul | 20:19 | |
*** yolanda has quit IRC | 20:22 | |
pabelanger | finger urls :) | 20:23 |
*** yolanda__ has joined #zuul | 20:24 | |
mnaser | oooh | 20:24 |
mnaser | that might be a fun one | 20:24 |
*** yolanda_ has quit IRC | 20:24 | |
corvus | tobiash: okay, restarted and confirmed our broken config is running; that explains some of the behavior i was seeing earlier, and confirms that there's a bug somewhere that caused us not to report on a breaking config error. i strongly suspect it's related to the source_context not being correct (since that would cause zuul to suppress a config error report) | 20:26 |
corvus | i will try to confirm that next, and continue to follow leads to try to find a root cause for that, also with the hope that there's a suggestion as to how that could run the wrong content | 20:27 |
corvus | since the restart, *only* the wrong path has shown up in the logs. never the correct path. i find that interesting and surprising. but also, it gives me hope. :) | 20:27 |
mordred | mnaser: also - check out the mailing list message from tristanC and my response to it ... there's a "fun" task related to setting up angular routing differently based on the results of api/info | 20:28 |
mordred | mnaser: I'm going to take a stab at it - but if you got bored and did it instead I wouldn't complain :) | 20:28 |
mordred | although also pabelanger's finger urls would be pretty awesome | 20:28 |
tobiash | corvus: that's really surprising. I thought it would be correct after the restart | 20:29 |
corvus | me too. but maybe it means it's more reproducible in our env | 20:30 |
tobiash | corvus: is that repl thing somrthing you want to build into zuul permanently or is it just for debugging now | 20:31 |
corvus | tobiash: if we're going to merge it, i need to check with the author of some of that code about licensing. | 20:33 |
tobiash | that would be an interesting an powerful tool for debugging | 20:33 |
corvus | it is. it's also very dangerous and should be disabled by default | 20:33 |
corvus | (like, telnet in and ask zuul for decrypted secrets) | 20:33 |
tobiash | Agreed :) | 20:33 |
corvus | okay i went ahead and fired off an email about licensing | 20:38 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: DNM: Break Zuul's config https://review.openstack.org/579986 | 21:00 |
corvus | tobiash: well, i have found one problem, which is how we ended up breaking our config. it's because the load-with-broken-config system allows us to remove a job that's still in use in another project. | 21:11 |
corvus | fbo: ^ | 21:11 |
corvus | tobiash: unfortunately, that suggests that it's not related to the source_context being wrong. and even less to the issue you saw of running with the wrong content. | 21:12 |
corvus | fbo: all of openstack's projects are gated by zuul, but we still broke zuul's config by removing a job which was still in use in another repo. | 21:13 |
corvus | fbo: that's because if you remove a job, and it's not in use in the repo in which it's defined, then the only error messages are the ones from the other repos which use it. since they aren't the current repo of the change, they are suppressed, so we don't report them to the user. even though they really are caused by the current change. | 21:14 |
corvus | i wonder if we keyed all the errors by the source_context+start_mark (project+branch+file+line) and said if any of the new configuration error keys don't appear in the current configuration error keys, we report the error. | 21:19 |
corvus | that would probably uniquely identify errors enough to avoid most cases like this. | 21:19 |
corvus | oh, it also looks like if zuul accumulates too many errors, it won't report on any more errors | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Split test_dynamic_conf_on_broken_config https://review.openstack.org/579996 | 22:20 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Report config errors when removing cross-repo jobs https://review.openstack.org/579997 | 22:20 |
corvus | fbo: ^ | 22:20 |
*** dtruong has quit IRC | 22:30 | |
gundalow | Hi all. Myself and mattclay have been thinking more about softwarefactory-zuul for ansible/ansible. We are considering creating gh/ansible/zuul-config which will be very generic, like https://github.com/ansible-network/zuul-config And rather than having gh/ansible/zuul-jobs, we are considering putting all of that in gh/ansible/ansible so everything is versions together. In the past we've faced issues with other CI frameworks where certain things | 22:56 |
gundalow | have been versioned independently to others. | 22:56 |
clarkb | gundalow: probably one thing to keep in mind when figuring out if things should be split or not is the limitations placed on trusted config projects | 22:57 |
clarkb | gundalow: trusted config projects cannot be tested pre merge and if you end up needing to make gh/ansible/ansible trusted then you'd not have self testing config changes for the subset that could be self tested | 22:57 |
gundalow | interesting. https://softwarefactory-project.io/r/gitweb?p=config.git;a=blob;f=zuul/ansible_networking.yaml;h=fc7e43ceb305453f5ec7cb3b2f95c9cf5c4f8682;hb=HEAD#l7 contains the config I'm using to test gh/ansible-network, according to that only `ansible-network/zuul-config` is trusted (which defines very little), so I think that's OK | 22:58 |
gundalow | What may cause me to want to make gh/ansible/ansible trusted? | 22:59 |
clarkb | if you've got gh/ansible/zuul-config for that then possibly nothing | 23:00 |
gundalow | woot | 23:00 |
gundalow | Thanks. I'll create a PR and see how it goes | 23:00 |
clarkb | corvus: for identifying errors, the downside to the method used in your change above is that you can introduce a new error in a child change on the same line that would go unreported until the parent error was fixed? | 23:03 |
clarkb | corvus: could we address that by comparing the lines (either directly or via a hash) | 23:03 |
*** pwhalen has quit IRC | 23:06 | |
*** pwhalen has joined #zuul | 23:09 | |
*** pwhalen has joined #zuul | 23:09 | |
clarkb | oh it does hash the line if it is supplied, nevermind | 23:12 |
clarkb | corvus: I left some comments but I don't think they are -1 worthy | 23:58 |
clarkb | mostly thinking out loud about a couple things | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!