*** jkilpatr has quit IRC | 00:47 | |
*** bhavik1 has joined #zuul | 05:27 | |
*** bhavik1 has quit IRC | 05:31 | |
*** bhavik1 has joined #zuul | 06:21 | |
*** bhavik1 has quit IRC | 06:24 | |
*** yolanda_ has joined #zuul | 07:01 | |
*** hashar has joined #zuul | 08:13 | |
*** bhavik1 has joined #zuul | 09:21 | |
*** bhavik1 has quit IRC | 10:14 | |
*** jkilpatr has joined #zuul | 11:19 | |
*** hashar has quit IRC | 11:42 | |
*** hashar has joined #zuul | 11:51 | |
*** dkranz has quit IRC | 12:32 | |
*** jkilpatr has quit IRC | 13:11 | |
*** jkilpatr has joined #zuul | 13:11 | |
*** jkilpatr has quit IRC | 13:17 | |
*** dkranz has joined #zuul | 13:26 | |
*** jkilpatr has joined #zuul | 13:30 | |
openstackgerrit | Andreas Scheuring proposed openstack-infra/nodepool master: Use POST operations for create resource https://review.openstack.org/480601 | 14:10 |
---|---|---|
openstackgerrit | Merged openstack-infra/nodepool master: Remove duplicate python-jenkins code from nodepool https://review.openstack.org/259157 | 14:23 |
mordred | jeblair, clarkb: I'm reviewing AJaeger's mitaka cleanup patches and it made me think abouta thing we'll need a good story for in zuul v3 ... | 14:37 |
jeblair | i do like a good story | 14:37 |
mordred | namely - we'll want, at some point, to be able to have either a job or a utility to verify what jobs in the zuul config reference a given image name | 14:37 |
jeblair | mordred: web api may be a good choice for that | 14:38 |
mordred | as once we have .zuul.yaml job configs - configs that reference, say "ubuntu-trusty" would become sad when infra stops building such a node -but it'll be hard for infra to be able to be proactive on upgrades since we won't know | 14:38 |
mordred | jeblair: agree | 14:38 |
mordred | jeblair: I figured we should just capture the use case somewhere so that we remember to implement it before we delete ubuntu-xenial :) | 14:39 |
mordred | jeblair: might also be neat for us to be able to add a "deprecated" flag to an image so that zuul could report use of a deprecated image when it runs jobs using one too | 14:39 |
jeblair | mordred: we can also see when nodepool provides nodes, so we can see what active jobs are using particular image types | 14:41 |
mordred | jeblair: yup | 14:42 |
mordred | jeblair: I think we've got good tools at our disposal | 14:42 |
clarkb | switching behavior to fail instead of queue indefinitely might also help. At least this way you get feedback immediately on any issues which can be corrected | 15:04 |
mordred | clarkb: ++ | 15:34 |
mordred | clarkb: also will be important even for normal cases to catch config errors | 15:35 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add configuration documentation https://review.openstack.org/463328 | 15:49 |
jlk | o/ | 15:58 |
* rcarrillocruz waves | 16:08 | |
rcarrillocruz | hey folks | 16:08 |
rcarrillocruz | playing with multiple networks per node in nodepool | 16:08 |
rcarrillocruz | however i get OpenStackCloudException: Error in creating instance (Inner Exception: Multiple possible networks found, use a Network ID to be more specific. (HTTP 409) (Request-ID: req-bd4ea6d2-9c97-487f-82bd-2b0e0540ff06)) | 16:09 |
rcarrillocruz | not sure if it's because nodepool doesn't know which network to associate floating IP on | 16:09 |
rcarrillocruz | looking at docs i don't see a param to say 'associate floating IP to this network IPs' or something like that | 16:09 |
rcarrillocruz | Shrews: mordred ^ | 16:09 |
jlk | rcarrillocruz: it's probably deeper than that | 16:10 |
jlk | rcarrillocruz: are you specifying which network to use for the private address? | 16:10 |
jlk | OpenStack doesn't really have a server side way to say "this is the default, pick it if somebody doesn't specify" | 16:10 |
clarkb | right the only time there is a default is when you have a single network | 16:11 |
clarkb | then neutron just uses that bceause there is no other option. If there is more than one network you have to specify which to use | 16:11 |
jlk | rcarrillocruz: I see that message usually when doing the initial boot, and you have more than one network available for the private address. | 16:12 |
rcarrillocruz | but where in the yaml | 16:12 |
rcarrillocruz | in the label? | 16:12 |
rcarrillocruz | i don't see that in the docs | 16:12 |
clarkb | rcarrillocruz: no, under the cloud provider | 16:13 |
clarkb | rcarrillocruz: there are examples in infras nodepool.yaml | 16:13 |
clarkb | (osic does this iirc) | 16:13 |
clarkb | https://docs.openstack.org/infra/nodepool/configuration.html#provider is where it is documented | 16:15 |
dmsimard | Silly question | 16:17 |
dmsimard | At what point does a RFE becomes a spec from a story ? When someone picks it up and it's ready to be discussed/worked on basically ? | 16:18 |
rcarrillocruz | i don't see in prod nodepool providers with multiple networks | 16:18 |
rcarrillocruz | and yes | 16:18 |
rcarrillocruz | that's exactly what i have | 16:18 |
rcarrillocruz | in the provider | 16:18 |
rcarrillocruz | multiple networks defined there | 16:18 |
clarkb | rcarrillocruz: osic | 16:18 |
rcarrillocruz | but nodepool doesn't like it it seems | 16:18 |
jeblair | dmsimard: we usually only write specs if we want to make sure we have agreement on something before starting work on it | 16:18 |
jeblair | dmsimard: we tend to do it for larger or more complex efforts | 16:19 |
clarkb | rcarrillocruz: also tripleo iirc | 16:19 |
dmsimard | jeblair: makes sense, just wanted to make sure | 16:19 |
rcarrillocruz | clarkb: looking at nodepool.openstack.org osic clouds have only one network | 16:21 |
rcarrillocruz | - name: 'GATEWAY_NET_V6' | 16:21 |
clarkb | rcarrillocruz: the provider only has one network but the cloud has multiple. That is the way we select whcih of the multiple networks we want to use because neutron will not default to one of them for us | 16:21 |
clarkb | rcarrillocruz: that is also a list so you can provide more than one if you want more than one | 16:22 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Reorganize docs into user/admin guide https://review.openstack.org/475928 | 16:24 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use oslosphinx theme https://review.openstack.org/477585 | 16:27 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move tenant_config option to scheduler section https://review.openstack.org/477587 | 16:27 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_expiry to webapp section https://review.openstack.org/477586 | 16:27 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct sample zuul.conf https://review.openstack.org/477588 | 16:27 |
dmsimard | jlk, jeblair: I created a story from a discussion a while back since it was a topic of discussion again in RDO's implementation https://storyboard.openstack.org/#!/story/2001102 | 16:29 |
jlk | dmsimard: can you document the data flow? Where does the whole thing get started from? Seems like you still need an event to wake up the scheduler, which would then use the script as a pipeline filter? or do you just imagine the scheduler running the script every X seconds in a loop, or??? | 16:32 |
dmsimard | jlk: I'm not particularly familiar with zuul internals so it's not trivial for me to express is proper terms :) | 16:33 |
jlk | ah. | 16:33 |
jlk | but generally? | 16:33 |
dmsimard | jlk: technically, using the term filter would probably be more appropriate than trigger | 16:34 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs https://review.openstack.org/477589 | 16:34 |
dmsimard | at least in our use case, it would be a filter | 16:34 |
jlk | Pipelines are event driven, because basically the system sits idle until some event kicks it into gear. Be it an event from gerrit, or github, or a timer. | 16:34 |
jlk | that event is processed to determine which project it relates to | 16:35 |
jlk | and then the scheduler looks at the event and matches it up to the triggers for pipelines, to complete a event + project + jobs + pipeline mashup | 16:35 |
jlk | when it's doing that, it looks at either event filters ( limitations on the triggers themselves ), or pipeline filters ( higher level restrictions on the pipeline ). | 16:37 |
dmsimard | Right, so perhaps what I'm asking for is something that doesn't exist yet -- the notion of a pipeline filter, and then fold the existing branch/files filtering at the job-level under a proper filter key where more things (such as the arbitrary script I'm speaking of) could be added | 16:37 |
jlk | for instance, a github event may be a comment event of "recheck", but the gate pipeline may have a pipeline filter (requirement) that there be enough votes. SO the trigger matches, but the pipeline requirement filter blocks it. | 16:38 |
jlk | dmsimard: if I understand you, then you still want the initial wake up to be an existing trigger, like a gerrit change or comment | 16:39 |
dmsimard | yes. | 16:39 |
jlk | but then you want as a pipeline requirement the ability to run an arbitrary script to make a Just In Time decision on whether to enqueue or not | 16:39 |
dmsimard | Yes. Ultimately I think our need is for that mechanism to be at the job definition layer (second example in the story) | 16:40 |
dmsimard | But I feel that can get things slow rather quickly | 16:41 |
dmsimard | especially at scale | 16:41 |
jlk | yeah, since that definition can live _in_ the repo in question | 16:41 |
jlk | so it's a clone/merge, re-read configuration | 16:41 |
jlk | yeah eww, that kind of leaks pipeline config (which is supposed to only be from trusted repos) into untrusted repos | 16:42 |
dmsimard | jlk: to re-iterate the context, it's basically a way to do a "freeform" filter | 16:43 |
jlk | where would you picture this script running, with what context? | 16:43 |
dmsimard | jlk: in JJB terms, add a shell builder to decide whether to continue or not very early on in the process :p | 16:43 |
dmsimard | jlk: that last statement was not in reply to your question, hang on | 16:44 |
dmsimard | jlk: thinking outside the box, perhaps I'm looking at this from the wrong angle | 16:45 |
dmsimard | jlk: how about being able to dynamically trigger jobs from a job ? | 16:45 |
dmsimard | the filtering and triggering logic could live inside that job and that job would trigger jobs based on what it needs | 16:46 |
jlk | we have a job hierarchy thing | 16:48 |
jlk | You can have an "early" job on a pipeline, and a number of jobs that "depend" on that job | 16:48 |
jlk | so if your early job succeeds, the others go. If not, they do not | 16:48 |
mordred | dmsimard: can you give an example of what userscript.sh might do? | 16:48 |
jlk | let me see if I can pull up the docs (for v3) on this | 16:49 |
dmsimard | jlk: yeah, but (like discussed last time), this parent job might have one of a dozen jobs to trigger -- for a patch we might have three jobs to trigger, not 12.. and another patch maybe we have 5 | 16:49 |
dmsimard | mordred: curl weather.com and if weather is good trigger job A B C, if weather is bad trigger job C D E ? | 16:50 |
jlk | this is where roles/playbooks come in | 16:50 |
mordred | dmsimard: yah - also what jlk says about hierarchies may be helpful - but a few things are quite different in v3, so it's possible that the things you're wanting to accomplish are structued differently | 16:50 |
mordred | dmsimard: I mean - can you give me a non-invented example? | 16:50 |
mordred | dmsimard: (trying to wrap my head around the use case and I think a real example problem would help a lot here) | 16:51 |
rcarrillocruz | the issue was that i had networks section on provider | 16:51 |
rcarrillocruz | and cos i have a pool | 16:51 |
rcarrillocruz | it had to be put in that level | 16:51 |
mordred | dmsimard: it's also worth noting that jobs in v3 are in ansible and also support multiple hosts -so it's entirely possible that this is a "this logic should just be in an ansible playbook" case | 16:52 |
dmsimard | mordred: We have this thing in RDO called rdoinfo which is more or less a database that maps upstream git repositories to rpm package names (amongst other things). This project spans all OpenStack releases and the jobs we need to trigger depends on the release that is being modified. | 16:52 |
dmsimard | mordred: https://github.com/redhat-openstack/rdoinfo/blob/master/rdo.yml | 16:52 |
mordred | dmsimard: looking/reading | 16:52 |
mordred | dmsimard: and thanks | 16:52 |
dmsimard | mordred: We've currently hacked together something rather awesome but also pretty ugly where we have a job that knows what jobs to trigger and with what parameters and triggers them remotely (on a remote jenkins instance where generic parameterized jobs exist) and we then poll the jobs until they finish to get their status | 16:53 |
mordred | dmsimard: is that rdo.yml consumed by anything _other_ than routing how builds work? | 16:54 |
dmsimard | mordred: https://review.rdoproject.org/jenkins/job/weirdo-validate-buildsys-tags/98/consoleFull | 16:54 |
dmsimard | mordred: it's consumed by various RDO tooling to decide what is built and how it is built | 16:55 |
dmsimard | mordred: basically, when we, for example, bump upper-constraints https://review.rdoproject.org/r/#/c/7403/3/rdo.yml we'll run integration jobs | 16:56 |
dmsimard | which will build the packages that changed with that patch taken into account and then run integration jobs with those newly built packages | 16:56 |
dmsimard | that particular upper-constraints patch is sort of an easy case because it only touches one release, the problem comes when it spans different releases | 16:57 |
mordred | dmsimard: I mean - I ask because it seems like th eproblems you're solving with it are the same problems that zuulv3 is trying to solve - which is not to say that the answer for your problem is just already in v3 - but the problem spae is SO similar that I think it's going to take us a non-simple amount of time to untangle which bit should grok and understand what | 16:57 |
mordred | dmsimard: which is my way of saying - I think I understand the problem space, as least in an initial way, and I think we'll need to understand a bit more deeply to be able to work with you on an appropriate solution | 16:58 |
dmsimard | mordred: sure -- I'm an end user after all :P I'm pretty sure the use case is useful but can be addressed in different ways. Happy to discuss it further. | 16:59 |
mordred | dmsimard: awesome. I'm certain the use case and problem domain are important- so I definitely want to make sure we find an answer | 17:00 |
*** hashar has quit IRC | 17:02 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname https://review.openstack.org/477592 | 17:38 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section https://review.openstack.org/477591 | 17:38 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile https://review.openstack.org/477590 | 17:38 |
jlk | So I think I need an alternative to curl for throwing json at my zuul | 17:39 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs https://review.openstack.org/477593 | 17:40 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation https://review.openstack.org/479020 | 17:42 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver https://review.openstack.org/477594 | 17:42 |
mordred | jlk: I tend to use python requests in the python repl for things like that | 17:45 |
jeblair | jlk, mordred: ^ that's the docs stack with most comments addressed; starting at 463328 | 17:45 |
mordred | jeblair: woot | 17:45 |
jlk | thanks! | 17:46 |
jeblair | i addressed comments without performing a rebase for easier diffing | 17:46 |
jeblair | i'm going to perform the rebase pass now, and then there are a couple things to change after the rebase (using the new config getter) | 17:46 |
mordred | ++ | 17:47 |
jeblair | then let's land it and be done :) | 17:47 |
jlk | I'll trade you for reviews on depends-on and fixing reports on push events :D | 17:47 |
mordred | jeblair: agree. I think it'll be much more useful for us to have docs and iterate on them as we find things | 17:47 |
jeblair | jlk: deal | 17:47 |
jlk | mordred: okay, doing it with requests in a python3 interpreter worked, something is just screwy with my curl binary it seems :( | 18:00 |
mordred | jlk: well- that's "good news" | 18:01 |
jlk | seeing if brew has a newer curl | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs https://review.openstack.org/477593 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname https://review.openstack.org/477592 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver https://review.openstack.org/477594 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use oslosphinx theme https://review.openstack.org/477585 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move tenant_config option to scheduler section https://review.openstack.org/477587 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add configuration documentation https://review.openstack.org/463328 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_expiry to webapp section https://review.openstack.org/477586 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs https://review.openstack.org/477589 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct sample zuul.conf https://review.openstack.org/477588 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section https://review.openstack.org/477591 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile https://review.openstack.org/477590 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation https://review.openstack.org/479020 | 18:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Reorganize docs into user/admin guide https://review.openstack.org/475928 | 18:01 |
jeblair | mordred, jlk: ^ that's the rebase -- the remaining suggestions about using get_config were actually the conflicts, so i had to fix those during the rebase. should mean the stack is ready now. | 18:02 |
jlk | oooh weird. It worked to get the event in, but then it tried to make an error somewhere? | 18:04 |
jlk | zuul-scheduler_1 | DEBUG:paste.httpserver.ThreadPool:Added task (0 tasks queued) | 18:05 |
jlk | then a 400 bad request | 18:05 |
jlk | or... or I did something wierd? I don't know. | 18:05 |
mordred | jlk: it's going to make an outbound call to github to fill in some data, right? | 18:05 |
jlk | it did all of that | 18:05 |
mordred | jlk: is the data in your payload data that it can effectively make outbound calls for? | 18:05 |
mordred | oh. weird | 18:06 |
jlk | I'm going to repeat and see if I'm really seeing this or not | 18:06 |
mordred | kk | 18:06 |
mordred | jeblair: https://review.openstack.org/#/c/478265/ is ready btw | 18:06 |
mordred | jeblair: or, I say that- lemme go ahead and add a zuul patch depending on it real quick | 18:06 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Run the new fancy zuul-jobs versions of jobs https://review.openstack.org/480692 | 18:08 |
mordred | jeblair: let's see how badly that breaks :) | 18:08 |
mordred | oh. piddle | 18:08 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Run the new fancy zuul-jobs versions of jobs https://review.openstack.org/480692 | 18:08 |
jlk | okay I can't repeat the error when using requests. | 18:09 |
mordred | jlk: when you're curling - are you doing: -H "Content-Type: application/json; charset=UTF-8" ? | 18:12 |
jlk | curl -H "Content-Type: application/json" -H "X-Github-Event: issue_comment" -X POST -d @pull-comment.json http://localhost:8001/connection/github/payload | 18:12 |
jlk | which worked when the recipient was python2 based, but fails with python3 | 18:13 |
jlk | now I"m wondering if the saved .json file is just in a format that python3 won't like | 18:13 |
mordred | jlk: does adding "; charset=UTF-8" to your content type header have any effect? | 18:14 |
* mordred grasping at straws to explain | 18:14 | |
mordred | but I'm honestly curious to know what the problem actually is | 18:14 |
jlk | it hasn't, no. Because that doesn't change /how/ curl sends things, it just adjust what it tells the other end | 18:14 |
jlk | I'm going to do something more fun, curl from a Fedora docker. | 18:15 |
mordred | jlk: awesome | 18:16 |
mordred | jlk: also, while you're grasping at straws: python3 that: http://paste.openstack.org/show/614493/ | 18:19 |
jlk | nod, I went through the dance of writing the file as utf8 via vim | 18:20 |
mordred | :) | 18:20 |
jeblair | hrm, looks like there's a problem in the docs stack about halfway down; looking | 18:21 |
jlk | oh d'oh. Gotta link my docker networks together. | 18:21 |
jlk | nope, that didn't do it. | 18:24 |
mordred | jlk: like - it still broke? | 18:27 |
jlk | yeah, I re-wrote the file using your snippit and used curl in Fedora docker to send the file up | 18:27 |
jlk | still get the immediate traceback | 18:27 |
jlk | ugh, and using --data-binary instead of just -d didn't work either | 18:30 |
mordred | jlk: that's super crazy | 18:31 |
mordred | jeblair: notaminusone - you added mention of allow-secrets in the secrets sectoin - but it's not actually listed in the pipeline config section | 18:32 |
SpamapS | jlk: and your only log in zuul-scheduler is 400? | 18:37 |
jlk | SpamapS: no, it's a traceback out of webob | 18:37 |
jlk | zuul-scheduler_1 | TypeError: a bytes-like object is required, not 'str' | 18:37 |
jlk | looks like if I do the json manually in-line with curl it doesn't choke | 18:37 |
jlk | so it's something about how curl is reading it from the filesystem | 18:37 |
SpamapS | weird | 18:38 |
jlk | very | 18:40 |
jlk | Guess I'll just build myself a tiny python script to do this. | 18:43 |
jlk | Still no idea where the bug is, but my little utility is working fine. | 19:10 |
jlk | https://github.com/j2sol/z8s/blob/master/sendit.py | 19:11 |
dmsimard | In the context of optimizing resources (for higher concurrency rather than speed), how would you deal with more than one flavor properly inside a nodepool tenant ? | 19:25 |
dmsimard | Just set max-servers for each node type in a ratio that makes sense ? | 19:25 |
clarkb | dmsimard: currently max-servers is per provider so you could do that but would have to split up logical providers per resource | 19:26 |
dmsimard | I know upstream doesn't really deal with this since the flavor is uniform with 8vcpu/8gb ram etc | 19:26 |
clarkb | s/resource/flavor/ | 19:26 |
clarkb | we've done the logical provider max-servers mapping onto clodu resources in the past to limit the number of instances per network/router so it does work but not sure how useable it is on the flavor side | 19:27 |
dmsimard | clarkb: oh you're right max-servers is at the provider level, I was thinking of min-ready | 19:27 |
mordred | jeblair: I have +2'd the first three, but have left comments on them too | 19:27 |
dmsimard | clarkb: I guess what I'm looking for is a "max-servers" at the label layer or something. | 19:28 |
clarkb | dmsimard: ya you can do that with the logical provider hack | 19:29 |
clarkb | dmsimard: we did it with hpcloud to control how many instances ended up on each network/router | 19:29 |
dmsimard | clarkb: does an example of this exist somewhere ? | 19:30 |
clarkb | dmsimard: in the way way back history of our nodepool.yaml from when hpcloud was in it | 19:30 |
dmsimard | ok, I'll try and look when I have a chance. Thanks :) | 19:30 |
jeblair | dmsimard: remember all this is different in nodepool v3 because allocation is handled differently. things should work better with much less tuning; the only thing that might warrant tuning is min-ready. | 19:40 |
jeblair | mordred: thanks; i'll add another patch for allow-secrets | 19:41 |
jeblair | jlk: feel free to stick it in zuul/tools if it's generally useful | 19:41 |
clarkb | jeblair: max-servers is still problematic (tobiash ran into this) because you can boot 50 "large" instances which could put you over cpu or memory or disk etc quota then try to boot 1 "small" | 19:42 |
clarkb | instance and nodepool thinks it is ok because max servers is say 100 | 19:43 |
clarkb | this is a general problem in nodepool reducing multi dimensional quota to a single factor | 19:43 |
jeblair | clarkb: ah yes. we do need to rid ourselves of max-servers. hopefully we can figure out how to use openstack's quota api :) | 19:43 |
mordred | jeblair: we support it shade-side | 19:44 |
jeblair | (though we probably want to keep supporting max-servers but also add max-ram and max-cpu) | 19:44 |
mordred | jeblair: the biggest problem is that sometimes the clouds lie/don't report it :( | 19:44 |
jeblair | for folks who may have higher quota than they want to use/pay for | 19:44 |
clarkb | yes ^ | 19:44 |
mordred | jeblair: BUT - I think if we just allowed people to set max-ram / max-cpu / etc like you suggest | 19:44 |
clarkb | but also the simplification isn't all bad, openstack quotas are a mess and we mostly don't worry about them because we simplified | 19:45 |
mordred | we could get close enough - the info is provided in the flavors | 19:45 |
jeblair | mordred: do you think the quota api is useful enough for us to do both? ie, if none of max-{ram,cpu,servers} is unlimited in the nodepool config, can we automatically query openstack to find out what the max is? or is that dangerous enough we need to make it opt-in or opt-out? | 19:45 |
mordred | jeblair: I think we could totally query the quota api - and then provide override values for folks with non-sane quota apis | 19:46 |
mordred | jeblair: or for folks who want to use less than the entire quota on their project | 19:46 |
jeblair | mordred: that seems like a reasonable approach. yeah, i guess it works for both cases. | 19:46 |
mordred | but I also agree - max-servers for the simple case seems like a nice thing to keep | 19:47 |
clarkb | granted any resource limitation tool with 20 axis is going to be a complicated mess :) | 19:48 |
clarkb | instances, cpu, ram, disk, ports, floating ips, networks, subnets, routers, volumes, volume size, volume aggregate, and I am sure I am forgetting a bunch | 19:48 |
clarkb | oh volumes per instance | 19:49 |
clarkb | clouds should just be free and infinite | 19:49 |
jeblair | heh, real clouds pretty much are. i think the metaphor just broke. | 19:49 |
*** hashar has joined #zuul | 19:49 | |
mordred | ++ | 19:50 |
mordred | I think maybe let's start with instances, cpu, ram and maybe disk - since those are all reported in flavors | 19:51 |
mordred | and we can consider fips and volumes later | 19:51 |
jeblair | "we talkin' cloud like seattle-cloud? or yuma-cloud? 'cause they ain't the same thing." | 19:52 |
mordred | heh | 19:54 |
mordred | jeblair: feature request related to "always run post jobs" | 19:58 |
jeblair | wow. in a very morisettian development -- the "cloud museum" is 10 miles from yuma -- the least cloudiest city in america: http://cloudmuseum.dynamitedave.com/ | 19:58 |
mordred | jeblair: if a job hits retry_limit - it sure would be nice to get the logs from the last failure | 19:58 |
jeblair | mordred: i think that should happen | 19:59 |
jeblair | did we land that change yet? | 19:59 |
clarkb | wouldn't you get them under $uuid? | 19:59 |
clarkb | so it is just a matter of reporting where they can be found? | 19:59 |
jeblair | clarkb, mordred: yes, logs from all the attempts should be stored, then i would expect the final log url to be used in the report | 20:00 |
mordred | ah- that may be it - https://review.openstack.org/#/c/480692/ ran new jobs - but no url | 20:00 |
jeblair | huh, wonder why no url? | 20:01 |
jeblair | at any rate, i suspect *that's* the bug here | 20:01 |
mordred | http://logs.openstack.org/92/480692/2/check/e2e1435/job-output.txt | 20:02 |
mordred | yes - output was logged and uploaded | 20:02 |
jeblair | okay, i've got 2 things on my stack right now; i can look at that a bit later if someone doesn't beat me to it | 20:02 |
jlk | 'retry_limit' is our version of Nova's "no valid host found". | 20:07 |
jeblair | jlk: :( yes | 20:07 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs https://review.openstack.org/477589 | 20:12 |
* jlk lunches | 20:15 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs https://review.openstack.org/477593 | 20:15 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname https://review.openstack.org/477592 | 20:15 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation https://review.openstack.org/479020 | 20:15 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver https://review.openstack.org/477594 | 20:15 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section https://review.openstack.org/477591 | 20:15 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile https://review.openstack.org/477590 | 20:15 |
tobiash | hi, regarding quotas | 20:18 |
tobiash | what if we let nodepool just run into quota and pause allocation for a while if it gets an quota related error? | 20:19 |
clarkb | tobiash: I am not sure you can reliably detact quota erros? | 20:19 |
tobiash | my nodepoolv2 works quite nice when running into the quota | 20:19 |
clarkb | things will fail but who knows why? that is something that is testable though | 20:20 |
tobiash | my nodepoolv3 not so nice... | 20:20 |
tobiash | clarkb: at least the logs say something about quota in the exceptions | 20:20 |
tobiash | so I think it should be detectable (not sure how easy) | 20:20 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add docs on allow-secrets https://review.openstack.org/480726 | 20:21 |
tobiash | letting run nodepool into quota gracefully would also reduce the need for synchronization of several nodepools in the same tenant | 20:26 |
clarkb | tobiash: I think the intent is that that already happens because the next provider should attempt to fullfill the request. The missing bit is some sort of coordination to back off once you hit the limit so that you aren't constantly trying when you know you are likely to fail? | 20:27 |
jeblair | clarkb: the current algorithm assumes that it knows quota; so i think we either need to facilitate the provider "knowing quota correctly" (whether that's api calls and/or config), or change the algorithm to accomodate what tobiash is suggesting | 20:28 |
jeblair | of course with multiple nodepools in the same tenant, things get a little hairy -- you can, but probably don't want to, solve that with config (set each to 1/2 quota? yuck) | 20:29 |
tobiash | jeblair: I think running gracefully into quota could be easier and more robust (considering several nodepools running in the same tenant) | 20:29 |
jeblair | api calls can improve the situation, but only if you make them before each request, and i wasn't thinking of doing that (that would be a big hit for us) | 20:30 |
jeblair | so gracefully handling quota errors is probably the best solution to that particular case (and happens to help with some others as well) | 20:30 |
tobiash | nodepoolv2 had no problem running into quota (apart from stressing the openstack api, but that could be solved) | 20:30 |
tobiash | with nodepoolv3 and running into quota I get NODE_FAILURE errors as job results | 20:31 |
jeblair | tobiash: handling it gracefully gets fairly complicated though. what if nodepool-A is never able to create the nodes it needs because of nodepool-B? | 20:31 |
tobiash | jeblair: at some point in time it will | 20:32 |
jeblair | in other words, the algorithm handles starvation very well currently, but only if it's the only occupant in the tenant (or, at least, its max-servers has been set up so that it doesn't run into other occupants) | 20:33 |
tobiash | jeblair: but I think if it runs into quota error it should behave if it never had tried to fulfill the node request | 20:33 |
*** dkranz has quit IRC | 20:34 | |
tobiash | jeblair: could nodepool just unlock the node request in this case (without decline) and retry (or let another nodepool retry) the request? | 20:34 |
jeblair | tobiash: yeah, that's the minimal necessary modification to the algorithm to handle that. it's probably the best thing to do. but i do think we need to make sure we know the caveat that in the case where nodepool can not utilize its full expected quota, starvation can result. | 20:34 |
jeblair | tobiash: yep | 20:35 |
jeblair | tobiash: http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#proposed-change | 20:37 |
tobiash | jeblair: so what kind of starvation do you think of (assuming both nodepools are working for the same zuul)? | 20:37 |
jeblair | tobiash: actually, the best change might be to remain in step 4 until it can be satisfied (in other words, continue trying to create new servers for step 4 despite quota errors). | 20:38 |
jeblair | tobiash: (and also treating a quota failure in step 5 as transitioning that to a step 4 request). | 20:39 |
tobiash | jeblair: I think that could block the job until the quota errors are gone even if another provider in a different tenant could satisfy it? | 20:40 |
tobiash | jeblair: I think I didn't understand some other part of this regarding step 3 | 20:41 |
tobiash | jeblair: scenario: one nodepool, request > quota -> declined -> node allocation failed -> job fails with NODE_FAILURE | 20:42 |
tobiash | jeblair: is my assumption correct? | 20:42 |
jeblair | tobiash: correct. to clarify: "request > quota" doesn't mean "request > available_nodes" -- it truly means "my quota is set at 5 nodes and this is a request for 10 nodes, so i can never satisfy it under any conditions" | 20:43 |
tobiash | jeblair: ah, misunderstood that then | 20:44 |
jeblair | tobiash: the algorithm will actually work well, even with two nodepools, if it knows the correct value of "available nodes". it's just that it can't do so without extra api calls. but we can approximate that by treating a surprise out-of-quota error as if we were in step 4 and were just waiting on available nodes (regardless of whether we were in step 4 or step 5) | 20:46 |
tobiash | jeblair: ok, that would work I think | 20:46 |
jeblair | tobiash: and yes, as soon as you handle a request in step 4, you may not be running optimally, because another provider might be able to handle that request. *however*, that other provider *will* handle the next request. | 20:47 |
jeblair | tobiash: the occasional delay in fulfilling a single request when a provider hits its quota limit is the price we pay in exchange for not having to communicate between providers. | 20:47 |
tobiash | jeblair: that was the second thing I observed | 20:48 |
jeblair | tobiash: we can, in the future, extend the algorthm to communicate information between providers (we are using zookeeper after all). but we wanted to keep things simple for this first attempt. | 20:48 |
jeblair | tobiash: (and even if we do that, i doubt we would communicate between different nodepool instances sharing a tenant, or even nodepool sharing with something else, so we may still need to keep some of the step-4 behavior) | 20:49 |
tobiash | jeblair: scenario: provider1 almost at quota, provider2 with free capacity, provider1 grabs node request, blocking due to running into quota (max_servers) where provider2 would have been able to serve the request with already existing nodes | 20:49 |
jeblair | (basically, the change would be: step 3.5: if request > available nodes for this provider and request < available nodes for another provider, skip) | 20:50 |
jeblair | tobiash: correct | 20:50 |
tobiash | jeblair: that would do | 20:51 |
jeblair | tobiash: i'd follow your scenario up with: provider2 grabs next node request and continues, while provider1 stops handling further requests | 20:51 |
tobiash | jeblair: maybe it also makes sense to skip instead of block when running into quota (regardless if surprising or calculated) | 20:51 |
jeblair | tobiash: if we always skip and we don't have the extra information about other providers, then the request sits indefinitely. | 20:52 |
jeblair | tobiash: that's bad for large requests -- smaller ones will always starve them out | 20:52 |
jeblair | tobiash: (remember that at load, we're usually only 1 node away from quota, so we would only satisfy one-node requests) | 20:53 |
tobiash | jeblair: right | 20:54 |
tobiash | jeblair: third scenario: provider1 has 0 nodes ready, provider2 has 3 nodes ready, both not at quota | 20:56 |
tobiash | jeblair: what I observed was that often provider1 took the node request spawing a new node where provider2 would have allocated it directly | 20:56 |
tobiash | jeblair: -> job start penalty of a minute | 20:57 |
jeblair | whoopsie | 20:57 |
tobiash | jeblair: so a step 3.5 could be check for ready nodes in different providers and skip to give them a chance | 20:57 |
jeblair | tobiash: that can also probably be solved with something like the step3.5... exactly :) | 20:57 |
tobiash | jeblair: so the plan could be: add 3.5 with ready node check and block in 4 until nodes could be spawned by gracefully handling quota errors? | 21:00 |
tobiash | jeblair: (that would of course be two changes) | 21:01 |
jeblair | tobiash: yeah. i think the change to 4 should happen first, then 3.5 later (because it's introducing a new layer of complexity to the algorithm). | 21:02 |
tobiash | jeblair: agreed, 4 fixes an issue, 3.5 is optimization | 21:03 |
tobiash | jeblair: will try that next week | 21:03 |
* tobiash is on a workshop marathon this week | 21:03 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix indentation error in docs https://review.openstack.org/480740 | 21:06 |
mordred | jeblair: ok - I've +2d most of the docs stack - there's one -1 in the middle cause it's code related - for otherthings there may be comments with the +2 | 21:10 |
mordred | jeblair: also, I think we could deal with the -1 as a followup if you prefer | 21:10 |
jeblair | mordred: ok. as i'm going through this, a lot of your suggestions are good but i'm not going to write all of them. some i will leave for you and others. :) | 21:13 |
jeblair | mordred: i don't want you to think i'm ignoring them, or don't like them. | 21:14 |
jeblair | mordred: just that i want to land this so that we can all take shared ownership of docs. :) | 21:14 |
mordred | jeblair: oh - yah - I mostly just wanted ot write them down somewhere so we didn't completely lose them | 21:14 |
mordred | jeblair: I totally agree about landing this as it is | 21:14 |
jeblair | mordred: does "scalable component" -> "scale-out component" help address your concerns? (i'm also adding other words, just wondering if that's a better capsule description) | 21:17 |
mordred | jeblair: I think so? I think scalable component could describe an Oracle database on a very large piece of hardware | 21:18 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Reorganize docs into user/admin guide https://review.openstack.org/475928 | 21:36 |
jeblair | mordred, jlk: ^ updated patch #2 | 21:37 |
jeblair | mordred, jlk: i've replied to your comments on patches 1 and 2 now. | 21:37 |
mordred | ++ | 21:38 |
mordred | let's get these landed - SpamapS, feel like landing a couple of docs patches? | 21:39 |
SpamapS | mordred: I'll start reviewing them now. | 21:44 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs https://review.openstack.org/477593 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname https://review.openstack.org/477592 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix indentation error in docs https://review.openstack.org/480740 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver https://review.openstack.org/477594 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use oslosphinx theme https://review.openstack.org/477585 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move tenant_config option to scheduler section https://review.openstack.org/477587 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_expiry to webapp section https://review.openstack.org/477586 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs https://review.openstack.org/477589 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct sample zuul.conf https://review.openstack.org/477588 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section https://review.openstack.org/477591 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile https://review.openstack.org/477590 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add docs on allow-secrets https://review.openstack.org/480726 | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation https://review.openstack.org/479020 | 21:45 |
mordred | SpamapS: awesome. thanks! in general aiming for fixing stuff in followups where possible, fwiw | 21:45 |
jeblair | (cause otherwise ^ that happens and it's not fun) | 21:46 |
*** hashar has quit IRC | 21:50 | |
SpamapS | +3'd a lot so far | 21:51 |
SpamapS | So I may have gotten out of sync w/ that last rebase | 21:51 |
SpamapS | leading to a possible <shock> +3 without two +2's | 21:52 |
SpamapS | Unfortunately, I have to run to a 15:00 appt. bbiab | 21:52 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_url to webapp config section https://review.openstack.org/480759 | 21:53 |
jeblair | mordred: i aimed for the [scheduler] section but missed and ended up in [webapp]. | 21:53 |
jeblair | mordred: okay, i'm now waiting for v+1 and w+3s to roll in -- is there job stuff waiting for me to look at? | 21:57 |
mordred | jeblair: no - I added jobs to zuul but then they failed | 22:06 |
mordred | jeblair: the issue at hand is: | 22:06 |
mordred | 2017-07-05 18:17:01.821275 | ubuntu-xenial | + sudo service mysql start | 22:06 |
mordred | 2017-07-05 18:17:02.033407 | ubuntu-xenial | Failed to start mysql.service: Unit mysql.service not found. | 22:06 |
jeblair | something amiss in bindep land? | 22:06 |
mordred | jeblair: I have not yet dug in to why that's working for tox-py35 and not for zuul-tox-py35 | 22:07 |
mordred | jeblair: I'm guessing | 22:07 |
mordred | jeblair: oh - it's a sequencing | 22:08 |
jeblair | jlk: adam_g left a comment on ps5 of https://review.openstack.org/474401 does it still apply with the change to pr comments rather than commit messages? | 22:09 |
jeblair | mordred: cool; i'll dig into that post url thing now | 22:09 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Port in tox jobs from openstack-zuul-jobs https://review.openstack.org/478265 | 22:09 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add a test to verify push reports only set status https://review.openstack.org/476400 | 22:10 |
mordred | jeblair: cool - the new job was running extra test setup before bindep (whoops) | 22:10 |
jlk | jeblair: so what Adam is saying is that what I have in change 474401 is fine, it is not broken in the way that change 476286 addressed. | 22:10 |
jeblair | jlk: okay cool, sorry i got twisted around. :) | 22:11 |
jlk | I had to re-read it myself :D | 22:11 |
mordred | jeblair: the zuul-* jobs are passing on zuul except for the cover job | 22:17 |
mordred | jeblair: given how little I care about the cover job, I think I'm going to consider that "good for now" and treat the cover job as a followup fix | 22:18 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Improve debugging at start of job https://review.openstack.org/480761 | 22:19 |
jeblair | that's been driving me batty trying to read the logs ^ | 22:19 |
jeblair | mordred: cool. i mean, we should fix it because it's a useful test case, but i'm down with deferring. | 22:20 |
jeblair | mordred: though.... | 22:20 |
mordred | yah. I mean - theres a TON of iterative work that needs to be done on those jobs | 22:20 |
jeblair | mordred: if we land it now, we'll go back to having zuul -1 all our changes | 22:20 |
mordred | nah - it's a non-voting job anyway | 22:21 |
jeblair | okay | 22:21 |
jeblair | good, because i have noticed people have been ignoring v-1 changes, even though they are just false v-1s from zuul | 22:21 |
mordred | jeblair: oh - also, I don't know if it's a thing or not - but on the status page completed jobs are not updating their url to be the log location - they keep the finger location | 22:22 |
jeblair | mordred: hrm, that sounds like an oversight. | 22:23 |
mordred | jeblair: https://review.openstack.org/#/c/480692/ is the change to add zuul- jobs to the zuul repo https://review.openstack.org/#/c/478265/ are the jobs themselves | 22:23 |
jeblair | mordred: the reason the cover job is failing is due to a lack of playbook | 22:24 |
mordred | oh | 22:24 |
mordred | well that's a good reason to fail | 22:24 |
mordred | let's fix that | 22:25 |
jeblair | mordred: (that happened to be the one i picked to figure out why the log link didn't show up in the report -- so i think the answer there is that we don't have friendly exception handlers for that, as wes as probably a few other "normal" exceptions like when a job has an included ansible module) | 22:25 |
mordred | jeblair: ah - nod | 22:26 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Port in tox jobs from openstack-zuul-jobs https://review.openstack.org/478265 | 22:27 |
mordred | k. that should fix the cover job - thakns for spotting that | 22:27 |
jeblair | mordred: do you think that was the case for the other failures too? since your recheck fixed the others... | 22:27 |
mordred | jeblair: oh- no- the others were that we were running extra-test-setup before bindep | 22:28 |
mordred | the recheck there is because I uploaded a new version to zuul-jobs that fixed the order | 22:28 |
jeblair | hrm, that should have reported then. i'll continue to dig. | 22:28 |
mordred | jeblair: that was a normal pre-playbook-exit-nonzero case - and it just actually did hit its retry limit | 22:29 |
jlk | mordred: SpamapS: Could use a review/workflow on https://review.openstack.org/474401 | 22:40 |
mordred | jeblair: ok - now https://review.openstack.org/#/c/480692/ is totally green - so https://review.openstack.org/#/c/478265/ is good to go now | 22:46 |
mordred | SpamapS: ^^ | 22:46 |
mordred | jeblair, SpamapS: don't gouge your eyes out - there are ugly copy-pasta shell scripts directly in playbooks - followup patches will start to refactor those small bits at a time | 22:47 |
jeblair | mordred: i can't navigate the zuul debug log with all the json crap in it. we've fixed that in code, but i need to restart zuulv3. | 22:48 |
jeblair | mordred: i'd like to do that and then debug the log url thing when it happens again. | 22:48 |
mordred | jeblair: ok | 22:49 |
mordred | jeblair: it's an easy thing to reproduce - you want me to make a DNM patch that will trigger it for you? | 22:49 |
jeblair | mordred: sure. i've restarted zuul scheduler now, so it's safe to push that up. | 22:50 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Implement Depends-On for github https://review.openstack.org/474401 | 22:50 |
jeblair | mordred: i'm going to restart the executor too, to eliminate any doubt about what we're running. :) | 22:51 |
mordred | jeblair: kk | 22:51 |
jeblair | done | 22:52 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: DNM Broken Pre Playbook https://review.openstack.org/480764 | 22:54 |
mordred | jeblair: ok - that was one of the no-log types we saw | 22:55 |
mordred | jeblair: (that's the "there is no pre-playbook" case) | 22:55 |
jeblair | mordred: oh, that case i understand | 22:56 |
jeblair | i just don't know what the other failures were | 22:56 |
mordred | k. patch for the other one coming too | 22:56 |
jeblair | mordred: your jobs changes have +2s from me | 22:57 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: DNM Broken Pre Playbook that fails pre-task https://review.openstack.org/480766 | 22:57 |
mordred | jeblair: woot! | 22:57 |
mordred | jeblair: that one is a pre-playbook that has a shell task that fails | 22:57 |
jeblair | ok cool | 22:57 |
mordred | jeblair: I also realized I can delete all of the other jobs in that one so we don't waste time waiting on them | 22:58 |
SpamapS | I put up a question on 480759 | 22:58 |
jeblair | SpamapS: oh, sorry, i guess that's an unrelated change. i wrote it then because i was performing a mental audit of "are single-entry config sections correct". i wanted to record why i thought it was okay to keep zookeeper.hosts | 23:01 |
jeblair | SpamapS: would you like me to spin it out into a new change, or update the commit message? | 23:01 |
SpamapS | jeblair: OH I wasn't sure which one was a white lie! ;-) | 23:01 |
SpamapS | no if that's the correct lie, I'm game | 23:01 |
jeblair | okay, i'll respond to your comment for posterity then. happy to update or spin out as needed though. | 23:02 |
SpamapS | So many birds in the air.. I think we'll see if we can hit a few with the scatter gun. ;-) | 23:03 |
jeblair | i'm not sure if that's a metaphor or not.... :) | 23:04 |
jeblair | mordred: aha! i think i get it. it's because of the way we implement retry_limit. we create the n+1 build, and if n+1 > retries, we fail that build. | 23:07 |
mordred | jeblair: ah! | 23:07 |
jeblair | mordred: that was convenient to implement, but maybe we need to do the silghtly harder thing and actually have the N build return retry_limit on failure | 23:08 |
mordred | jeblair: yah - I think we might need to | 23:08 |
mordred | jeblair: while we're looking at that - are we hard-erroring out on non-retryable errors? | 23:09 |
mordred | jeblair: for instance - there is no need to retry if ansible-playbook returns a parse error | 23:09 |
jeblair | mordred: nope, we retry that. :( | 23:09 |
jeblair | i'll add that to the punch list. | 23:10 |
jeblair | https://storyboard.openstack.org/#!/story/2001104 | 23:25 |
jeblair | https://storyboard.openstack.org/#!/story/2001105 | 23:27 |
jeblair | https://storyboard.openstack.org/#!/story/2001106 | 23:29 |
jlk | hrm. A change (PR) that includes .zuul.yaml files, that should cause a reconfig, right? Like a change that adds another pipeline to a project? | 23:47 |
jlk | (should potentially trigger that pipeline) | 23:47 |
jlk | yeah it should if I read model and manager correctly | 23:51 |
jlk | oh haha, my fault. | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!