openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610 | 00:40 |
---|---|---|
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610 | 01:16 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610 | 01:50 |
*** bhavikdbavishi has joined #zuul | 03:14 | |
*** threestrands has joined #zuul | 05:21 | |
*** threestrands has quit IRC | 05:21 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Add spec for scale out scheduler https://review.openstack.org/621479 | 06:23 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Add spec for scale out scheduler https://review.openstack.org/621479 | 06:24 |
openstackgerrit | Surya Prakash (spsurya) proposed openstack-infra/zuul master: dict_object.keys() is not required for *in* operator https://review.openstack.org/621482 | 06:35 |
*** quiquell|off is now known as quiquell | 07:10 | |
*** pcaruana has joined #zuul | 07:25 | |
*** quiquell is now known as quiquell|brb | 07:40 | |
*** bjackman has joined #zuul | 07:56 | |
bjackman | Just realised I didn't get a reviewer auto-assigned to my Zuul change, should I request one? | 07:59 |
bjackman | I thought tristanC was auto-assigned to my last one but maybe I misunderstood | 07:59 |
tristanC | bjackman: reviewer are not auto-assigned in review.openstack.org | 08:00 |
bjackman | tristanC: OK thanks | 08:00 |
tristanC | bjackman: feel free to assign me, or ask for review here using the change url | 08:01 |
tristanC | ask here* | 08:01 |
bjackman | tristanC: Cheers, I have assigned you | 08:02 |
*** gtema has joined #zuul | 08:07 | |
tristanC | bjackman: thank you for fixing gerrit-2.16 support :) | 08:09 |
bjackman | tristanC: No worries :) | 08:13 |
*** quiquell|brb is now known as quiquell | 08:18 | |
*** jpena|off is now known as jpena | 08:28 | |
*** themroc has joined #zuul | 08:50 | |
*** dkehn has quit IRC | 09:29 | |
*** sshnaidm|off is now known as sshnaidm | 09:43 | |
bjackman | What does the promote operation do? | 10:13 |
*** electrofelix has joined #zuul | 10:18 | |
*** electrofelix has quit IRC | 10:22 | |
*** bhavikdbavishi has quit IRC | 10:23 | |
*** electrofelix has joined #zuul | 10:31 | |
*** bjackman has quit IRC | 10:35 | |
*** bjackman_ has joined #zuul | 10:35 | |
*** bjackman_ is now known as bjackman | 10:35 | |
*** panda|pto is now known as panda | 10:47 | |
*** gtema has quit IRC | 11:02 | |
*** sshnaidm has quit IRC | 11:16 | |
*** sshnaidm has joined #zuul | 11:16 | |
*** sshnaidm has quit IRC | 11:18 | |
*** rfolco has joined #zuul | 11:18 | |
*** sshnaidm has joined #zuul | 11:19 | |
*** quiquell is now known as quiquell|brb | 11:21 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610 | 11:44 |
*** quiquell|brb is now known as quiquell | 11:45 | |
*** electrofelix has quit IRC | 11:58 | |
*** electrofelix has joined #zuul | 12:03 | |
*** njohnston has joined #zuul | 12:06 | |
*** njohnston has quit IRC | 12:10 | |
*** njohnston has joined #zuul | 12:10 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift resource provider https://review.openstack.org/570667 | 12:13 |
*** jpena is now known as jpena|lunch | 12:34 | |
*** rlandy has joined #zuul | 12:54 | |
*** bjackman has quit IRC | 12:55 | |
*** jpena|lunch is now known as jpena | 13:26 | |
*** dkehn has joined #zuul | 13:44 | |
*** quiquell is now known as quiquell|lunch | 14:01 | |
*** smyers has quit IRC | 14:19 | |
*** smyers has joined #zuul | 14:21 | |
*** gtema has joined #zuul | 14:34 | |
*** quiquell|lunch is now known as quiquell | 14:40 | |
quiquell | AJaeger_, pabelanger: I think openstack/pbrx is missing configparser as requirement.txt | 14:41 |
*** gtema has quit IRC | 14:45 | |
mordred | quiquell: yeah - I think you're right (at least for python2) - if you install it under python3 is should be fine ... but we should add that for sure | 14:51 |
mordred | quiquell: you want to send a patch or want me to? | 14:51 |
quiquell | mordred: do we have to support python2 ' | 14:52 |
quiquell | mordred: what version of configparser do we need to pin to ? | 14:52 |
mordred | quiquell: I don't think we need to pin it - but we should mark it with ;python_version=='2.7' so it only installs for 2.7 | 14:53 |
mordred | quiquell: and yes - for now - there is another utility in pbrx that wants to be used under 2.7 by some people - I think we should likely split that out and make pbrx into two separate tools | 14:54 |
mordred | because I don't actually find python2 support for build-images to be useful :) | 14:54 |
pabelanger | agree, was just going to ask why python2 for image builds at all. But, maybe because of centos-7 | 14:58 |
quiquell | mordred: So maybe is better to split things, and not add configparser dep | 14:58 |
mordred | quiquell: yeah. I'll work on that today ... I think the multiple command thing is less useful than it should be | 15:00 |
mordred | pabelanger: indeed ... although I think maybe for now, since pbrx is pretty new and not used in many places, not supporting centos-7 is maybe not the worst thing? | 15:02 |
mordred | especially with 8 right around the corner | 15:02 |
pabelanger | Yup, I'd agree. Would love to only support python3 on rhel-8 / centos-8 | 15:03 |
*** themroc has quit IRC | 15:32 | |
*** hughsaunders has joined #zuul | 15:34 | |
corvus | hughsanders: you were asking about how nodepool requests are distributed in #openstack-infra... | 15:34 |
hughsaunders | corvus: hey | 15:34 |
hughsaunders | yep | 15:34 |
corvus | hughsaunders: your summary is pretty accurate -- there's almost no communication between the nodepool launchers | 15:35 |
corvus | (they know which other launchers are online, but otherwise, they know nothing about any of the others) | 15:35 |
corvus | lemme dig up a link | 15:35 |
hughsaunders | ok thanks | 15:35 |
corvus | hughsaunders: the algorithm is still basically what's described here (from the original zuulv3 spec): http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html | 15:36 |
corvus | hughsaunders: (search for "A simple algorithm") | 15:37 |
corvus | hughsaunders: the original design goals there were simplicity, reliability, and fault tolerance | 15:37 |
corvus | hughsaunders: i think we'd all love to improve the situation to make it smarter (so that quota and capacity across different providers is taken into account) | 15:38 |
corvus | hughsaunders: i think the trick is that any changes need to take into consideration the behavior of the existing algorithm and ensure that we don't create any new edge cases where something can get wedged, or lose fault tolerance. | 15:39 |
corvus | hughsaunders: or, we could create an entirely new algorithm | 15:39 |
corvus | hughsaunders: either will be acceptable -- the main thing is that the current algorithm doesn't really have a lot of facilities for changes like that, and any changes need to be made with great care | 15:40 |
hughsaunders | corvus: My initial thought was cause each provider to pause for a few seconds if they can't satisfy a request from ready nodes, or immediately lock it if they can. This gives other providers time to lock the request if they have ready capacity. After the timout, any waiting thread tries to lock the request. | 15:40 |
hughsaunders | Not pretty, but it is simple :/ | 15:41 |
hughsaunders | Other thoughts included some kind of auction where providers bit for a request depending on the number of nodes from the request they can fulfil from ready capacity. That would be neat but makes the algorithm much more complex. | 15:41 |
hughsaunders | *bid | 15:41 |
*** themroc has joined #zuul | 15:44 | |
corvus | hughsaunders: i think the timeout approach would work, but may be somewhat fragile -- in other situations we're dealing with nodepool taking many minutes to process the full request list | 15:45 |
corvus | hughsaunders: so i think an approach where launchers can communicate more about themselves might be better... though also.... | 15:45 |
corvus | hughsaunders: i think clarkb was suggesting (for other reasons) that we should have a two-phase system where node requests from zuul are processed in one phase and then any still-needed nodes are put onto a queue individually, so that node creation is less-directly tied to node requests. we may be able to incorporate your case into that too, by making sure that the initial request processor can just return | 15:48 |
corvus | ready nodes regardless of the launcher which supplied them. | 15:48 |
tobiash | corvus: I hope it was ok that I fixed your fixes to zuul (I think they landed already) | 15:49 |
corvus | hughsaunders: the bid idea would also work -- it's actually not too dissimilar from the current algorithm; i think the downside there is that it would essentially run the algorithm twice for each request (every launcher reports on whether it can fulfill requests during the ready phase, and then again afterwords (assuming no ready nodes)) | 15:50 |
corvus | tobiash: yes, thanks! were there any significant errors i should look at, or pep8 stuff? | 15:51 |
tobiash | corvus: just minor logic thing | 15:52 |
hughsaunders | corvus: It could slow things down for the min-ready:0 case, but I would have thought most users of nodepool are interested in min-ready>0. | 15:52 |
tobiash | corvus: https://review.openstack.org/#/c/621319/3/zuul/manager/__init__.py that's what I fixed | 15:52 |
pabelanger | hughsaunders: for ansible-network nodepool, we've stopped using min-ready nodes, mostly for the behavior you are talking about. We default to min-ready:0 and which ends up tacking a few more seconds on job run. This is also becase we pay $$$ for each node online also. | 15:53 |
tobiash | the filtered list didn't necessarily contain the item | 15:53 |
corvus | hughsaunders: yes, though for example, openstack almost never benefits from min-ready, because we are almost never idle enough to be able to build min-ready nodes. so taking that as an example, a busy system would spend longer running the algorithm twice, so it's sort of opposite-scaling. | 15:53 |
tobiash | corvus: so it needed a search for it and return the index or the list length as a fallback | 15:53 |
hughsaunders | corvus: I also thought about removing this check to see what happens, but I guess it would screw up accounting.. https://github.com/openstack-infra/nodepool/blob/master/nodepool/driver/__init__.py#L427 | 15:54 |
*** lennyb_ has quit IRC | 15:55 | |
*** jhesketh has quit IRC | 15:55 | |
corvus | tobiash: in what case? | 15:55 |
tobiash | in case the item is not yet enqueued | 15:55 |
corvus | tobiash: ah, we do call that before enqueing don't we. what about dependent.py? | 15:56 |
*** jhesketh has joined #zuul | 15:57 | |
tobiash | the dependent seemed to work | 15:57 |
corvus | tobiash: shouldn't it have the same problem? | 15:57 |
corvus | tobiash: (getNodePriority is overidden in dependent.py | 15:57 |
*** lennyb has joined #zuul | 15:57 | |
tobiash | it has its own implementation of getNodePriority does it? | 15:57 |
*** quiquell is now known as quiquell|off | 15:58 | |
tobiash | corvus: hrm the dependent implementation should throw an error in this case too, so either I lied or that is not happening in gate or is not tested in gate | 16:00 |
corvus | tobiash: maybe the case was when the item wasn't live? | 16:01 |
corvus | (which never happens in dependent) | 16:02 |
tobiash | that could be | 16:02 |
corvus | tobiash: yeah, i don't think getNodePriority is called before enqueing | 16:02 |
corvus | tobiash: and the tests only failed after the change to filter out non-live items | 16:03 |
tobiash | corvus: makes sense | 16:04 |
corvus | of course -- why is this being called at all for non-live items? | 16:04 |
tobiash | corvus: I think I'll re-analyze this later | 16:04 |
tobiash | that's an interesting question | 16:04 |
corvus | ah, i think it's called unecessarily inside of processOneItem, but it won't be used | 16:05 |
tobiash | ah yes, it's called before we know that there is no node request | 16:06 |
corvus | i'm going to double check that was the error reported, and if so, i think i might adjust the fix a bit | 16:07 |
tobiash | great | 16:12 |
*** themroc has quit IRC | 16:21 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Don't calculate priority of non-live items https://review.openstack.org/621626 | 16:35 |
corvus | tobiash: how about that? ^ | 16:35 |
tobiash | corvus: I had exactly the same in mind | 16:39 |
mordred | tobiash, corvus: I just added the second +2 to that - any reason I should not go ahead and +A? | 16:40 |
corvus | mordred: nope, if it passes gate it's good :) | 16:40 |
tobiash | ++ | 16:41 |
clarkb | corvus: are there other bug fixes on relative priority that need review? or are we good to make another go at it once 621626 is in? | 16:54 |
corvus | clarkb: afaik gtg, and we could do it before 626, it's just a slight inefficiency | 16:54 |
*** gtema has joined #zuul | 16:58 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Extract out common config parsing for ConfigPool https://review.openstack.org/621642 | 17:18 |
*** manjeets has joined #zuul | 17:19 | |
Shrews | corvus: tobiash: tristanC: proposed change to how we deal with driver configs ^^^ | 17:20 |
*** gtema has quit IRC | 17:31 | |
SpamapS | hit a weird problem today with our Zuul | 17:36 |
SpamapS | A developer proposed a PR from origin/foo -> origin/master, and then realized they shouldn't do PR's from origin, deleted origin/foo, re-proposed as them/foo -> origin/master, but the sha's were the same, and zuul seemed to get confused and marked the post job as RETRY_LIMIT because of a crash | 17:38 |
SpamapS | http://paste.openstack.org/show/736584/ <-- the crash | 17:38 |
tobiash | SpamapS: you mean a *branch* named 'origin/foo'? | 17:40 |
tobiash | SpamapS: we had something similar: https://review.openstack.org/#/c/619245 | 17:42 |
SpamapS | no, the branch was named 'foo' | 17:43 |
SpamapS | I'm just giving remote names as symbolic "the main repo" | 17:43 |
SpamapS | Also I think we need better docs (or I need to be shown them) on how to use zuul-enqueue-ref because it never does *anything* for me. | 17:44 |
mordred | SpamapS: was the branch actually named "feat/charity" ? | 17:45 |
mordred | SpamapS: also - nice edge case your developer found :) | 17:46 |
clarkb | zuul would've treated origin/foo PR source as merged and load config from that directly? then the new branch on then/foo would be unmerged | 17:47 |
clarkb | I'm guessing its that behavior in particular that has confused zuul? | 17:47 |
mordred | clarkb: the traceback looks more like zuul merger got confused by the branch going away | 17:47 |
clarkb | mordred: ya but only because it treats those branches as special I think? | 17:48 |
pabelanger | Yah, we had issue in ansible-network when we deleted an origin branch, zuul didn't respond well and we had to purge the mergers | 17:48 |
SpamapS | mordred: yes, it was named feat/charity | 17:48 |
SpamapS | I also am now working to turn on the "only look at protected branches" flag | 17:49 |
SpamapS | which I should have turned on long ago | 17:49 |
mordred | SpamapS: cool. just making sure I was grokking the traceback properly mapping it to your thing | 17:49 |
clarkb | on the gerrit side we handle this by taking action on the ref updated new sha1 0*40 event | 17:49 |
clarkb | iirc | 17:49 |
pabelanger | SpamapS: +1 | 17:49 |
clarkb | does github send a similar event we can take action on when branches are deleted? | 17:49 |
pabelanger | clarkb: yah, I believe so | 17:50 |
pabelanger | https://developer.github.com/v3/activity/events/types/#deleteevent | 17:50 |
clarkb | ya reading the merger code is specifically keys off of '0'*40 to know when to delete the upstream branch locally in _alterRepoState | 17:52 |
clarkb | so we either need to have github driver set newrev to '0'*40 on those events, or pass that event through more directly and take the same action | 17:52 |
clarkb | SpamapS: do you have access to the webhook event that fired for that delete? reading pabelanger's doc link it seems that 'ref' is set to the ref name not the sha1. So I wonder if the assumptions in the github driver around EMPTY_GIT_REF are never hit | 17:55 |
clarkb | yup, github connection sets event.newrev to body.get('after'). Reading the docs pabelanger linked there is no 'after' key in the body for the delete event. The rest of the code does seem to do the right thing if newrev is set to '0'*40 though | 17:58 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Extract out common config parsing for ConfigPool https://review.openstack.org/621642 | 17:59 |
clarkb | oh and we only handle that in the push event which is weird. I don't know enough about the api to know if that will fire on deletes. Maybe if you delete via the empty push mechanism? | 18:00 |
SpamapS | clarkb: yeah I'll peek | 18:00 |
clarkb | tobiash: jlk ^ | 18:00 |
clarkb | I think the fix here is to add a handler for the delete event push. Then create a translated internal zuul event that sets newrev to '0'*40. Zuul should handle that in the merger properly | 18:02 |
tobiash | I thought we have that already | 18:03 |
clarkb | tobiash: _event_push handles it if it shows up as a push | 18:04 |
clarkb | but I don't see an _event_delete | 18:04 |
SpamapS | Hrm, I don't see any hooks for the branch delete | 18:04 |
SpamapS | clarkb: apparently the default app setup doesn't have us subscribed to branch/tag delete, only create | 18:06 |
SpamapS | so that's one thing to consider changing | 18:06 |
openstackgerrit | Merged openstack-infra/zuul master: Don't calculate priority of non-live items https://review.openstack.org/621626 | 18:07 |
tobiash | SpamapS: I think you ran into a race | 18:10 |
tobiash | Need to switch to laptop to better explain | 18:10 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul master: Handle github delete events https://review.openstack.org/621665 | 18:10 |
clarkb | tobiash: do we not need something like ^? | 18:11 |
SpamapS | tobiash: agreed | 18:11 |
clarkb | I guess if deletevent and pushevent always happen together then pushevent is sufficient | 18:11 |
SpamapS | I think there's another race that tobiash may be getting at | 18:13 |
SpamapS | the merge probably shouldn't checkout branch names | 18:13 |
SpamapS | it should probably checkout with a ref | 18:14 |
tobiash | clarkb: it's not about delete events, they don't have anything to do with mergers | 18:14 |
clarkb | tobiash: they do. _alterRepoState() in the merger uses those events to claen up branches that were deleted upstream of the merger | 18:15 |
clarkb | what should happen is that github sends the delete event when the branch is deleted. This causes _alterRepoState to remove that branch from the local repo | 18:16 |
clarkb | then later when the new branch with same sha1 is pushed the merger should fetch it and be happy because of the earlier reconciliation | 18:16 |
tobiash | let me check | 18:16 |
*** jpena is now known as jpena|off | 18:17 | |
tobiash | clarkb: I don't see a connection of _alterRepoState to any event | 18:19 |
clarkb | tobiash: it is called by getRepoState | 18:19 |
tobiash | clarkb: I still don't get the relation to an event | 18:20 |
clarkb | hrm ya its called on the tail end of merging things | 18:21 |
clarkb | which actualyl seems right | 18:21 |
clarkb | seems like what happens is if we have a chagne to merge we merge the change. If its a repo update type event then we run getRepoState | 18:21 |
clarkb | this is in scheduleMerge in the pipeline manager | 18:21 |
tobiash | SpamapS, clarkb: so the race I have in mind works like this: have branch x, push branch y, merger merges and adds x and y into repostate, delete branch x, job now starts, takes repo state tries to restore x and y and fails during restoring x | 18:22 |
tobiash | SpamapS, clarkb: we had these things with tags and I have a hacky patch in our zuul since august to mitigate that but hadn't time to push up a real fix for this | 18:23 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Fix broken setRefs whith missing objects https://review.openstack.org/621667 | 18:24 |
tobiash | clarkb, SpamapS: this is my hacky fox for this we have in our zuul ^ | 18:25 |
tobiash | this is actually my only hacky fix that is not upstream yet ;) | 18:25 |
*** electrofelix has quit IRC | 18:31 | |
SpamapS | tobiash: yeah that seems like what happened | 18:37 |
tobiash | SpamapS: we had the same but with tags (someone deleted a tag) and from this point some mergers had that tag and some executors didn't and everything was broken because tags are always in the repo state | 18:39 |
SpamapS | Tags are just refs of a different type so it makes sense. | 18:39 |
clarkb | tobiash: reading through the scheduler and pipeline manager I think my idea is mostly correct. githubconnection.getChange should create a Ref() change object for that event. That will then get processed by the queue processing done on the pipelines. This will then trigger the mergers to clean up the deleted branch via alterRepoState | 18:40 |
clarkb | some of the details may need fixing to get stuff mapped properly, but that seems to be how the gerrit side handles this | 18:41 |
tobiash | clarkb: the repostate is metadata that is attached to a queue item, which is carried through to the executor which will reproduce the same thing as the first merger | 18:43 |
clarkb | oh I see repo_state there isn't mapped onto disk | 18:44 |
clarkb | so we probably need to concile in multiple places | 18:44 |
tobiash | clarkb: when getting a merge command a merger updates its remote refs, performs the merge and creates a clean repostate data structure from that. That is carried through to all jobs of that change so the mergers that create the workspaces can reproduce the merges deterministically | 18:44 |
tobiash | this repostate contains all branches of the repo, in github you often have branches that get created and deleted often | 18:46 |
tobiash | that's the reason you don't typically see this in gerrit but in github as it's common to work on short lived feature branches and delete them after the pr merges | 18:46 |
clarkb | right, is that repostate valid after a branch delete? | 18:46 |
clarkb | I don't see where we would currently clean it up beacuse it relies on a push of sha1 0*40 to do so | 18:46 |
tobiash | we don't need to clean it up | 18:47 |
tobiash | the first merge comes without a repo state and takes the current remote refs at that time | 18:47 |
tobiash | that is perfectly fine | 18:47 |
clarkb | I see so it will start fresh and see the current valid list | 18:47 |
tobiash | the only problem is the race between creating that repo state and using it when in the middle there were upstream deletions | 18:48 |
tobiash | my hacky fix was to just ignore the now non-existing stuff | 18:48 |
clarkb | would my fix, which will update repo_state after such a deletion also fix it? | 18:49 |
clarkb | or is there still a race between that reconciliation and using it later | 18:49 |
tobiash | that wouldn't change anything | 18:49 |
clarkb | looking at _mergeItem we reset the repo (whcih clean up stale refs on disk) then we restoreRepoState in that case wouldn't we either remove the stale ref and be fine or the stale ref would still be on disk and things would work until the next reset? | 18:52 |
tobiash | the only thing I can think of is really to ignore non-existing stuff from the repostate. In almost all cases this will be something unused and if not, a job trying to use a non-existing ref would fail anyway | 18:52 |
clarkb | if the order was reversed I would understand the race you describe. But we set an on disk state then introspect that so it should be consistent? | 18:52 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool master: Make launcher debug slightly less chatty https://review.openstack.org/621675 | 18:53 |
clarkb | even save repo state ignores the remote state and uses the on disk state | 18:53 |
tobiash | clarkb: but the merger directly before did an update | 18:54 |
clarkb | tobiash: shouldn't that be ok? the mergers are prcessing repos serially. We reset the repo as the first step which will either say the ref is still valid or clean it up. After that is when we set repostate and it should be consistent with that reset? Is the problem we reset on subsequent merger actions but don't update the repostate? | 18:55 |
clarkb | reset (no delete). save repo state (no delete). then later reset again (delete). Don't resave repostate use preexisting? | 18:56 |
tobiash | clarkb: yes, thats correct, but later we use the repo state on a different merger/executor potentially hours later in order to re-do the exactly same merge | 18:56 |
clarkb | got it | 18:57 |
tobiash | that hour is the time you're usually waiting for a node ;) | 18:57 |
tobiash | and during that anything can happen to remote branches (which is common in the github workflow) | 18:57 |
clarkb | can we reprune the repo state after subsequent resets? | 18:58 |
tobiash | so when re-doing the merge, it also performs a repo reset, restores the repo state, merges the change | 18:58 |
clarkb | I guess those subsequent resets happen external to that event so lining things up may be painful | 18:58 |
tobiash | yes, I think the only efficient way is to just ignore things that are missing. If the ref that we are trying to merge is missing during repo state restore the merge will fail anyway | 19:00 |
tobiash | that process is completely disconnected to any event from the driver, these are just used to modify the job configs, not to modify inflight changes | 19:01 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Set pool for error'ed instances https://review.openstack.org/621681 | 19:32 |
*** AJaeger_ is now known as AJaeger | 19:43 | |
*** manjeets_ has joined #zuul | 20:17 | |
*** manjeets has quit IRC | 20:18 | |
SpamapS | I'm building a slack notifier role. Does anyone know if there is a standard way, in a job, to get the URL to the Zuul status/builds/etc. API? | 20:19 |
clarkb | SpamapS: infra's zuul jobs set it in a zuul_return iirc | 20:20 |
clarkb | then that is used across the builds | 20:20 |
clarkb | https://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/set-zuul-log-path-fact/tasks/main.yaml consumed by https://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/emit-job-header/tasks/main.yaml#n20 then set as a return later iirc | 20:21 |
clarkb | and I think zuul_log_url is a site var? | 20:22 |
SpamapS | That's for logs | 20:23 |
SpamapS | I want the API | 20:23 |
SpamapS | I think a site var is fine | 20:23 |
SpamapS | I'll just document that in the role. | 20:23 |
SpamapS | "Set this site var" | 20:23 |
SpamapS | maybe we can make a convention for it | 20:23 |
clarkb | ya its for logs, but its a similar idea "here is where the other part of the sytem you care about is located" | 20:24 |
*** manjeets_ is now known as manjeets | 21:20 | |
corvus | SpamapS: status_url is set in the web configuration: https://zuul-ci.org/docs/zuul/admin/components.html#attr-web.status_url | 21:29 |
*** techraving has joined #zuul | 21:40 | |
SpamapS | corvus: is there any way to get at that in a post job? | 21:49 |
corvus | SpamapS: oh, you said role. i thought you were writing a driver. | 21:52 |
corvus | SpamapS: i don't think so. but i think we should expose it. i think we should expose both the change-specific status page and the buildset page in zuul vars. | 21:54 |
corvus | i'm not sure how close to completion those are. | 21:55 |
SpamapS | mmmm yes that would be amazing. :) | 22:13 |
SpamapS | Well in this case, I want to build the various URLs for a slack message. | 22:13 |
SpamapS | And actually, I think I may want to chase that "cleanup job" idea soon. | 22:13 |
SpamapS | So the slack message can be for the buildset and not all the jobs. | 22:13 |
SpamapS | (I guess I can make one job that depends on all the others.. but feels like we can just model that in python a lot easier than yaml. :) | 22:14 |
SpamapS | I also just found a race in the EC2 driver that has been leaking nodes and fixed it so yay for that. | 22:15 |
* SpamapS still can't quite get the unit tests to pass reliably | 22:15 | |
*** pcaruana has quit IRC | 22:18 | |
SpamapS | So... FYI, for anyone doing monorepos (maybe I should email list this) ... zuul with monorepo works fine. However, you will want to be pretty aggressive about using files matchers. So.. many... jobs.. | 22:37 |
corvus | SpamapS: speaking of file matchers: i've been thinking that it would be useful for zuul to detect when a change modifies a given job and automatically have that job "match" | 22:44 |
SpamapS | corvus: ++ | 22:45 |
SpamapS | I have been splitting jobs in to groups in .zuul.d for that exact purpose. | 22:45 |
corvus | SpamapS: this came from openstack-infra where we are starting to use file matchers aggressively for changes to our ansible. occasionally we make changes to those jobs, and right now, if you touch .zuul.yaml, *all the jobs run*. this would let us omit .zuul.yaml from files matchers, and still have the (relevant) jobs run | 22:45 |
corvus | SpamapS: oh yeah, that's smart, why didn't i think of that. :) | 22:45 |
SpamapS | and then adding files: '^.zuul.d/group-jobs.yaml$' | 22:46 |
corvus | SpamapS: so yeah, sounds like we've both seen the same problem and that would help | 22:46 |
SpamapS | Definitely | 22:46 |
SpamapS | And it shouldn't be hard to do I think. | 22:46 |
corvus | nope, i'm pretty sure all the info is in memory | 22:46 |
corvus | just need to connect some dots | 22:47 |
*** irclogbot_3 has quit IRC | 22:47 | |
SpamapS | Yeah, I was thinking about something similar when writing a sandbox usage of a zuul-roles job too. | 22:48 |
SpamapS | (more for the case of testing jobs in real repos automatically, but similarly I was diving around the model seeing that most of the info is already there in the config and triggers) | 22:49 |
SpamapS | while we're spraying ideas around.. I think I want to write a role that lets me find common sha's with another branch and, consequently, dig out the log url for jobs that ran against the sha, so I can grab docker containers from that sha and not rebuild them (so I can have a promotion workflow). I feel like there was some other stuff going on with artifacts though. | 22:51 |
clarkb | SpamapS: I think people are using zuul_return to pass that info along? Though I guess that is within the context of a single pipeline | 22:52 |
clarkb | not between pipelines | 22:52 |
corvus | SpamapS: yeah, the artifact url storage work i'm doing stores artifact urls along with a build record. so my idea is to then, in a later pipeline, query the api to find the build and artifact urls, and use them. | 22:53 |
corvus | SpamapS: you should be able to query by "patchset" (in the github case) and get a build | 22:54 |
corvus | SpamapS: so once the new stuff lands, you may have what you need | 22:54 |
SpamapS | corvus: yeah that's perfect. | 22:55 |
corvus | SpamapS: (in the gerrit case, i'm imagining a "promote" pipeline which runs on change-merged events, and uses the change-merged event info to query the api to find the builds+artifacts) | 22:55 |
corvus | the same should work for github too (in the pr-closed case) | 22:55 |
corvus | SpamapS: looks like it's ready to merge: https://review.openstack.org/620427 | 22:56 |
corvus | i mean, ready for review. don't mean to be presumptuous. :) | 22:56 |
SpamapS | Yeah what I have is merge commits in my promotion branch (called 'prod') that refer to commits that were already built and tested in master. In those cases the containers are already pushed and tested, I just have to change a single field in a k8s object to make them "live". So I want the post job in the 'prod' branch to just find the exact image URL from that previous run and make it so. | 22:57 |
SpamapS | We do the --no-ff so we have metadata about when things were merged to 'prod' | 22:57 |
SpamapS | oh sweet I'll review post haste | 22:57 |
SpamapS | Right now, btw, I can totally fake this. I tag the containers with the newrev in post, so I can just look at the merge commit's children, and regenerate the tag. But I am not sure that will work for everything. | 22:59 |
corvus | SpamapS: you should be able to test out the query part of this now -- though http://zuul.openstack.org/api/builds?change=620427&patchset=1 isn't working for me and i'm not sure why (http://zuul.openstack.org/api/builds?change=620427 does work) | 23:01 |
corvus | haha | 23:01 |
corvus | because jobs never reported on ps1 | 23:01 |
corvus | http://zuul.openstack.org/api/builds?change=620427&patchset=2 works fine | 23:01 |
corvus | and so does http://zuul.openstack.org/api/builds?patchset=2 (though, obviously it's not useful in our case) | 23:02 |
* SpamapS peeks | 23:05 | |
corvus | patchset doesn't show up as an option in the web ui, but it is supported in the api | 23:06 |
SpamapS | https://zuul.gdmny.co/builds?change=11,patchset=cd60082ebb6b84a723a9a2fb2ea951b195c6add8 | 23:06 |
SpamapS | yeah that works | 23:06 |
SpamapS | Can I query a commit sha tho? | 23:08 |
*** irclogbot_3 has joined #zuul | 23:10 | |
SpamapS | Seems like nodepool+ec2 is ignoring my max-servers | 23:11 |
SpamapS | I think GitHub suffers from a deficiency here becuase there's no "pull request merge" event, there's a push, and you have to go dig out the PR backwards from the commit. :-P | 23:14 |
corvus | SpamapS: there's a closed event | 23:28 |
corvus | i suspect that happens on merge but am not positive, or what the additional info to distinguish that from hitting the close button is, but i think it's worth looking into | 23:29 |
*** dkehn has quit IRC | 23:53 | |
*** dkehn has joined #zuul | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!