Friday, 2017-05-26

openstackgerritClark Boylan proposed openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Revert "Use devstack's zookeeper support"
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Update keyscan for python3 compat
*** yolanda has joined #zuul05:32
*** Cibo_ has joined #zuul06:15
tobiashjeblair: how does zuulv3 choose the branch of the config repos?06:27
tobiashI have the following use case and trying to find out if that's already possible or not.06:28
tobiashLet's say I have several deployments for different projects. We will have an internal release process of the whole ci system where different deployments could be at different releases.06:30
tobiashI'd like to be able to refer the different deployments to a common config repo with common roles but stick them to individual branches/tags per deployment.06:31
tobiash(independent of the branching model of the project repos itself)06:33
*** Cibo_ has quit IRC06:40
*** jamielennox is now known as jamielennox|away06:48
*** Cibo_ has joined #zuul06:52
*** bhavik1 has joined #zuul06:53
*** bhavik1 has quit IRC06:59
*** Cibo_ has quit IRC07:30
*** openstackgerrit has quit IRC07:48
*** jkilpatr has quit IRC10:30
*** Cibo_ has joined #zuul10:34
*** jkilpatr has joined #zuul10:52
*** openstackgerrit has joined #zuul11:08
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Test shared pipeline across multiple gerrits
*** Cibo_ has quit IRC11:14
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Decode ssh output of gerrit connection
*** rcarrillocruz has quit IRC12:33
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix socket.error exception usage
Shrewspabelanger: i noticed this py3.5 failure ^^^ in your job failure on 46825312:45
pabelangerShrews: thanks I am looking at iterate error now too12:50
pabelangertrying to see if12:50
pabelangerfor k in dict.keys() vs for k in dict vs for k in list(dict.keys()) has differences12:51
pabelangerI think we could just do: for label in self._submittedRequests12:51
Shrewspabelanger: does that work in 2.7?12:56
Shrewsnah, surely not12:58
Shrewsheh, it does12:59
Shrewsoh, the other error there is use doing bad things with a dict. we loop through it and modify it within the loop. yikes13:05
Shrewsimma bad programmer13:06
pabelangerShrews: Ya, not sure which one we'd want to go with13:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix removeCompletedRequests for dict iteration
Shrewslet's see if that fixes it. pabelanger, you wanna rebase 468253 on top of that one and see what we get?13:13
Shrewsmaybe that's not necessary though13:14
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use https to clone
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Update keyscan for python3 compat
Shrewspabelanger: i meant rebase on 468409, but that's alright. let's see what the experimental job comes back with13:18
pabelangerwe need 468127 currently13:19
Shrewsi thought maybe your patches were exposing the problem, but looks like maybe not13:19
pabelangerso, if you rebase 468409 on 468127 it should expose the failure13:19
pabelangeror 46825313:20
pabelangerthat is tip13:20
Shrewsah, thx13:20
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix removeCompletedRequests for dict iteration
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap
SpamapSjeblair: ^ Amusing amend to the bwrap there informed by our close_fds find yesterday. Since py2.7 lacks the pass_fds parameter if we add close_fds=True to our ansible running (not a bad idea) we may become py2.7 incompatible :-P13:41
*** isaacb has joined #zuul14:02
SpamapS <--- can somebody explain this?14:06
SpamapSlooks like maybe something went terribly wrong14:06
jeblairtobiash: the config repos are assumed to be branchless -- zuul only reads from master there (mostly because considering more than one branch of a config repo becomes very difficult to reason about).  i suppose we could have a zuul configuration setting to indicate which single branch (master or something else) of config repos to read, but i would want to be very careful with that that we don't make it too confusing.14:24
jeblairSpamapS: do you mean the fact that there is a lot of test output going to the console there?14:25
tobiashjeblair: something like ?14:27
jeblairtobiash: yep14:29
tobiashjeblair: ok, then I'll try this next week14:29
*** dkranz has joined #zuul14:30
jeblairtobiash: note that a lot of the self-testing jobs and instant re-configuration in zuulv3 is geared toward something more like a continuous deployment model.  so pinning to a tag, etc, will inhibit some of that.14:30
tobiashjeblair: I'm aware of that so I'll try to minimize the pinned config parts14:31
tobiashjeblair: I'm just thinking of the really basics of the deployment like log collection or similar which would need pinning if I share them between deployments and want to add something which requires a zuul update14:32
jeblairtobiash: that makes sense.  we're also considering site-local variables, so that even base jobs could have, say, log collection destinations in a config file rather than a repo.14:34
tobiashgreat, I think a carefully chosen (minimal) mixture of both of these two concepts would fit well in our deployment/release model14:37
Shrewspabelanger: looks like my patches eliminated the exceptions, but now looks like the nodes is never booted. stuck in the waitfornode loop14:41
Shrewss/nodes is/node is/14:41
pabelangerShrews: ya, I think base64.encodestring() is the issue now14:44
pabelangernot python3 compat14:44
pabelangerreading docs, it expects a bytes-like object14:45
pabelangerand we pass string14:45
tobiashWas there a change in the executor config? Looks like it doesn't know the configured gerrit connections anymore.14:51
Shrewspabelanger: hrm, yeah. also, would be nice if we could set min-ready:0 for labels we don't build images for14:54
jeblairpabelanger: there's probably some base64 changes in zuul you can reference14:57
jeblairtobiash: yeah, i think we may have merged a change that should *only* load the gerrit connections in the executor (ie, not sql reporters).  I453754052b75de3d4fac266dc73921c7ce9c075f   obviously that shouldn't cause the behavior you're seeing, but maybe there was an issue with it?14:59
jeblairtobiash: that's the only recent thing i can think of14:59
tobiashjeblair: yepp, reverting this fixes my issue14:59
jeblairi guess we're not running it yet :|15:01
jeblairi wonder if we need isinstance rather than issubclass15:01
tobiashjeblair: so with this change only timer and zuul get configured in the merger15:01
tobiashI'll try this out15:01
jeblairthat sounds like the opposite of the desired behavior15:02
jeblairoh yeah, 'driver' in that function is an object instance, not a class object.  so i think the fix will be to switch to isinstance15:03
jeblairi would have expected this to be caught by tests... :|15:04
tobiashchanging to isinstance -> timer, zuul, gerrit connections15:05
tobiashI'll try to craft a test case15:05
jeblairoh, i wonder if the tests aren't passing source_only correctly.  maybe we should not make that a default argument.15:06
jeblairsince it defaults to false, that means the executor in the tests would load everything.15:06
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not request nodes for which we have no images
Shrewspabelanger: ^^^ i *think* that will work15:11
tobiashjeblair: do the tests have separate connection registries by scheduler and executor?15:14
jeblairtobiash: oh, you're right, they don't.15:15
jeblairtobiash: i'm not sure i want to change that, so maybe we do just need a separate unit test of this.15:15
tobiashjeblair: I think I have a test case (+fix)15:33
tobiashwill upload shortly15:33
jeblairtobiash: cool, thanks :)15:33
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Fix missing source connections on merger
tobiashjeblair: ^^^ that's my try15:42
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Fix base64 encoding of server key
Shrewspabelanger: maybe?? ^^^15:42
pabelangerShrews: ya, that should work15:45
pabelangerjust tested locally15:45
pabelanger[u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCbqq2UG8bkIcd30BsNt0w9HizdwyB5MMZfBQi8phXaHuKieUEZ1NnXfohIKWBCMC1wYm/XH0rsJOGuolNlDKc12bQ/ibvnNmAKT7H95QHa6QVOwMioeQQPVZcGvs0g6iJOM6bfLn6OJpDQeUMJz9aGJ6JuFeR8/kQsT8s/hXLOxLI58Ac1m5heDZ/8u+GKYW5eB9rXWWKDcYoeWOgs3hsw9rjP2ILW2QjNYAKnv9RPir/CCpa6A2Ydf9Kq5Mr4rdWz1ZP4J/zvpgyIsCrc9pmGPUZ0WkcivxPIk04al71pLKz2ZXX9mtIbmY6sq2L3DBE+vKBpJfj9fi4Zz/4/yqZJ']15:45
pabelangerlets see if zookeeper likes it now15:46
jeblairtobiash: lgtm, thanks!15:48
jeblairanyone else want to review mordred's log output cleanup patches?  467310 46760315:49
* SpamapS will take a look15:52
SpamapSjeblair: would that make the link I pasted easier to read?15:52
SpamapSbecause I honestly have no idea what went wrong there15:52
jeblairSpamapS: yes15:52
SpamapS"Could not determine repository location of /home/zuul/src/" ?15:52
jeblairSpamapS: however, i think mordred identified a further problem in that we might not be getting data back from subsequent command invocations?  Shrews, were you looking into that?15:53
Shrewsno, i think monty was15:54
jeblairSpamapS: (it's not clear to me whether we would make things "worse" by landing mordreds fix in that we wouldn't see any output after the first command; though "worse" in this case is definitely subjective)15:54
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Use https to clone
jeblairSpamapS: inline question on 468173; otherwise lgtm!16:00
SpamapSjeblair: I think the net result is "better"16:01
jeblairShrews: how so?  we don't need to test that we can generate an ssh key, right?  that seems like it would unecessarily slow down the tests...16:03
jeblairSpamapS: oh! that was a response to the console log thing.  :)16:04
jeblairha, sorry16:04
jeblairShrews: and sorry, i s-tab-failed there16:04
jeblairwe all kinds of mixed up in irc16:05
SpamapSjeblair: indeed, we should add a list of tags to any multiplexed irc conversation. tags=['console']16:06
jeblairyes, and require unique first letters... we may need to use unicode16:06
mordredjeblair, Shrews: my thinking was that seeing the issue would be clearer if we land the patches - although I agree that it 'reduces' our ability to see test output at that point16:08
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Strip unneeded cmd, changed, stdout and stderr from results
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Display command/shell results in a banner
jeblairmordred: wfm16:09
Shrewsmordred: yeah. i'll review those too16:10
mordredk. let's see what new issues I created16:10
jeblairi love seeing the vintage of zuul unit tests when they say things like "stable/havana"16:12
Shrewspabelanger: eh?
pabelangerShrews: need to be rebased on 46812716:14
pabelangerwe should see about landing that16:14
Shrewsgah. i rebased the earlier patches in gerrit, not locally16:14
mordredjeblair: since those are the action plugins we should pick up those changes without a restart, right?16:16
jeblairclarkb, pabelanger: i'm okay with 468127, but just for my own edification, why do we need to drop -e there?16:17
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix base64 encoding of server key
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not request nodes for which we have no images
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix removeCompletedRequests for dict iteration
jeblairmordred: i think those are the things we copy into a staging area on start, so we will need a restart16:18
jeblair(we'll also need a manual git pull and pip install since the server isn't puppeted)16:19
pabelangerjeblair: it appears pip -e for python3 write more data to our git repo, we thing maybe some addition setup files.  The issue is, setup_develop above is run under devstack, and sudo. So our choice was either use sudo or drop -e.16:20
pabelangerhappy to do a 3rd option if there is one16:21
pabelanger is the issue16:21
mordredjeblair: ah - ok. cool. I can work on doing that16:22
mordredjeblair: it should just be the executor that needs a restart yah?16:25
jeblairmordred: yep16:27
mordredjeblair, pabelanger: I'm tailing the debug log of the executor right now ... and I'm seeing issues float by about no-route-to-git.o.o16:36
*** isaacb has quit IRC16:42
jeblairmordred: there's an ipv6 and and ipv4 check; i think it's okay if the ipv6 check fails if ipv4 works16:47
mordredah - ok16:50
* SpamapS rebases16:52
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage
pabelangerya, we just started doing traceroute / traceroute6 on zuulv3-dev16:53
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Fix socket.error exception usage
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack
mordredjeblair, Shrews: WELP - those patches just flat didn't work at all ... investigating17:13
mordredjeblair: is it possible we only do the code staging copy on schedule restart?17:14
* mordred goes to look17:14
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Update keyscan for python3 compat
mordredjeblair: OH FOR THE LOVE OF ...17:16
jeblairmordred: did you do a "pip install -U ." ?17:16
mordredjeblair: nope17:16
mordredjeblair: I just did the git pull :)17:16
jeblairthat'll do it17:16
mordredI'm going to restart the executor again17:17
mordredmaybe THIS time I'll be able to make something break because I write bad code17:17
mordredrather than making it no-op because I don't know how to intsall software17:17
jeblairmordred: a universe of possibilities awaits!17:19
Shrewsoh, what a universe17:20
clarkbShrews: I've approved but left a comment inline if thats something you want to clean up17:20
SpamapSmordred: HeroOps!17:20
SpamapSwhich is also "Her Oops" which I'm thinking is why we haven't seen much uptake on that moniker. ;)17:21
Shrewsclarkb: ah, yep. will clean it up after helping pabelanger fix the py3.5 things17:21
jlkuh, without the comma17:22
Shrewscapitalization is everything, SpaMaps17:22
jeblairSpamapS: [paste] hrm, do you think that's related to/racing with ssh-agent and close_fds?17:23
jeblairSpamapS: (or racing with git ops?)17:23
SpamapSGlittering marble tile and tranquil aroma therapy with local cannabis incense burning awaits you once you brave this dirt track outside Fontana, CA... THIS, is Spa Maps!17:23
SpamapSjeblair: I do.. I think we may need to start passing close_fds=True on some things. But that also might hasten our py3 transition (or we have to double-fork to run bwrap with open fds)17:25
jeblairSpamapS: oh, that's py3 too, so everything should be getting close_fds by default17:26
jeblairSpamapS: do you set close_fds=False for the bwrap run?17:26
SpamapSjeblair: I don't force it no, since I'm letting the caller pass any subprocess.Popen args in.17:26
clarkbShrews: I've approved those nodepool cleanups. Doesn't look like py35 job is happy yet, but I figure those changes can only help17:27
SpamapSjeblair: I recently added some code that is py3 only to it though, where if you do pass close_fds=True, we have to use pass_fds=[passwd_r, group_r] so that our getent stuff makes it in.17:27
jeblairSpamapS: closed_fds defaults to true in py3 though17:27
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Fix base64 encoding of server key
SpamapSjeblair: right... let me show you the patch..17:28
Shrewsclarkb: thx. waiting for the log to see what else is wrong17:28
SpamapShttps://server/job/tox-py35/0/ <-- handy link17:29
jeblairSpamapS: Shrews has a fix for that in progress :)17:30
SpamapSso the other option btw, is to double-fork17:30
SpamapSwe could use the zuul-bwrap thing I wrote to do it actually17:30
jeblairSpamapS: i'm really confused now.  it seems like your code should handle the py35 case... :?17:31
SpamapSif we get a close_fds=True in py27, we could run zuul-bwrap, and zuul-bwrap would run bwrap with close_fds=False17:31
Shrewsi have a fix for what?17:31
jeblairShrews: log streaming17:31
SpamapSjeblair: It does handle py35. It does not handle py27 if we start passing close_fds=True anywhere else. These are facts, but I think we got off the context path...17:32
jeblair(specifically, a fix for the lack thereof)17:32
jeblairSpamapS: i thought you showed me a bad fd error with py317:32
SpamapSjeblair: in the py35 run that failed, we can presume close_fds=True is the default, and so there may be something else causing our bad file descriptor errors. :-/17:32
*** rcarrillocruz has joined #zuul17:32
SpamapSfor once, though, this reliably fails when run just as that one test17:34
SpamapSso I can actually employ like, debuggers and stuff17:34
SpamapSThe quality of my problems is increasing.17:34
jlkIt's a good problem, Bort17:39
jlkIs there a description of what the various gerrit statuses mean? Context is using them as a pipeline requirement.17:42
pabelangerhave we changed anything on zuulv3-dev.o.o recently^17:46
pabelangerya, looks like it was restarted17:46
mordredpabelanger: yup17:48
jeblairmordred, pabelanger: we should land
jeblairsorry i lost track of that17:49
mordredjeblair: oh - why yes we should!17:49
jeblairjlk: is all i see17:50
jeblairjlk: it just enumerates them17:50
jeblairjlk: but i can describe them more if you have questions17:51
jlkI'm trying to determine if I need to map any of that over to github17:51
jlk"status" is a completely different thing over there17:51
jlkso I was hoping to understand the deeper meaning on the gerrit side17:52
jeblairjlk: (the trickiest one is probably submitted, which is gerrit speak for "someone or something (ie zuul) has told gerrit to merge the change and that action has been added to the queue)17:52
SpamapSoh man this is getting ugly17:52
SpamapSit's definitely still a race17:53
jeblairjlk: most closely maps to pull request 'state': open/closed17:53
jlkyeah, and we've already got an "open" boolean17:53
jeblairthough... how is it that pull requests have only one closed state?  shouldn't there be closed-and-merged and closed-unmerged?17:53
SpamapSseems like maybe there's a bunch of thread-unsafe stuff17:53
jlkon sec17:54
jlkthere are multiple data points on github side17:55
jeblairSpamapS: there should never be more than one thread dealing with a single jobdir17:55
jlkthere is "state" which can be open or closed17:55
SpamapSjeblair: the problem is pipes and fds17:55
jeblairjlk: ah, and 'merged' boolean17:55
SpamapSjeblair: there's one global list of open fds for the executor.. and something is closing other threads' fds17:55
jlkjeblair: yeah, merged_at17:56
jlkor an API call to see if the thing is merged.17:56
jlkGET /repos/:owner/:repo/pulls/:number/merge  204 merged, 404 not.17:56
jeblairjlk: i see merged_at as well as merged17:56
jlkah right17:57
jlkand for even more confusion, there is 'merge_commit_sha' which changes it's meaning and value depending on the state of the PR and of the repo configuration17:57
mordredI see merged: False for unmerged things17:57
mordredor something17:58
SpamapSOSError: [Errno 9] Bad file descriptor: 'role_0'17:58
jeblairjlk: at any rate... i wouldn't recommend mapping things, per se.  but rather just exposing (some of) what github provides.  so let 'status' be different between github and gerrit.  if we decide we want a merged attribute, we can add it.17:59
SpamapSthat's weird17:59
jeblairSpamapS: yes that is a funny looking file descriptor18:00
jlkoh right I wanted those to be different18:00
jlkI am exposing the things in github native ways. I just didn't know what all you could expect from "status" on gerrit.18:01
jeblairjlk: got it, cool.  so basically gerrit status is a combo of github 'state' and 'merged'.18:02
jeblairstate really being the important one to have18:02
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix missing source connections on merger
jeblair(which is covered by 'open' as you point out)18:02
jlkOn github side I already added a boolean for open, so you can stop processing if a thing is closed.18:02
jlkDoes it make sense to expose 'merged' as another boolean?18:02
jlkI can't think of one immediately, so I'm inclined to not add it.18:03
jeblairjlk: same here18:03
jeblairadd it when it proves useful, i'd say18:03
SpamapS__del__ strikes again18:04
* SpamapS sees the bug18:04
* SpamapS just found 5 ssh-agents hanginging around18:07
SpamapSbeginning to doubt running them in the background18:07
SpamapSthey're all from aborted test runs in the last 2 hours :-P18:07
jeblairSpamapS: yeah... though they didn't seem to die after aborted test runs with -D either, iirc...18:10
* jeblair assumes something something testr18:11
SpamapSjeblair: for desktop it's not a big difference. But I'm concerned about the server.18:11
SpamapSno it's not testr, it's actually systemd/upstart holding them open18:11
SpamapSbecause all the terminals are in the same process group now18:11
SpamapSso your child processes get orphaned to the desktop session manager18:12
SpamapSon a server though, if zuul-executor dies, the ssh-agents that it hasn't killed would get killed _if_ it's running as a daemon or systemd service since systemd will make a process group for service units by default18:13
jeblairSpamapS: in production, it should only be an issue in the case of an unclean shutdown....18:13
SpamapSagreed, a 'splosion level crash18:13
jeblairSpamapS: and since we already make a pid group for the ansible run, an unclean shutdown is already messy there18:13
SpamapSwhat's the thinking behind that btw?18:14
jeblair(since all the ansible processes would continue in the same way as ssh-agent, right?)18:14
jeblairSpamapS: we have to kill ansible sometimes, and we want it and all its subprocesses to die18:14
SpamapSthe ansible processes would have their stdout/err connected to pipes though, no?18:14
SpamapSso they'd get EPIPE or EBADF on operations18:15
jlkjeblair: if zuul gets an event for a change, and queues up that event for processing, and time passes, when it's ready to process that event when it looks at the change to run filters, does it get new data about the change at that point?  In gerrit it's _updateChange (with some caching), in github it's getChange.18:15
jlkthe getChange happens at schedule time, which could be some time after event time, right?18:15
jeblairSpamapS: maybe?  we can try killing it and see what happens :)18:15
jeblairjlk: part of the reason for the caching is so that any time there's new information about a change, we update the singleton object in zuul to reflect the new information18:16
jeblairjlk: so in gerrit's case, it will use the most recent data18:17
jeblairjlk: iirc, github doesn't implement that yet, but i think it should18:17
jlkyeah I think we talked through the caching once before, and I put a pin in implementing a cache for github.18:17
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Remove unnecessary list()
jlkmostly I wanted to be sure I had the model right, that "fresh" data is grabbed at schedule time18:17
jeblairjlk: yep.  and it'll go stale until we implement the caching/update thing.  then it should get asynchronously updated.18:18
jlkit'll "go stale" between the time scheduler decides it needs to be tested and it gets tested?18:19
jeblairjlk: yes, or even in the time between when it was enqueued and when it's evaluated18:20
jlkwhat would cause a fresh update at evaluation time?18:20
jlk(in the gerrit case)18:21
jeblairjlk: in either case, any further update to the change/pr.18:22
jlkooooh wait, I see, because a new event would come in, which would cause an update to the in-memory cached object18:23
jeblairjlk: exactly18:23
jlk<--- LIGHT BULB18:23
jeblairjlk: that might just be a comment-added.  it might also be something substantial like someone closed the pr.18:23
jlkyup, makes total sense now. I was looking at it from the wrong angle.18:23
jeblairin any case, that event should (as a side effect) update the data on the change/pr in memory18:24
jeblairand if the scheduler hasn't gotten around to evaluating the original event, once it does, it'll use the new data (so it may well not enqueue a newly updated change in check if someone manages to close it quickly)18:25
jeblairor, if it's already running, the next time the pipeline manager passes over that change, it will see that it no longer meets the "open: true" pipeline requirement that it has, and kick it out.18:25
SpamapSOo, I wonder if we should check for leaked FDs in our test shutdown checks.18:28
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap
* SpamapS stops closing other peoples' fds18:30
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Cleanup failed upload records
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Fetch server console log if ssh connection fails
* jlk afk for some MTB riding18:52
jeblairSpamapS: aha!  i see what you mean about __del__ there.  is it worth thinking about accomplishing that with a context manager?18:58
jeblairmordred: did you re-restart zuulv3 yet after the connection fix landed?18:58
mordredjeblair: nope. I will now do that18:58
mordredSuccessfully installed zuul-2.5.2.dev83418:59
mordredthat's a lot of dev patches :)18:59
jeblairmordred: if only it would tell us how many patches less than 3.0 we are :)18:59
mordredjeblair: I did service zuul-executor stop and it did not stop19:00
mordredjeblair: how much do we care about that righ tnow?19:00
jeblairmordred: i don't know what "service zuul-executor stop" does :|19:00
mordredme either19:01
jeblairmordred: "zuul-executor stop" will, however, dtrt (and that is, ultimately, what service zuul-executor stop should do)19:01
jeblairor maybe it should graceful19:01
jeblairi dunno19:01
mordrednope. still running19:01
jeblairmordred: if it did graceful, then that's expected19:01
jeblairit will stop eventually19:01
mordredah. ok19:01
mordredthe last 4 lines from the log are:19:02
mordred2017-05-26 18:59:41,381 DEBUG zuul.ExecutorServer: Stopping19:02
jeblair(also, it will be really nice to have rotated logs)19:02
mordred2017-05-26 18:59:41,381 DEBUG zuul.CommandSocket: Accepted socket connection <socket._socketobject object at 0x7efdb0a2f7c0>19:02
mordred2017-05-26 18:59:41,381 DEBUG zuul.CommandSocket: Received _stop from socket19:02
mordred2017-05-26 18:59:41,382 DEBUG zuul.AnsibleJob: Abort: no process is running19:02
jeblairso i guess it ran "stop" and not graceful19:02
jeblairwhich means this is unexpected19:03
jeblairmordred: i'd sigkill at this point19:03
mordredjeblair: ok. just didn't want to lose any valuable debugging if we thought it was valuable19:03
jeblair(maybe some side effect of the bug we're fixing?)19:03
jeblairmordred: nah19:04
mordredjeblair: I'm voting side-effect19:04
mordredjeblair: ok - killed and restarted19:04
jeblairmordred, Shrews, pabelanger, jlk, jesusaur, SpamapS, clarkb, fungi, anyone else: er i just realized monday is a us holiday; should we cancel the zuul meeting?19:21
* jeblair is bad at meetings19:21
mordredjeblair: ++19:22
mordredjeblair: that'll give me another week to accomplish my tasks so that tristanC doesn't dislike me19:22
clarkbI'm likely to be afk19:23
fungijeblair: i will gladly attend a zuul meeting monday, but will be just as happy not to need to remember to take a break from holidaying to do so19:24
* jeblair is also bad at mondays19:25
jlkI'm AFK19:28
jlkfor Monday19:28
pabelangerjeblair: wfm19:29
* jesusaur will likely be around monday, but not working19:34
SpamapS+1 cancel19:44
*** jkilpatr has quit IRC19:49
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap
mordredShrews, jeblair: I think the hard-restart we did of the zuul executor may have broke something19:57
mordredShrews, jeblair: is not returning and the executor debug log hasn't really done anything in a while19:59
jeblairmordred: works20:03
* Shrews certainly has no idea about executor things20:04
mordredShrews: well - I was pinging you originally because I had a stuck status page that made it look like everything was queued - so I thought maybe zk/nodepool blah20:05
mordredbut then I reloaded the status page20:05
mordredand got blank20:05
jeblairi think that's the problem20:07
mordredjeblair: oh - that sure does seem to have wrapped wrong20:08
jeblairyeah, i'm assuming line 34 should be on line 3320:08
*** jkilpatr has joined #zuul20:08
jeblairmordred: i'm knee deep in something i don't want to stash; you want to push up a fix?20:08
mordredjeblair: so - that said- the executor debug log is still very not doing anything20:08
mordredjeblair: yup. I'm on it20:08
jeblairmordred: agreed; lemme look at the status json and see if it should be20:09
jeblairyeah, it definitely thinks it should be running jobs20:09
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix bad text wrap in status page
mordredoh! it just did something20:10
mordredjeblair: so - the last time it sat there doing nothing, the last line in the log was:20:10
mordred2017-05-26 20:02:01,121 DEBUG zuul.Repo: CreateZuulRef feature/zuulv3/Zdfed0f5092ff42cd8f510ce65be7771d at 7b7520b55df36b62b6800273c16e1200c7c76c03 on <git.Repo "/var/lib/zuul/executor-git/">20:10
mordredit is once again seeming to not do anything - and the last line in the log is now:20:11
mordred2017-05-26 20:10:09,099 DEBUG zuul.Repo: CreateZuulRef feature/zuulv3/Zed63cd22bbdb43b7a2b67a337215a2b2 at 9c702bcb666e17d57f05372eafd68b39a6e8a280 on <git.Repo "/var/lib/zuul/executor-git/">20:11
jeblairyeah, so the executor's merger is doing stuff, but it's apparently not running jobs20:11
jeblairnothing queued20:12
mordredjeblair: 14b315d was the commit that was running before - there aren't a ton of things related to interaction in there, but there is some of the work finishing the merger/executor split in the new running executor20:15
jeblairmordred: maybe we should restart the scheduler in case something changed with the interaction20:16
mordredjeblair: perhaps we're on opposite sides of a shift?20:16
* mordred tries that20:16
mordredjeblair: yah. much happier now20:17
mordredjeblair: so - let's say it was interface mismatch20:18
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add a space and a capital letter to success banner
mordredSpamapS, jeblair: ^^ based on looking at the output now, there is a minor edit to the format string20:41
pabelangerclarkb: also means, if we create the same setup in openstackzuul project, this should just work20:41
clarkbpabelanger: for ipv6? I think we want to have consistent connectivity and routes20:41
clarkbbut ya20:41
mordredSpamapS, jeblair: I also think we need a filtered on_task_start method too - printing _raw_params is bad20:42
pabelangermordred: ++20:42
pabelangeri noticed that20:43
mordredand honestly I think we need to sort out how to not double-timestamp this20:43
pabelangerclarkb: oops wrong room20:43
SpamapSmordred: can't we just ARA this stuff and be done? ;)20:48
mordredSpamapS: :)21:00
mordredpabelanger: I feel like a few weeks ago one of us figured out what we thought was wrong with the "stream logs from host only works for first task" issue21:01
mordredpabelanger: you don't happen to remember do you?21:01
pabelangermordred: was it something to do with unicode / encoding / decoding?21:02
pabelangercannot remember21:02
jeblairmordred: i agree it would be nice not to double timestamp; i'm really not sure how to accomplish it (i don't even mean technically -- i'm not sure what the desired outcome would actually be)21:07
jeblairmordred: the inner timestamp is critical for debugging (it's the *actual* timestamp); the outer one can be discarded... *except* that means that the log no longer has a consistent timestamp series or format.21:09
jeblairi'm inclined to that (discard the outer timestamp, if that's possible).  i just worry about whether that makes life difficult for log parsers like osla and grok21:10
mordredwell - I'm thinking we could change the logger format we emit in the callback plugin (which is what controls the outer one)21:10
jeblairooh yeah21:10
mordredto match the format the inner one would have21:11
jeblairif we change that to match the command module, and then drop it from content streamed from the command module, we might be set21:11
mordredthen when we emit the messages from the remote logger, don't append timestamps21:11
mordredI think that's doable21:11
jeblair(also p=871 u=zuul are *really* unecessary for us)21:11
mordredthey should go away21:12
mordredand a few other cleanups21:12
mordredpabelanger: I don't either :(21:12
jeblair(it's only interesting when the pid changes, but we can easily throw in a "starting new playbook" line)21:12
mordredjeblair: indeed21:13
mordredjeblair: oh wow - wanna see a fun hack?21:14
mordredjeblair: logger = logging.getLogger("p=%s u=%s | " % (mypid, user))21:14
mordredjeblair: they name the logger itself "p=%s u=%s | " % (mypid, user) - then set the format to '%(asctime)s %(name)s %(message)s'21:15
jeblairoh, wow21:22
jeblairmaybe they couldn't think of another way of adding in extra data like that...?21:24
clarkbmordred: its too near the weekend for eye bleeding21:25
mordredjeblair: I think yes - I think it's because in the callback plugin itself it actually doesn't know that information perhaps?21:26
mordred(I honestly have no clue)21:26
jeblairthere are so many ways of manipulating logger data if you use the python logging library correctly...21:26
clarkbos.getpid() should work anywhere?21:26
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Release executor jobs on test exit
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add LoggerAdapter for executor jobs
jeblairthere's one ^ :)21:27
jeblairmordred: anyway, we can probably get the logger and reset the formatter?21:28
mordredjeblair: yah21:28
mordredI'll poke at that on money21:28
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Actually check out the branch in the jobdir
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Enable test_multi_branch cloner/executor test
jeblair2 cloner tests enabled! ha ha ha21:32
SpamapSisn't that a dead thing?21:33
SpamapSis zuulv3-dev not running jobs now?21:34
SpamapSnot seeing it post results to my most recent patches21:34
jeblairSpamapS: [cloner] yes, but what it *does* is now handled by the executor.  the cloner tests are really detailed and important, so i'm enabling them applied to the executor with minimal auditable changes; i'll rename them at the end.21:36
jeblairSpamapS: er, there were issues earlier; i think they were resolved, but i haven't followed up with very recent stuff21:36
jeblairSpamapS: anything uploading since 20:17 utc should be okay21:38
jeblairer uploaded since21:38
jeblairolder than that, not so much21:38
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add a space and a capital letter to success banner
SpamapSjeblair: ah ok I fell into the "problems" window ;)21:46
* SpamapS will recheck21:46
SpamapSjeblair: also [cloner] ACK21:47
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Verify executor branch checkout
* SpamapS hits reviewer exhaustion22:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Enable test_multi_branch cloner/executor test
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Verify executor branch checkout
pabelangerjeblair: have we committed to nolonger having zuul-merger running for zuulv3?22:48
SpamapSpabelanger: I believe the thought is that it might be useful to have it doing just merges to offload work from executors.22:48
SpamapStho if the result of that work doesn't make it into the executor and must be repeated, I'm not sure there's value in doing that.22:49
jeblairSpamapS: unfortunately, since jobs for a single change can run on multiple executors, we will either need to repeat it or fetch it from the mergers (which require they serve git repos (though only privately, not publicly))22:50
jeblairat any rate, i've implemented "repeat it"; if we later decide we'd rather have them fetch, that's fine -- it'll mostly just be a matter of deleting code :)22:51
SpamapSI could see just having two ways of configuring it as we learn to scale.22:52
SpamapSfor big ones like infra... you have the mergers serving to the executors22:52
SpamapSfor the average one, just have executors merging.22:52
jeblairwhile that's related to the question of whether standalone mergers are worthwhile, it's not the only consideration -- either way, there's significant work that can be offloaded from the executors by having standalone mergers, so i want to keep them around until we see how they perform.22:52
SpamapSand I'd rather focus on the average one first, and then when we break it, we have data to guide our optimization22:53
jeblairwell, we kind of need to run the infra one :)22:53
SpamapSwhat about another way of thinking.. what if mergers could push their merge results to executors, sort of like how gerrit push triggers work22:54
jeblairat any rate, both are supported in the code right now, and i think if pabelanger is asking "what should the puppet modules do?" i think the answer is they should continue to support both for now.22:54
SpamapSjeblair: what I mean by that is that we haven't run an infra with this scheme, so we don't know that it will break at that scale.22:54
pabelangerjeblair: ya, puppet / ansible. I have the code to support both still, was mostly curious if still valid.  Thanks for the feedback22:55
SpamapSentirely possible the executors will remain blissfully busy and humming doing merges sometimes and running jobs others.22:55
jeblairSpamapS: yeah, though that's the same problem with the direction of transfer reversed.  it's actually a little trickier since an executor knows where to pull from, but a merger doesn't know where to push.22:55
SpamapSjeblair: I was thinking merger would push to all. :-P22:55
SpamapSor.. something crazy like.. to AFS22:55
SpamapSwould that even work22:56
* SpamapS has no idea22:56
jeblairSpamapS: about scale: yes, that much i agree with; we don't need to push people to run mergers yet; we just need to keep them around in case infra-scale needs them.22:56
jeblairSpamapS: (mostly ^ i wanted to clarify that i think we do need to keep that architecture working so we have it ready if infra-scale needs it)22:56
jeblairSpamapS: pull would have the advantage of only doing work on the machines that need it; i think the access/network concerns are about the same either way22:57
* SpamapS really would love to be in the position of actually watching it scale or fail to scale. ;)22:59
jeblair(i'm just really happy that we're out of the business of saying "here's how you set up apache to serve git repos"; probably if we did pull, we could just have intra-zuul ssh access or something to make it simpler)22:59
jeblair(it has also occurred to me that we could push git blobs over gearman, but i'm trying not to think about that)22:59
jeblairthe "repeat" approach is dumb, but simple.23:00
SpamapSis there such a thing as a git blob?23:01
SpamapSI wondered if maybe there was , like a git pack thing, that can be easily created.23:01
jeblairSpamapS: well, the stuff that git upload-pack ships around23:01
pabelangerya, ssh would be nice over apache23:02
SpamapSI just don't know how those are created23:02
SpamapSI guess you know the refs you made with the merge23:02
jeblairit could get complicated since it depends on what objects are on both sides of the transaction23:02
jeblairprobably you'd just assume the executor had all the objects required by the current branch tips, and then send all the ones required by the change not in that set.  then the executor could start by updating from origin, then apply the objects from gearman.23:04
jeblairugh.  i'm failing to not think about this.  :)23:04
jeblairSpamapS: you've now gotten me convinced this is worth spending a few minutes poking at.23:05
SpamapSit actually doesn't seem so haphazard23:06
jeblairso next thing is: get a list of objects we need to send (so we can pass that to git-pack-objects)23:07
jeblairwould that just be a list of commits, or is it more complex than that?23:08
jeblair(like, would we also have to identify what tree objects are required?)23:09
SpamapSjeblair: if you know that the other side has everything except those commits, then it's just those commits.23:09
SpamapSBut if there are things in those objects' tree that are not present, you need those too to unpack.23:09
jeblairSpamapS: so basically: for each new commit object, get the associated tree object.  pack the set of commit and tree objects.  send it over the wire and unpack.  ?23:17
SpamapSjeblair: indeed23:20
* SpamapS is playing localy23:21
SpamapSlocally even23:21
SpamapSjeblair: so I don't know how to get the updates to refs23:21
SpamapSI commit to master, pack up that commit hash, and move it to the other repo with unpack-objects ... and then I can checkout that hash, but master is still at the old hash23:22
jeblairSpamapS: that's cool -- i already have code to update all the refs, including branch heads, to point at the right commits23:22
SpamapSso yeah seems doable23:23
SpamapSand the pack files are pretty efficiently sized.. already gzipped23:23
SpamapSso even a large chain of merges would probably keep under the 1MB limit23:23
jeblairokay... i'll putz around with doing this with gitpython, and maybe try it out next week...23:24
jeblairSpamapS: 1MB limit?23:24
jeblairSpamapS: per packet?23:24
SpamapStechnically the limit is 4GB but realistically pushing bigger than 1MB through gearman is pretty inefficient.23:25
SpamapSgearmand will flat out refuse > 1MB jobs23:25
SpamapSdunno if geard does the same23:25
jeblairnope, didn't know about a limit :)23:26
SpamapSbut .. memory usage would obviously get pretty hairy23:26
SpamapSit's just an implementation specific limit23:26
SpamapSthe protocol allows a 32-bit unsigned int for packet length.. that's the only theortical limit.23:26
jeblairSpamapS: but a job can send multiple work_data packets, so we could stream reasonable chunks at a time.23:26
SpamapSthat's a good point, those are handled one at a time for that reason23:26
SpamapSthe original usage of that was resizing images on livejournal23:27
jeblair(we might even want to go ahead and do that just to avoid starvation)23:27
SpamapSyeah I could see just pushing a merge job with a flag that says "get some git objects from work data first"23:28
SpamapSerr no23:28
SpamapSclients don't send work data (doh)23:28
jeblairoh good point23:28
SpamapSfeeling more and more like configuring apache to fetch from mergers is simpler ;)23:29
jeblairso if we do merger -> scheduler -> executor, we can only chuck from merger -> scheduler, not from scheduler-> executor....23:29
jeblairif we have the scheduler ask the merger for data directly (via gearman), that could stream in chunks, and, incidentally, avoid the scheduler seeing a lot of data it doesn't care about.23:30
SpamapSif we had a coalescing gearmand we could go in reverse23:31
jeblairscheduler -> [merger:merger], returns {merger: merger-name; objects: ...}.  then executor -> [merger-name:fetch-objects, objects:...]  would be able to stream chunks from the merger to the executor directly (at the cost of requiring the merger to stay online for the duration of the buildset)23:32
SpamapSthis feels like a very roundabout way to get around mergers simply serving their completed merges, doesn't it?23:33
SpamapSlike, not needing tests nodes to contact mergers is a big win. Not needing executors to do the same seems less important.23:33
jeblairSpamapS: yes, fetch over http or ssh would be much simpler, but complicates the operational deployment.23:33
jeblairSpamapS: agreed, internal access isn't as significant as external access.23:34
jeblairSpamapS: how does coalescing help here?23:34
SpamapSyou know I think it could be done w/o coalesce in this instance23:36
SpamapSbut basically flip the roles23:37
SpamapShave the executors submit a job to a queue name in response to receiving a merge request, and then basically let the mergers serve it like cache requests.23:37
SpamapS"do I have these objects already", no, do the merge.. send packfile.23:38
SpamapSthe coalesce would just be good for avoiding repeating merges23:39
SpamapSsince if you had several executors request the same objects they could coalesce on a hash of that request and only one merger would do the merge23:39
jeblairSpamapS: oh, yeah, i think this is only a win if we only perform the merge once23:40
SpamapSbut if you want the merges done beforehand, stuffing the result in somewhere addressable and fetching it from there seems like the win.23:40
jeblairwhich is why i was thinking one option is to have the executors ask the merger which performed it for the objects23:40
SpamapSI guess you could have a gearman queue per merger and attach the merger name that did the initial merge to the job that the executors get?23:40
jeblairyeah that ^23:41
SpamapSThat would achieve 1-merge-per-change and would avoid new network pathways23:41
jeblairstill sends lots of data through gearman, but it's the shortest path version of that.23:41
SpamapSand you wouldn't need coalescing to make it efficient since even if multiple executors submitted they'd just get served rapidly in succession23:42
jeblair(both sides can be efficient about how they handle the data since it doesn't require intermediate storage)23:42
SpamapS(though coalescing _would_ make it more efficient when you get to lots and lots of executors)23:42
jeblairif we go with that, we really should look into whether we can switch to C gearmand though.  i'm not worried about gear clients bottlenecking here, but this may exceed what we can do with geard.23:43
SpamapSI don't think gearman handling lots of data is much of a problem.. it's stateless so the only waste is network bandwidth and a little buffer churn23:43
SpamapSAnd yeah, C gearmand gets you multi-threaded performance.23:43
jeblairi feel fairly confident that if there are any remaining problems with gearmand for zuul, we can fix them.  :)23:44
* SpamapS eyes rustygear longingly, but shuns it again23:45
SpamapSso basically we just need 'git pack over gearman' ;-)23:45
SpamapSI will point out that whenever users on the ML ask me for solutions like this, I recommend that they just serve that data over http ;)23:46
jeblairSpamapS: here's what i have so far (up to the point where we would git-pack):
SpamapSjeblair: seems legit :)23:58

Generated by 2.14.0 by Marius Gedminas - find it at!