22:06:10 <jeblair> #startmeeting zuul 22:06:10 <dmsimard> ok /me scratches it off 22:06:10 <openstack> Meeting started Mon Nov 27 22:06:10 2017 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 22:06:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 22:06:13 <openstack> The meeting name has been set to 'zuul' 22:06:56 <jeblair> dmsimard: it's also a grey area :) not completely out of scope, but perhaps not intrinsic to zuul itself 22:07:10 <jeblair> dmsimard: maybe let's stick it at the end 22:07:14 <jeblair> which it already is 22:07:24 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/Zuul 22:07:33 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/zuul/2017/zuul.2017-11-20-22.07.html 22:07:35 <dmsimard> jeblair: the meeting participants are largely the same, let's talk about it tomorrow :p 22:07:48 <jeblair> #topic Roadmap 22:08:07 <jeblair> based on a discussion right before summit, i've moved our roadmap into storyboard 22:08:16 <jeblair> #link roadmap in storyboard https://storyboard.openstack.org/#!/board/53 22:08:26 <jeblair> i *just* finished typing everything in 22:08:35 <fungi> still has that new board smell 22:08:40 <jeblair> that's one story for each line in the roadmap email i sent 22:09:08 <jeblair> in most cases, i just put in a bare-bones description of what the item is 22:09:23 <pabelanger> great 22:09:24 <Shrews> that must have been fun data entry 22:09:26 <jeblair> if you have a particular interest in an item, or know what it means, please do update the descriptions with more explanation 22:09:27 <fungi> perhaps a meta-item, but at what point do we merge back to master? 22:09:51 <jeblair> fungi: good question, let's talk about that next 22:09:57 <fungi> k 22:10:15 <fungi> wondered if it was missing from the roadmap, or at a different level than the roadmap 22:10:17 <jeblair> i *almost* finsished assigning folks to the tasks i know about 22:10:43 <jeblair> please assign yourself if you're working on something with no assignee 22:10:58 <jeblair> that's a fully automatic board 22:11:37 <jeblair> the entries are all stories, and the rows are generated by queries based on tags 22:11:47 <jeblair> so a story tagged with zuulv3.0 will appear in the first column 22:11:53 <jeblair> zuul v3.1 in the second 22:11:58 <jeblair> er "zuulv3.1" 22:12:03 <jeblair> and "zuulv3.x" in the third 22:12:35 <jeblair> i had one or two more categories in the email, i'm not sure if it's useful to break it down further or not... 22:12:52 <clarkb> o/ 22:12:59 <dmsimard> fwiw I made progress on the json/finger:// issue last week and I have a huge draft email I was going to send out but I figured I'd sleep on it and take another look this week, I'll add a comment on the story. 22:13:39 <jeblair> i'd love to have "3.0 in progress" and "3.0 todo", but that would require the kind of script we were using for the initial zuulv3 board, since storyboard can't do that query natively. 22:13:54 <jeblair> if folks think that's really important, i can set that up. it's not too hard. 22:14:24 <jeblair> dmsimard: what's the json/finger:// issue? 22:14:46 <dmsimard> The zuul_json truncated problem that leads up to a finger:// URL 22:14:53 <jeblair> dmsimard: ah cool 22:15:31 <dmsimard> I got rabbit-holed into trying to understand how we could avoid finger:// URLs altogether but it's a bit complicated :) 22:16:06 <jeblair> dmsimard: you're talking about the issue where zuul_json is broken, yeah? 22:16:15 <jeblair> dmsimard: https://storyboard.openstack.org/#!/story/2001329 ? 22:16:21 <dmsimard> yeah, due to a callback bug with some tasks 22:16:29 <dmsimard> yes 22:16:36 <dmsimard> I'm writing a comment there to describe the issue a bit right now. 22:17:13 <jeblair> okay. my preference would be to fix that bug and not focus on the way it manifests as an error. 22:17:29 * dmsimard nods 22:18:12 <jeblair> we should probably come up with a process for adding new things 22:18:30 <dmsimard> I have access to a zuul v3 instance now (thanks tristanC and team) it's easy to reproduce so it's just a matter of having time to spend on it 22:19:37 <jeblair> i don't have a suggestion for a process for adding new items to the roadmap at the moment, so for now, maybe let's just discuss things in #zuul 22:19:44 <jeblair> we'll come up with something later 22:20:08 <jeblair> i see the 'release 3.0' lane as a burn-down list for us to do the 3.0 release 22:20:37 <jeblair> so if there's something there you can help with, please do -- either by assigning yourself and hacking on it, or reviewing changes related to it 22:21:03 <jeblair> anything else about the roadmap? 22:21:59 <jeblair> please let me know if it's useful, or not useful. 22:22:06 <jeblair> #topic merging into master 22:22:39 <jeblair> i think last week the topic of merging into master came up, and there was some support 22:22:47 <fungi> asap? immediately prior to tagging 3.0.0? somewhere in between? 22:23:37 <clarkb> I think it would be good to get done as soon as resonably possible. THe bulk of the 3.0 work is done and the master branches have largely been dormant. This will make it clear release is imminent and that future dev focus is on the newer stuff 22:23:48 <jeblair> i'm inclined to merge asap -- at least in my mind, the criteria for merging into master was when infra was running v3, and shortly before the release. 22:23:58 <clarkb> it will force us to sort out some of the deployment tooling transition stuff too which is good 22:24:01 <fungi> and do we preemptively branch a stable/2.x from current master, or just tag it 2.6.1 and then branch when we find we need to append there? 22:24:01 <dmsimard> Is a merge even possible ? Or is that a kind of git push --force kind of thing ? 22:24:12 <clarkb> fungi: I think we just tag and if we need to branch later 22:24:19 <jeblair> dmsimard: 'git merge --strategy ours' or however you spell that 22:24:31 <jeblair> clarkb: ++ 22:24:31 <pabelanger> for nodepool, merging back to master will make things a little easier upgrading our nodepool-builders. So +1 for that, but we can use feature/zuulv3 and do the dance with patches 22:24:37 <fungi> dmsimard: a "merge" is always possible, if we choose the right strategy ("ours" will basically be an overwrite) 22:25:05 <jeblair> pabelanger: yeah, i think this conversation applies to both repos 22:25:05 <dmsimard> TIL 22:25:27 <fungi> right, i assumed we were talking about merging in zuul and nodepool repos more or less synchronously 22:25:29 <dmsimard> will come in handy because I'll need to merge back feature/1.0 into master for ARA as well, thanks for that :D 22:25:38 <SpamapS> +1 for merge to master soon. Do we have any I'm progress fixes for 2.x? 22:25:46 <SpamapS> In 22:26:01 <jeblair> SpamapS: not that i'm aware of -- all the urgent gerrit upgrade stuff has merged into master and made it into a release 22:26:11 <clarkb> and the jenkins + nodepool stuff merged iirc 22:26:29 <fungi> #link https://review.openstack.org/#/q/project:openstack-infra/zuul+is:open 22:26:36 <fungi> #undo 22:26:44 <fungi> #link https://review.openstack.org/#/q/project:openstack-infra/zuul+is:open+branch:master 22:27:10 <jeblair> probably the next step is to start an infra mailing list thread about it, to make sure we coordinate with folks deploying from master 22:27:15 <jeblair> i'm happy to start that 22:27:25 <fungi> we have _lots_ of open zuul repo changes for master, looks like 22:27:54 <SpamapS> jeblair: indeed. A long runway for them to speak up seems like a good idea. 22:27:57 <jeblair> i think after that we need to sort out the puppet-openstackci deployment bits. i'd love it if someone else volunteered to drive that. 22:28:34 <jeblair> i don't think it's that hard (we discussed driving everything from a zuulv3 switch argument), but some care will need to be taken. 22:28:59 <clarkb> and probably want to have at least one third party ci group/individual following along for early feedback 22:29:15 <jeblair> i'll ask for a volunteer for that on the mailing list 22:29:16 <SpamapS> For the open changes.. perhaps a script that automatically WIP's them all with a friendly note? I mean, they'll likely merge conflict anyway, so might be the nice thing to do to let authors know why their change is stagnating and unlikely to move forward. 22:30:02 <fungi> i can take care of mass-wip'ing, that's trivial via gertty (no need for a script) 22:30:22 <fungi> just process-mark all changes matching the above query and then review them all in one shot 22:30:48 <fungi> s/i/i or anyone/ 22:30:51 <SpamapS> \o/ gertty 22:31:14 <clarkb> ttx forwarded someone to us that was interested in getting involved in the more sysadminy bits during openstack day france. Possible they would be interested in working through some of the puppet for this 22:31:23 <fungi> well, any zuul-core reviewer can at any rate 22:31:30 <clarkb> Thats kind of jumping into the deep end but probably in a good way :) 22:31:50 <jeblair> #action jeblair start email thread about merging into master 22:32:15 <clarkb> I'll try to get them on the mailing list and lurking in irc and see what they are interested in poking at 22:32:31 <jeblair> fungi: yeah, it may be worth taking a pass through and triaging some of those into "handled by zuulv3" "irrevenant in zuulv3" "please re-propose to zuulv3" 22:32:34 <fungi> while i think it's great if someone new wants to handle the puppet module updates, that may be an extraordinary amount of pressure since it's a task which is blocking some time-sensitive roadmap actions 22:33:04 <pabelanger> I can likey start work on puppet-openstackci, if somebody else doesn't do so before me 22:33:10 <clarkb> fungi: ya thats true, but I also think it will likely largely be mechanical puppet api updates which if familiar with puppet should be straightforward 22:33:31 <clarkb> (if not familiar with puppet then likely not a good first task) 22:33:43 <fungi> if it turns out to be a fairly trivial amount of work there, then i agree that would be a great way to get to know the rest of the team ;) 22:34:45 <jeblair> since we're already in the openstack / zuul grey area... 22:34:51 <jeblair> #topic Update ARA version on executors 22:35:34 <clarkb> is ara something we should just ensure latest on the executors? 22:35:53 <jeblair> #link https://review.openstack.org/516740 22:35:58 <jeblair> apparently this is tricky in puppet 22:36:24 <dmsimard> I meant to ask the TC if ara was something we could put in upper constraints, however it wouldn't really make sense in the context of zuul 22:36:29 <clarkb> the trick is to just call pip directly 22:36:42 <clarkb> with an onlyif clause that checks the version of ara against what is available 22:37:29 <clarkb> I can help with that if necessary 22:37:31 <dmsimard> I have to step away momentarily :( but I wanted to unblock that review somehow because there are some fixes in there we want (such as the firefox permalink issue) 22:37:35 <dmsimard> brb 22:37:36 <jeblair> it's also possible to solve this by declaring ara an extra dependency of zuul, but i'd prefer to avoid that and actually generalize the callback mechanism anyway. 22:37:55 <jeblair> so if there's a way to fix 516740, that's what i'd prefer 22:38:10 <jeblair> clarkb: can you take that over then? 22:38:10 <dmsimard> jeblair: I've already cut down the amount of deps in 1.0 (at least 3 dependencies gone) and started splitting the components 22:38:10 <clarkb> but basically exec resource that calls pip with all the options necessary, then onlyif paratmer (This is likely the trickiest bit to get that check right) 22:38:12 <dmsimard> so it's WIP 22:38:15 <dmsimard> brb 22:38:25 <clarkb> jeblair: ya I can work on a new patchset to do the thing I describe 22:38:31 <jeblair> clarkb: thanks! 22:38:43 <jeblair> #action clarkb fix https://review.openstack.org/516740 to call pip directly 22:39:08 <jeblair> #topic Open Discussion 22:39:21 <jeblair> anyone have anything else? 22:39:28 <clarkb> is there any outstanding cleanup work? 22:39:38 <clarkb> deleting servers, removing dead puppet code, etc? 22:39:54 <fungi> i almost have an etherpad with a list of the v2 (master) changes proposed for zuul and nodepool 22:40:19 <fungi> #link https://etherpad.openstack.org/p/zuulv2-outstanding-change-triage 22:40:25 <fungi> that's where it'll appear in a few minutes 22:40:35 <jeblair> clarkb: i think there are still some disabled tests in zuul 22:41:00 <jeblair> clarkb: i can't think of examples of the 2 things you cited 22:41:10 <pabelanger> I'd like to get a few eyes on https://review.openstack.org/521324/ Add support for shared ansible_host in inventory now that dmsimard has added a few notes 22:41:25 <pabelanger> clarkb: we likey can delete the release slaves, I can look at that tomorrow 22:41:53 <Shrews> just an FYI for folks, i'm fairly certain that the cause of the finger daemon dying on the executor VMs is b/c of OOM issues. zuul-executor is chosen for killing, but it's actually the child of that process (the finger daemon) being killed 22:42:09 <Shrews> this was discussed in #zuul, but mentioning here for wider exposure 22:42:14 <pabelanger> We also restarted zuulv3 late last week, to address memory issues. It was just about to hit 15GB of RAM 22:42:35 <jeblair> Shrews: now that i think about that more -- i wonder why zuul is using so much memory... i mean, ansible is doing all the work 22:43:08 <jeblair> Shrews: you're fairly certain it's the main process that's using lots of ram? 22:43:50 <jeblair> Shrews: oh, i wonder if there's some ansible output buffering the executor is doing that's using lots of ram 22:44:30 <jeblair> pabelanger: how long had it run, do you know? 22:44:32 <Shrews> jeblair: yes. kern.log outputs the mem usage of things before killing 22:45:11 <jeblair> Shrews: know how much it was using in absolute terms? 22:45:18 <fungi> okay, that etherpad linked earlier is now complete: 133 open zuul changes and 65 open nodepool changes on master at the moment 22:45:31 <pabelanger> jeblair: about a week I think, that was the last time I had restarted it 22:45:44 <fungi> also worth noting, we still have a couple open changes on nodepool's feature/gearman-zk-shim branch. presumably those can go away? 22:45:53 <jeblair> pabelanger: hrm. we should be able to do better than that. 22:45:54 <pabelanger> jeblair: but, I can get a more exact number 22:46:10 <Shrews> jeblair: http://paste.openstack.org/raw/627522/ 22:46:15 <Shrews> that was from ze4 22:46:19 <jeblair> fungi: yeah, i think so. we can probably abandon them and delete the branch. let's double check with mordred. 22:46:25 <pabelanger> jeblair: yah, memory does seem to spike when new patchsets with .zuul.yaml are added 22:46:37 <jeblair> pabelanger: did we have a bunch all at once? 22:46:43 <jeblair> maybe we should graph that as a metric 22:46:43 <pabelanger> this time, it was chef repos that were coming online 22:46:49 <pabelanger> jeblair: yah, about 6 patches 22:46:50 <fungi> jeblair: pabelanger: was the zuul scheduler memory consumption maybe due to someone posting a new flurry of dynamic reconfiguration triggering changes? 22:47:01 <fungi> er, i guess you already went there 22:47:14 <jeblair> Shrews: know the units for that by any chance? 22:47:24 <mordred> jeblair, fungi: we can definitely delete anyhting on feature/gearman-zk-shim 22:47:29 <mordred> fungi: I can abandon those right now 22:47:43 <jeblair> mordred: want to go ahead and delete the branch? 22:47:45 <fungi> thanks mordred! 22:48:01 <jeblair> anyone object to deleting the feature/gearman-zk-shim branch? 22:48:08 <Shrews> jeblair: i do not 22:48:14 <pabelanger> ++ 22:48:18 <fungi> far from it, i wholeheartedly endorse deletion of that branch now 22:48:32 <mordred> jeblair: yah - can do 22:48:44 <fungi> having more than one feature branch open for a repo seems like a recipe for problems, to me 22:48:56 <jeblair> Shrews: i think it may be 4k pages 22:49:12 <mordred> fungi, jeblair 22:49:15 <mordred> fungi, jeblair: done 22:49:41 <fungi> now we just have some ~200 pre-v3 changes to zuul and nodepool to figure out what to do with ;) 22:50:05 <jeblair> Shrews: so that's 4.6 GB of ram. it seems like a lot. 22:50:10 <clarkb> fungi: we probably want to sort them by features vs bugfixes 22:50:30 <clarkb> fungi: bugfixes consider merging based on severity, features point to v3 things? 22:50:35 <fungi> agreed, some means of grouping/tagging would make sense next 22:50:37 <jeblair> Shrews: may be a leak in the executor? 22:50:52 <Shrews> jeblair: "Nov 23 00:26:22 ze04 kernel: [8494719.679223] git invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0" 22:51:00 <Shrews> jeblair: does that mean a git operation is triggering it? 22:51:44 <fungi> i think it means git asked for some memory, the kernel realized it had none, and then excitement ensued 22:52:02 <fungi> or hilarity, depending on your mood 22:52:12 <jeblair> yeah, so not really git's fault. just in the wrong place at the wrong time 22:52:53 <Shrews> just wondering it the git operations are using more memory than we expect. at any rate, probably a #zuul discussion 22:53:34 <fungi> probability is the number of git operations mean that an allocation request from a git process is statistically likely to make the next request 22:53:55 <fungi> er, be the next request, whatever 22:54:42 <fungi> your highest-churn processes are the ones most likely to trigger an oom action, but the process using the most memory could easily be something else 22:54:55 <Shrews> *nod* 22:55:27 <fungi> the sacrificed children mentioned in the log are the most likely culprits 22:56:22 <fungi> [Thu Nov 23 00:26:21 2017] Killed process 20080 (zuul-executor) total-vm:850552kB, anon-rss:8900kB, file-rss:6048kB 22:56:29 <Shrews> fungi: in this case, the child was using much less memory than the parent 22:56:56 <Shrews> but the child was chosen, thus no more finger process 22:57:24 * fungi avoids making a lotr reference involving us being down to only 9 fingers 22:58:54 <jeblair> we're about at time 22:58:58 <jeblair> so thanks everyone! 22:59:04 <jeblair> #endmeeting