22:02:45 <jeblair> #startmeeting zuul 22:02:46 <openstack> Meeting started Mon Aug 21 22:02:45 2017 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 22:02:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 22:02:49 <openstack> The meeting name has been set to 'zuul' 22:02:51 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/Zuul 22:03:00 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/zuul/2017/zuul.2017-08-14-22.03.html 22:03:26 <jeblair> the agenda remains: "What needs to happen before PTG" (as i expect it to until the ptg) 22:03:43 <jeblair> #link pre-ptg etherpad https://etherpad.openstack.org/p/zuulv3-pre-ptg 22:04:02 <jeblair> so let's go through those which we identified last time 22:04:23 <jeblair> actually, i'm going to do easy things first 22:04:33 <jeblair> #topic startup time 22:04:52 <jeblair> we ran this test, at least as best we can until we have more jobs defined 22:05:11 <jeblair> we got a baseline for how long it will take zuul to checkout all branches of all projects 22:05:30 <jeblair> if we size zuulv3 at least as large as zuulv2, things look good. it should be able to start in less than 2.5 minutes 22:05:53 <jlk> switching to phone, family dragging me to icecream 22:06:02 <mordred> that's excellent time for an initial start - especially considering it doesn't need to initial start much 22:06:08 <jeblair> i think that's quite manageable for a system that large which is, after all, not actually intended to restart very often 22:06:10 <jeblair> mordred: ya that :) 22:06:36 <jeblair> (ftr: that's ~1600 repos, with 8 mergers and 8 executors) 22:06:49 <Shrews> is a progress indicator emitted during that? 22:06:52 <clarkb> and that was with pre primed repos right? 22:07:09 <clarkb> (that may be worthy of deployment documentation if expected to be used) 22:07:19 <jeblair> Shrews: no 22:07:32 <jlk> Not bad at all. 22:07:42 <jeblair> clarkb: yes -- that's a good point. if we want to minimize this for our very first restart, we should prime the git repos on our new hosts 22:07:53 <jlk> There are logs in the file... 22:08:09 <mordred> jeblair: was 2.5 minutes with or without pre-primed git repos? 22:08:19 <jeblair> clarkb: i don't think i'd mention this in general documentation though. most other zuul v3 instances will be able to grow to this size 22:08:24 <jeblair> mordred: with pre-primed 22:08:27 <mordred> ok. cool 22:08:33 <jeblair> primed in this case means they were already cloned onto the host 22:09:22 <mordred> and yah, I don't think it's super useful for normal docs - our migration is a weird special case of starting a new v3 at massive size :) 22:09:25 <jeblair> clarkb: by which i mean, it's really only the case where a zuul v3 instance springs from nowhere with 1600 repos that it's worth considering 22:09:49 <jeblair> i'm comfortable scratching this from the pending list; any other concerns? 22:10:02 <mordred> the results have exceeded my expectations 22:10:08 <jlk> That's like replacing an existing Zuul site with a new host and new file systems. 22:10:18 <jlk> So, like a major failure. 22:10:40 <mordred> yah. and in that case, pre-cloning the repos isn't likely to buy you much in terms of downtime response 22:10:54 <jeblair> jlk: yeah, and if you're a large site, you'd have to lose all your (ideally many) executors+mergers to achieve this level of failure 22:10:58 <mordred> in fact, it's likely to make it slower, since you'll have to clone the repos which means scripting that real quick 22:11:08 <jeblair> mordred: indeed 22:11:24 <jlk> So a very unlikely scenario. 22:11:43 <jeblair> #topic fix for "A worker was found in a dead state" bug 22:11:53 <jeblair> we didn't actually talk about this last time 22:12:18 <jeblair> but i put it on the list because it was killing a significant portion of our ansible-playbook runs 22:12:34 <jeblair> good news: we tracked it down to a python segfault which has been fixed in current versions of python 22:12:54 <jeblair> mordred created a ppa with the backported bugfix (which we're running now i believe?) 22:13:07 <jeblair> and SpamapS started the ubuntu SRU process for it 22:13:32 <mordred> https://launchpad.net/~openstack-ci-core/+archive/ubuntu/python-bpo-27945-backport <- ppa exists with python3.5 package built 22:13:42 <mordred> so the package there is good to go 22:13:53 <jeblair> mordred: ah, do we still need to add that to our puppet? 22:14:23 <SpamapS> I did 22:14:26 <mordred> no, we have done that already 22:14:28 <jeblair> cool 22:14:30 <jlk> Neat. 22:14:31 <SpamapS> and sorry I missed the start AGAIN... weird day. :-P 22:14:36 <mordred> https://review.openstack.org/495399 was merged 22:14:43 <jeblair> SpamapS: sundial messed up? 22:14:45 <mordred> jlk, SpamapS: y'all likely want to add that :) 22:14:46 <SpamapS> exactly 22:15:29 <jeblair> SpamapS: let us know what happens with the sru process, please :) 22:15:33 <SpamapS> I would expect that upload to be released into xenial-proposed within a week. 22:15:53 <SpamapS> and once it's there, we should report back that the binaries work, and it will spend another few days in proposed before they release it in updates. 22:15:59 <SpamapS> I suggest subscribing to the bug. 22:16:27 <jlk> Okay 22:16:32 <jeblair> SpamapS: have the bug link handy? 22:16:49 <jeblair> oh i do 22:16:54 <SpamapS> https://bugs.launchpad.net/ubuntu/+source/python3.6/+bug/1711724 22:16:55 <openstack> Launchpad bug 1711724 in python3.5 (Ubuntu Xenial) "Segfaults with dict" [High,In progress] - Assigned to Clint Byrum (clint-fewbar) 22:17:06 <jeblair> #link https://bugs.launchpad.net/ubuntu/+source/python3.6/+bug/1711724 22:17:16 <jeblair> SpamapS: thanks! 22:17:31 <jeblair> #topic tarball/publish jobs 22:17:46 <jeblair> #link https://etherpad.openstack.org/p/mVSVwG4xos 22:18:02 <jeblair> pabelanger and mordred have been working on this 22:18:19 <mordred> pabelanger has some next patches up - I've got some locally I wrote on the plane that follow up to his patches that I'll get pushed up after the meeting 22:18:43 <jeblair> #link https://review.openstack.org/494672 22:18:45 <pabelanger> yes, python-branch-tarball should be ready fore testing 22:18:50 <pabelanger> also added upload-twine role 22:19:03 <jeblair> #link https://review.openstack.org/495972 22:19:08 <jeblair> i think those are those two patches ^ 22:19:17 <pabelanger> yes 22:19:36 <pabelanger> https://review.openstack.org/495973/ uses the new twine role 22:19:43 <jeblair> pabelanger: would you mind keeping the etherpad updated and adding links to those patches in there? 22:19:52 <jeblair> i just added those, i mean in the future 22:19:53 <pabelanger> jeblair: sure 22:20:06 <jlk> Also ping me if you need more work. I have an empty plate. 22:20:34 <dmsimard> jeblair: pong (sorry got sidetracked) 22:20:51 <pabelanger> release-openstack-tarball still needs to be pushed up, but holding off until we get branch-tarball working 22:20:52 * dmsimard reads backlog 22:21:31 <jeblair> pabelanger: you mean release-openstack-python? 22:21:42 <pabelanger> Ah, yes 22:22:32 <pabelanger> actually, I'll push up that review shortly 22:22:44 <jeblair> pabelanger: thank you. that will let us work on things in parallel 22:22:53 <pabelanger> agree 22:23:06 <jeblair> pabelanger, mordred: do you think we can wrap up these jobs in the next 2 days or so? 22:23:23 <pabelanger> yes, I hope we can finish them for tomorrow 22:23:56 <mordred> jeblair: yes. I agree with pabelanger on tomorrow 22:24:01 <jeblair> okay, thanks! 22:24:03 <mordred> they're very close 22:24:12 <pabelanger> if we land 494672 today, that would be helpful too 22:25:40 <jeblair> please continue to ping me as soon as any patches directly related to these efforts are ready for review 22:25:53 <jeblair> shall we move on to devstack now? 22:26:05 <pabelanger> yes 22:26:15 <jeblair> #topic devstack jobs 22:26:18 <jeblair> #link https://etherpad.openstack.org/p/AIFz4wRKQm 22:26:28 <jeblair> there's the brainstorming etherpad for this one 22:26:50 <jeblair> jlk: i suspect that there may be more opportunity for you to jump in here, as compared to the tarball jobs 22:26:58 <jlk> Okay 22:27:46 <mordred> jeblair: I did some (mostly useless) noodling in an off moment over the weekend - I do not think we'll wind up being able to use anything I poked at over the weekend directly... 22:28:10 <jeblair> once mordred and pabelanger finish up with the tarball jobs, i expect their focus will shift to this 22:28:11 <pabelanger> yes 22:28:24 <mordred> jeblair: but one thing that jumped out that we're missing from the list is a role or something to get from our new repo structure to somethign devstack can consume 22:28:26 <jeblair> and clarkb has volunteered to review some of this as well :) 22:28:39 <jlk> I'll read up this evening / tomorrow. 22:28:44 <mordred> because the PROJECTS list and repos on disk is drastically different 22:28:58 <pabelanger> I think mordred added devstack-gate to zuulv3 last week? 22:29:06 <mordred> yes. d-g is in v3 currently 22:29:40 <pabelanger> great 22:30:19 <jeblair> mordred: good point. i added an item to the etherpad todo list 22:30:24 <mordred> jeblair: my hunch was that it might be good for you to at least eyeball that and ponder it 22:31:00 <jeblair> mordred: i expect 'required-projects' to be a replacement for the $PROJECTS variable 22:31:06 <mordred> I do too 22:31:10 <pabelanger> +1 22:31:15 <jeblair> clarkb's work to reduce the use of that to minimum will be helpful here 22:31:55 <jeblair> so yeah, i'll plan on thinking about that next 22:32:22 <pabelanger> one question on d-g, for legacy hooks, that is basically just going to be a shell task right? 22:32:26 <jeblair> in the mean time, i started on the localrc ansible module we discussed. that's a nice out-of-the way thing i could start while other folks were finishing the tarball jobs 22:32:33 <mordred> jeblair: ++ 22:32:51 <jeblair> the guts of that are done, i just need to wrap it up in module boilerplate. should have that up tomorrow. 22:33:25 <jeblair> pabelanger: the etherpad says: make "legacy" playbooks that run hook scripts with vars collected by part one 22:33:32 <mordred> yah 22:33:34 <jeblair> (part one is "process env vars") 22:33:53 <mordred> and in non-legacy jobs people can just use pre-tasks as needed 22:34:09 <jeblair> pabelanger: so yeah, i think so -- a playbook with a shell task with the current hook content 22:34:21 <jeblair> automatically generated by the migration script 22:34:29 <mordred> this: https://review.openstack.org/#/c/495930/ does not work - but is the first attempt at dealing with part one "process env vars" btw 22:34:49 <mordred> it does not work and probalby should be deleted - but that was the inspiration 22:34:49 <pabelanger> Right, so I guess we'll have to do some magic for bash variables like WORKSPACE too 22:35:19 <mordred> well - the idea in my brainhole is that we make a thing that produces all of the legacy envvars that things normally run with 22:35:31 <pabelanger> okay, cool 22:35:37 <mordred> then when we take a hook script, we run it in a script that first sources those vars, then runs the hook script 22:35:51 <pabelanger> perfect 22:36:03 <jeblair> we'll want legacy vars for a bunch of types of jobs, so that's probable a role to generate the file, then source when appropriate? 22:36:19 <mordred> yah 22:37:15 <jlk> A role that wraps a script, script passed as role var at role call? 22:37:29 <pabelanger> not that I am looking for an answer, but have we given any thought on how long we'd support a legacy hook for? 22:37:38 <mordred> jlk: yah- something like that 22:38:22 <mordred> pabelanger: preferrably not a super long time :) 22:38:36 <jeblair> pabelanger: i think once we get the automagic conversion done, we (openstack) set a release+1 goal for projects to migrate their own jobs (and help projects as we're able) 22:38:47 <mordred> jeblair: ++ 22:38:49 <pabelanger> okay, cool 22:38:50 <jlk> Less time than keystone v2... 22:39:14 <jeblair> anything else devstack related? 22:39:19 <mordred> also - fwiw, MOST of the hook scripts are one line scripts 22:39:24 <mordred> that just call a script in the repo 22:39:31 <clarkb> which then calls our d-g script 22:39:34 <pabelanger> mordred: tripleo-ci is the usecase I am thinking off 22:39:35 <clarkb> so we also have to keep that around 22:39:37 <pabelanger> of* 22:39:43 <SpamapS> Just a hook, script, and a jump? 22:39:49 <mordred> the ones that are more than one line are mostly 2 lines, with the first line being a CD 22:39:52 <mordred> cd 22:39:54 <mordred> gah 22:40:07 <jeblair> CD \ 22:40:17 <pabelanger> jeblair: nothing more here 22:40:35 <jeblair> #topic migration script 22:40:44 <mordred> so thankfully there's not a ton of actual complex logic in the job definitions themselves - and we can probably take any of the very few actual special cases and nudge them to fit into the pattern of the otherhook scripts 22:41:04 <jeblair> this is basically pending completion of the publish and devstack jobs 22:41:23 <jeblair> mostly because we need to know what to migrate too 22:41:48 <jeblair> i'm hoping we can get those jobs wrapped up very soon and have at least a couple of weeks to work on the migration script and deal with the output 22:41:56 <mordred> ++ 22:42:11 <pabelanger> ++ 22:42:39 <mordred> luckily the migration script is a thign that just reads local yaml files - so it's actually easy to iterate on locally 22:42:56 <mordred> it may hurt one's brain - but other than that it's just normal hacking 22:43:39 <jeblair> #topic migration docs 22:43:52 <jeblair> Shrews has started to pick up work on this 22:43:59 <jlk> YAML2YAML The YAMLing 22:44:31 <jeblair> updating the infra-manual migration page based on his experiences trying to use zuulv3 with almost no relevant documentation :) 22:44:31 <Shrews> https://review.openstack.org/495971 22:44:44 <jlk> Yay 22:45:23 <jeblair> Shrews: should i continue to flesh out some of the todo items i left in there? or did you want to take some of them over? 22:45:47 <Shrews> that link is just for a 0-day, "omg, how do i do something" guide 22:46:39 <Shrews> jeblair: i'm afraid of assuming where you were heading with some of those todos 22:46:39 <jeblair> (eg: one of the important things i think we need to communicate is how the variant binding works. there's some tribal knowledge about metajobs and skip-if that we need to tell people how to do with variants) 22:46:45 * jlk has to go afk. Will review logs. 22:47:20 <jeblair> Shrews: okay, i'll continue to poke at them as able, and maybe try to trick you into doing some if i can articulate it adequately :) 22:47:24 <Shrews> jeblair: does what i put up cover your "actually work" todos? 22:47:40 <Shrews> or were you planning to go deeper? 22:48:18 <SpamapS> There should definitely be a "zuulv3 for the jjb programmer" 22:48:34 <Shrews> i guess it doesn't really cover inheritence or roles very well 22:50:39 <pabelanger> has anybody looked at project-template from python-jobs? 22:51:12 <pabelanger> possible that is in mordred migration script, but I haven't looked 22:53:01 <jeblair> Shrews: i'm not sure; there's some good stuff in there, but it almost looks like it's more aimed at the user who hasn't used zuulv2 in openstack. i think we'll be able to use a lot of that in the project drivers guide of the infra-manual to replace the current jjb/zuulv2 stuff that's there. 22:54:47 <jeblair> i'll leave some suggestions as to how we might change things for the migration audience 22:57:09 <jeblair> anything else? 22:58:24 <jeblair> i forgot to start the meeting with our countown clock, so i'll end it with: we have 15 workdays until the scheduled ptg cutover 22:58:27 <jeblair> thanks everyone! 22:58:30 <jeblair> #endmeetig 22:58:31 <jeblair> #endmeeting