22:01:48 <jeblair> #startmeeting zuul 22:01:49 <openstack> Meeting started Mon Nov 21 22:01:48 2016 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 22:01:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 22:01:52 <openstack> The meeting name has been set to 'zuul' 22:01:57 <Shrews> yo 22:02:10 <jeblair> hi all. i know we have some apologies for absences today... 22:02:19 <phschwartz> Hi 22:02:26 <jeblair> so this might be brief 22:02:33 <jeblair> i know that would be disappointing 22:02:37 <jeblair> but we will cope 22:02:43 <Shrews> say it ain't so 22:02:46 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/Zuul 22:03:04 <jeblair> there is our shiny new agenda! 22:03:19 <jeblair> #topic Actions from last meeting 22:03:25 <jeblair> jeblair set up meeting agenda wiki page 22:03:26 <jeblair> done! 22:03:34 <jeblair> #action jeblair work with Shuo_ to document roadmap location / process 22:03:38 <jeblair> not so much with that one yet 22:03:59 <jeblair> #topic Status updates 22:04:10 <jeblair> let's talk about nodepool/zookeeper 22:04:15 * zaro lurks for the latest news 22:04:17 <jeblair> i think we're really really close there 22:04:27 <pabelanger> Ya, I'm happy how well it is going too 22:04:53 <jeblair> pabelanger, Shrews: are we about there on the test-reenablement front? 22:05:39 <pabelanger> I think we are in good shape. I believe all our existing tests ther were disabled, have been re-enabled (up for review or merged) 22:05:42 <pabelanger> I 22:05:45 <Shrews> 4 in test_commands, 3 in test_nodepool, 1 in test_webapp 22:05:54 <Shrews> according to a quick "git grep" 22:06:05 <Shrews> some of those probably have reviews up 22:06:20 <jeblair> yeah, i know there are at least a few 22:06:28 <jeblair> so we're probably down to the last 1-3 or something like that 22:06:54 <jeblair> we've also been polishing up the procedures for enabling/disabling images/providers 22:07:13 <jeblair> and we identified last week an operational need for a 'pause' attribute to temporarily suspend image builds 22:07:28 <jeblair> i think that will end up being a 10 line change, i can write that after the meeting 22:07:57 <jeblair> we also recently discussed rolling this out into production 22:08:20 <jeblair> and that we would like to spin up a new builder host and run this on it in parallel while we continue using the current system 22:08:58 <jeblair> that way we can watch image builds and uploads, make sure it works well, and see if we can get a quorum of images in zk 22:09:07 <jeblair> (if we can't, we could always port the data over) 22:09:35 <jeblair> i think we can probably make that happen by the end of wednesday? what do folks think? 22:10:23 <Shrews> i'm out on Wednesday 22:11:34 <pabelanger> internet is poor where I am :( 22:11:43 <jeblair> Shrews: maybe we can make it happen before then :) but also, maybe by then we'll just be down to puppet changes 22:12:20 <pabelanger> Ya, I've starte on puppet-nodepool changes. I can pick that up again this week 22:13:08 <jeblair> pabelanger: let's see how far we can get 22:13:20 <pabelanger> ack 22:13:37 <Shrews> how many ZK nodes in the cluster are you planning to use? 22:13:37 <rcarrillocruz> i'll be around, so i can check things during EMEA times if needed 22:13:45 <jeblair> Shrews: just one for starters 22:13:53 <jeblair> rcarrillocruz: cool! 22:14:27 <jhesketh> ditto APAC time for me 22:14:49 <Shrews> jhesketh: who needs redundancy? :-P 22:15:00 <Shrews> err, jeblair 22:15:09 <Shrews> darn you, tab completion 22:15:53 <jeblair> Shrews: not us -- not on a system which is, today, entirely contained on a single host. :) 22:15:58 <fungi> wednesday will mostly be entertaining in-laws for me, so not much help sorry :/ 22:16:25 <Shrews> jeblair: maybe we should do some disaster testing... like, killing the single ZK node to see what breaks and what cleanup looks like 22:16:26 <phschwartz> About the same here but for most of the morning est I can lend a hand. 22:16:38 <jeblair> Shrews: we can expand that perhaps after we start running a builder, and maybe before we start running separate node workers. 22:17:08 <phschwartz> If this will be turned on at all in production I would advise against 1 zk node 22:17:09 <pabelanger> I'll be on a train again Wednesday, but should be able to help too 22:17:16 <jeblair> Shrews: yeah, that's a good idea. i feel certain we're going to want some "--force" commands to forcibly alter the ZK tree. 22:17:40 <jeblair> Shrews: and of course, developing some tests for that once we figure out how would be nice as well. :) 22:17:51 <Shrews> yup 22:18:38 <jeblair> phschwartz: well, we want the entire system to scale down to being able to run on one node (zuul and zookeeper) for the very small case. so if zk can't run with just a single node, we have made a very bad mistake. 22:18:54 <mordred> ohai 22:19:09 <jeblair> i haven't encountered anything that suggests that though 22:19:19 <phschwartz> You can run it with a single zk node but there is no data safety guarantee which is bad. 22:20:01 <phschwartz> You can loose the running data set with the last journal set to replay from which can be missing stuff. 22:20:49 <phschwartz> It is not a show stopper for dev systems and such. But I would say yes for prod if we don't want a chance of loosing stuff. 22:21:24 <jeblair> well, if we want to support one-node operation, we may need to be aware of that limitation and work around it 22:21:46 <phschwartz> Ack 22:22:06 <Shrews> yep, thus my suggestion for disaster scenerio testing 22:22:28 <mordred> what does "lose the running data set with the last journal set to replay from which can be missing stuff" mean? 22:22:40 <mordred> is that a thing that's a case that happens if the machine restarts hard? 22:22:47 <mordred> or just a thing that can happen while it's running? 22:22:50 <Shrews> mordred: it means we all go drinking when that happens 22:22:55 <mordred> neat! 22:22:56 <jeblair> (for instance, it sounds like we might end up leaking an image or node -- fortunately, openstack has taught us to be pretty skeptical of that sort of thing, so we often have routines that deal with stuff like that) 22:23:06 <clarkb> mordred: hard restart 22:23:13 <clarkb> its no different than today with gearman aiui 22:23:21 <pabelanger> right 22:23:48 <mordred> ack 22:24:18 <phschwartz> The only difference is the journal can be damn large and take a long time to replay to get back up to date 22:24:29 <phschwartz> But not a showstopper by any means. 22:24:48 <jeblair> cool, i'm in favor of building systems that are pretty resilient to that kind of thing. i think nodepool is currently to a signifigant degree. we'll run into some edge cases with zk which are new to us and we'll have to fix them up. but dealing with surprising data is something we can do. 22:27:37 <jeblair> i think that covers it for nodepool for now 22:27:48 <jeblair> any status updates re zuul? 22:28:12 <pabelanger> nothing, for me. I've shifted to nodepool this past week 22:28:42 <jeblair> (and yeah, i think the push on nodepool right now is a good idea) 22:28:59 <rcarrillocruz> i have a handful of devstack-gate ansible roles in review for some time 22:29:02 <jhesketh> jeblair: it's not v3 related, but I'd like reviews on the sql reporter: https://review.openstack.org/#/c/223333/ 22:29:17 <mordred> I keep thinking that's landed 22:29:36 <mordred> rcarrillocruz: oh yeah? you got a topic? 22:29:40 <rcarrillocruz> also, awaiting for the secrets spec to land ( i +1, jhesketh too), so i can move forward implementation 22:29:44 <rcarrillocruz> mordred: zuulv3 22:29:45 <jeblair> jhesketh: oh yes, that's a good idea. i'm happy to do so (i think it's grandfathered in to pass the v2 feature freeze) 22:30:05 <jhesketh> I'm also working on merging master into feature/zuulv3 which I assume we still want to do 22:30:13 <jeblair> jhesketh: ++ 22:30:31 <jeblair> rcarrillocruz: i'm not seeing your changes under topic zuulv3 22:30:52 <jhesketh> I poked around merging nodepool master into the feature branch too which looks like it might pick up some useful changes... Is this a thing we want to try and do or should I go through and cherrypick anythign that might be applicable given how close it is 22:31:06 <mordred> jeblair: I do actually: https://review.openstack.org/#/q/status:open+project:openstack-infra/devstack-gate+branch:master+topic:zuulv3 22:31:16 <rcarrillocruz> jeblair: https://review.openstack.org/#/q/topic:zuulv3 i do, maybe cos those changes are openstack-infra/devstack-gate ? 22:31:19 <mordred> rcarrillocruz: sorry - I had missed those - they're on my list now 22:31:26 <rcarrillocruz> no worries, thanks 22:31:41 <jeblair> rcarrillocruz: i'm very excited that's happening! :) 22:31:49 <rcarrillocruz> ++ 22:32:11 <jeblair> rcarrillocruz: oh, i see my problem. sorry. :) 22:32:44 <jeblair> jhesketh: if you can give that a shot, i think it would be useful. 22:32:58 <rcarrillocruz> we need cross-project topics in gerrit naow :P 22:33:03 <jeblair> jhesketh: i hope the builder in master hasn't changed much 22:33:04 <jhesketh> jeblair: we'll need to set the acl's to allow merge commits, so I'll do that shortly 22:33:22 <jeblair> jhesketh: cool. 22:33:24 <jhesketh> err, there's some big merge conflicts that I need to figure out if they are useful 22:34:03 <jeblair> jhesketh: ok, let me know if you need help sorting through that. 22:34:09 <jhesketh> thanks :-) 22:34:23 <jhesketh> so I realise I'm jumping back to nodepool again, but after we test/prove the new system, how soon do we plan on cutting the feature branch over to master? 22:34:34 <jeblair> and, ftr, i think the plan is that once we are ready to use this builder in production, we will merge the zuulv3 branch of nodepool into master 22:34:34 <jhesketh> can we do that before zuulv3 is ready for example? (ie asap) 22:34:39 <jeblair> jhesketh: yep 22:34:52 <jeblair> puppet-openstackci has a pin on it, so it should be safe 22:35:05 <jhesketh> heh, cool, thanks :-) 22:35:26 <jeblair> and, it is definitely something people can start running if they want 22:35:33 <jhesketh> jeblair: you want to merge the feature branch into master? or do you want to switch the branch to become master 22:35:34 <jeblair> ie, it's still compatible with zuulv2 22:36:08 <clarkb> jhesketh: merge otherwise its non fast forward 22:36:22 <jeblair> jhesketh: i think merge into master (but prefer v3 in conflict resolution -- which, hopefully, there won't be any after you finish your work) 22:36:25 <jhesketh> okay, we'll have to resolve the diffs then 22:36:31 <jhesketh> yep 22:39:00 <jeblair> one last status update: SpamapS found an issue with zuulv3 job configuration in his work on tests. i think i have a solution for that which i will write a patch for soon 22:39:37 <jeblair> moving on -- i'm going to skip progress summary since SpamapS isn't around... 22:39:44 <jeblair> #topic Announcements of areas of interest 22:40:07 <jeblair> anyone have anything new they'd like to start working on? 22:40:37 <rcarrillocruz> i have, but don't want to put more on my plate till i have something decent for the zuul secrets stuff 22:40:50 <jeblair> rcarrillocruz: ++ that's a big one 22:41:07 <jeblair> and actually, with that... 22:41:08 <rcarrillocruz> in particular, nodepool working with other node providers, which can be another big one :P 22:41:46 <jeblair> rcarrillocruz: yes, that will be important, but it's also not in our critical path to get the initial zuulv3 version out the door 22:41:54 <rcarrillocruz> ++ agreed 22:42:30 <jeblair> rcarrillocruz: but that will build on the next stage of nodepool work, so good to be involved in that and make sure we're setting things up so we can accomplish that later 22:42:51 <jeblair> (the use zk for node builders phase) 22:42:54 <rcarrillocruz> i'll def. keen an eye on that 22:43:23 <fungi> jeblair: are you planning to propose the outstanding zuul spec updates at the infra meeting tomorrow, or giving them another week to firm up first? 22:43:25 <jeblair> i think next week i should be able to clean up the spec update for that, and maybe we can review it 22:43:40 <jeblair> #topic Secrets spec update 22:43:42 <fungi> aha, excellent 22:43:52 <jeblair> #link https://review.openstack.org/386281 22:43:56 <jeblair> i think that is ready now 22:44:13 <jeblair> fungi: so i'd like to put it on tomorrows agenda 22:44:26 <fungi> sounds great 22:44:45 <jeblair> mordred, fungi: i would especially like your reviews on it before it merges 22:45:00 <fungi> already have the latest diff pulled up, thanks for the reminder 22:45:08 <jeblair> but you've both been involved in it enough now that i think we can safely open it up for council vote 22:45:22 <fungi> i agree 22:45:37 <jeblair> (and of course, i only didn't mention rcarrillocruz there since he already voted on it :) 22:45:46 <rcarrillocruz> ++ 22:46:31 <jeblair> but anyone else who would like to go over it with a fine tooth comb, that would be great. there is much opportunity for us to shoot ourselves in the foot with this. :) 22:47:07 <jeblair> i'll put it on the infra meeting agenda 22:47:49 <jeblair> fungi: and yeah, the other 2 outstanding ones i'm not quite ready for, but hope to clean up after this one 22:48:13 <jeblair> #topic Open Discussion 22:48:23 <jeblair> ...so not that short after all... 22:49:18 <Shrews> Many thanks to pabelanger for jumping in on the nodepool test enablement and new tests. That pushed us along quite awesomely. 22:49:30 <rcarrillocruz> \o/ 22:49:32 <Shrews> and found cool bugs 22:49:57 <pabelanger> Shrews: no problem, thanks for the awesome zk.py, for make it easy to use :) 22:49:58 <clarkb> for turning the builder on if we preserve the ezisting log publiahing that should help people debug any potential issues 22:50:04 <clarkb> also exct upload failures 22:50:12 <jeblair> yes, i'd like to add that it's been really fun working with Shrews, pabelanger, and clarkb on pushing hard on getting the nodepool builder ready for production :) 22:50:42 <clarkb> ok now I afk again 22:51:32 <jeblair> shall we end this with clarkb's mic drop? 22:51:37 <Shrews> word 22:51:43 <jeblair> #endmeeting