19:03:09 #startmeeting infra 19:03:10 Meeting started Tue Jul 11 19:03:09 2017 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:13 The meeting name has been set to 'infra' 19:03:22 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:03:28 o/ 19:03:28 #topic Announcements 19:03:42 #info Many thanks to jlk (Jesse Keating) and tobiash (Tobias Henkel) for agreeing to take on core reviewer duties for zuul/nodepool and related repos! 19:03:57 #info Also belated thanks to larainema (Dong Ma) for a willingness to help with jenkins-job-builder core reviewer responsibilities! 19:03:57 \o/ 19:04:02 \o/ congrats! 19:04:11 \o/ 19:04:12 yay people 19:04:25 we should probably get jlk to join us in here ... 19:04:25 great 19:04:25 moar peoples 19:04:36 thank you :) 19:04:49 tobiash: don't thank me yet! 19:05:06 * mordred hands tobiash a lovely box of pies he found over in the corner 19:05:08 thank you for the hard work and willingness to take on additional responsibility 19:05:26 * Shrews needs additional reminders for this meeting. o/ all 19:05:42 Shrews: there's a meeting now 19:05:44 * fungi scrounges around for some other annuoncements here 19:05:49 * jeblair is helpful 19:05:49 announcements too 19:05:53 jeblair: how timely! :) 19:05:59 #info Don't forget to register for the PTG if you're planning to attend 19:06:06 #link https://www.openstack.org/ptg/ PTG September 11-15 in Denver, CO, USA 19:06:22 #info If you have something you want to present at the Summit, submit your abstract by Friday 19:06:30 #link http://lists.openstack.org/pipermail/openstack-dev/2017-July/119494.html July 14 CFP Deadline - Sydney Summit 19:06:45 Thanks, still need to submit something 19:06:45 as always, feel free to hit me up with announcements you want included in future meetings 19:07:00 #topic Actions from last meeting 19:07:11 #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-06-27-19.03.html Minutes from last meeting 19:07:42 ianw abandon pholio spec and shut down pholio.openstack.org server 19:07:45 ianw mentioned he's on vacation this week, but he seems to have shutdown the server 19:07:51 need the spec abandoned still, so just readding that for when he returns 19:07:58 #action ianw abandon pholio spec 19:08:08 fungi get details on current server models, presence of rails and switchport counts for infra-cloud 19:08:18 i've put in a request to kevin coker, data center manager at hpe and am still waiting on some details 19:08:28 he confirmed at least that the mounting rails wouldn't fit the new racks we moved into and they didn't have any appropriate replacements on hand 19:08:37 also that due to lack of available storage space the unused mounting rails and original network switches were discarded 19:08:56 i was able to find a fairly recent spreadsheet with the server models and some other details noted, which i've imported into a paste: 19:08:58 Ya, I read up on that thread last night. Consider me available to help drive infracloud things if needed 19:09:07 #link http://paste.openstack.org/show/613968/ infra-cloud server hardware details 19:09:28 kevin also assures me he should have the switch port/media type counts for me this week, _along_with_ photos of the back panels for everything so we can see it for ourselves 19:10:06 thanks pabelanger! i'll make sure to lean on you for some of this once we have more concrete action items 19:10:23 readding action item for the part i'm still waiting for... 19:10:27 #action fungi get switchport counts for infra-cloud 19:10:41 also can't really do the ml thread justice until i have that, so... 19:10:43 #action fungi start a ml thread about the infra-cloud rails and switching situation 19:10:49 goes back on the to do pile 19:11:08 #topic Spec proposal: Ansible Puppet Apply is implemented (fungi) 19:11:14 #link https://review.openstack.org/478610 "Ansible Puppet Apply is implemented" specs change 19:11:24 probably not contentious, as per the commit message we already basically agreed to this cleanup during our june 6 meeting 19:11:34 i just forgot to include it in the batch with the others 19:11:42 #info Council voting is open on the proposed "Ansible Puppet Apply is implemented" specs change until 19:00 UTC Thursday, July 13 19:12:10 #topic Spec proposal: Gerrit ContactStore Removal (fungi) 19:12:17 #link https://review.openstack.org/479058 "Gerrit ContactStore Removal" specification 19:12:27 this is something we've talked about for ages, but didn't have a good solution for vetting foundation membership of voters in technical elections until last month 19:12:37 and turns out it was an unexpected prerequisite for our upcoming gerrit upgrade too 19:12:48 i'm already digging into the first work item 19:12:59 the foundation needs it for sane handling of contributor ptg registration discounts anyway 19:13:08 which they'll want me to send next week once extra-atcs are nailed down by the tc 19:13:19 spec's been up for a couple weeks now and looks like it's had a few reviewers already 19:13:26 hopefully straightforward, but any questions about this? 19:13:49 * mordred is sure fungi will solve all the things 19:14:02 not really much to solve at this point 19:14:08 it is necessary for gerrit upgrade beacuse gerrit 2.13 removed the contact store feature entirely 19:14:24 because no one was using it 19:14:28 i think 2.12 removed it actually, from what i could tell in their git history 19:14:32 ah 19:14:40 sounds great, though i have to admit the way you worded the alternatives section really makes me want to propose a solution, but i'll try to restrain myself. ;) 19:14:54 so basically we're running the last gerrit release to support the feature 19:14:57 * jeblair likes a challenge 19:15:16 jeblair: yeah, i phoned the alternatives in. happy to entertain better phrasing or more options there ;) 19:15:39 fungi: no i think it's great. it's just "this is impossible" is like catnip for me 19:16:15 sure. i mean, i thought of several alternatives but none of them covered all the drawbacks 19:16:33 i will look it over more carefully after meeting 19:16:40 thanks! 19:16:59 any objections to putting it up for council vote between now and this time thursday? 19:17:19 sounds good 19:17:39 #info Council voting is open on the proposed "Gerrit ContactStore Removal" specification until 19:00 UTC Thursday, July 13 19:17:50 also, a follow-on topic... 19:17:57 #topic Spec proposal: Make Gerrit contactstore removal a priority (fungi) 19:18:03 #link https://review.openstack.org/479887 "Make Gerrit contactstore removal a priority" specs change 19:18:08 as noted in the aforementioned spec itself, this is a hard dependency for the priority effort around our gerrit 2.13 upgrade 19:18:14 and so is probably also a transitive priority 19:18:19 anyone disagree? 19:18:31 sounds reasonable 19:18:59 ++ 19:19:17 ya I think it has to be because transitive property of getting work done 19:19:20 #info Council voting is open on the proposed "Make Gerrit contactstore removal a priority" specs change until 19:00 UTC Thursday, July 13 19:19:43 #topic Priority Effort Zuul v3: OpenStack rollout discussion (mordred, jeblair) 19:19:49 #link http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html "Zuul v3" specification 19:19:54 exciting news? 19:20:43 * jeblair looks at mordred to see if he should talk 19:20:49 * mordred thinks about talking 19:20:58 * mordred goes for it 19:21:11 * jeblair relaxes somewhat in his chair 19:21:18 SO --- guess what? we're finally at "make plans to roll v3 out into production" time!!! 19:21:20 woot! 19:21:33 i shuold have popped some popcorn 19:21:39 i love laying plans the best. 19:21:43 that doesn't mean we're done - but it does mean that enough unknown-unknowns are done that the finish line is reasonably in sight 19:21:58 jeblair: if you lay them will they still roll? 19:21:58 w00t 19:22:04 great! 19:22:08 I made that etherpad with a proposal on a plan 19:22:14 if you don'twnat to read it - the summary is: 19:22:23 a) do some more stuff b) roll live at the denver PTG 19:22:40 should i link the etherpad? 19:22:47 with an asterisk in that there is still at least one thing we want to succeed at first before we commit to denver 19:22:51 o - yah 19:23:00 I stuck it on the iwki -- oh, no I didn't 19:23:01 #link https://etherpad.openstack.org/p/zuulv3-migration Zuul v3 migration plan 19:23:02 I opened the wiki 19:23:09 's okay 19:23:28 I *almost* linked it on the wiki 19:23:36 a for effort 19:23:53 there's a bit of a phased approach - adding a few more things over the next few weeks to exercise increasingly large amounts of our job content 19:24:18 while we work on making sure that we've got nice new shiny versions of the bulk of the jobs 19:24:46 there's no way we pre-translate all the jobs into shiny new versoins - but that's ok - we're currently running auto-translated versions, so we KNOW we can generate ugly versions 19:24:52 * Shrews senses a hectic PTG with much alcoholz 19:25:00 Shrews: ++ 19:25:14 before we do the merge over, what about encouring teams to retire dead repos? That means less jobs to move over and special case - and especially old jobs... 19:25:37 we should be careful not to impact other work going on at the ptg too 19:25:47 yeah, any cleanup people want to work on between now and then reduces the scope of work somewhat 19:25:56 clarkb: yeah, that's a special concern 19:25:58 everyone else there is going to be writing and pushing lots of code and if they all of a sudden can't get CI or merge things that may be sadness 19:26:00 AJaeger: well - it shouldn't matter too much - we should have a script to auto-migrate jobs that are in zuul now - so if dead repos have some ugly jobs, it's not the WORST thing - but yeah, people cleaning their bedroom isn't a terrible idea 19:26:26 clarkb: yup. totally. I do not think we should go for it if we do not hav ea high degree of confidence 19:26:34 clarkb: i think the thinking is that there's actually a lot of planning going on. and while there's some coding, it's not central to why people are there, and things are actually a little quieter than normal. 19:26:47 the plan laid out there includes a lot of testing including scale testing 19:27:04 jeblair: I'm not sure thats what we saw in atlanta at least. Maybe to a degree but not quite like we had with the old summit format where no coding happened 19:27:12 (the rooms I sat in on involved tons of coding work) 19:27:19 yah - and also - our worst-case scenario is that we have 'ugly' copies of the current jobs 19:27:44 so i believe if we stick to the plan, either we'll have strong confidence that we can avoid causing undue disruption to the ptg _or_ well postpone the cut-over 19:28:10 s/well/we'll/ 19:28:15 fungi: yeah, that sounds reasonable 19:28:18 so actually executing them really shouldn't be a change in behavior- I do not think we should auto-translate any jobs to newly structured jobs if we're iffy on the success - an auto-translated devstack-gate job that's a copy of the current script is better than an attempt at a nicer thing that doens't work 19:28:25 jeblair: ++ 19:28:27 gah 19:28:31 fungi: ++ 19:28:55 ++ all around 19:29:01 mordred: maybe we should move up the shade conversion to our dev system so we can shake out any MAJOR kinks ahead of time? would there be anything preventing us from doing that? 19:29:07 that said- I do expect a lot of questions about how to make changes - which is why I think the PTG will be a good venue, since folks can find us and get face time 19:29:17 (being the 1st victim, it might be helpful to do that) 19:29:33 Shrews: yes - that should be on the list reasonably early-ish 19:29:44 Shrews: since shade exercises devstack-gate pretty extensively 19:30:08 yeah, looks like shade and devstack-gate are the next repos to add which aren't job definition repos, per "the plan" 19:30:13 mordred: does this mean we are ready to start adding more projects to zuulv3.o.o today? If said projects want to start experimenting? 19:30:18 so are we comfortable with 'light' impact to operations at the ptg? like, if some projects have some trouble, or we have a few unexpected zuul restarts for bugfixes at the ptg, is that okay? 19:30:23 pabelanger: no - I dont think so 19:30:39 pabelanger: I think we still want to keep it constrained for now so that we know what we're debugging if we need to debug 19:30:45 mordred: ack 19:31:30 jeblair: ya I think light impact is ok. If expectation is nothing will work for 3 days that wouldn't be ok 19:31:39 clarkb: ++ 19:31:47 but "we might restart your jobs a few times tomorrow" or "some jobs might get lost during lunch" is doable 19:31:49 fungi, what about having a session early at the PTG "Zuul v3 - the new way of running jobs" ? 19:31:59 jeblair: i'd be comfortable with disruption which causes delays testing changes by an hour here or there, but having testing dead in the water for long stretches risks people coming to the infra room, lart-in-hand 19:32:00 ok. we can keep that in mind when we're making go/nogo decisions, and in our communications with the dev community so they know what to expect. 19:32:09 AJaeger: I think that's actually a great idea 19:32:17 my worry is suddenly throwing a whole lot of work to the system that's only seen relatively light work 19:32:20 AJaeger: "so we just made a big change, here are some things you might want to know" 19:32:32 mordred: exactly... 19:32:38 AJaeger: Ya, I was considering doing a talk about that at the summit, PTG might actually be better 19:32:43 AJaeger: you're predicting an upcoming meeting topic ;) 19:32:50 Shrews: yah - I think we definitely want to ramp up the load considerably before throwing the switch 19:32:57 pabelanger: both :) 19:33:09 mordred, Shrews: we can silently shadow production to generate load 19:33:12 jeblair: ++ 19:33:23 jeblair: yeah, good idea 19:33:33 ya, increasing load is something I'd love to see too 19:33:34 Shrews: scale testing is included in "the plan" too 19:33:47 well, maybe not completely 19:33:47 it is - but I think that's a different kind we should note 19:33:50 we have "Test config reading/restarting on all 2k repos" 19:33:55 yeah, the "load configuration from 2k git repos" is something i'm particularly interested in. :) 19:33:58 so yes, maybe running lots of jobs isn't in there yet 19:33:58 yah 19:34:06 let's add that ... 19:34:06 that's something we've never done before, though running lots of jobs is. 19:34:17 nodepool, give me 2k nodes for my job :D 19:34:43 (so if we run into problems with running lots of jobs, we have some experience with that. if we run into a problem with config loading/parsing, we're going to need to be more creative) 19:35:01 we could plan a soft outage of zuul v2 some friday where we coopt our aggregate quota for zuul v3 load testing i guess 19:35:10 k. added to etherpad 19:35:13 fungi: oh that's a good idea 19:35:54 i would want to see a very clear test plan leading into that sort of window though, so we don't waste the relatively little bit of time we set aside for it 19:36:08 (nb: we should be saying load configuration from 6k branches, not 2k repos) 19:36:29 like, at least knowing which scenarios we want to trst and having the sequence and hopefully commands lined up in advance 19:36:45 s/trst/test/ 19:36:46 jeblair: good point 19:36:56 fungi: ++. it'll probably have to come after we have some of the bulk config generation done. 19:37:02 agreed 19:37:10 yup. 19:37:11 it'll likely be pretty close up against the ptg 19:37:24 we might come up with some clever ways to test things between now and then too 19:37:28 depending on how fast the rest of this goes 19:37:38 realistically, yeah. 19:38:06 one thing i just realized we don't have on here is docs 19:38:13 important! 19:38:28 of course, we've just landed some major docs changes so zuulv3 actually *has* docs 19:38:32 we know we want to improve them 19:38:46 but also, we should start talking about what kind of docs we need to write for the openstack dev audience 19:38:57 like "so you jjb your openstack job now -- here's what you need to know" 19:39:06 ++ 19:39:11 things like what we would talk about at the ptg, but we'll want written as well 19:39:21 #agreed We're shooting for production migration to Zuul v3 immediately prior to (or possibly during) the Queens PTG in Denver, week of September 11 19:39:29 ^ yeah? 19:39:35 ++ 19:39:37 we probably also need some sort of message to prevent the chef/puppet/salt/!ansible crowd from revolting 19:39:38 ++ 19:39:52 as far as dev docs go (since there are strong feelings in that space) 19:39:55 clarkb: yeah, we should start working a communications plan into this as well 19:40:06 clarkb: indeed. in fact - it's probably worth adding to the list getting some jobs done for a few non-ansible repos 19:40:11 let's hold ptg-specific discussions for the separate meeting topic i've set aside 19:40:14 mordred: ++ 19:40:17 like maybe the infra puppet repos - so we can hav ea thing to point to 19:40:27 "look, this is what driving puppet looks like" 19:41:12 we use puppet for a lot of our infrastructure, so adding that makes sense anyway 19:41:15 clarkb: do we still have active salt and chef teams? I thought we were down to puppet, ansible and juju at this point? 19:41:27 mordred: I don't think we have salt but we do have chef 19:41:51 and now also have things like helm and friends 19:41:54 mordred: i'm taking some quick notes at the top of the etherpad for things to add to it. i have docs, communication plan, puppet jobs 19:42:05 jeblair: cool 19:42:11 focusing examples on technologies our community infrastructure is actively relying on makes sense, and shouldn't be seen as "playing favorites" hopefully 19:42:40 I mean- their jobs have been run with ansible for the last year - so hopefully it's not 100% shocking 19:42:43 but yes 19:42:51 mordred: that's a great way to introduce it 19:43:10 the intent is certainly not to remove anyone's ability to do non-ansible things - in fact, we would hope it'll empower them to do more stuff 19:43:25 I think a lot of the anti ansible sentiment out there has to do explicitly with the lagnuage used. Its pretty terrible (wee yaml as code). So having examples that don't force you to write a ton of ansibly ansible may be good 19:43:42 similar thing with puppet if you were to try and run jobs with puppet. The DSL makes a lot of people mad 19:43:46 clarkb: yup. agree - and should be easy to do 19:44:03 luckily "just do this giant block of shell script" totally works 19:44:07 ya 19:44:22 communicating that ^ is an option is likely what we want to do here 19:44:29 while also saying if you like ansible go to town 19:44:52 exactly 19:45:02 there's also an important audience split 19:45:04 and we'll have a large library of friendly base jobs to build on 19:45:20 examples of shell scripts in yaml blocks should look pretty familiar to people who have written jjb configs anyway 19:45:22 which is that for many of the base jobs people will build on top of - we as infra/zuul people will be writing ansible things 19:45:36 but that doesn't mean that people writing job content for their project need to 19:45:57 it can be easy to miss that distinction - and important that we make it clearly for folks 19:46:52 anything else we need to iron out for this during the meeting, or should i try to get to some more of the agenda in the next 13 minutes? 19:47:07 good from my end! 19:47:13 ++ 19:47:35 thanks! and this is awesome, in case i haven't said so enough already 19:47:52 #topic Priority Effort Gerrit 2.13 Upgrade: Status update (clarkb) 19:47:55 #link http://specs.openstack.org/openstack-infra/infra-specs/specs/gerrit-2.13.html "Gerrit 2.13 Upgrade" specification 19:48:31 I just upgraded review-dev to 2.13.9.4 this morning. This has the latest gerrit 2.13 release from last week wtih our mysql fix and the fix poitned to us by wikimedia foundation for account lookups against the cache. 19:48:50 I expect that unless we find other bugs this is the version we will deploy to production because gerrit isn't supposed to release any more 2.13 releases 19:49:06 also puppet/ansible is enabled again for review-dev (also mentioned in the infra status log) 19:49:25 topic:gerrit-upgrade has a chnage up to fix one upgrade sequence thing I ran into today. Reviews on that would be good. But next step is largely going to be testing 19:49:45 please go use review-dev for common tasks that you have. gertty, ui code review,s pushing code, api queries, whatever 19:49:54 btw, are folks okay with tobiash's fix for the case sensitive label thing in zuul? 19:50:04 separately I hope to get up a noopy zuul instance to ensure that things are working there. ^ is related 19:50:18 https://review.openstack.org/469946 and child 19:50:45 jeblair: i have it on the gerrit testing etherpad 19:50:51 cool 19:50:57 dunno if anybody's tried to confirm it on review-dev yet 19:50:57 will 2.13 upgrade be before or after zuulv3 rollout? 19:51:04 i don't have a better alternative. but if anyone has any bright ideas, now would be a good time. 19:51:08 is it worth also pointing the v3 at review-dev too? 19:51:13 tobiash: hard to say 19:51:15 I've already fixed a small number of issues that I have found with comment links. Gerrit changed urls just for fun 19:51:40 clarkb: oh goodie 19:51:40 mordred: tobiash ya we may want to test both a v2.5 and a v3 zuul just to have options 19:51:49 if before, i reckon we will need to backport? 19:52:06 the actual fix should be easy to backport. the tests, less so :( 19:52:13 indeed 19:52:50 I could try if I have spare time next week 19:53:18 thanks! 19:53:19 tobiash: that would be great 19:53:30 so ya go out and use it real quick to make sure your workflows work 19:53:31 tobiash: a workaround is making sure the events are listed in lower-case in the zuul config, right? 19:53:41 and I will attempt to get zuul tested against it 19:53:48 mordred: nope that doesn't work 19:53:52 ah. AWESOME 19:53:53 and feel free to point a test zuul at review-dev and trigger stuff there 19:53:59 that was my first try to solve this with 2.13 19:55:07 zuul (gate) then sends the lowercased label together with submit to gerrit resulting in a +2d but not submitted change :( 19:56:18 oh. that truly is fantastic 19:56:26 anything else we want to cover on the gerrit upgrade status? 19:57:46 #topic PTG planning (fungi) 19:57:49 #link https://etherpad.openstack.org/p/infra-ptg-queens Infra planning pad for Queens PTG in Denver 19:57:54 just kicking off the async brainstorming process here... 19:57:59 add your ideas on the pad 19:58:05 i suppose starting with the zuul v3 stuff we just discussed a few minutes ago ;) 19:58:10 does the ptg have a facility for having some kind of session of general interest? 19:58:26 like, can we fit something into the venue/schedule requirements? 19:58:28 yes, the format this time has evolved a bit 19:58:37 there will be reservable discussion room(s) 19:58:49 i've slotted all five days for us, for starters 19:59:07 and the first two days are intended more for general cross-project/community outreach 19:59:18 like we can do "office hours" inservice type things 19:59:33 cool, so we can have a big zuulv3 session the first day, then have consultations as needed 19:59:38 but as ttx mentions there are also separate rooms we can schedule 19:59:41 jeblair: ++ 19:59:46 and maybe even a repeat for folks that miss the first big one 19:59:57 and the ptgbot also comes into play where we can announce what we're doing in a more discoverable way 20:00:09 plus we still have the ethercalc for scheduling things further in advance 20:00:13 anyway, we're out of time 20:00:17 and there's a tc meeting 20:00:18 #thanks ptgbot :) 20:00:21 thanks everyone! 20:00:26 #endmeeting