19:03:39 <fungi> #startmeeting infra 19:03:40 <openstack> Meeting started Tue Mar 14 19:03:39 2017 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:43 <openstack> The meeting name has been set to 'infra' 19:03:45 <fungi> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:04:12 <fungi> #topic Announcements 19:04:23 <fungi> i don't have any this week 19:04:35 <fungi> as always, feel free to hit me up with announcements you want included in future meetings 19:04:56 <fungi> #topic Actions from last meeting 19:05:06 <fungi> #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-03-07-19.01.htm 19:05:30 <fungi> pabelanger to send ML post to get more feedback on our current run-tox playbooks / role 19:05:38 <fungi> i want to say i read it cover to cover 19:06:02 <jeblair> fungi: then say it! 19:06:34 <fungi> #link http://lists.openstack.org/pipermail/openstack-infra/2017-March/005230.html Feedback requested for tox job definition 19:06:39 <fungi> looks done to me 19:06:40 <jhesketh> o/ 19:07:06 <fungi> pabelanger: per yesterday's zuul meeting, you're no longer blocked on that, right? 19:07:56 <pabelanger> fungi: right, we are merged 19:08:23 <fungi> #link http://eavesdrop.openstack.org/meetings/zuul/2017/zuul.2017-03-13-22.02.log.html#l-120 Status updates: Zuul sample jobs 19:08:25 <fungi> cool 19:08:57 <fungi> #topic Specs approval: PROPOSED Zuul v3: remove references to swift (jeblair) 19:09:26 <fungi> #link https://review.openstack.org/443984 Zuul v3: remove references to swift 19:09:30 <jeblair> oh thanks 19:09:35 <jeblair> i was looking for that link under my sandwich 19:10:02 <jeblair> this is pretty short. we talked about it at the ptg, and a little in irc since then 19:10:31 <jeblair> the gist is that we think sending logs to swift is a thing people will still probably want to do, but we don't need built-in support in zuul for it 19:10:37 <fungi> any feel for whether there are downstreams relying on that feature? 19:10:48 <jeblair> so we can drop that from the spec, and later add it back to what we're calling the 'standard library' 19:10:51 <fungi> ahh, right, can just be in a role/playbook 19:11:07 <fungi> seems entirely non-contentious to me 19:11:29 <fungi> anyone object to opening council voting on this until 19:00 utc thursday? 19:11:59 <jeblair> i think based on informal conversations, it's ready for vote 19:12:27 <fungi> #info Council voting is open on "Zuul v3: remove references to swift" until 19:00 UTC Thursday, March 16 19:12:41 <fungi> looks like it also depends on some silly whitespace fixes 19:12:53 <jeblair> they are absolutely critical. 19:13:11 <fungi> which are now approved 19:13:57 <fungi> #topic Specs approval: PROPOSED Zuul v3: update job trees to graphs (jeblair) 19:14:19 <fungi> #link https://review.openstack.org/443985 Zuul v3: update job trees to graphs 19:14:38 <jeblair> this also has a follup clarification which i don't think needs its own vote, but it may be helpful to look at it since it makes the example more clear 19:14:58 <jeblair> #link https://review.openstack.org/445022 Zuulv3: clarify job dependencies example 19:15:16 <jeblair> this one is substantial -- it changes the job and project-pipeline definition syntax 19:15:37 <jeblair> it also has an implementation based on a patch which was proposed to the master branch some time ago 19:15:40 <jeblair> so you can see it in action 19:15:59 <fungi> while a small part of me feels like this is scope creep, it was apparently stated that this would be a part of zuul v3 and as it's basically already written, i don't see any reason to object 19:16:33 <jeblair> yeah, we probably should have put at least a placeholder in the spec earlier to say "we're going to do this but we don't know what the syntax will look like yet". sorry about that. 19:17:00 <jeblair> before v3.0 is definitely the time to do this though, since it's a major configuration syntax change. we won't have any users left if we put them through two of those. 19:17:24 <fungi> right, that's the best argument in favor for me. much harder to do after the big bang syntax change 19:17:46 <fungi> anyone object to opening council voting on this until 19:00 utc thursday? 19:18:05 <jeblair> (the main thing to note in the spec update is that we lose the ability to structure jobs in a yaml tree. so you don't *visually* see the dependencies in the config file. but you are able to express them fairly easily, and of course it allows more topologies than before.) 19:18:55 <clarkb> no objection here. I think i have already voted positively on this one 19:19:11 <fungi> #info Council voting is open on "Zuul v3: update job trees to graphs" until 19:00 UTC Thursday, March 16 19:19:27 <fungi> seemed pretty consensual in the zuul subteam meeting yesterday anyway 19:19:54 <fungi> #topic Priority Efforts 19:20:15 <fungi> i don't see any specific updates/blockers for these called out in the agenda 19:20:55 <fungi> i know that the task tracker and zuul v3 work are proceeding, and zaro was inquiting about some of the testing prerequisites for the gerrit upgrade just before the meeting 19:21:17 <fungi> at some point we likely need to revisit the ansible puppet apply spec and see whether whatever's remaining on it is still a priority 19:22:50 <fungi> #topic Zuulv3 sample jobs (pabelanger) 19:22:58 <fungi> is this a holdover from last week? 19:23:08 <fungi> looks like the thing which spawned the action item 19:23:08 <pabelanger> it is, sorry for not deleting it from the wiki 19:23:25 <fungi> no problem, sorry for not cleaning up the agenda )or paying close enough attention to last week's meeting minutes) 19:23:41 <fungi> #topic Plan sprint for getting off precise. Should happen within the next month or so. (clarkb) 19:23:44 <clarkb> ohai 19:23:50 <fungi> this looks fresh, or at least worth rediscussing 19:24:05 <clarkb> one of the things we siad pre ptg was that we should have a sprint to finish up the precise cleanup (and possibly work on xenialification) 19:24:16 <clarkb> precise EOL is a month away or so so we should work on that 19:24:46 <pabelanger> yes, we talked a little about this last week in open discussions 19:24:54 <pabelanger> thanks for bringing it up again 19:25:11 <clarkb> ah ok, I missed last week (I guess i hsould've checked logs) 19:25:20 <clarkb> was a time frame selected? I think we likely need at least several days 19:25:41 <pabelanger> no, we didn't select a date 19:26:06 <clarkb> I'm basically wide open between now and summit (no planned travel) 19:26:27 <fungi> i won't be able to pitch in a week from tomorrow (wednesday the 22nd) as i have a prior obligation which will occupy most of my day 19:26:56 <fungi> aside from that i don't have any more travel until april 10th 19:26:56 <pabelanger> I should be good anytime between now and summit 19:27:06 <pabelanger> no travel on the books 19:27:13 <clarkb> maybe we can target next week and just deal with fungi being out :) 19:27:21 <clarkb> I think sooner we start working on this the better 19:27:22 <fungi> almost a month with no travel for me--quite thrilled 19:27:32 <zara_the_lemur__> nice 19:27:54 <jeblair> do we have a list handy? 19:27:55 <fungi> well, i'm only out one day. i'll happily work on it the rest of the week if that's the plan 19:28:34 <ianw> i'm around (in au time) and happy to help on this too 19:28:41 <clarkb> I don't have a list handy, but shouldn't be had to get one from ansible/puppet? 19:29:03 <jeblair> might be helpful to estimate how many days 19:29:10 <clarkb> good point 19:29:12 * fungi remembers the days when he could just pull that up in puppetboard 19:29:22 <clarkb> fungi: me too :( 19:29:29 <fungi> speaking of servers still running on precise... 19:29:37 <fungi> (for some definitions of "running in this case) 19:30:08 <clarkb> https://etherpad.openstack.org/p/newton-infra-distro-upgrade-plans found our old etherpad 19:30:36 <fungi> perfect. if anybody's deployed anything at all on precise since then, they're clearly not paying attention 19:30:57 <fungi> wiki can come off that list 19:31:06 <clarkb> lists, static, planet, puppetd, wiki, zuul. And pretty sure soe of those are done 19:31:36 <fungi> zuul is now trusty, just checked 19:31:39 <clarkb> static is done too 19:32:18 <fungi> so that brings us to lists, planet and puppetdb (puppetboard) 19:32:22 <clarkb> yup 19:32:28 <clarkb> and I checked those all are precise 19:32:33 <jeblair> is puppetdb still useful? 19:32:40 <fungi> it's so very, very broken 19:32:42 <clarkb> planet wants an upgrade to xenial iirc 19:32:47 <fungi> yeah 19:32:48 <clarkb> jeblair: I think if the service could be made to work yes 19:32:49 <pabelanger> ci-backup according to site.pp 19:32:55 <pabelanger> jeblair: not to me 19:33:11 <fungi> pabelanger: oh, i think there is an upgraded one for that which hasn't been switched into production yet? 19:33:23 <clarkb> jeblair: the use case we need from something like it is reporting to humans that are not roots so they cansee when their changes happen 19:33:32 <pabelanger> fungi: not sure, site.pp just lists precise 19:33:41 <jeblair> clarkb: but only if they look quickly, before the next 10 runs 19:34:03 <clarkb> jeblair: yes. Not sure puppetboard is the best way to solve that problem. But its the problem we'd like to have solved 19:34:07 <fungi> well, in this case i say it's "not useful" because it's completely and thoroughly offline for a year or more now 19:34:33 <jeblair> i vote we drop it. i'd love the use case to be handled, but we should stop pretending it is. :| 19:34:37 <fungi> perpetual "internal server error" 19:34:59 <fungi> yeah, i won't object to someone redeploying it and getting it working, we can keep the config management around for it 19:35:17 <fungi> but it's not like deleting the current precise host it's un-running on will be a regression 19:35:47 <clarkb> right 19:35:51 <clarkb> so mostly lists and planet then 19:36:28 <fungi> yup. lists is tricksy because we'll need to defer inbound messages on a secondary mx (ideally) while we test out the replacement with copied (migrated?) data 19:36:39 <pabelanger> interwebs not great for me currently 19:37:11 <fungi> planet should be a relative non-event since there's no persistent data. just need to get it working (on xenial i guess?) and then switch dns 19:37:52 <clarkb> fungi: jeblair (email noob here) couldn't we set up a second MX record for ne whost with higher priority which would cause senders to fall back on existing host until ne whost existed? 19:38:02 <clarkb> is problem in syncing the data in mailmain? 19:38:06 <jeblair> fungi: well, we don't really need a secondary mx if the downtime is small (say < 30m). 19:38:17 <fungi> i'm not sure the term "sprint" is really applicable here, but we do need one or (preferably) more people carefully planning the lists switch and then pre-testing the replacement 19:38:42 <jeblair> sending servers don't usually generate bounces for connection errors until they persist a while 19:39:36 <fungi> right, holding the inbound queue on a secondary mx would really only be necessary if we can't make the switch (including dns propagation) happen in a reasonably short timeframe 19:39:54 <jeblair> but if we did want no interruption in mx service, we could probably just have the existing one queue and then deliver to the new host. 19:40:17 <jeblair> downside for that is it's an extra bit of mta configuration that needs its own testing 19:40:21 <fungi> true, and that's only a few lines of exim config 19:40:36 <fungi> but right, we'd probably want to test it somewhere else beforehand 19:40:59 <clarkb> I think we can take a few minutes downtime for queuing 19:41:03 <jeblair> the worst part of this is that we will lose our ip and its reputation 19:41:07 <fungi> so maybe we should just take volunteers to work on the planet replacement and to work on the lists replacement and maintenance plan? 19:41:09 <clarkb> and worst worst case some small number of people might need to resend their emails? 19:42:05 <clarkb> jeblair: we could try an upgrade in place... 19:42:18 <jeblair> yeah. we can practice with snapshots. 19:42:54 <fungi> it worked reasonably well for the wiki server 19:43:23 <fungi> and while that would preserve our ip address, it's fairly out of character for us to upgrade that way 19:43:33 <clarkb> having had ubuntu lts' installs upgraded in place over years it ends up being pretty messy 19:43:37 <fungi> but worth considering nonetheless 19:43:37 <clarkb> but functional 19:44:29 <clarkb> jeblair: how important is that reputation? do you expect we will end up being rejected by a bunch of MXes if we change IPs? 19:45:04 <fungi> worth noting, i strongly suspect a redeploy from scratch will end up renotifying all the listadmins in the lists class manifest about their lists being created (along with the default listadmin password) 19:45:10 <ianw> probably depends on who had that IP before :) 19:45:15 <jeblair> clarkb: it would not surprise me if it took a few days or a week to settle out. 19:45:17 <fungi> yeah, quite the gamble 19:45:49 <jeblair> rackspace's vetting usually means we're not getting spammer ips, but still. 19:46:15 <fungi> #link https://www.senderbase.org/lookup/?search_string=lists.openstack.org 19:46:23 <ianw> i can take bringing up a xenial planet if you like; i maintained a planet in a previous life 19:46:23 <clarkb> so probably worht testing the inplace upgrade with a snapshot then? 19:46:41 <fungi> email rep. is "good" for both ipv4 and ipv6 addresses 19:47:15 <fungi> ianw: awesome, pleia2 indicated it should be viable on xenial, we just weren't quite ready to run xenial servers yet at that time 19:47:29 <jeblair> yeah, i'm thinking in-place may be worth trying for this. 19:47:36 <fungi> no objection here 19:47:55 <pabelanger> worth a try 19:47:57 <fungi> #action ianw try booting a Xenial-based replacement for planet.openstack.org 19:48:10 <ianw> it's pretty much the same mailman version right? 19:48:28 <fungi> #agreed We'll attempt an in-place upgrade of lists.openstack.org from Precise to Trusty, practicing on instance snapshots beforehand 19:48:56 <clarkb> ianw: yes precise -> trusty mailman is baically the same 19:48:57 <fungi> #link http://packages.ubuntu.com/mailman 19:49:09 <clarkb> 2.1.14 -> 2.1.16 19:49:22 <ianw> yep, that's good, not also shoehorning a v3 upgrade on top :) 19:49:43 <fungi> last news i read, mmv3 is still in a questionable state 19:50:10 <ianw> yeah, that was what i thought too, with all the configuration being quite different 19:50:12 <clarkb> is that something people will be able to work on soon? 19:50:25 <clarkb> any volunteers to work on mailman? I can help there but am definitely not an email expert 19:50:32 <fungi> if we're already incurring the pain on precise->trusty, do we want to follow that with a trusty->xenial in short order? anybody remember which release got the dmarc workaround patches? 19:51:14 <jeblair> clarkb: i will volunteer to help 19:51:23 <clarkb> fungi: I think we may want to do a single in place upgrade, evaluate how that went. Then decide from there if we want to d oa second to xenial 19:51:25 <fungi> #link https://wiki.list.org/DEV/DMARC says 2.1.26 added frim_is_list 19:51:26 <jeblair> better yet, i *do* volunteer to help 19:51:44 <fungi> #undo 19:51:45 <openstack> Removing item from minutes: #link https://wiki.list.org/DEV/DMARC 19:51:50 <fungi> #link https://wiki.list.org/DEV/DMARC says 2.1.16 added frim_is_list 19:52:02 <clarkb> fungi: oh 2.1.16 is what precise has so maybe thats less urgent? 19:52:15 <fungi> so trusty should get us dmarc-related options without needing to consider xenial quite yet 19:52:19 <fungi> yep 19:52:42 <fungi> 2.1.18 added some more stuff though per that wiki 19:52:47 <clarkb> (I think we should also work to xenialify in the near future too, but don't want to make the goalposts to far ahead since precise is eol real soon now) 19:52:58 <fungi> xenial would get us 2.1.20 19:53:04 <jeblair> to be honest, i'm not sure we should enable that option 19:53:28 <fungi> i too am fine with sticking to our guns on ignoring dmarc 19:53:49 <fungi> some listservs i've got subscriptions on have decided to unsubscribe and block subscribers from affected domains instead 19:53:59 <fungi> well, block posts from 19:54:11 <jeblair> that is appropriate and consistent with what those domains have expressed as their policy via dmarc. 19:54:47 <jeblair> i'm intrigued by the 2.1.18 dmarc behaviors. i'll have to think about that a bit. 19:55:29 <fungi> yea, half the blame is on dmarc-asserting domains for not telling their users they shouldn't post to mailing lists, and the other half on dmarc-validating mtas on ignoring the incompleteness which is dmarc 19:56:37 <jeblair> i'm going to spend the rest of the day trying to shoehorn "systemdmarc" into a sentence somehow. 19:56:50 <fungi> so anyway, i can volunteer to help on the lists.o.o stuff too (i feel fairly confident with its cil utilities, pickle file manipulating tool and archive filesystem layout), though i worry i don't have time to drive the effort 19:57:16 <fungi> jeblair: i'm sure lennart already has a plan for that one 19:57:16 <clarkb> do we wnat to pick a specific time next week to work on this further or let volunteers poke at it as they are able? 19:58:04 <clarkb> I guess jeblair and ianw can grab fungi and me for help as needed and go from there? 19:58:17 <fungi> or is there any other infra-root who wants to gain a deeper understanding for mailman? 19:58:39 <fungi> there's nothing like the stress of an upgrade to hammer home key concepts ;) 19:58:43 * Shrews hides in the darkest corner 19:58:49 <jeblair> fungi, clarkb: maybe let's poke at it as able, but plan on spending thurs-fri next week making real headway if we don't before then? 19:58:59 <clarkb> jeblair: ++ 19:59:02 <fungi> jeblair: i'm happy to commit to that 19:59:09 <ianw> are we defaulting to rax as a provider for these hosts? 19:59:15 <ianw> well clearly the inplace upgrade stays 19:59:20 <clarkb> ianw: for lists yes due to IP 19:59:24 <jeblair> (hopefully that gives us some time to asynchronously make some snapshots, etc). maybe keep an etherpad with a work log. 19:59:24 <fungi> that's a good question we don't have time to answer in the next 30 seconds 19:59:27 <clarkb> planet could potentially be hosted by vexxhost? 19:59:37 <fungi> i'm fine with that idea 19:59:52 <jeblair> no objections to planet vexxhost 20:00:04 <fungi> obviously the plan for lists.o.o is rackspace for now since we're doing in-place 20:00:10 <fungi> and we're out of time 20:00:13 <fungi> thanks everyone! 20:00:17 <fungi> #endmeeting