#openstack-meeting log

19:03:39 <fungi> #startmeeting infra
19:03:40 <openstack> Meeting started Tue Mar 14 19:03:39 2017 UTC and is due to finish in 60 minutes.  The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:03:43 <openstack> The meeting name has been set to 'infra'
19:03:45 <fungi> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:04:12 <fungi> #topic Announcements
19:04:23 <fungi> i don't have any this week
19:04:35 <fungi> as always, feel free to hit me up with announcements you want included in future meetings
19:04:56 <fungi> #topic Actions from last meeting
19:05:06 <fungi> #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-03-07-19.01.htm
19:05:30 <fungi> pabelanger to send ML post to get more feedback on our current run-tox playbooks / role
19:05:38 <fungi> i want to say i read it cover to cover
19:06:02 <jeblair> fungi: then say it!
19:06:34 <fungi> #link http://lists.openstack.org/pipermail/openstack-infra/2017-March/005230.html Feedback requested for tox job definition
19:06:39 <fungi> looks done to me
19:06:40 <jhesketh> o/
19:07:06 <fungi> pabelanger: per yesterday's zuul meeting, you're no longer blocked on that, right?
19:07:56 <pabelanger> fungi: right, we are merged
19:08:23 <fungi> #link http://eavesdrop.openstack.org/meetings/zuul/2017/zuul.2017-03-13-22.02.log.html#l-120 Status updates: Zuul sample jobs
19:08:25 <fungi> cool
19:08:57 <fungi> #topic Specs approval: PROPOSED Zuul v3: remove references to swift (jeblair)
19:09:26 <fungi> #link https://review.openstack.org/443984 Zuul v3: remove references to swift
19:09:30 <jeblair> oh thanks
19:09:35 <jeblair> i was looking for that link under my sandwich
19:10:02 <jeblair> this is pretty short.  we talked about it at the ptg, and a little in irc since then
19:10:31 <jeblair> the gist is that we think sending logs to swift is a thing people will still probably want to do, but we don't need built-in support in zuul for it
19:10:37 <fungi> any feel for whether there are downstreams relying on that feature?
19:10:48 <jeblair> so we can drop that from the spec, and later add it back to what we're calling the 'standard library'
19:10:51 <fungi> ahh, right, can just be in a role/playbook
19:11:07 <fungi> seems entirely non-contentious to me
19:11:29 <fungi> anyone object to opening council voting on this until 19:00 utc thursday?
19:11:59 <jeblair> i think based on informal conversations, it's ready for vote
19:12:27 <fungi> #info Council voting is open on "Zuul v3: remove references to swift" until 19:00 UTC Thursday, March 16
19:12:41 <fungi> looks like it also depends on some silly whitespace fixes
19:12:53 <jeblair> they are absolutely critical.
19:13:11 <fungi> which are now approved
19:13:57 <fungi> #topic Specs approval: PROPOSED Zuul v3: update job trees to graphs (jeblair)
19:14:19 <fungi> #link https://review.openstack.org/443985 Zuul v3: update job trees to graphs
19:14:38 <jeblair> this also has a follup clarification which i don't think needs its own vote, but it may be helpful to look at it since it makes the example more clear
19:14:58 <jeblair> #link https://review.openstack.org/445022 Zuulv3: clarify job dependencies example
19:15:16 <jeblair> this one is substantial -- it changes the job and project-pipeline definition syntax
19:15:37 <jeblair> it also has an implementation based on a patch which was proposed to the master branch some time ago
19:15:40 <jeblair> so you can see it in action
19:15:59 <fungi> while a small part of me feels like this is scope creep, it was apparently stated that this would be a part of zuul v3 and as it's basically already written, i don't see any reason to object
19:16:33 <jeblair> yeah, we probably should have put at least a placeholder in the spec earlier to say "we're going to do this but we don't know what the syntax will look like yet".  sorry about that.
19:17:00 <jeblair> before v3.0 is definitely the time to do this though, since it's a major configuration syntax change.  we won't have any users left if we put them through two of those.
19:17:24 <fungi> right, that's the best argument in favor for me. much harder to do after the big bang syntax change
19:17:46 <fungi> anyone object to opening council voting on this until 19:00 utc thursday?
19:18:05 <jeblair> (the main thing to note in the spec update is that we lose the ability to structure jobs in a yaml tree.  so you don't *visually* see the dependencies in the config file.  but you are able to express them fairly easily, and of course it allows more topologies than before.)
19:18:55 <clarkb> no objection here. I think i have already voted positively on this one
19:19:11 <fungi> #info Council voting is open on "Zuul v3: update job trees to graphs" until 19:00 UTC Thursday, March 16
19:19:27 <fungi> seemed pretty consensual in the zuul subteam meeting yesterday anyway
19:19:54 <fungi> #topic Priority Efforts
19:20:15 <fungi> i don't see any specific updates/blockers for these called out in the agenda
19:20:55 <fungi> i know that the task tracker and zuul v3 work are proceeding, and zaro was inquiting about some of the testing prerequisites for the gerrit upgrade just before the meeting
19:21:17 <fungi> at some point we likely need to revisit the ansible puppet apply spec and see whether whatever's remaining on it is still a priority
19:22:50 <fungi> #topic Zuulv3 sample jobs (pabelanger)
19:22:58 <fungi> is this a holdover from last week?
19:23:08 <fungi> looks like the thing which spawned the action item
19:23:08 <pabelanger> it is, sorry for not deleting it from the wiki
19:23:25 <fungi> no problem, sorry for not cleaning up the agenda )or paying close enough attention to last week's meeting minutes)
19:23:41 <fungi> #topic Plan sprint for getting off precise. Should happen within the next month or so. (clarkb)
19:23:44 <clarkb> ohai
19:23:50 <fungi> this looks fresh, or at least worth rediscussing
19:24:05 <clarkb> one of the things we siad pre ptg was that we should have a sprint to finish up the precise cleanup (and possibly work on xenialification)
19:24:16 <clarkb> precise EOL is a month away or so so we should work on that
19:24:46 <pabelanger> yes, we talked a little about this last week in open discussions
19:24:54 <pabelanger> thanks for bringing it up again
19:25:11 <clarkb> ah ok, I missed last week (I guess i hsould've checked logs)
19:25:20 <clarkb> was a time frame selected? I think we likely need at least several days
19:25:41 <pabelanger> no, we didn't select a date
19:26:06 <clarkb> I'm basically wide open between now and summit (no planned travel)
19:26:27 <fungi> i won't be able to pitch in a week from tomorrow (wednesday the 22nd) as i have a prior obligation which will occupy most of my day
19:26:56 <fungi> aside from that i don't have any more travel until april 10th
19:26:56 <pabelanger> I should be good anytime between now and summit
19:27:06 <pabelanger> no travel on the books
19:27:13 <clarkb> maybe we can target next week and just deal with fungi being out :)
19:27:21 <clarkb> I think sooner we start working on this the better
19:27:22 <fungi> almost a month with no travel for me--quite thrilled
19:27:32 <zara_the_lemur__> nice
19:27:54 <jeblair> do we have a list handy?
19:27:55 <fungi> well, i'm only out one day. i'll happily work on it the rest of the week if that's the plan
19:28:34 <ianw> i'm around (in au time) and happy to help on this too
19:28:41 <clarkb> I don't have a list handy, but shouldn't be had to get one from ansible/puppet?
19:29:03 <jeblair> might be helpful to estimate how many days
19:29:10 <clarkb> good point
19:29:12 * fungi remembers the days when he could just pull that up in puppetboard
19:29:22 <clarkb> fungi: me too :(
19:29:29 <fungi> speaking of servers still running on precise...
19:29:37 <fungi> (for some definitions of "running in this case)
19:30:08 <clarkb> https://etherpad.openstack.org/p/newton-infra-distro-upgrade-plans found our old etherpad
19:30:36 <fungi> perfect. if anybody's deployed anything at all on precise since then, they're clearly not paying attention
19:30:57 <fungi> wiki can come off that list
19:31:06 <clarkb> lists, static, planet, puppetd, wiki, zuul. And pretty sure soe of those are done
19:31:36 <fungi> zuul is now trusty, just checked
19:31:39 <clarkb> static is done too
19:32:18 <fungi> so that brings us to lists, planet and puppetdb (puppetboard)
19:32:22 <clarkb> yup
19:32:28 <clarkb> and I checked those all are precise
19:32:33 <jeblair> is puppetdb still useful?
19:32:40 <fungi> it's so very, very broken
19:32:42 <clarkb> planet wants an upgrade to xenial iirc
19:32:47 <fungi> yeah
19:32:48 <clarkb> jeblair: I think if the service could be made to work yes
19:32:49 <pabelanger> ci-backup according to site.pp
19:32:55 <pabelanger> jeblair: not to me
19:33:11 <fungi> pabelanger: oh, i think there is an upgraded one for that which hasn't been switched into production yet?
19:33:23 <clarkb> jeblair: the use case we need from something like it is reporting to humans that are not roots so they cansee when their changes happen
19:33:32 <pabelanger> fungi: not sure, site.pp just lists precise
19:33:41 <jeblair> clarkb: but only if they look quickly, before the next 10 runs
19:34:03 <clarkb> jeblair: yes. Not sure puppetboard is the best way to solve that problem. But its the problem we'd like to have solved
19:34:07 <fungi> well, in this case i say it's "not useful" because it's completely and thoroughly offline for a year or more now
19:34:33 <jeblair> i vote we drop it.  i'd love the use case to be handled, but we should stop pretending it is.  :|
19:34:37 <fungi> perpetual "internal server error"
19:34:59 <fungi> yeah, i won't object to someone redeploying it and getting it working, we can keep the config management around for it
19:35:17 <fungi> but it's not like deleting the current precise host it's un-running on will be a regression
19:35:47 <clarkb> right
19:35:51 <clarkb> so mostly lists and planet then
19:36:28 <fungi> yup. lists is tricksy because we'll need to defer inbound messages on a secondary mx (ideally) while we test out the replacement with copied (migrated?) data
19:36:39 <pabelanger> interwebs not great for me currently
19:37:11 <fungi> planet should be a relative non-event since there's no persistent data. just need to get it working (on xenial i guess?) and then switch dns
19:37:52 <clarkb> fungi: jeblair (email noob here) couldn't we set up a second MX record for ne whost with higher priority which would cause senders to fall back on existing host until ne whost existed?
19:38:02 <clarkb> is problem in syncing the data in mailmain?
19:38:06 <jeblair> fungi: well, we don't really need a secondary mx if the downtime is small (say < 30m).
19:38:17 <fungi> i'm not sure the term "sprint" is really applicable here, but we do need one or (preferably) more people carefully planning the lists switch and then pre-testing the replacement
19:38:42 <jeblair> sending servers don't usually generate bounces for connection errors until they persist a while
19:39:36 <fungi> right, holding the inbound queue on a secondary mx would really only be necessary if we can't make the switch (including dns propagation) happen in a reasonably short timeframe
19:39:54 <jeblair> but if we did want no interruption in mx service, we could probably just have the existing one queue and then deliver to the new host.
19:40:17 <jeblair> downside for that is it's an extra bit of mta configuration that needs its own testing
19:40:21 <fungi> true, and that's only a few lines of exim config
19:40:36 <fungi> but right, we'd probably want to test it somewhere else beforehand
19:40:59 <clarkb> I think we can take a few minutes downtime for queuing
19:41:03 <jeblair> the worst part of this is that we will lose our ip and its reputation
19:41:07 <fungi> so maybe we should just take volunteers to work on the planet replacement and to work on the lists replacement and maintenance plan?
19:41:09 <clarkb> and worst worst case some small number of people might need to resend their emails?
19:42:05 <clarkb> jeblair: we could try an upgrade in place...
19:42:18 <jeblair> yeah.  we can practice with snapshots.
19:42:54 <fungi> it worked reasonably well for the wiki server
19:43:23 <fungi> and while that would preserve our ip address, it's fairly out of character for us to upgrade that way
19:43:33 <clarkb> having had ubuntu lts' installs upgraded in place over years it ends up being pretty messy
19:43:37 <fungi> but worth considering nonetheless
19:43:37 <clarkb> but functional
19:44:29 <clarkb> jeblair: how important is that reputation? do you expect we will end up being rejected by a bunch of MXes if we change IPs?
19:45:04 <fungi> worth noting, i strongly suspect a redeploy from scratch will end up renotifying all the listadmins in the lists class manifest about their lists being created (along with the default listadmin password)
19:45:10 <ianw> probably depends on who had that IP before :)
19:45:15 <jeblair> clarkb: it would not surprise me if it took a few days or a week to settle out.
19:45:17 <fungi> yeah, quite the gamble
19:45:49 <jeblair> rackspace's vetting usually means we're not getting spammer ips, but still.
19:46:15 <fungi> #link https://www.senderbase.org/lookup/?search_string=lists.openstack.org
19:46:23 <ianw> i can take bringing up a xenial planet if you like;  i maintained a planet in a previous life
19:46:23 <clarkb> so probably worht testing the inplace upgrade with a snapshot then?
19:46:41 <fungi> email rep. is "good" for both ipv4 and ipv6 addresses
19:47:15 <fungi> ianw: awesome, pleia2 indicated it should be viable on xenial, we just weren't quite ready to run xenial servers yet at that time
19:47:29 <jeblair> yeah, i'm thinking in-place may be worth trying for this.
19:47:36 <fungi> no objection here
19:47:55 <pabelanger> worth a try
19:47:57 <fungi> #action ianw try booting a Xenial-based replacement for planet.openstack.org
19:48:10 <ianw> it's pretty much the same mailman version right?
19:48:28 <fungi> #agreed We'll attempt an in-place upgrade of lists.openstack.org from Precise to Trusty, practicing on instance snapshots beforehand
19:48:56 <clarkb> ianw: yes precise -> trusty mailman is baically the same
19:48:57 <fungi> #link http://packages.ubuntu.com/mailman
19:49:09 <clarkb> 2.1.14 -> 2.1.16
19:49:22 <ianw> yep, that's good, not also shoehorning a v3 upgrade on top :)
19:49:43 <fungi> last news i read, mmv3 is still in a questionable state
19:50:10 <ianw> yeah, that was what i thought too, with all the configuration being quite different
19:50:12 <clarkb> is that something people will be able to work on soon?
19:50:25 <clarkb> any volunteers to work on mailman? I can help there but am definitely not an email expert
19:50:32 <fungi> if we're already incurring the pain on precise->trusty, do we want to follow that with a trusty->xenial in short order? anybody remember which release got the dmarc workaround patches?
19:51:14 <jeblair> clarkb: i will volunteer to help
19:51:23 <clarkb> fungi: I think we may want to do a single in place upgrade, evaluate how that went. Then decide from there if we want to d oa second to xenial
19:51:25 <fungi> #link https://wiki.list.org/DEV/DMARC says 2.1.26 added frim_is_list
19:51:26 <jeblair> better yet, i *do* volunteer to help
19:51:44 <fungi> #undo
19:51:45 <openstack> Removing item from minutes: #link https://wiki.list.org/DEV/DMARC
19:51:50 <fungi> #link https://wiki.list.org/DEV/DMARC says 2.1.16 added frim_is_list
19:52:02 <clarkb> fungi: oh 2.1.16 is what precise has so maybe thats less urgent?
19:52:15 <fungi> so trusty should get us dmarc-related options without needing to consider xenial quite yet
19:52:19 <fungi> yep
19:52:42 <fungi> 2.1.18 added some more stuff though per that wiki
19:52:47 <clarkb> (I think we should also work to xenialify in the near future too, but don't want to make the goalposts to far ahead since precise is eol real soon now)
19:52:58 <fungi> xenial would get us 2.1.20
19:53:04 <jeblair> to be honest, i'm not sure we should enable that option
19:53:28 <fungi> i too am fine with sticking to our guns on ignoring dmarc
19:53:49 <fungi> some listservs i've got subscriptions on have decided to unsubscribe and block subscribers from affected domains instead
19:53:59 <fungi> well, block posts from
19:54:11 <jeblair> that is appropriate and consistent with what those domains have expressed as their policy via dmarc.
19:54:47 <jeblair> i'm intrigued by the 2.1.18 dmarc behaviors.  i'll have to think about that a bit.
19:55:29 <fungi> yea, half the blame is on dmarc-asserting domains for not telling their users they shouldn't post to mailing lists, and the other half on dmarc-validating mtas on ignoring the incompleteness which is dmarc
19:56:37 <jeblair> i'm going to spend the rest of the day trying to shoehorn "systemdmarc" into a sentence somehow.
19:56:50 <fungi> so anyway, i can volunteer to help on the lists.o.o stuff too (i feel fairly confident with its cil utilities, pickle file manipulating tool and archive filesystem layout), though i worry i don't have time to drive the effort
19:57:16 <fungi> jeblair: i'm sure lennart already has a plan for that one
19:57:16 <clarkb> do we wnat to pick a specific time next week to work on this further or let volunteers poke at it as they are able?
19:58:04 <clarkb> I guess jeblair and ianw can grab fungi and me for help as needed and go from there?
19:58:17 <fungi> or is there any other infra-root who wants to gain a deeper understanding for mailman?
19:58:39 <fungi> there's nothing like the stress of an upgrade to hammer home key concepts ;)
19:58:43 * Shrews hides in the darkest corner
19:58:49 <jeblair> fungi, clarkb: maybe let's poke at it as able, but plan on spending thurs-fri next week making real headway if we don't before then?
19:58:59 <clarkb> jeblair: ++
19:59:02 <fungi> jeblair: i'm happy to commit to that
19:59:09 <ianw> are we defaulting to rax as a provider for these hosts?
19:59:15 <ianw> well clearly the inplace upgrade stays
19:59:20 <clarkb> ianw: for lists yes due to IP
19:59:24 <jeblair> (hopefully that gives us some time to asynchronously make some snapshots, etc).  maybe keep an etherpad with a work log.
19:59:24 <fungi> that's a good question we don't have time to answer in the next 30 seconds
19:59:27 <clarkb> planet could potentially be hosted by vexxhost?
19:59:37 <fungi> i'm fine with that idea
19:59:52 <jeblair> no objections to planet vexxhost
20:00:04 <fungi> obviously the plan for lists.o.o is rackspace for now since we're doing in-place
20:00:10 <fungi> and we're out of time
20:00:13 <fungi> thanks everyone!
20:00:17 <fungi> #endmeeting