19:03:48 <fungi> #startmeeting infra
19:03:49 <openstack> Meeting started Tue Jun 20 19:03:48 2017 UTC and is due to finish in 60 minutes.  The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:03:52 <openstack> The meeting name has been set to 'infra'
19:03:55 <fungi> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:04:01 <fungi> #topic Announcements
19:04:04 <fungi> #info Don't forget to register for the PTG if you're planning to attend!
19:04:09 <fungi> #link https://www.openstack.org/ptg/ PTG September 11-15 in Denver, CO, USA
19:04:13 <fungi> as always, feel free to hit me up with announcements you want included in future meetings
19:04:19 <fungi> #topic Actions from last meeting
19:04:26 <fungi> #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-06-13-19.02.html Minutes from last meeting
19:04:38 <fungi> clarkb finish writing up an upgrade doc for gerrit 2.11 to 2.13
19:04:41 <fungi> #link https://etherpad.openstack.org/p/gerrit-2.13.-upgrade-steps upgrade doc for gerrit 2.11 to 2.13
19:04:44 <fungi> still available to work on testing that on review-dev in a couple hours?
19:04:49 <fungi> i'll be around to help however you need
19:05:00 <mordred> o/
19:05:02 <clarkb> yup
19:05:10 <fungi> though i may still be dialled into the board meeting and listening at the same time, depending on how long that ends up running
19:05:16 <clarkb> if others are able to give that a look over in the next few hours that would be nice
19:05:44 <fungi> we can also talk through it during open discussion if we want
19:05:47 <fungi> fungi start an infra ml thread about puppet 4, beaker jobs and the future of infra configuration management
19:05:50 <fungi> #link http://lists.openstack.org/pipermail/openstack-infra/2017-June/005454.html Puppet 4, beaker jobs and the future of our config management
19:05:56 <fungi> sorry that took a couple weeks, but everyone interested please follow up there
19:06:11 <fungi> #action ianw abandon pholio spec and shut down pholio.openstack.org server
19:06:14 <fungi> (carrying that over so we don't forget)
19:06:29 <ianw> not yet, sorry ... gate issues have had my attention
19:06:36 <fungi> perfectly fine!
19:06:43 <fungi> it's not a hurry, just cleanup
19:06:50 <fungi> #topic Specs approval: PROPOSED PTG Bot (fungi)
19:06:53 <fungi> #link https://review.openstack.org/473582 "PTG Bot" spec proposal
19:06:59 <fungi> #info Council voting is open for the "PTG Bot" spec proposal until 19:00 UTC on Thursday, June 22.
19:07:02 <fungi> just a reminder, i gave that one the extra week since i only proposed it just before the meeting last week
19:07:21 <fungi> i also have some initial changes proposed under that review topic i'll un-wip after it gets approved
19:07:33 <fungi> #topic Specs approval: PROPOSED Provide a translation check site for translators (eumel8, ianychoi, fungi)
19:07:37 <fungi> #link https://review.openstack.org/440825 "Provide a translation check site for translators" spec proposal
19:07:44 <fungi> i added this following conversation with ianychoi during the open discussion period in last week's meeting
19:08:21 <fungi> it seems to be ready enough for a vote; it's mainly just a change of direction on an already approved spec which ended up being untenable
19:08:40 <fungi> any objections for putting it up for council vote until thursday?
19:09:23 <AJaeger> no objection by me
19:10:19 <fungi> #info Council voting is open for the "Provide a translation check site for translators" spec proposal until 19:00 UTC on Thursday, June 22.
19:10:31 <fungi> #topic Priority Efforts
19:11:01 <fungi> clarkb: any interest in talking more about the gerrit upgrade plan during the meeting?
19:11:08 <clarkb> sure
19:11:21 <fungi> #topic Priority Efforts: Gerrit 2.13 Upgrade
19:11:43 <fungi> worth noting, this is a version skip. always fun
19:11:43 <clarkb> I've got the rough plan sketched out in that etherpad for upgrading review-dev to 2.13.7.ourlocalbuild
19:12:05 <clarkb> yes because it is a version skip we cannot do online reindexing after upgrade, we have to do a full offline reindex before starting the service
19:12:11 <fungi> and per an earlier meeting, we're choosing to roll forward with 2.13.x instead of 2.14.x for now
19:12:38 <clarkb> so fungi and I will walk through that on review-dev in order to watch the reindex process
19:12:59 <fungi> timeframe for offline reindexing in 2.10 and online in 2.11 suggests we should budget at least 4 hours of downtmie
19:13:01 <clarkb> once that is done we will want to test services and scripts against 2.13. Particularly zuul as there may be new event types that we don't handle or otherwise want to handle btetter
19:13:31 <fungi> (4 hours of downtime for the production review.o.o reindex i mean)
19:13:40 <clarkb> ya don't expect review-dev to take that long
19:13:41 <fungi> review-dev will likely be far faster
19:13:43 <mordred> fungi: I know we discussed not-14 before - but I think I thought that was partially because going to 2.14 was going to be significantly more expensive ...
19:13:44 <fungi> right
19:13:48 <clarkb> as reindexing is on a thread per repo basis
19:14:07 <clarkb> mordred: it would require java 8 which requires xenial (or at least not trusty)
19:14:08 <mordred> fungi: if we're going to have to do an offline reindex in this case, is it work reconsidering that?
19:14:16 <fungi> mordred: yes, some much more significant changes in 2.14 which we didn't want to complicate the current progress with
19:14:17 <mordred> clarkb: nod
19:14:22 <mordred> kk. just making sure
19:14:37 <clarkb> mordred: if we then go from 2.13. to 2.14 we should be able to do online reindex as part of that upgrade
19:14:51 <fungi> and also because we already missed the boat on the 2.12 upgrade by deciding to refocus on 2.13
19:14:56 <clarkb> mordred: I think it is a good idea to separate the distro upgrade from the 2.14 upgrade as a result
19:14:57 <mordred> clarkb: ah - ok . so this should be the last offline reindex we need to eat
19:15:01 <mordred> clarkb: ++
19:15:02 <clarkb> mordred: hopefully
19:15:04 <mordred> definitely agree
19:15:30 <fungi> so given how long it takes us to prepare for gerrit upgrades compared to their frequency of major releases, if we keep revising our plan to be whatever the latest major release is we may never upgrade
19:16:07 <fungi> and since we already have a lot of progress and 2.13 acceptance testing behind us, i'd rather not lose that momentum
19:16:39 <clarkb> I also think that 2.13 has had a chance to mature (7 point releases) whereas 2.14 not so much yet
19:16:51 <fungi> also there are a number of useful things we can do with 2.13 (and could have done with 2.12) that make the upgrade worthwhile even if it means we immediately begin planning for the next upgrade
19:17:12 <mordred> ++
19:17:40 <clarkb> so ya hopefully after today we can start poking at testing with zuul against review-dev and our hook scripts and the election roll geneartion and all that
19:17:41 <fungi> like, enabling individual teams or the stable team to take care of the eol process, or simplifying the release automation
19:17:56 <clarkb> then maybe in a week or two we can schedule an upgrade in production
19:18:06 <clarkb> (trouble is we are getting to the fun aprt of the release cycle)
19:19:04 <fungi> i don't mind if we get the upgrade details worked out and then have to put it on ice until a lull in release activity, even if that means between the ptg and summit
19:19:39 <clarkb> yup especially because momentum on process is here now
19:19:44 <mordred> yah
19:19:49 <clarkb> just somethign to be aware of as we get closer to being ready to upgrade production
19:20:08 <fungi> #link https://releases.openstack.org/pike/schedule.html Pike Release Schedule
19:20:47 <fungi> _if_ we can swing it this cycle, it'll probably be in the next ~3 weeks
19:21:32 <clarkb> which is theoretically doable if we can get people testing it out on review-dev and making whatever changes we need to address bugs
19:22:34 <fungi> given what we don't yet know and may uncover while running through this, i'm hesitant to commit to being able to upgrade before we get into the library final release window and things start picking up
19:23:00 <fungi> so while it would be nice if it works out, i'm not going to get my hopes up
19:24:05 <fungi> basically we'd need the details ironed out in the next 2 weeks and then a maintenance announcement with a week of advance warning since the outage (for the entire ci system) will be pretty lengthy
19:24:37 <clarkb> ya
19:24:44 <clarkb> I think better to not kill ourselves with that effort
19:24:52 <clarkb> and instead be thorough
19:24:56 <fungi> anyway, let's see what we figure out this week and i'll make sure we touch on the updates status in next week's meeting too at which point we may have a better idea as to how feasible it is
19:25:11 <fungi> er, updated status
19:26:48 <clarkb> soudns good
19:27:01 <fungi> having read through the etherpad, i'm wondering whether we need to disable puppet for gerrit?
19:27:13 <fungi> (and disable puppet globally when we do teh real maintenance)
19:27:33 <clarkb> fungi: my concern there is that puppet could update the war under us while we are in progress
19:27:44 <clarkb> so basiclly we want to deactivate puppe ttehre until we get the change merged to reflect the right war
19:28:02 <fungi> no, that's what i meant, but i missed that you already have it as the first step there
19:28:19 <clarkb> I don't think I have the step of merge change to reflect war though
19:28:58 <fungi> good point, and that probably has to be done after disabling puppet but certainly before stopping any of the other services
19:31:00 <clarkb> well before starting pupept again at least
19:31:23 <fungi> oh, sure, can be done at either end
19:32:00 <fungi> after makes the most sense i guess, since it's less to revert if we need to roll back
19:33:41 <fungi> our gerrit fork's tags are lagging behind too...
19:34:10 <fungi> #link https://gerrit.googlesource.com/gerrit/+/v2.13.8 Gerrit 2.13.8 stable point release from April 26, 2017
19:34:28 <fungi> might make sense to confirm that still builds for us when we get time
19:34:53 <clarkb> oh hrm do we want to push the upgrade a day and get ^ built
19:35:03 <clarkb> I thought I double checked for tags and 2.13.7 was latest
19:35:33 <fungi> i'm fine doing it that way too. i have even more time tomorrow to help (fewer meetings)
19:36:15 <clarkb> I'm trying to get a changelog to see what 2.13.8 adds
19:36:23 <fungi> i think we need to rebase all our tags onto that if we do
19:36:32 <fungi> er, s/tags/backports/
19:36:47 <clarkb> https://www.gerritcodereview.com/releases/2.13.md#2.13.8
19:36:54 <clarkb> yes we'll need to rebase and merge the ~4 changes
19:37:24 <clarkb> 2.13.8 includes jgit and performance fixes
19:37:31 <clarkb> my hunch is we probably do want those?
19:37:42 <fungi> there are a few additional patches in the stable branch on top of that tag too
19:38:06 <fungi> #link https://git.openstack.org/cgit/openstack-infra/gerrit/log/?h=upstream/stable-2.13 our fork of Gerrit stable-2.13
19:38:33 <fungi> including some which look like bug fixes
19:38:50 <fungi> most recent is 2 days old
19:39:52 <clarkb> one thing is that if we have to delay for prod upgrade we may end up doing a point release upgrade on review-dev anyways just to get latest?
19:40:36 <fungi> right, i would be cool doing one last minor update on review-dev before the production maintenance just to vet the latest stable state
19:40:42 <clarkb> perhaps we should just bake that into our thinking for this upgrade process. Go to 2.13.7 now, start testing stuff like zuul against. Then do upgrade to 2.13.8/9/whatever closer to production upgrade then do 2.13.8/9/whatever in production
19:40:51 <fungi> sure, sgtm
19:41:01 <clarkb> I actually like that as I think the jump to 2.13 and testing that is the biggest concern right now
19:41:10 <clarkb> then it will be easy to add on a different point release in the future.
19:41:15 <fungi> upstream/stable-2.13 branch tip maybe
19:41:20 <clarkb> ya
19:41:40 <fungi> since they seem to apply fixes there far more often than they tag
19:42:00 <clarkb> in that case lets stick with the original plan for review-dev for now
19:42:04 <clarkb> that gets us moving on the testing front
19:42:20 <clarkb> then we can incorporate a minor bump down the line when things are more firmed up for production
19:42:43 <fungi> #agreed proceed with testing gerrit-v2.13.7.4.988b40f.war today, update to upstream/stable-2.13 branch tip and briefly re-test shortly before production upgrade maintenance
19:43:16 <fungi> #topic Open discussion
19:43:35 <clarkb> DNS is really hurting us in osic.
19:43:36 <fungi> we've still got about 15 minutes before the tc and board meetings start if anyone has anything else to bring up
19:43:49 <clarkb> (message:"failed: Temporary failure in name resolution." OR message:"Temporary failure resolving" OR message:"wget: unable to resolve" OR  message:"Could not resolve host") AND NOT message:"Could not resolve host: fake" AND tags:"console" is my logstash query
19:43:50 <fungi> and yeah, sad dns
19:44:23 <clarkb> I think that the problem is unbound's list of forwarders don't have priority/order so we are using ipv4 and ipv6 resolvers in osic
19:44:50 <clarkb> to hit ipv4 resolvers we have to go through PAT/NAT which I think is likely the cause of our troubles
19:44:55 <fungi> did you consider my suggestion to have a udev rule rejigger unbound as soon as you get a v6 default route?
19:45:07 <clarkb> I haven't yet
19:45:18 * fungi has no idea how terrible that might be
19:45:33 <clarkb> unbound-control does allow you to configure those things on the fly
19:45:47 <clarkb> so we could have it remove resolvers from the existing list based on ipv6 coming up
19:45:52 <ianw> oh, there is a period of no dns if you start all v6?
19:46:21 <fungi> given the async nature of boot in general and v6 autoconfig in particular, i think triggering reconfiguraion off kernel events is about teh fastest solution you're going to get there
19:46:23 <clarkb> ianw: more backrgound on that is my systemd unit file hack doesn't work because network-online comes up before we have working ipv6 because ipv4 is up
19:46:51 <clarkb> ianw: so my check of "do we have ipv6" isn't working right
19:47:02 <ianw> ahh, ok ... that makes sense, i guess :/
19:47:21 <clarkb> and we can't stop ipv4ing because github and gems and other things
19:47:51 <clarkb> another option is to set it in a nodepool ready script based on whether or not ipv6 is present at that point (it should be because nodepool will prefer ipv6)
19:48:09 <ianw> ahh ... hence the discussions with mordred i'm guessing
19:48:11 <clarkb> fungi: I think I have a preference for ^ because it is simple and straightforward
19:48:20 <clarkb> fungi: though udev is likely workable as well
19:48:42 <mordred> yah - I think doing it in a nodepool ready script is a great idea
19:48:50 <fungi> udev addresses your concern of "how do you do this without a ready script and without zuul"
19:49:40 <mordred> most of the other discussion is more for "how do we do this action in v3" - I think we've got great information at ready-script/pre-playbook time to do this well
19:51:58 * ianw has no strong opinions, but is glad for such tenacity from clarkb investigating it!
19:52:22 <clarkb> I'll push up an update to ready script it as that is quick and easy
19:52:30 <clarkb> well needs new images I guess
19:52:34 <clarkb> but otherwise is quick and easy :)
19:54:20 <fungi> okay, i'm ending the meeting 5 minutes early to give anyone who wants to time to dial into the board of directors conference call and/or grab popcorn before the tc meeting in here on queens goals refinement
19:54:52 <mordred> fungi: I'm landing ... which is going to make my participatoin in the TC meeting a bit difficult
19:54:57 <AJaeger> 5 mins is not enough to get popcorn ;/
19:55:24 <fungi> thanks everyone!
19:55:26 <fungi> #endmeeting