19:01:40 <clarkb> #startmeeting infra
19:01:40 <openstack> Meeting started Tue Nov 21 19:01:40 2017 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:43 <openstack> The meeting name has been set to 'infra'
19:01:51 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:01:54 <pabelanger> o/
19:02:27 <clarkb> There was a bit of stuff left over from last meeting that I took the liberty of clearing off the agenda. Feel free to reraise items as we go through if I shoudn't have removed something
19:02:38 <clarkb> #topic Announcements
19:02:54 <frickler> o/
19:03:03 <clarkb> This week is a major holiday in the US so expect those of us living there to be AFK starting thursday
19:03:36 <clarkb> I will be picking up a turkey this afternoon so the fun starts early too
19:03:52 <mguiney> (/me lurk)
19:04:05 <clarkb> #topic Actions from last meeting
19:04:12 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-10-31-19.01.txt Minutes from last meeting
19:04:22 <clarkb> er thats the wrong link
19:04:25 <clarkb> #undo
19:04:25 <openstack> Removing item from minutes: #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-10-31-19.01.txt
19:04:53 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-11-14-19.00.txt
19:05:22 <clarkb> The only action there is for ianw to confirm backups are working properly. I saw report that this had been confirmed yseterday so \o/
19:05:45 <clarkb> fungi: you've also had an action to document secrets backup policy. Has that change been pushed/merged?
19:05:50 <fungi> strangely, it looks like my pending action item got omitted
19:05:52 <fungi> but yeah
19:05:55 <fungi> #link https://review.openstack.org/520181 Add a section on secrets to the migration guide
19:06:12 <fungi> needs reviewers
19:06:28 <jeblair> oh cool thanks!
19:06:33 <clarkb> thanks
19:06:37 <fungi> there was nowhere good to add the note about it, so i added a section to give us somewhere
19:08:06 <jeblair> that looks great; when it merges, we should send a note to the dev list pointing people at it, for the benefit of those who have already read the doc
19:08:39 <fungi> dunno if that was too much or not enough detail about the feature, but felt it needed some context at least
19:08:39 <clarkb> There aren't any specs that need review/approval that I've seen
19:08:39 <clarkb> so going to skip specs approval
19:08:39 <clarkb> #topic Priority Efforts
19:08:39 <clarkb> did we lose the bot?
19:08:56 <clarkb> there it goes maybe I'm lagging
19:09:01 <jeblair> clarkb: you seem laggy to me
19:09:10 <clarkb> fun!
19:09:16 * frickler is seeing lagging, too
19:09:21 <clarkb> #topic Zuul v3
19:09:35 <fungi> ahh, yeah, i had an entry for this
19:09:43 <clarkb> Fungi asks if it is time to remove jenkins from the Gerrit CI group. ++ from me
19:09:58 <fungi> we've gotten a few people showing up in irc asking us to delete stale -1 verify votes from "jenkins"
19:10:06 <AJaeger> let's do it
19:10:07 <fungi> because people aren't reviewing their changes
19:10:26 <pabelanger> seems sane
19:10:36 <fungi> and we did at one point before the rollout indicate that we would at some point remove jenkins from the group granting it verify voting permission
19:11:13 <fungi> but didn't want to do it immediately because we were relying on zuul v3 trusting the verify votes left by v2 so people wouldn't need to recheck everything
19:11:16 <clarkb> right so the concern is that removing the account from the group that is allowed to -1/-2 verify will effectively remove those votes from the UI
19:11:38 <fungi> yup
19:11:52 <fungi> figured now was a good time to revisit and see if we think we're just down to a long-tail of relatively inactive changes where doing a recheck isn't too onerous
19:11:52 <mordred> well - the votes would only go away if we deleted the user, right?
19:12:00 <mordred> rather than deleting the user from the group
19:12:02 <jeblair> mordred: right, but they won't be visible
19:12:15 <fungi> mordred: gerrit "hides" the votes on active changes if you lose permission for them
19:12:23 <mordred> wow. what a GREAT idea
19:12:30 <jeblair> this of course will hide all the verified votes ever
19:12:34 <fungi> so you can still see them in the db, but won't in the webui or api
19:13:04 <fungi> jeblair: it only seems to do it for open changes. merged and abandoned were unaffected in the past by it
19:13:06 <clarkb> ya so anything with a jenkins +1 will need to get zuul +1'd before it can gate
19:13:12 <jeblair> fungi: oh huh
19:13:18 <jeblair> that's less crazy
19:13:26 <clarkb> I think its been long enough that we can make that jump now
19:13:32 <jeblair> i agree, i'm in favor
19:13:51 <fungi> right. if i add myself to project bootstrappers, +2 verify and submit a change, then remove myself from that group, the verify +2 lingers on the merged change
19:13:53 <mordred> yah. I think our stale check would be in effect by now anyway
19:13:54 <AJaeger> let's send an email to the mailing list about this
19:14:14 <AJaeger> (as a FYI - we've just done this)
19:14:15 <fungi> i'll write up an announcement
19:14:17 <jeblair> mordred: do we still have that?
19:14:20 <jeblair> stale check
19:14:26 <mordred> jeblair: oh - do we not?
19:14:32 <clarkb> I think we got rid of it
19:14:35 <fungi> we do not do stale result checking
19:14:38 <AJaeger> mordred: ages ago...
19:14:42 <mordred> ah. well, silly me
19:14:54 <jeblair> yeah seems to be gone
19:14:58 <fungi> right, it's been a couple of years i think? my sense of time is pretty terrible any more though
19:14:59 <mordred> in any case, I think it's long enough that requiring a new +1 from zuul shouldn't be a super large burden
19:16:18 <jeblair> if i can slip in a late addition to the topic...
19:16:27 <jeblair> pabelanger proposed removing infra-check: https://review.openstack.org/521880
19:16:33 <clarkb> I thnik we have plenty of time
19:16:33 <jeblair> i think it's time for that as well
19:16:45 <clarkb> jeblair: ++
19:16:50 <fungi> wfm
19:17:02 <mordred> ++
19:17:10 <pabelanger> Yay
19:17:24 <fungi> i did not approve in case others were still reviewing
19:17:38 <pabelanger> and zuul-env from DIB: https://review.openstack.org/514483/
19:18:38 <jeblair> aiui the shim should handle the removal of that transparently, but we should still be prepared to keep an eye out and roll back images if something about that fails
19:18:39 <clarkb> for ^ I think just need to approve it when we are ready to field questions from anyone that may have used the zuul env in weird ways (lets hope there are zero cases of that :) )
19:18:46 <clarkb> jeblair: yup that was my undersatnding as well
19:19:12 <pabelanger> cool
19:19:37 <mordred> clarkb: wait - you're saying that people may have used a thing in a ways we didn't intend or forsee? I find that hard to imagine ...
19:19:56 <clarkb> :)
19:20:08 <fungi> i suppose we could codesearch for /usr/zuul-env or whatever the path is
19:20:16 <AJaeger> mordred: I expect you didn't find any such cases while working on releasenotes and sphinx ;)
19:20:37 <clarkb> do we want to have a more explicit list of transition cleanup changes somewhere that we can work through?
19:20:49 <fungi> codesearch says manila is doing that
19:20:51 <clarkb> I know we had an etherpad for the VM instances maybe tack something onto there
19:21:05 <fungi> and oslo.messaging
19:21:06 * AJaeger approves the infra-check removal in a minute or two
19:21:14 <fungi> and tooz...
19:21:18 <fungi> there's a ton
19:21:34 <clarkb> another thing was moving the jenkins user stuff into its own element that third party CIs could use that we wouldn't
19:21:35 <fungi> ahh, because they copied legacy playbooks in-repo
19:21:44 <clarkb> pabelanger: ^ do you know if the jenkins stuff has changes yet (or maybe is done?)
19:21:53 <mordred> AJaeger: I'm certainly not STILL finding them
19:21:59 <pabelanger> clarkb: yah, I have a change up, but need to rework it still into own element
19:22:02 <pabelanger> I can finish that today
19:22:37 <fungi> so anyway, according to codesearch there's going to be a ton of cross-project work involved in removing use of zuul-env from copies of legacy jobs
19:22:56 <fungi> likely also copied to stable branches codesearch isn't indexing
19:23:01 <clarkb> fungi: probably worth an email to the dev list with a link to the change where we want to delete it
19:23:10 <fungi> yeah
19:23:15 <clarkb> similar to the run: foo -> run: foo.yaml email
19:23:42 <pabelanger> https://review.openstack.org/521937/ could use a final +3 too, removes static wheel-builder slaves
19:23:58 <pabelanger> once merged, i can delete servers and push on other static slave nodes
19:24:17 <mordred> wait - removing zuul-env from the images won't break people using /usr/zuul-env/bin/zuul-cloner - that's where we put zuul-cloner in the legacy base job
19:24:31 <pabelanger> mordred: right
19:24:37 <clarkb> its only if they use any other content of the venv
19:24:41 <mordred> so like the manila jobs thatuse it shouldn't be broken, they're in legacy jobs
19:24:44 <mordred> yah.
19:24:53 <pabelanger> it will break once we remove zuul-cloner from base job, but that is another topic for another day
19:25:02 <pabelanger> into base-legacy
19:25:29 <mordred> it's already in base-legacy, no?
19:25:41 <pabelanger> yes, but we haven't removed it from base yet
19:25:49 <jeblair> it's in both?
19:25:52 <mordred> this: /usr.zuul-env(!?.bin.zuul-cloner)/ should be a regex for finding zuul-env uses other than zuul-cloner yeah?
19:25:59 <pabelanger> jeblair: yes
19:26:06 <jeblair> on purpose?
19:26:40 <pabelanger> see https://review.openstack.org/513506/ for history
19:26:58 <clarkb> mordred: that looks right
19:27:10 <pabelanger> I think once we remove zuul-env in DIB, we can circle back to 513506
19:27:45 <mordred> clarkb: ok. according to that there are no uses of zuul-env that are not zuul-cloner
19:27:58 <jeblair> okay.
19:28:11 <jeblair> i agree the way to proceed is to remove it from images, then 513506
19:28:16 <fungi> oh, got it, so if they use zuul-cloner that's fine since we'll still put something executable at /usr/zuul-env/bin/zuul-cloner for the foreseeable future?
19:28:23 <jeblair> if 506 breaks people, it will be easy to revert
19:28:40 <clarkb> fungi: yes
19:28:48 <clarkb> fungi: there just won't be a virtualenv there around it
19:29:13 <clarkb> and to double check zuul-cloner shim doesn't rely on the virtualenv python or any libs there right?
19:29:13 <mordred> I agree with jeblair - I think 506 is good to land- there really shouldn't be jobs using zuul-cloner and not parented on legacy-base and if there are we need to find them
19:29:16 <clarkb> it runs from system python?
19:29:19 <mordred> clarkb: that's right
19:29:23 <jeblair> oh, i'm assuming we put the shim in base because just in case someone ended up (even accidentally as dmsimard says) using the cloner, we wanted them using the shim and not v2
19:29:24 <fungi> that's less scary then. i thought we were talking about _also_ removing the zuul-cloner shim
19:29:38 <mordred> clarkb: it'snot installed in the venv - it's just copied there
19:29:44 <jeblair> to summarize:
19:29:55 <jeblair> 1) remove zuul-env from images now.   if that doesn't blow up:
19:30:14 <pabelanger> mordred: ++
19:30:15 <jeblair> 2) remove zuul-cloner shim from base immediately afterwords.
19:30:35 <jeblair> 3) if that blows up, fix those jobs, or temporarily revert 506 to add zuul-cloner shim back.  repeat as necessary.
19:30:46 <mordred> jeblair: ++
19:30:53 <jeblair> 4) in the distant future, remove legacy-base (which will continue to install the shim as long as it exists)
19:30:55 <jeblair> [eol]
19:31:04 <fungi> yeah, that seems sane
19:31:07 <mordred> I have verified that fetch-zuul-cloner will create the directory if it's not there
19:31:07 <clarkb> sounds like a plan
19:31:13 <mordred> so it should still work on images without the venv
19:31:34 <jeblair> lemme info this
19:31:42 <jeblair> #info plan for removing zuul-cloner shim:
19:31:48 <jeblair> #info 1) remove zuul-env from images now.   if that doesn't blow up:
19:31:52 <jeblair> #info 2) remove zuul-cloner shim from base immediately afterwords.
19:31:56 <jeblair> #info 3) if that blows up, fix those jobs, or temporarily revert 506 to add zuul-cloner shim back.  repeat as necessary.
19:32:00 <jeblair> #info 4) in the distant future, remove legacy-base (which will continue to install the shim as long as it exists)
19:32:01 <jeblair> k
19:32:18 <AJaeger> step 1 is https://review.openstack.org/#/c/514483
19:32:29 <pabelanger> wfm
19:32:33 <AJaeger> step 2 is https://review.openstack.org/#/c/514483
19:32:50 <AJaeger> jeblair: plan is fine. Who wants to +A 514483?
19:32:54 <clarkb> #info step 1 is https://review.openstack.org/#/c/514483
19:33:11 <clarkb> #info step 2 is https://review.openstack.org/#/c/513506/
19:33:16 <pabelanger> I can watch step1 now
19:34:06 <clarkb> ok, anything else related to zuulv3?
19:34:18 <jeblair> nak
19:34:27 <pabelanger> it is awesome
19:34:30 <pabelanger> :)
19:34:37 <clarkb> #topic General Topics
19:35:01 <clarkb> This is where I claered out a whole bunch of stuff from the agenda that appeared stale so speak up if I did so and shouldn't have
19:35:08 <mordred> pabelanger: I +A'd 514483
19:35:19 <pabelanger> mordred: ack
19:35:26 <clarkb> Worth mentioning again that ianw reported that the new backup server is functioning and has the old backups servers volumes attached to it
19:35:38 <clarkb> thank you ianw for getting that sorted out
19:36:04 <fungi> thanks a ton ian! that's been on our backlog longer than i want to think about
19:36:32 <clarkb> we should make a note to remove and delete the old volumes after an appropriate amount of time. Maybe in the new year?
19:37:07 <clarkb> that gives us just over a month or so of keeping old backups around
19:37:18 <fungi> yeah, that seems long enough to me
19:37:36 <pabelanger> agree
19:38:02 <fungi> we've also never figured out how to rotate backups so we don't eat disk space indefinitely. i wonder if that's a good model (switch volumes, then evenrually remove old volumes)
19:38:37 <clarkb> fungi: so rather than adding up to one 3TB filesystem just swap out an old 1TB fs for a new 1TB fs?
19:39:04 <mordred> wfm
19:39:23 <clarkb> (should check actual usage before committing to a specific size but I like the idea of rotating rather than appending)
19:39:26 <fungi> if memory serves, bup doesn't have a way to age out data, so we do incur a bunch of overhead re-priming the new volume under that model
19:39:43 <clarkb> fungi: correct
19:40:01 <fungi> as we have to transfer a full copy of everything rather than just differential/incremental changes
19:40:53 <fungi> i think it comes down to a question of how much we're backing up, and how much retention we want
19:41:04 <clarkb> ianw may also have thoughts having just done it
19:41:11 <fungi> and how much disk we can allocate, i guess
19:41:39 <fungi> though this might also be a good thing to try moving to vexxhost?
19:41:58 <clarkb> ya ceph may make this substantially easier
19:42:00 <clarkb> (not sure)
19:42:28 <fungi> if nothing else, it's been suggested that getting available block devices of substantial size is much easier for us there than in rax
19:43:16 <fungi> like, could get a 25tb block device rather than having to stitch together a slew of 1tb volumes with lvm or raid0
19:43:19 <clarkb> we may also consider a different tool like borg, which has support for append only and not append only backups. Not sure if you can switch between them in a way that makes sense given the reasons we have append only backups in the first place
19:44:07 <clarkb> but thats likely significantly more worjk
19:44:12 <mordred> yah
19:45:25 <clarkb> The other general item I wanted to bring up quickly was we are mostly keeping up with the logstash job queue at this point. its been steady around 130-150k jobs for a couple days now and the worker processes aren't crashing \o/
19:45:55 <pabelanger> nice
19:46:11 <clarkb> I'd like to not add any significant load to that system (new files to index) until after the holiday as I'd like to see if it catches up and drives to zero with the expected drop in job activity during the holiday
19:46:19 <clarkb> if it does that then I think we can slowly add things back in and see how we do
19:46:41 <AJaeger> clarkb: https://review.openstack.org/520171 adds one file
19:46:51 <AJaeger> clarkb: do you want to WIP? ^
19:46:57 <clarkb> AJaeger: ya I can WIP it
19:48:04 <fungi> sounds like a fine plan
19:48:39 <clarkb> #topic Open Discussion
19:49:21 <pabelanger> I had 2 things, first, what do people think about a virtual sprint before or after jan 1 for control plane upgrades (xenial)
19:49:59 * clarkb pulls up a calendar
19:50:07 <pabelanger> 2nd, could have been in zuulv3 topic, I'd like to upgrade / migration nodepool-builders to feature/zuulv3 branch to build up python3 and new nodepool syntax, we've been trying to add new images and it is confusing to contributors
19:50:25 <mordred> pabelanger: ++ to 2nd thing for sure
19:51:14 <fungi> i should be free of visiting family around that week, if all goes well
19:51:16 <pabelanger> for #1, https://releases.openstack.org/queens/schedule.html, r-11 or r-10 or r-4 look clear on release side of things
19:51:59 <clarkb> r-10 and r-8 look good to me
19:52:26 <pabelanger> yah, I could do either myself
19:52:27 <mordred> r-10 and r-11 look good to me
19:52:43 <clarkb> I think a matrix of availability etherpad/ethercalc thing may work best since everyone is going to have different holiday plans/travel/etc
19:52:50 <mordred> I cannot do R-8
19:52:52 <clarkb> but ya I like the idea of focusing on that when things get quiet around here
19:53:14 <fungi> yeah, r-10 and r-8 are currently clear on my calendar too
19:53:20 <pabelanger> okay, I'll compose and send that out ML today. See when people are free and what they want to work on
19:53:24 <jeblair> we should totally do it on jan 1 when we're all still sloshed
19:53:27 <clarkb> sounds good
19:53:29 <clarkb> jeblair: hahahaha
19:53:35 <clarkb> "who needs this server? NOT US"
19:53:39 <pabelanger> I'd paid to see jeblair sloshed
19:53:41 <fungi> jeblair: i can always make a point of getting sloshed regardless of the week
19:54:37 <clarkb> pabelanger: for the second thing, does it make sense to just merge v3 into master on nodepool at this point?
19:55:14 <pabelanger> clarkb: maybe jeblair or mordred can answer that
19:55:18 <clarkb> then the install will update automagically to the new version? we just have ot update the config in sync right?
19:55:29 <jeblair> tbf, we could probably merge v3 to master on zuul too
19:55:43 <pabelanger> clarkb: we'd need to update puppet for python3 support, but yat
19:55:59 <clarkb> I'm thinking that may be the best approach as it solves the underlying issue of having the two branches
19:56:11 <clarkb> its more work but gets us into a better state I think then we just go back to dev on master
19:56:17 <pabelanger> yah, if we want to have that discussion, sure
19:56:37 * mordred is in favor of merging back to master
19:56:40 <mordred> on both
19:56:45 <jeblair> it might be best to go ahead and make a plan to tell third-parties to freeze anything they need, then start updating the puppet modules and merge the branches in
19:57:11 <mordred> we've made recentish tags on both that 3rd parties can use, yeah?
19:57:26 <jeblair> yeah, i think we need a plan for puppet modules though
19:57:44 <clarkb> we have v3 flags on the puppet modules today
19:57:48 <clarkb> we could invert it?
19:57:52 <jeblair> maybe someone could work through all the steps for that
19:57:59 <clarkb> so that v3 is default and if you are deploying v2 then set the flag?
19:58:25 <jeblair> clarkb: yeah, v2 + v2 release tag could be inputs to the puppet module to get a v2 system
19:58:28 <jeblair> and defaults could all be v3
19:58:57 <dmsimard> totally missed out on the entire meeting, I had a topic but I guess we're out of time ? :/
19:59:14 <fungi> you have half a minute ;)
19:59:31 <clarkb> also I'm sure we'll mostly be around in -infra after the meeting
19:59:35 <dmsimard> Don't want to overlap, I'll bring it up in #openstack-infra yeah
19:59:49 <jeblair> though i will be stuffing my face with a fried chicken sandwich
19:59:58 <clarkb> now i want one
20:00:02 <clarkb> thanks everyone!
20:00:05 <clarkb> #endmeeting