19:00:09 <clarkb> #startmeeting infra
19:00:10 <openstack> Meeting started Tue Sep 19 19:00:09 2017 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:13 <openstack> The meeting name has been set to 'infra'
19:00:24 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:00:33 <clarkb> #topic Announcements
19:00:50 <clarkb> really quickly going to remind everyone to sign the queens release key. I Haven't done it yet and feel very guilty over this
19:01:14 <ianw> o/
19:01:20 <clarkb> Also if you haven't noticed we just upgraded gerrit and while a bumpy start I think we are starting to get a handle on things? more on that later
19:01:32 <clarkb> #topic Actions from last meeting
19:01:49 <clarkb> We had a couple but I think we agreed to put them on the back burner for now. Particularly the infracloud one (also more on that later :) )
19:01:52 <pabelanger> o/
19:02:08 <clarkb> #topic Specs approval
19:02:29 <clarkb> I havne't really had a chance to look at these since the last meeting. I don't think there is anything outstanding right now. Please do ping me if there is something urgent
19:02:38 <mordred> o/
19:02:56 <clarkb> and with that we can move on to the portion of the meeting that everyone is waiting for
19:02:58 <clarkb> #topic Priority Efforts
19:03:05 <clarkb> #topic Zuul v3
19:03:22 <clarkb> Lots of things related to Zuulv3 happened last week at hte PTG
19:03:36 <clarkb> lots of on zuulv3 things happened too which we'll get to after the priority specs
19:03:44 <fungi> lots
19:04:00 <jeblair> my memory is that we worked on jobs + migration script a lot
19:04:03 <AJaeger> ;)
19:04:04 <clarkb> I know the zuul meeting was cancelled yesterday in favor of getting gerrit upgraded so now is the chance ot catch up on all that goodness
19:04:15 <AJaeger> lots of open changes in the queue to review...
19:04:18 <jeblair> mordred is continuing to work on migration script
19:04:28 <jeblair> mordred: i think it emits actual job output now?
19:04:44 <jeblair> mordred: do you think we can, say, run some nova check jobs tomorrow or something?
19:05:24 <jeblair> andreaf wrote a basic native v3 tempest job that builds on the basic native v3 devstack job
19:05:30 <jeblair> i haven't reviewed that yet, but hope to soon
19:05:44 <jeblair> that's in devstack-gate repo (for now; we'll move it to tempest soon)
19:05:49 <clarkb> is that the first job addition from outside of zuul development?
19:06:26 <jeblair> clarkb: i don't want to figure out who's "inside" or "outside" enough to answer that question :)
19:06:48 <jeblair> but having andreaf self-bootstrap into writing a v3 job is awesome :)
19:06:54 <clarkb> ++
19:06:59 * fungi cheers
19:07:14 <AJaeger> Yeah!
19:07:29 <jeblair> tristanC showed us his in-progress dashboard work which will be important once we finish the cutover
19:07:38 <jeblair> (hopefully we're not far away from having a job history dashboard)
19:07:58 <jeblair> we have some ideas of how it and openstack-health can be complementary
19:08:22 <jeblair> tobias has patches in progress to nodepool to improve quota support
19:08:35 <jeblair> that will allow us to use smaller flavors for, say, pep8 jobs
19:08:35 <mordred> jeblair: yes - there is a giant stack of patches I'll bug people about tomorrow when this migration is done
19:08:41 <jeblair> (and therefore increase our apparent capacity)
19:08:49 <jeblair> mordred: awesome!
19:09:12 <clarkb> for us to make use of that we may need some quota changes (as we have instance quotas that line up with ram use in most cases)
19:09:26 <jeblair> clarkb: yeah, but it'll be a good problem to have :)
19:09:30 <clarkb> indeed
19:09:41 <dmsimard> I have a series of patches that should be mostly ready to land as far as multi node in v3 in concerned
19:09:58 <dmsimard> https://review.openstack.org/#/q/topic:zuulv3-multinode
19:10:17 <jeblair> we also did a roadmap exercise at the end of the week.  we were exhausted, so it's only in rough etherpad form at this point.  some time this week or next, i'll write it up for further discussion, so we can all get on the same page for what we do between the openstack cutover and v3 release, and what comes after.
19:10:18 <dmsimard> zuul-jobs has the role bits while the jobs/integration testing is in openstack-zuul-jobs
19:10:27 <jeblair> dmsimard: great!
19:10:59 <dmsimard> pabelanger and I discussed that it's worth thinking about when and where we'll want to trigger the base-integration and multinode-integration jobs (filtering on files?)
19:11:30 <jeblair> i need to track down problems with logstash job submission.  that's the only transition blocker (aside from migration script) that i'm aware of
19:11:42 <mordred> stack of patches related to migration script from me ends with: https://review.openstack.org/#/c/504968
19:11:57 <clarkb> #link https://review.openstack.org/#/c/504968 Zuulv3 job migration scripting
19:12:09 <clarkb> #link https://review.openstack.org/#/q/topic:zuulv3-multinode Native zuulv3 multinode jobs
19:12:33 <dmsimard> Once the zuulv3-multinode stack has landed, I'll try and see how hard it would be to bring up a native devstack multinode
19:12:47 <jeblair> #link https://etherpad.openstack.org/p/zuulv3-roadmap rough draft post-cutover roadmap
19:12:56 <mordred> hrm - I'm going to re-topic migration patches to zuulv3-migration
19:13:31 <fungi> rather than just zuulv3?
19:13:38 <clarkb> #link https://review.openstack.org/#/q/topic:zuulv3-migration Also for migration scripting
19:13:58 <fungi> oh, right, they're on the feature/zuulv3 branch already
19:14:06 <fungi> so show up in queries that way
19:14:34 <dmsimard> it's probably trivial to filter on topic:^zuulv3.* or something regardless
19:14:52 <dmsimard> because those patches are not necessarily on the zuul repo with feature/v3 branch
19:15:00 <dmsimard> i.e, project config, zuul jobs, etc
19:15:14 <fungi> i guess i should update my query
19:15:28 <jeblair> if folks could use 'zuulv3' as the topic for any patch not on a feature/zuulv3 branch, that would be great
19:15:48 <jeblair> on feature/zuulv3, a topic *other than* zuulv3 is helpful.  :)
19:16:03 <clarkb> #action everyone use 'zuulv3' as the topic for any patch not on a feature/zuulv3 branch, that would be great. on feature/zuulv3, a topic *other than* zuulv3 is helpful.
19:16:40 <dmsimard> hmm, at first glance it doesn't seem like it's possible to put a regex for the topic field search :/
19:16:40 <jeblair> (so mordred's zuulv3-migration here is great)
19:17:04 <mordred> jeblair: WELL - except that zuulv3-migration has a bunch of patches on project-config too - it's a tricky little devil
19:17:37 <dmsimard> oh, gerrit has an 'intopic' search parameter so https://review.openstack.org/#/q/intopic:zuulv3 works
19:17:53 <jeblair> mordred: those are the most important to have 'zuulv3' as the topic
19:18:19 <clarkb> #link https://review.openstack.org/#/q/intopic:zuulv3 for everything zuulv3 related
19:18:42 <clarkb> jeblair: mordred would probably be good to have a summary of the currnet plan going forward as well
19:18:48 <clarkb> (once through general status updates)
19:19:31 <jeblair> mordred: did you send out that email?
19:19:40 <clarkb> he did
19:19:42 <dmsimard> he did :)
19:20:28 <jeblair> that probably covers the current plan afaik
19:20:35 <mordred> k. I've updated the zuulv3-migration topic on project-config patches to just be zuulv3
19:20:36 <clarkb> ok let me dig up a link for it
19:21:05 <mordred> well - that plan is "planning on doing a rollout on Monday" - we probably want to have a slightly more detailed infra-team plan
19:21:10 <clarkb> #link http://lists.openstack.org/pipermail/openstack-dev/2017-September/122241.html Zuulv3 rollout plans email
19:23:19 <clarkb> anything else we want to go over on Zuulv3 related items?
19:23:39 <mordred> well - lemme say one more quick thing about migration script
19:24:07 <mordred> once the patches from me in zuul, project-config, openstack-zuul-jobs are landed ...
19:24:14 <mordred> to my knowledge the migration script is good to go
19:24:37 <dmsimard> mordred: I'd like to test drive some sanity checks against what I feel is most vulnerable -- deployment projects (openstack-ansible, puppet-openstack, kolla, tripleo, etc)
19:24:40 <mordred> so it's important to start combing through the generated content for bugs
19:25:07 <dmsimard> Can I go ahead and "whitelist" those projects in main.yaml and submit what would be their "migrated" jobs as zuul.yaml in their repos ?
19:25:24 <Shrews> i second mordred's suggestion. there WILL be bugs. some of that script is... complicated
19:25:36 <jeblair> dmsimard: let's not submit any auto-generated content to project repos
19:25:47 <mordred> dmsimard: I'd prefer we didn't, as we won't have any way of knowing which things we've submitted to project repos
19:25:53 <mordred> gah. jeblair said that with less words
19:26:05 <jeblair> dmsimard: let's put that in project-config and/or openstack-zuul-jobs
19:26:07 <dmsimard> jeblair, mordred: I'm not planning on merging anything, it will be -W
19:26:26 <dmsimard> if we want test drives to be in project-config and openstack-zuul-jobs, that requires merging and reverting things
19:27:05 <jeblair> dmsimard: i'm fine with that.
19:27:18 <dmsimard> ok, sure.
19:27:27 <jeblair> i'd rather do that than have people get the mistaken idea that we might want to merge the auto-generated stuff in their repos
19:27:36 <Shrews> mordred: perhaps we should outline what to expect for migration script output? like expanded templates, job name changes, etc. might help others review
19:27:39 <jeblair> this is a PR issue, not a technical one.  :)
19:27:43 <mordred> Shrews: ++
19:27:56 <clarkb> Shrews: I think that would be helpful
19:28:07 <dmsimard> mordred: did that zuul-sphinx zuul.d fix land so we could at least put the tests in a separate file ?
19:28:27 <mordred> also - it's worth pointing out that several of the migration script 'fixes' I've made over the weekend were actually fixes or changes to old jobs or just making a new v3 version
19:28:27 <jeblair> mordred: do you think we need to add anything to the infra-manual migration doc?
19:28:40 <AJaeger> dmsimard: https://review.openstack.org/#/c/504797/ did not merge yet
19:28:53 <mordred> oh - right ... jeblair ^^
19:29:02 <dmsimard> AJaeger: thanks, just added zuulv3 topic
19:29:02 <mordred> jeblair: zuul-sphinx no support zuul.d
19:29:20 <jeblair> ++
19:30:03 <mordred> so I think a doc (and/or update to infra-manual) about expected outcome from migration script - as well as a few notes about where these things expect to go ...
19:30:29 <mordred> like, migrated project-pipeline stuff goes into project-config zuul.yaml but jobs and project-templates go into openstack-zuul-jobs
19:30:36 <mordred> and playbooks
19:31:08 <jeblair> that sounds helpful
19:31:22 <jeblair> ("i heard you migrated stuff, where the cloud is it?")
19:31:34 <mordred> but - just spot-checking things randomly often finds things with the eyeballs
19:31:41 <mordred> for instance, I now notice this:
19:31:43 <mordred> http://logs.openstack.org/79/505379/1/check/zuul-migrate/06587d2/playbooks/legacy/manila-tempest-minimal-dsvm-lvm-centos-7/run.yaml
19:31:51 <mordred> echo "Detailed logs: http://logs.openstack.org/$LOG_PATH/"
19:32:01 <mordred> should likely do something about that :)
19:32:36 <jeblair> who wants to write the infra-manual changes?
19:33:04 <mordred> I can take that on
19:33:10 <mordred> unless someone else wants to :)
19:34:04 <clarkb> lets say mordred tag you are it and if anyone else finds time or wants to help they can ping mordred?
19:34:14 <Shrews> mordred: i'll email you a section on "matchers" that you can include
19:34:31 <clarkb> #action mordred update infra-manual documentation on what to do with migrated zuulv3 jobs
19:34:36 <clarkb> That look right?
19:36:33 <clarkb> ready to move on to the other priority spec relevant to yesterday and today?
19:36:41 <jeblair> +
19:36:46 <fungi> indeed!
19:37:05 <clarkb> #topic Gerrit 2.13 Upgrade
19:37:16 <clarkb> This mostly happened yesterday
19:37:34 <clarkb> amazingly we fit it into the allocated outage time even though database migrations took 5 hours
19:37:42 <clarkb> Thank you everyone for helping make that happen
19:37:46 <fungi> that was nuts
19:38:09 <clarkb> Unfortunately we've turned over some new and exciting behavior around stream events and gerrit email and memory usage and so on that we are trying to address
19:38:19 <jeblair> and special thanks to zaro for doing work a year ago to cause us to finish within our window :)
19:38:23 <clarkb> ++
19:38:38 <clarkb> Current issues are being tracked at https://etherpad.openstack.org/p/gerrit-2.13-issues
19:38:42 <clarkb> #link https://etherpad.openstack.org/p/gerrit-2.13-issues
19:38:58 <clarkb> For the most part I think we have a handle on the problems and a good chunk of fixes or attempts at fixes
19:39:00 <fungi> absolutely. i don't know for sure that the reindex ordering optimization sped it up, but i'm happy to believe that was it
19:39:56 <clarkb> My biggest concerns right now are the memory use and email slowness. I think the memory use is actually likely to be ok based on utilization today. Jeblairs change to bump max memory seemed to get us to a happy place and gerrit isn't leaking memory all over the floor (it decreases even)
19:40:30 <fungi> though the weirdness around edits via api/webui are troubling
19:40:36 <clarkb> That all said does anyone think we need to be considering a rollback? For me I think that was on the table before the memory stuff got sorted but am feeling much more comfortable with moving forward and fixing things
19:40:46 <clarkb> now
19:41:01 <fungi> i'm still good with rolling forward at this point
19:41:08 <jeblair> yeah, the problems yesterday may have been caused by some anomolous event (there was a 500mbps outbound spike when it got bad).  or we may have needed more jvm ram.  it's not clear.
19:41:46 <jeblair> yeah, if we've managed to get this far without a memory catastrophe, i think forward is the way to go
19:42:09 <ianw> agree ... it's not out of bounds from what we saw with previous release that went mad occasionally too
19:42:53 <clarkb> ok good, I didn't want to put a bunch of effort into fixing problems if we were feeling like a revert is warranted. Glad to know no one is ready to go down that route yet :)
19:43:18 <clarkb> After the meeting we will be applying fixes for a bunch of the problems on that list, hopefully we see improvements.
19:43:30 <clarkb> Once again thank you everyone for helping
19:44:20 <clarkb> Probably the other big item related to this is that we are not merging new project creation changes
19:44:39 <clarkb> want to get everything working and happy before doing that as we have had some puppet related issues around project creation and nodepool image builds
19:44:44 <mordred> ++
19:45:44 <clarkb> anything else I've missed here? I guess see everyone after the meeting to do the great service restart of Tuesday September 19
19:46:33 <clarkb> #topic PTG recap
19:46:39 <clarkb> #topic Infracloud
19:47:02 <clarkb> We talked about things that weren't zuulv3 at the PTG too. I think one of the more important items was the future of infracloud
19:47:45 <clarkb> I've summarized the plans around that at http://lists.openstack.org/pipermail/openstack-infra/2017-September/005585.html if you have an interest in infracloud and haven't read that email please do. Feedback is very welcome
19:48:24 <clarkb> TL;DR is that it probably isn't viable to move existing hardware due to cost and logistcal problems (we need rails..). As a result we likely don't want to put effort into upgrading the current cloud regions and instead let them die on the vine
19:48:44 <clarkb> #topic PTG recap
19:49:41 <clarkb> Other items that came up were the future of puppet testing. I think we've got a plan where we can replace beaker jobs with something a bit more test suite agnostic and even config management tool independent. This fits into zuulv3 and potentially allows for individuals to write ansible replacements of puppet things and have them be tested in similar manners
19:50:01 <clarkb> We also talked about replacing puppetboard with ara. I think dmsimard said there will be a spec up for that
19:50:03 <jeblair> ooh that would be nice
19:50:07 <jeblair> that would be nice too
19:50:15 <jeblair> (my oohs and aahs are lagging)
19:50:18 <clarkb> :)
19:50:32 <clarkb> The other big item on my list was talkin about monitoring of control plane srevices
19:50:57 <clarkb> There was general agreement this would be ok as long as any alerting was purely opt in by roots and if someone or group would be responsible for tuning things
19:51:15 <clarkb> I expect that that will be an interesting spec with probably lots of alternatives weighing the pros and cons of a variety of monitoring tools out there
19:51:21 <clarkb> dmsimard: ^ you still up to writing those?
19:52:01 <ianw> clarkb: will you be issuing pagers :)
19:52:08 <clarkb> (I know everyone is currently busy with zuul and gerrit things so no rush, but expect tosee that in infra-specs at some point if interested or maybe you can even help write the specs)
19:52:18 <clarkb> ianw: please no :) and in fact that was basically that jeblair said. No pager duty
19:52:34 <clarkb> we can collect the info and use it but we aren't oncall and shouldn't have our sleep interrupted
19:53:20 <clarkb> Ok I think I managed to get through the entire agenda I had. Sorry if it felt rushed.
19:53:25 <clarkb> #topic Open Discussion
19:53:39 <clarkb> anything else for the last ~5 minutes or so of the alotted time?
19:53:42 <jlk> monitoring and metrics can help inform future development and such, but shouldn't be waking volunteers up :D
19:53:45 <fungi> good work on the agenda!
19:54:09 * AJaeger put one item on the agenda:
19:54:22 <AJaeger> There's a discussion about <= mitaka EOL branches at http://lists.openstack.org/pipermail/openstack-dev/2017-August/121432.html
19:54:27 <clarkb> oh the mitaka eol. Sorry that didn't make it to my local text file
19:54:35 <AJaeger> and mordred has a change up at https://review.openstack.org/#/c/504964
19:54:57 <AJaeger> Do we want to remove all the regexes etc like in 504964 - or first retire all <= mitaka branches?
19:55:02 <fungi> there was also something on the agenda about dropping cached git clones (or not) but i think that was from the meeting prior to the ptg?
19:55:06 <AJaeger> or are those not related?
19:55:24 <clarkb> fungi: ya I think that is leftovers from last meeting when infra cohosted zuul
19:55:40 <clarkb> AJaeger: I think we can likely do them independently especially if plan is to remove those branches anyways
19:55:53 <clarkb> AJaeger: but we can sync up with tonyb later today to make sure that works for him
19:56:37 <AJaeger> clarkb: yeah, syncing with tonyb sounds best next step...
19:57:01 <ianw> AJaeger / tonyb: i can also help with branch removal in our tz if needs be ... i figured it out :)
19:57:01 <clarkb> AJaeger: I know your timezones don't overlap much. I can try pinging him once not swamped with gerrit related items
19:57:06 <clarkb> oh ++ to ianw
19:57:24 <fungi> on a related note, we're slowly getting feedback on projects okay with us deleting their date-based releases from pypi
19:57:44 <fungi> i did a batch of them right before the ptg where the release team was able to make the call
19:57:48 <AJaeger> ianw: will you take care of it and discuss with tonyb ? Would be best IMHO
19:58:01 <fungi> but now the ones which aren't under release management are slowly trickling in
19:58:06 <mordred> AJaeger, ianw: we could also just remove those right before the migration
19:58:20 <AJaeger> clarkb: want to give ianw an #action?
19:58:20 <mordred> the main thing is tht they cause a bunch of projects to not actually use the project-template in v3
19:58:29 <clarkb> as long as ianw is ok with it
19:58:39 <AJaeger> mordred: I don't care about timing, just that it gets done if needed
19:58:43 <ianw> yes, ok
19:58:45 <clarkb> #action ianw Work with tonyb to coordinate old stable branch removal from projects
19:58:47 <mordred> and in v3 we can, should we want, add a branch exclusion to the project-tempalte itself
19:58:54 <mordred> AJaeger: cool
19:59:13 <clarkb> and we are out of time
19:59:16 <clarkb> Thanks everyone!
19:59:18 <clarkb> #endmeeting