#openstack-meeting log

19:01:09 <clarkb> #startmeeting infra
19:01:10 <openstack> Meeting started Tue Sep 18 19:01:09 2018 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:14 <openstack> The meeting name has been set to 'infra'
19:01:18 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:01:32 <clarkb> #topic Announcements
19:01:34 <Shrews> ohai
19:01:51 <diablo_rojo> o/
19:02:06 <clarkb> Mostly just a reminder that last week was the PTG and this week it seems like a good chunk of the team is either on vacation or something resembling vacation so that they can deal with a hurricane
19:02:39 <clarkb> #topic Actions from last meeting
19:02:50 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-09-04-19.01.txt minutes from last meeting
19:03:17 <clarkb> There weren't any concrete actions listed. It was prep for PTG for many of us I think
19:03:38 <clarkb> Maybe I should do a high level recap of the PTG then we can dig in through the rest of our agenda
19:03:45 <clarkb> #topic High level PTG recap
19:04:02 <ianw> ++
19:04:08 <clarkb> The first two days of the PTG the infra team was in the "Ask Me Anything" helproom
19:04:48 <clarkb> This ended up being fairly productive. We ended up helping OSA with various zuul related job questions as well as pointing them in the direction of how to reproduce those builds locally (I think we may see them push changes to make that possible in the near future)
19:05:06 <clarkb> Swift came by a couple times as they were working on multinode functional testing and doc build updates
19:05:41 <notmyname> clarkb: and we're just waiting on zuul to land it! thanks for your help (https://review.openstack.org/#/c/601686/)
19:05:43 <tdasilva> thanks for the help :)
19:06:01 <clarkb> Tripleo/RDO are interesting in how we might do OVB (something virtual baremetal) testing. Which requires working PXE boot which requires layer 2 broadcast for dhcp in the current design
19:07:00 <clarkb> one suggestion was to test the pxe provisioning independent of the workload those nodes run so that OVB can boot lightweight instances using our existing network overlays and qemu then test the workload on nodepool provisioned nodes so that nested virt isn't a problem
19:07:19 <clarkb> Sahara and others were interested in third part CI as well as zuul v3
19:07:33 <clarkb> ianw: ^ I tried pushing people to your spec to see if we could get a volunteer to write out what that looks like in practice
19:07:46 <clarkb> ianw: the other popular idea was to just use software factory
19:07:59 <clarkb> which apparently works very well out of the box at small scale (but needs tuning as things grow)
19:08:27 <clarkb> I also helped StarlingX sketch out their docs publishing which intersected with AFS ( you may have seen chagnes from dtroyer and myself for that which I think are not yet merged )
19:08:27 <ianw> i've actually updated that spec, so was interested if there was talk on it
19:08:53 <clarkb> ianw: the sense I got was that for most users they would be fine just using software factory
19:09:00 <clarkb> which is maybe a reasonable thing to point people at
19:09:20 <clarkb> Jan from I forget South African networking company in particular was a fan of that
19:09:29 <clarkb> (I'm really bad with names sorry)
19:09:39 <Shrews> yeah, i think SF is pretty solid and close to what we run
19:09:57 <Shrews> a few extras
19:10:05 <clarkb> Another recurrent topic in the helproom was working around nested virt
19:10:57 <clarkb> I think where we have gotten with that is if we can continue to push on our cloud providers to have better communications channels we can explore the possibility of nested virt enabled flavors. Then jobs are best effort if they run on that. Basically you'd have to talk to the cloud not infra if they break in weird ways. I think that should happen once we've largely moved to bionic though
19:11:04 <clarkb> new kernels seem to lead to new nested virt trouble
19:11:23 <clarkb> That was Monday and Tuesday
19:11:49 <clarkb> Wednesday and Thurday were infra dedicated time. The first day was largely spent getting Zuul to trigger CD updates of Zuul via bridge.o.o
19:12:21 <clarkb> Turns out this is more complicated than we originally anticipated. Current status is we have a job that will add bridge.o.o to its inventory then fail to ssh to bridge.o.o because we only allow ssh in as root from bridge and puppetmaster
19:12:33 <clarkb> (we can talk about this more in the priority topic section of the meeting)
19:13:03 * fungi is sorta here for a few minutes, but not completely and may disappear again shortly
19:13:05 <clarkb> To get to that point we addressed quite a few of the smaller issues we've run into with ansible. Including upgrading ansible to 2.7.0rcX on bridge.o.o which seemed to speed up execution of ansible
19:13:17 <clarkb> Overall it was quite useful in getting ansible things running more reliably
19:13:58 <clarkb> Thursday was the last day of Infra dedicated time and we used it to talk about the other items on our etherpad. I and others tried to take notes about the conversation there. /me picks out some highlights
19:14:21 <clarkb> The openstack transition from xenial to bionic will ideally be managed by the openstack project with infra and zuul teams assisting
19:14:40 <clarkb> fungi: will be sending out mailing list merge plan details as soon as the hurricane allows
19:14:47 <fungi> yup
19:15:01 <fungi> the list creation change has been up for review since friday, btw
19:15:11 <clarkb> The lounge has come up again as a thing we shoudl try to run to make irc more friendly to new users
19:15:23 <clarkb> ttx volunteered to make that a 20% time project of his
19:15:49 <fungi> #link https://review.openstack.org/602781 Create the OpenStack discussion mailing list
19:15:55 <clarkb> mostly need to figure out authentication for our users there
19:16:39 <clarkb> And finally there was a conversation in which kmalloc volunteered to start buildign a proof of concept for authentication/identity aggregation so that we can point gerrit and wiki and storyboard etc at a single identity provider that proxies for say launchpad, openstackid, github, or whatever else people want
19:16:39 <fungi> ideally 602781 will be merged and the resulting list locked down before i announce its existence
19:16:55 <clarkb> fungi: would it be best for us to +2 then you can approve and lock down when ready?
19:17:01 <fungi> sure, wfm
19:17:18 <clarkb> I think that is it for a high level recap
19:17:33 <clarkb> Before we talk the config management update priority effort anything I missed people want more info on?
19:17:41 <clarkb> or topics that I skimmed too much on in general?
19:18:28 <Shrews> any experimenting/discussing the dockerhub zuul/nodepool containers?
19:18:57 <clarkb> Shrews: we did discuss some of the pain points that SpamapS ran into with the pbrx generated images in the helproom
19:19:29 <clarkb> Shrews: I think we had general consensus that we should improve zuul's default behavior around daemonizing to be more container friendly as well as improving the documentation aroudn what each image is/does
19:19:46 <clarkb> but otherwise SpamapS said the images were working for him they just had to have some minor tweaks done
19:19:56 <Shrews> cool
19:20:17 <clarkb> zuul-base vs zuul image is confusing. Also you don't want to daemonize by default in docker containers
19:20:52 <clarkb> #topic Update Config Management
19:21:07 <clarkb> Don't forget to review 'topic:puppet-4 and topic:update-cfg-mgmt' changes in gerrit
19:21:46 <clarkb> This is what msot of the time was dedicated to at the PTG. cmurphy has got quite a few services puppeting under puppet 4 parser (and improved the inventory listing to make it less merge conflicty)
19:22:26 <clarkb> Then mordred corvus jhesketh myself et al worked on setting up Zuul to trigger ansible playbook runs on bridge.o.o so that we can run CD from zuul
19:23:07 <clarkb> There are two big things we need to address to make ^ possible. The first is we only allow ssh as root into the bridge from the bridge and puppetmaster. We need a zuul user on bridge or we need to allow the zuul executors to ssh into bridge.o.o as root
19:23:22 <clarkb> I'm actually leaning towards adding a zuul user myself since we already manage arbitrary users that can ssh in and sudo
19:23:42 <clarkb> The other item is our support for nil nodeset jobs is pretty basic in our base jobs
19:24:06 <clarkb> we don't get all the same logging for example nor do we clean out the master ssh key from the ssh agent like we do on jobs with a nodeset
19:25:00 <clarkb> Another item to be aware of is mordred is writing a new inventory plugin for us that will more quickly filter and add nodes to groups than the current constructed plugin can do (the constructed plugin accounts for like 2 minutes of every ansible-playbook run right now, it isn't fast)
19:25:19 <clarkb> mordred: ^ you aren't around today to talk about that are you
19:25:37 <clarkb> there were some unexpected runtime behaviors around removing nodes from groups that I don't think have been solved yet
19:26:15 <Shrews> 2 minutes? eek
19:26:34 <clarkb> Shrews: ya its creating and evaluating a jinja template for all our groups X all our hosts
19:26:40 <clarkb> something like 10k jinja template evaluations
19:26:53 <fungi> matrix expansion at its finest
19:27:12 <Shrews> cartesian inventory ftw
19:27:42 <clarkb> to summarize this topic we are in a much more reliable state for running ansible frequently in a loop than we were last week. But still work to be done to get zuul talking to bridge.o.o to trigger ansible runs there
19:28:01 <clarkb> Also the puppet-4 futureparser work continues and if you can help review those changes I'm more than happy to approve them when I can monitor them :)
19:28:19 <clarkb> #topic General Topics
19:28:33 <clarkb> Lets keep moving because we have half an hour left and plenty of other items related to PTG to talk about :)
19:28:42 <clarkb> OpenDev
19:29:28 <clarkb> it hda been about 3 weeks without seeing any strong opposition to the OpenDev name (the only thing I saw aws concern about confusion with the conference and fungi made a strong argument for why that shouldn't be a problem and that seemed to settle it on the therpad)
19:29:56 <clarkb> I went ahead and decided it was important for us to keep making progress on this and did the no news is good news thing and sent email  to the list saying we should go with it
19:30:28 <clarkb> one thing corvus brought up at the PTG is that we should enlist the help of the OpenStack Foundation to draft clear communications about what this means for existing and potential new projects
19:30:52 <clarkb> one concern in particular being that if we aren't careful we could send the message that this is just a rename of the openstack infra team and they exist only to serve openstack the project
19:31:12 <clarkb> I've got it on my todo to talk to the foundation about drafting this
19:31:51 <clarkb> From a technical perspective we will spin up ns1.opendev.org ns2.opendev.org and adns1.opendev.org. Host opendev.org DNS there and then migrate zuul-ci.org DNS there then delete ns*.openstack.org
19:32:01 <clarkb> that is sort of a step 0 to starting to host things with that name branding.
19:32:07 <clarkb> corvus: has volunteered to spin those up
19:32:32 <fungi> just want to chime in with my support for self-hosting dns as the day-zero prerequisite
19:33:06 <clarkb> The last related item on my list si that we should consider booting new servers eg etherpad01.openstack.org replacement as etherpad01.opendev.org in nova so that we don't have to replace everything once we do trusty upgrades
19:33:31 <clarkb> that will likely be a case by case basis (and in that particular example it would continue to run etherpad.openstack.org since dns won't necessarily be up yet and comms won't have been sent out yet)
19:33:39 <clarkb> just a nova api bookkeeping item.
19:34:16 <clarkb> Questions, concerns, thoughts on OpenDev before we move on to the next item?
19:34:31 <mnaser> (yay for opendev)
19:35:23 <Shrews> are we going to use resources donated for openstack for opendev?
19:35:29 <Shrews> any concerns around that?
19:35:32 * fungi is looking forward to typing two fewer keys for our urls ;)
19:36:04 <clarkb> Shrews: the clouds we've spoken too so far (mostly vexxhost and potential arm resources) seem tothink that being a little more generic is easier to sell
19:36:10 <clarkb> Shrews: but we haven't talked to all of them yet.
19:36:26 <clarkb> Shrews: what I like to point out is that tripleo + nova + neutron are ~80% of our resource utilization
19:36:31 <clarkb> and I don't expect that will change
19:36:44 <fungi> some of the sentiment was that resources are being donated to primarily serve the communities operated under the osf, and that won't really change
19:36:58 <Shrews> makes sense
19:37:16 <clarkb> Shrews: and we've always hosted related projects its just that many are unwilling to do so with "openstack" featured so prominently with branding and naming
19:37:21 <clarkb> its definitely somethign to be aware of though
19:37:41 <Shrews> 
19:37:48 <Shrews> k
19:38:27 <clarkb> Speaking of ARM we have a new linaro cloud region
19:38:32 <clarkb> ianw: ^ thank you for setting that up
19:39:04 <ianw> i think fungi has leads on more arm resources?
19:39:20 <clarkb> in parallel to that Gary Perkins is spinning up another arm cloud https://review.openstack.org/#/c/602436/
19:39:22 <ianw> there's certainly a review out for adding credentials
19:39:38 <clarkb> ya I think fungi got the secret portions of the credentials ^ fungi any idea if we are good to move forward on that at this point?
19:39:39 <fungi> yeah, if someone has time to pick up the creds are in a file in my homedir on bridge.o.o at the moment
19:40:08 <clarkb> ah, we need to add them to hiera but after that should be good to go?
19:40:13 <ianw> fungi: ahh, cool, i can take a look.  i know we got confirmation it works with the same images nb03.o.o is producing, which is good
19:40:15 <fungi> ~fungi/temp_clouds.yaml
19:40:30 <fungi> i was trying to get it working well enough with osc to reset the password
19:40:35 <fungi> but didn't get quite that far yet
19:40:47 <fungi> was running into some sort of error i think with my syntax there
19:41:40 <clarkb> ianw: ^ let me know if I can help too
19:41:41 <fungi> anyway, feel free to take that over, i was mostly acting as an in-person secrets receptacle
19:42:36 <clarkb> prometheanfire has been working to get gentoo images running in nodepool
19:42:54 <clarkb> I think those are close but may need some small tweaks still. Mostly a heads up that this is happening and probably still slightly broken
19:43:04 <ianw> i guess we should promote that to voting jobs in dib
19:43:24 <clarkb> most of the issues so far have been around unbound config, gentoo is different than other distros
19:43:34 <clarkb> I think we are past that particular set of problems now though
19:43:47 <ianw> my one concern is that nobody but prometheanfire has really every maintained those in dib
19:44:04 <clarkb> ya, I'm not really a gentooian myself having never run it
19:44:12 <clarkb> but we do have other gentoo people in the larger community (like ttx)
19:44:20 <Shrews> what's the need for gentoo?
19:44:32 <Shrews> just more os coverage?
19:44:46 <clarkb> Shrews: OSA is interested in supporting it, and the consequence of that will be that gentoo doesn't need their own tooling for openstack support
19:45:02 <Shrews> gotcha
19:45:05 <fungi> certainly a good option for some kinds of bleeding-edge versions of dependencies and platform testing
19:46:13 <Shrews> or for folks who really like waiting on compiles
19:46:58 <persia> There was some talk of precompiling most things for the test nodes
19:47:01 <clarkb> Heads up that OVH is doing openstack upgrades in BHS1 tonight. THis has a couple of consequences for us. First is that we'll disable nodepool in BHS1 this afternoon (I'll approve that change later today). The other is my IRC bouncer runs on ovh openstack in BHS1 so uh I may not be around in the morning while I recover :)
19:47:12 <fungi> if i ever get around to it i'll try to get debian unstable images building too (which is remarkably stable, contrary to its name, i've run it for a nearly two decades)
19:47:13 <clarkb> On October 10th they will do the same in GRA1
19:47:57 <clarkb> And the last thing on my list of agenda items is there is an openstack baord meeting in 13 minutes
19:48:03 <clarkb> #link http://lists.openstack.org/pipermail/foundation/2018-September/002620.html OpenStack Board Meeting after this meeting
19:48:16 <mordred> clarkb: ohai
19:48:22 <clarkb> This meeting will cover the plans for focus areas and new projects like zuul and starlingx and airship and kata
19:48:23 <mordred> clarkb: sorry my brain still isn't quite working
19:48:25 <fungi> a wild board member appears!
19:48:46 <clarkb> I know that particular topic is of interest to many around here so feel free to join
19:48:49 <Shrews> mordred: that implies that it once *did* work
19:49:04 <clarkb> I'll dial in then work on this conference talk slide deck probably :)
19:49:31 <clarkb> also lunch
19:49:36 <fungi> i have to miss the board call this time, but have confidence that others will fill me in
19:49:39 <clarkb> #topic Open Discussion
19:49:47 <ianw> fungi: my laptop sleep state would like to disagree with you on debian unstable stability ;)
19:49:56 <clarkb> Anything else? feel free to dig into mroe PTG related stuff too
19:50:00 <clarkb> ianw: ha
19:50:07 <fungi> ianw: fair, let's try to avoid putting our servers to sleep ;)
19:50:16 <clarkb> fungi: that is how live migration works fwiw
19:50:56 <fungi> something tells me we don't care if a live migration of one of our test nodes fails
19:52:46 <clarkb> mordred: when your brain does start working you may want to sync up on the current zuul cd to brideg issues
19:52:59 <clarkb> mordred: they are just tricky enough that I think we should consider them a bit before choosing a path forward
19:54:08 <ianw> if people want to look at the 3rd party ci spec, i think it's about done
19:54:10 <ianw> #link https://review.openstack.org/#/c/563849/
19:54:17 <mordred> clarkb: ++
19:54:47 <ianw> however, i'm also willing to promote softwarefactory to a more prominent "this is a solution", if that's what people want
19:55:14 <clarkb> ianw: I think its what people are doing, unsure if its what they all want
19:55:20 <clarkb> but those doing it with software factory do seem happy with it
19:55:27 <kmalloc> o/
19:56:21 * kmalloc reads backscroll and confirms what folks said re: volunteering
19:57:45 <clarkb> Seems like we are winding down now. Thank you everyone. If any of this doesn't make sense or is crazy or needs further clarification please do reach out to me or start a mailing list thread and we can discuss it further
19:58:05 <clarkb> (I'm happy to start a thread if you aren't comfortable with it either)
19:58:32 <clarkb> with that I'll give you all a minute to dial into the baord meeting if you choose or have breakfast or dinner etc :)
19:58:35 <clarkb> #endmeeting