19:01:09 <clarkb> #startmeeting infra 19:01:10 <openstack> Meeting started Tue Sep 18 19:01:09 2018 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:14 <openstack> The meeting name has been set to 'infra' 19:01:18 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:32 <clarkb> #topic Announcements 19:01:34 <Shrews> ohai 19:01:51 <diablo_rojo> o/ 19:02:06 <clarkb> Mostly just a reminder that last week was the PTG and this week it seems like a good chunk of the team is either on vacation or something resembling vacation so that they can deal with a hurricane 19:02:39 <clarkb> #topic Actions from last meeting 19:02:50 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-09-04-19.01.txt minutes from last meeting 19:03:17 <clarkb> There weren't any concrete actions listed. It was prep for PTG for many of us I think 19:03:38 <clarkb> Maybe I should do a high level recap of the PTG then we can dig in through the rest of our agenda 19:03:45 <clarkb> #topic High level PTG recap 19:04:02 <ianw> ++ 19:04:08 <clarkb> The first two days of the PTG the infra team was in the "Ask Me Anything" helproom 19:04:48 <clarkb> This ended up being fairly productive. We ended up helping OSA with various zuul related job questions as well as pointing them in the direction of how to reproduce those builds locally (I think we may see them push changes to make that possible in the near future) 19:05:06 <clarkb> Swift came by a couple times as they were working on multinode functional testing and doc build updates 19:05:41 <notmyname> clarkb: and we're just waiting on zuul to land it! thanks for your help (https://review.openstack.org/#/c/601686/) 19:05:43 <tdasilva> thanks for the help :) 19:06:01 <clarkb> Tripleo/RDO are interesting in how we might do OVB (something virtual baremetal) testing. Which requires working PXE boot which requires layer 2 broadcast for dhcp in the current design 19:07:00 <clarkb> one suggestion was to test the pxe provisioning independent of the workload those nodes run so that OVB can boot lightweight instances using our existing network overlays and qemu then test the workload on nodepool provisioned nodes so that nested virt isn't a problem 19:07:19 <clarkb> Sahara and others were interested in third part CI as well as zuul v3 19:07:33 <clarkb> ianw: ^ I tried pushing people to your spec to see if we could get a volunteer to write out what that looks like in practice 19:07:46 <clarkb> ianw: the other popular idea was to just use software factory 19:07:59 <clarkb> which apparently works very well out of the box at small scale (but needs tuning as things grow) 19:08:27 <clarkb> I also helped StarlingX sketch out their docs publishing which intersected with AFS ( you may have seen chagnes from dtroyer and myself for that which I think are not yet merged ) 19:08:27 <ianw> i've actually updated that spec, so was interested if there was talk on it 19:08:53 <clarkb> ianw: the sense I got was that for most users they would be fine just using software factory 19:09:00 <clarkb> which is maybe a reasonable thing to point people at 19:09:20 <clarkb> Jan from I forget South African networking company in particular was a fan of that 19:09:29 <clarkb> (I'm really bad with names sorry) 19:09:39 <Shrews> yeah, i think SF is pretty solid and close to what we run 19:09:57 <Shrews> a few extras 19:10:05 <clarkb> Another recurrent topic in the helproom was working around nested virt 19:10:57 <clarkb> I think where we have gotten with that is if we can continue to push on our cloud providers to have better communications channels we can explore the possibility of nested virt enabled flavors. Then jobs are best effort if they run on that. Basically you'd have to talk to the cloud not infra if they break in weird ways. I think that should happen once we've largely moved to bionic though 19:11:04 <clarkb> new kernels seem to lead to new nested virt trouble 19:11:23 <clarkb> That was Monday and Tuesday 19:11:49 <clarkb> Wednesday and Thurday were infra dedicated time. The first day was largely spent getting Zuul to trigger CD updates of Zuul via bridge.o.o 19:12:21 <clarkb> Turns out this is more complicated than we originally anticipated. Current status is we have a job that will add bridge.o.o to its inventory then fail to ssh to bridge.o.o because we only allow ssh in as root from bridge and puppetmaster 19:12:33 <clarkb> (we can talk about this more in the priority topic section of the meeting) 19:13:03 * fungi is sorta here for a few minutes, but not completely and may disappear again shortly 19:13:05 <clarkb> To get to that point we addressed quite a few of the smaller issues we've run into with ansible. Including upgrading ansible to 2.7.0rcX on bridge.o.o which seemed to speed up execution of ansible 19:13:17 <clarkb> Overall it was quite useful in getting ansible things running more reliably 19:13:58 <clarkb> Thursday was the last day of Infra dedicated time and we used it to talk about the other items on our etherpad. I and others tried to take notes about the conversation there. /me picks out some highlights 19:14:21 <clarkb> The openstack transition from xenial to bionic will ideally be managed by the openstack project with infra and zuul teams assisting 19:14:40 <clarkb> fungi: will be sending out mailing list merge plan details as soon as the hurricane allows 19:14:47 <fungi> yup 19:15:01 <fungi> the list creation change has been up for review since friday, btw 19:15:11 <clarkb> The lounge has come up again as a thing we shoudl try to run to make irc more friendly to new users 19:15:23 <clarkb> ttx volunteered to make that a 20% time project of his 19:15:49 <fungi> #link https://review.openstack.org/602781 Create the OpenStack discussion mailing list 19:15:55 <clarkb> mostly need to figure out authentication for our users there 19:16:39 <clarkb> And finally there was a conversation in which kmalloc volunteered to start buildign a proof of concept for authentication/identity aggregation so that we can point gerrit and wiki and storyboard etc at a single identity provider that proxies for say launchpad, openstackid, github, or whatever else people want 19:16:39 <fungi> ideally 602781 will be merged and the resulting list locked down before i announce its existence 19:16:55 <clarkb> fungi: would it be best for us to +2 then you can approve and lock down when ready? 19:17:01 <fungi> sure, wfm 19:17:18 <clarkb> I think that is it for a high level recap 19:17:33 <clarkb> Before we talk the config management update priority effort anything I missed people want more info on? 19:17:41 <clarkb> or topics that I skimmed too much on in general? 19:18:28 <Shrews> any experimenting/discussing the dockerhub zuul/nodepool containers? 19:18:57 <clarkb> Shrews: we did discuss some of the pain points that SpamapS ran into with the pbrx generated images in the helproom 19:19:29 <clarkb> Shrews: I think we had general consensus that we should improve zuul's default behavior around daemonizing to be more container friendly as well as improving the documentation aroudn what each image is/does 19:19:46 <clarkb> but otherwise SpamapS said the images were working for him they just had to have some minor tweaks done 19:19:56 <Shrews> cool 19:20:17 <clarkb> zuul-base vs zuul image is confusing. Also you don't want to daemonize by default in docker containers 19:20:52 <clarkb> #topic Update Config Management 19:21:07 <clarkb> Don't forget to review 'topic:puppet-4 and topic:update-cfg-mgmt' changes in gerrit 19:21:46 <clarkb> This is what msot of the time was dedicated to at the PTG. cmurphy has got quite a few services puppeting under puppet 4 parser (and improved the inventory listing to make it less merge conflicty) 19:22:26 <clarkb> Then mordred corvus jhesketh myself et al worked on setting up Zuul to trigger ansible playbook runs on bridge.o.o so that we can run CD from zuul 19:23:07 <clarkb> There are two big things we need to address to make ^ possible. The first is we only allow ssh as root into the bridge from the bridge and puppetmaster. We need a zuul user on bridge or we need to allow the zuul executors to ssh into bridge.o.o as root 19:23:22 <clarkb> I'm actually leaning towards adding a zuul user myself since we already manage arbitrary users that can ssh in and sudo 19:23:42 <clarkb> The other item is our support for nil nodeset jobs is pretty basic in our base jobs 19:24:06 <clarkb> we don't get all the same logging for example nor do we clean out the master ssh key from the ssh agent like we do on jobs with a nodeset 19:25:00 <clarkb> Another item to be aware of is mordred is writing a new inventory plugin for us that will more quickly filter and add nodes to groups than the current constructed plugin can do (the constructed plugin accounts for like 2 minutes of every ansible-playbook run right now, it isn't fast) 19:25:19 <clarkb> mordred: ^ you aren't around today to talk about that are you 19:25:37 <clarkb> there were some unexpected runtime behaviors around removing nodes from groups that I don't think have been solved yet 19:26:15 <Shrews> 2 minutes? eek 19:26:34 <clarkb> Shrews: ya its creating and evaluating a jinja template for all our groups X all our hosts 19:26:40 <clarkb> something like 10k jinja template evaluations 19:26:53 <fungi> matrix expansion at its finest 19:27:12 <Shrews> cartesian inventory ftw 19:27:42 <clarkb> to summarize this topic we are in a much more reliable state for running ansible frequently in a loop than we were last week. But still work to be done to get zuul talking to bridge.o.o to trigger ansible runs there 19:28:01 <clarkb> Also the puppet-4 futureparser work continues and if you can help review those changes I'm more than happy to approve them when I can monitor them :) 19:28:19 <clarkb> #topic General Topics 19:28:33 <clarkb> Lets keep moving because we have half an hour left and plenty of other items related to PTG to talk about :) 19:28:42 <clarkb> OpenDev 19:29:28 <clarkb> it hda been about 3 weeks without seeing any strong opposition to the OpenDev name (the only thing I saw aws concern about confusion with the conference and fungi made a strong argument for why that shouldn't be a problem and that seemed to settle it on the therpad) 19:29:56 <clarkb> I went ahead and decided it was important for us to keep making progress on this and did the no news is good news thing and sent email to the list saying we should go with it 19:30:28 <clarkb> one thing corvus brought up at the PTG is that we should enlist the help of the OpenStack Foundation to draft clear communications about what this means for existing and potential new projects 19:30:52 <clarkb> one concern in particular being that if we aren't careful we could send the message that this is just a rename of the openstack infra team and they exist only to serve openstack the project 19:31:12 <clarkb> I've got it on my todo to talk to the foundation about drafting this 19:31:51 <clarkb> From a technical perspective we will spin up ns1.opendev.org ns2.opendev.org and adns1.opendev.org. Host opendev.org DNS there and then migrate zuul-ci.org DNS there then delete ns*.openstack.org 19:32:01 <clarkb> that is sort of a step 0 to starting to host things with that name branding. 19:32:07 <clarkb> corvus: has volunteered to spin those up 19:32:32 <fungi> just want to chime in with my support for self-hosting dns as the day-zero prerequisite 19:33:06 <clarkb> The last related item on my list si that we should consider booting new servers eg etherpad01.openstack.org replacement as etherpad01.opendev.org in nova so that we don't have to replace everything once we do trusty upgrades 19:33:31 <clarkb> that will likely be a case by case basis (and in that particular example it would continue to run etherpad.openstack.org since dns won't necessarily be up yet and comms won't have been sent out yet) 19:33:39 <clarkb> just a nova api bookkeeping item. 19:34:16 <clarkb> Questions, concerns, thoughts on OpenDev before we move on to the next item? 19:34:31 <mnaser> (yay for opendev) 19:35:23 <Shrews> are we going to use resources donated for openstack for opendev? 19:35:29 <Shrews> any concerns around that? 19:35:32 * fungi is looking forward to typing two fewer keys for our urls ;) 19:36:04 <clarkb> Shrews: the clouds we've spoken too so far (mostly vexxhost and potential arm resources) seem tothink that being a little more generic is easier to sell 19:36:10 <clarkb> Shrews: but we haven't talked to all of them yet. 19:36:26 <clarkb> Shrews: what I like to point out is that tripleo + nova + neutron are ~80% of our resource utilization 19:36:31 <clarkb> and I don't expect that will change 19:36:44 <fungi> some of the sentiment was that resources are being donated to primarily serve the communities operated under the osf, and that won't really change 19:36:58 <Shrews> makes sense 19:37:16 <clarkb> Shrews: and we've always hosted related projects its just that many are unwilling to do so with "openstack" featured so prominently with branding and naming 19:37:21 <clarkb> its definitely somethign to be aware of though 19:37:41 <Shrews> 19:37:48 <Shrews> k 19:38:27 <clarkb> Speaking of ARM we have a new linaro cloud region 19:38:32 <clarkb> ianw: ^ thank you for setting that up 19:39:04 <ianw> i think fungi has leads on more arm resources? 19:39:20 <clarkb> in parallel to that Gary Perkins is spinning up another arm cloud https://review.openstack.org/#/c/602436/ 19:39:22 <ianw> there's certainly a review out for adding credentials 19:39:38 <clarkb> ya I think fungi got the secret portions of the credentials ^ fungi any idea if we are good to move forward on that at this point? 19:39:39 <fungi> yeah, if someone has time to pick up the creds are in a file in my homedir on bridge.o.o at the moment 19:40:08 <clarkb> ah, we need to add them to hiera but after that should be good to go? 19:40:13 <ianw> fungi: ahh, cool, i can take a look. i know we got confirmation it works with the same images nb03.o.o is producing, which is good 19:40:15 <fungi> ~fungi/temp_clouds.yaml 19:40:30 <fungi> i was trying to get it working well enough with osc to reset the password 19:40:35 <fungi> but didn't get quite that far yet 19:40:47 <fungi> was running into some sort of error i think with my syntax there 19:41:40 <clarkb> ianw: ^ let me know if I can help too 19:41:41 <fungi> anyway, feel free to take that over, i was mostly acting as an in-person secrets receptacle 19:42:36 <clarkb> prometheanfire has been working to get gentoo images running in nodepool 19:42:54 <clarkb> I think those are close but may need some small tweaks still. Mostly a heads up that this is happening and probably still slightly broken 19:43:04 <ianw> i guess we should promote that to voting jobs in dib 19:43:24 <clarkb> most of the issues so far have been around unbound config, gentoo is different than other distros 19:43:34 <clarkb> I think we are past that particular set of problems now though 19:43:47 <ianw> my one concern is that nobody but prometheanfire has really every maintained those in dib 19:44:04 <clarkb> ya, I'm not really a gentooian myself having never run it 19:44:12 <clarkb> but we do have other gentoo people in the larger community (like ttx) 19:44:20 <Shrews> what's the need for gentoo? 19:44:32 <Shrews> just more os coverage? 19:44:46 <clarkb> Shrews: OSA is interested in supporting it, and the consequence of that will be that gentoo doesn't need their own tooling for openstack support 19:45:02 <Shrews> gotcha 19:45:05 <fungi> certainly a good option for some kinds of bleeding-edge versions of dependencies and platform testing 19:46:13 <Shrews> or for folks who really like waiting on compiles 19:46:58 <persia> There was some talk of precompiling most things for the test nodes 19:47:01 <clarkb> Heads up that OVH is doing openstack upgrades in BHS1 tonight. THis has a couple of consequences for us. First is that we'll disable nodepool in BHS1 this afternoon (I'll approve that change later today). The other is my IRC bouncer runs on ovh openstack in BHS1 so uh I may not be around in the morning while I recover :) 19:47:12 <fungi> if i ever get around to it i'll try to get debian unstable images building too (which is remarkably stable, contrary to its name, i've run it for a nearly two decades) 19:47:13 <clarkb> On October 10th they will do the same in GRA1 19:47:57 <clarkb> And the last thing on my list of agenda items is there is an openstack baord meeting in 13 minutes 19:48:03 <clarkb> #link http://lists.openstack.org/pipermail/foundation/2018-September/002620.html OpenStack Board Meeting after this meeting 19:48:16 <mordred> clarkb: ohai 19:48:22 <clarkb> This meeting will cover the plans for focus areas and new projects like zuul and starlingx and airship and kata 19:48:23 <mordred> clarkb: sorry my brain still isn't quite working 19:48:25 <fungi> a wild board member appears! 19:48:46 <clarkb> I know that particular topic is of interest to many around here so feel free to join 19:48:49 <Shrews> mordred: that implies that it once *did* work 19:49:04 <clarkb> I'll dial in then work on this conference talk slide deck probably :) 19:49:31 <clarkb> also lunch 19:49:36 <fungi> i have to miss the board call this time, but have confidence that others will fill me in 19:49:39 <clarkb> #topic Open Discussion 19:49:47 <ianw> fungi: my laptop sleep state would like to disagree with you on debian unstable stability ;) 19:49:56 <clarkb> Anything else? feel free to dig into mroe PTG related stuff too 19:50:00 <clarkb> ianw: ha 19:50:07 <fungi> ianw: fair, let's try to avoid putting our servers to sleep ;) 19:50:16 <clarkb> fungi: that is how live migration works fwiw 19:50:56 <fungi> something tells me we don't care if a live migration of one of our test nodes fails 19:52:46 <clarkb> mordred: when your brain does start working you may want to sync up on the current zuul cd to brideg issues 19:52:59 <clarkb> mordred: they are just tricky enough that I think we should consider them a bit before choosing a path forward 19:54:08 <ianw> if people want to look at the 3rd party ci spec, i think it's about done 19:54:10 <ianw> #link https://review.openstack.org/#/c/563849/ 19:54:17 <mordred> clarkb: ++ 19:54:47 <ianw> however, i'm also willing to promote softwarefactory to a more prominent "this is a solution", if that's what people want 19:55:14 <clarkb> ianw: I think its what people are doing, unsure if its what they all want 19:55:20 <clarkb> but those doing it with software factory do seem happy with it 19:55:27 <kmalloc> o/ 19:56:21 * kmalloc reads backscroll and confirms what folks said re: volunteering 19:57:45 <clarkb> Seems like we are winding down now. Thank you everyone. If any of this doesn't make sense or is crazy or needs further clarification please do reach out to me or start a mailing list thread and we can discuss it further 19:58:05 <clarkb> (I'm happy to start a thread if you aren't comfortable with it either) 19:58:32 <clarkb> with that I'll give you all a minute to dial into the baord meeting if you choose or have breakfast or dinner etc :) 19:58:35 <clarkb> #endmeeting