19:01:10 <clarkb> #startmeeting infra 19:01:11 <openstack> Meeting started Tue Oct 23 19:01:10 2018 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:15 <openstack> The meeting name has been set to 'infra' 19:01:20 <ianw> o. 19:01:21 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:23 <ianw> o/ 19:01:30 <clarkb> #topic Announcements 19:01:50 <clarkb> Just another friendly reminder that the summit and forum fast appraoch and you may wish to double check the schedule 19:01:54 <clarkb> #link https://www.openstack.org/summit/berlin-2018/summit-schedule/#track=262 Berlin Forum schedule up for feedback 19:02:00 <fungi> so fast 19:02:20 <clarkb> I think it is fairly solid at this point so mostly just a reminder to go look at the schedule if you haven't arleady. Unsure if changes can be made easily now 19:02:38 <clarkb> fungi: There is rumor that food awaits me after the meeting. Good motivation :) 19:03:01 <fungi> oh, i was simply commenting on the fast approach of the summit and forum 19:03:06 <clarkb> oh that :) 19:03:07 <fungi> but yes, fast food too 19:03:13 <clarkb> indeed it is like 3 weeks away 19:03:49 <clarkb> I'll be visiting the dentist tomorrow midday, but don't expect any drilling or drugs to happen so should be around. Otherwise not seen any announcements 19:04:03 <clarkb> #topic Actions from last meeting 19:04:11 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-10-16-19.01.txt minutes from last meeting 19:04:32 <clarkb> No explicit actions called out but we did some work on things we talked about (which we'll get to later in the agenda) 19:05:30 <clarkb> #topic Specs approval 19:05:57 <clarkb> Any specs we should be aware of? I think we are mostly head down on implementing work that is already captured by specs or general maintenance 19:06:09 <clarkb> I guess the storyboard spec is still outstanding. And I keep meaning to read it 19:06:23 <fungi> you and a bunch of us 19:06:27 <fungi> well, at least me anyway 19:06:52 <clarkb> #link https://review.openstack.org/#/c/607377/ Storyboard attachments spec. Not ready, but input would be helpful 19:07:13 <clarkb> frickler has review comments. diablo_rojo not sure if you've seen those 19:08:10 <clarkb> #topic Priority Efforts 19:08:16 <clarkb> #topic Storyboard 19:08:23 <clarkb> That makes a good transition to storyboard topic 19:08:51 <diablo_rojo> clarkb, I saw them, was waiting for some more before I did an update 19:09:04 <diablo_rojo> just poked SotK for comments 19:09:06 <clarkb> diablo_rojo: ok, I'll try to take a look at it today after the nodepool builder work 19:09:14 <diablo_rojo> clarkb, that would be suuuper helpful 19:09:49 <diablo_rojo> fungi, too if you have some free time in between the house stuff. 19:09:52 <clarkb> mordred: not sure if you are here, but I think you started looking at using pbrx with storyboard to work on server upgrades? 19:09:54 <fungi> yep! 19:10:15 * clarkb will try to be a stand in for mordred 19:10:40 <clarkb> mordred volunteered to work on the storyboard trusty server upgrades and seems to have taken the appraoch of using storaybord(-dev) as an early container deployment system 19:10:59 <diablo_rojo> Seems like a good approach to me 19:11:01 <clarkb> The upside to this is as a python + js application it is actually fairly similar to zuul which means pbrx should work for it 19:11:05 <fungi> it's a fairly self-contained service so might be a good test case 19:11:30 <clarkb> I want to say the early work has been in udpating bindep.txt type files to have pbrx produce working container images for storyboard 19:11:52 <clarkb> if you are interested in this aspect of the config mgmt updates this is probably a good place to follow along and/or participate 19:11:57 <fungi> the api server is straightforward and the webclient is just a wad of files that need to be splatted somewhere a webserver can find 19:12:03 <clarkb> ianw: ^ you may have thoughts in particular since you've looked at similar with graphite 19:12:40 <fungi> though there's also the need to have rabbitmq running, and also the worker process 19:12:48 <fungi> so i guess not completely straightforward 19:13:06 <clarkb> fungi: it should be a good real world application though as it isn't trivial 19:13:17 <clarkb> while still tying into the toolin we've already built 19:13:19 <ianw> clarkb: yeah, i need to get back to the actual "get docker on control plane server bit", some prelim reviews out there that didn't quite work 19:14:13 <clarkb> diablo_rojo: fungi any other storyboard topics to bring up before we move on to the config mgmt udpates? 19:14:28 <fungi> nah 19:14:28 <diablo_rojo> I don't think so. 19:14:34 <fungi> thanks! 19:14:41 <diablo_rojo> Thank you :) 19:14:43 <clarkb> #topic Config Mgmt Update 19:15:14 <clarkb> As mentioned in the context of storyboard the work of updating our config management processes continues to happen 19:16:05 <fungi> i guess there are a couple base classes of work going on there? 1. contanierizing things, and 2. automatic deployment/replacement of containers? 19:16:07 <clarkb> #link https://review.openstack.org/#/c/604925/ add zuul user to bridge.o.o for CD activities could use a second review 19:16:39 <corvus> also https://review.openstack.org/609556 19:16:45 <clarkb> fungi: ya part of the spec is the idea that we'll build images regularly to pick up updates and to avoid being stuck on insecure software unexpectedly 19:16:50 <corvus> i guess i should have used the topic on that 19:16:59 <clarkb> fungi: to make that useful we also need to update the running deployments with the newi mages 19:17:19 <clarkb> #link https://review.openstack.org/#/c/609556/ Install ansible 2.7.0 release on bridge.openstack.org 19:18:03 <clarkb> mostly I think the work here needs a few reviews. The topic:puppet-4 stuff is largely blocked on reviews as well 19:18:24 <clarkb> if we can find time to take a look at topic:puppet-4 and topic:update-cfg-mgmt we should be able to make some pretty big progress over the short term 19:18:42 <dmsimard> FWIW I sent patches to see what it would look like to enable ara on bridge.o.o, would love feedback https://review.openstack.org/#/q/topic:ara-on-bridge 19:19:11 <clarkb> #link https://review.openstack.org/#/q/topic:ara-on-bridge changes to run ara on the bridge server to visualize ansible runs there 19:19:37 <corvus> should we change the topic to update-cfg-mgmt? 19:19:49 <corvus> i think ara is in scope for that 19:19:52 <clarkb> corvus: dmsimard ++ I think this is an aid for that spec and could use the topic 19:19:58 <dmsimard> sure 19:20:35 <dmsimard> updated 19:21:44 <clarkb> Other than "please review as you can" any other items for this topic? 19:22:06 <corvus> re ara -- 19:22:19 <corvus> do we need a mysql db for that or are we going to stick with sqlite for now? 19:22:35 <dmsimard> sqlite works well in the context of executors, each job is running on it's own database 19:22:41 <dmsimard> the problem with sqlite is write concurrency 19:22:41 <fungi> sounded like there was a concurrent write issue with sqlite? 19:22:46 <fungi> yeah, that 19:23:03 <corvus> so we should have an infra-root set up a mysql db before we land those ara changes? 19:23:13 * mordred waves 19:23:28 <fungi> do we want trove or just orchestrate a mysql service onto the bridge server? 19:23:29 <clarkb> or add that to the changes and run one locally? 19:23:45 <dmsimard> corvus: the legwork is done in hostvars on bridge.o.o, pabelanger helped 19:24:12 <clarkb> dmsimard: meaning there is already a mysql db running somewhere? 19:24:24 <dmsimard> yes, a trove instance 19:25:40 <corvus> dmsimard: did he provide the connection info so we can update that change to use it? 19:25:41 <dmsimard> We can definitely start with sqlite, I'm just not sure how things might behave with concurrency 19:25:55 <corvus> let's not start with sqlite, let's use the trove which apparently exists :) 19:26:05 <clarkb> and would be helpful if we remember to #status log changes like that that happen outside of config mgmt 19:26:16 <dmsimard> clarkb: my default 19:26:19 <dmsimard> fault* 19:26:44 <clarkb> corvus: ++ 19:26:52 <dmsimard> corvus: the connection information is secret though, so it needs to live on bridge.o.o ? 19:27:20 <dmsimard> I guess we could add a node to the job, install mysql on it and use that for the testinfra tests 19:27:23 <corvus> dmsimard: only the password i think; paul should have added that to private hiera and given you the key he put it under so the change can reference thta key 19:27:46 <corvus> i think sqlite is ok for testinfra 19:27:55 <fungi> if it's a trove instance, we usually keep the hostname secret too 19:28:05 <corvus> ok 2 things then :) 19:28:14 <fungi> since there is no connection security there 19:28:16 <dmsimard> corvus: I'm the one who committed the changes, they're in hostvars for bridge.o.o as well as ara.o.o 19:28:30 <ianw> is there some potential for keys/passwords to make it into this? do we care? 19:28:43 <fungi> any machine for any tenant in that same region can open sockets to the trove mysql service for it, after all 19:28:56 <dmsimard> pabelanger helped me through the process of doing that, he didn't do it for me, sorry for the confusion 19:28:59 <clarkb> ianw: that was going to be my next question. Do we need to use no_log type attributes on tasks to avoid data exposure? 19:29:00 <corvus> dmsimard: oh i thought you said paul did it. ok then :) 19:29:21 <clarkb> maybe to start we should run it locally only and we have to ssh port forward into bridge to see the web server? 19:29:27 <clarkb> to get a feel for what data is exposed 19:29:33 <corvus> clarkb: wfm 19:29:34 <dmsimard> clarkb: yes 19:29:58 <dmsimard> ianw: and yes, data can be exposed 19:30:02 <fungi> sounds fine 19:30:12 <fungi> (not publishing the ara reports i mean) 19:30:25 <clarkb> ok lets start with localhost only webserver to see what we are exposing, remediate it and learn from there then maybe make it public 19:30:47 <corvus> and go ahead and have bridge report to mysql 19:30:47 <clarkb> (goal should be making it open in some capacity though) 19:30:52 <clarkb> corvus: yup 19:31:07 <ianw> oh, i wasn't really thinking publishing the website, i was more back at "data at rest not on bridge.o.o" tbh 19:31:17 <fungi> yeah, eventually it would be nice to get back what we lost with puppetboard being abandoned 19:31:20 <dmsimard> clarkb: the original spec was to replace puppetboard 19:31:24 <corvus> dmsimard: thanks for this, it's going to help a lot :) 19:32:03 <clarkb> ianw: ah the actual db contents themselves 19:32:20 <fungi> that's also a great point 19:32:38 <dmsimard> are the trove instances on the public internet ? 19:32:43 <clarkb> dmsimard: no 19:32:47 <fungi> possible we're compromising the security of bridge.o.o by putting the database somewhere not locally on that server 19:32:52 <clarkb> they are on the rax private network 19:32:56 <fungi> dmsimard: they might as well be though 19:33:10 <clarkb> which is "private" 19:33:12 <fungi> they're on the shared "private" network which all rackspace tenants can access 19:33:49 <dmsimard> if this is a concern, putting mysql on bridge.o.o is another option which would also happen to reduce latency and improve performance (i.e, no roundtrip to put stuff in a remote database far away) 19:33:49 <fungi> so you need access to a rackspace account/vm and to know (or guess) the database address and credentials 19:34:02 <fungi> or a pre-authentication zero-day vulnerability in mysql 19:34:14 <dmsimard> but then we might as well set up the web application there as well (instead of a dedicated ara.o.o server) 19:34:18 <ianw> yeah, also possibly the communications aren't encrypted? 19:34:19 <corvus> bridge is not sized correctly to even run ansible; there's no way we can add mysql there without a rebuild. 19:35:15 <fungi> ianw: no possible about it. the database connections aren't encrypted, so someone with the ability to subvert that network could also sniff them maybe? i can't remember whether they're sent in the clear or as a challenge/response 19:35:47 <clarkb> another option would be to run a mysql off host ourselves 19:35:53 <clarkb> then add encryption and such 19:35:57 <fungi> but also possible to inject into an established socket with some effort i guess 19:36:03 <clarkb> then we don't need to rebuild bridge.o.o for this 19:36:34 <dmsimard> Need to drop momentarily to pick up kids 19:36:38 <corvus> meh, we need to rebuild bridge anyway, so if we want to stick with on-host, i wouldn't count that as a strike against it 19:36:39 <clarkb> also possible trove has grown the feature to allow us to use tls/ssl for connections 19:36:44 <clarkb> corvus: fair neough 19:36:52 <fungi> maybe we use trove initially with the understanding that when (not if) we resize bridge (because it's already not appropriately-sized for ansible anyway) we add enough capacity to also put mysql on it? 19:36:58 <fungi> or also what corvus said 19:37:03 <clarkb> why don't we pick this back up in #openstack-infra when dmsimard can rejoin? and we can continue with the meeting agenda? 19:37:06 <ianw> it may also be that we consider it low risk enough that we are leaking in the first place, it doesn't matter 19:37:13 <ianw> or what fungi said :) ++ 19:37:34 <clarkb> I do think being careful around this is worthwhile while we work to understand what we are exposing 19:37:36 <corvus> yeah, we're not *supposed* to be putting private stuff in this db, the question is what happens if we accidentally do 19:37:54 <fungi> we're all just repeating one another at this point anyway, so yeah we can pick it back up in #-infra later ;) 19:38:13 <corvus> maybe we should pick it back up in #-infra later 19:38:19 <fungi> heh 19:38:21 <clarkb> #topic General topics 19:38:33 <clarkb> I'll take that as a cue 19:39:12 <clarkb> OpenDev is a thing and we now have some concrete early task items planned out. Step 0 is getting dns servers running for opendev.org (thank your corvus for getting this going) 19:39:35 <clarkb> before we can start using opendev.org everywhere we need to communicate what we mean by opendev and what the goals we have are 19:40:06 <clarkb> #link https://etherpad.openstack.org/p/infra-to-opendev-messaging is the document we started working with yesterday between the infra team and the foundation staff to try and capture some of that 19:40:43 <clarkb> I will be working with the foundation staff to write up the short message with a Q&A and use that to send a community wide email type thing. There is also plan to talk about this at the summit at least at the board meeting 19:41:10 <clarkb> My hunch is once we get through the summit we'll be able to start concretely using opendev and maybe etherpad becomes etherpad.opendev.org the week after summit 19:41:35 <clarkb> Still a lot of details to sort out, I expect we'll be reviewing documents in the near future to make sure they make sense from our perspective 19:41:58 <clarkb> and for anyone wondering the current working one liner is: "Community hosted tools and infrastructure for developing and maintaining open source software." 19:42:17 <clarkb> questions, concerns about ^? 19:43:38 <fungi> i like the specificity of "free/libre open source software" but as a first go it's not bad 19:44:09 <fungi> "community hosted and operated" might also be a good expansion 19:44:16 <clarkb> Feel free to reach out to me directly as well if this venue doesn't work for you 19:44:24 <clarkb> I'm reachable via email and irc 19:44:32 <fungi> something to make it more obvious we're running these things as a community, not just hosting them for communities 19:44:34 <clarkb> fungi: you should add those thoughts to the etherpad :) 19:44:36 <fungi> yup 19:44:46 <corvus> ++ 19:45:38 <clarkb> Next item is the trusty upgrade sprint. Last week ended up being a wierd week for this as various individuals had less availabiltiy then initially anticipated 19:46:08 <clarkb> That said I did manage to upgrade logstash.o.o, etherpad-dev.o.o, and etherpad.o.o and am working with shrews this week to delete nodepool.o.o and use our newer zookeeper cluster instead 19:46:17 <clarkb> Thank you to all of you that helped me with random reviews to make that possible 19:46:45 <clarkb> I'd like to keep pushing on this as a persistent thing because trusty EOL is about 6 months away 19:46:59 <clarkb> if we are able to do a handful of servers a week we should be done relatively quickly 19:47:26 <clarkb> #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup list of trusty servers that need to be upgraded. Please put your name next to those you can help with 19:48:17 <clarkb> That brings up the last thing I wanted to talk about which is the zookeeper cluster move 19:48:42 <clarkb> I got the zookeeper cluster up and running as a cluster yesterday. Shrews and I will be moving the nodepool builders to the cluster after this meeting 19:48:58 <clarkb> Those will then run for a day and a half or so to populate images on the new db 19:49:27 <clarkb> Then on thursday I'd like us to update the zookeeper config on the nodepool launchers and zuul scheduler to use the new cluster 19:49:47 <clarkb> We will likely implement this as a full cluster shutdown (may as well get everything running latest code, and maybe zuul can make a release based on that) 19:50:24 <clarkb> hrm apparently thursday is the stein-1 milestone day though :/ 19:50:26 <fungi> sounds great, thanks for working on that 19:50:37 <clarkb> we might have to wait for the milestone stuff to happen first? 19:50:43 <clarkb> I'll talk to release team after htis meeting 19:50:53 <fungi> milestone 1 is usually a non-event, especially now that openstack is no longer tagging milestones 19:51:03 <clarkb> oh in that case ya should be fine. But I will double check 19:51:29 <fungi> though they may force releases for cycle-with-intermediary projects which haven't released prior to the milestone week 19:51:30 <clarkb> Once all that is done we'll want to go back through and cleanup any images and nodepool nodes that are not automatically cleaned up 19:52:10 <clarkb> I'm volunteering myself to drive and do a lot of that because I do think this is an important switch and early cycle is the time to do it 19:52:20 <clarkb> you are now all warned and if you want to follow along or help let me know :) 19:52:25 <clarkb> #topic Open Discussion 19:52:34 <clarkb> And now ~7 minutes for anything else 19:52:45 <Shrews> we need to get them cleaned up since the image numbers will reset (and thus avoid name collisions), but we should have plenty of leeway on doing that 19:53:51 <clarkb> thats a good point, but ya ~1 year of nodepool with zk should give us a big runway for that 19:54:11 <Shrews> smallest sequence number i saw was 300 some, but still plenty 19:54:24 <corvus> we will definitely. almost certainly. probably. get it all cleaned up in a year. 19:54:44 <fungi> i like your optimism! 19:56:44 <clarkb> if anyone is wondering zk has weird way of determining which ip address to listen to 19:57:03 <clarkb> you configure a list of all the cluster members and for the entry that belongs to the current host it binds to the address in that entry 19:57:23 <clarkb> so if you use hostnames that resolve to localhost because /etc/hosts then you only listen to localhost and can't talk to your cluster 19:58:15 <clarkb> looks/sounds like we are basically done here. Thank you everyone. Find us on the openstack-infra mailing list or in #openstack-infra for further discussion 19:58:23 <clarkb> I'll go ahead and end the meeting now 19:58:25 <clarkb> #endmeeting