19:01:22 <clarkb> #startmeeting infra
19:01:23 <openstack> Meeting started Tue Jun 11 19:01:22 2019 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:26 <openstack> The meeting name has been set to 'infra'
19:01:45 <clarkb> #link http://lists.openstack.org/pipermail/openstack-infra/2019-June/006399.html
19:02:01 <clarkb> That is the agenda I sent out yesterday. We'll have a couple extra items compared to that that showed up late
19:02:11 <clarkb> #topic Announcements
19:02:34 <clarkb> I'm going to be out next week so we will need a volunteer to run the meeting next week
19:02:52 <clarkb> Also we started enforcing SPF failures on our mailman listserver
19:03:18 <clarkb> I think fungi has decided that qq.com does have valid servers in their policy so we should only be dropping the invalid sources now
19:03:28 <fungi> yup
19:03:39 <fungi> so far it's really knocked the spam volume waaaaay down
19:04:45 <clarkb> yay, here is hoping we can stop blackholing the list admin addrs
19:05:11 <clarkb> #topic Actions from last meeting
19:05:20 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-06-04-19.01.txt minutes from last meeting
19:05:32 <clarkb> corvus was going to do something with the storyboard slow log data
19:06:01 <clarkb> corvus: ^ did that end up happening?
19:06:02 <corvus> i ended up sticking the files in afs for the storyboard team to work with
19:06:24 <clarkb> awesome
19:06:29 <corvus> i ran a simple report and mordred gave suggestions on using a tool to do more investigation if warranted
19:06:50 <clarkb> It sounded like SotK knew where to look in storyboard given what the slow log was telling us
19:06:53 <corvus> i guess we should shut off the slow query log now though
19:06:58 <corvus> before it fills the disk
19:07:26 <fungi> or at least truncate the existing one and gather new data now that a few changes landed over the weekend to smiplify a few queries
19:07:40 <corvus> oh cool, given that, yeah
19:08:11 <clarkb> sounds like a plan, thanks
19:08:53 <clarkb> #topic Priority Efforts
19:09:07 <clarkb> #topic Update Config Management
19:09:35 <clarkb> ianw has some ansiblification work we'll talk about a bit later. Other than that any progress on other portions of this effort?
19:11:13 <clarkb> sounds like no? Given that lets move on to the next topic
19:11:26 <clarkb> #topic OpenDev
19:12:22 <clarkb> This is where I've spent a good chunk of my time recently in a round about way. I've wanted to rebuild gitea06 which we wanted to build on our control plane images built by nodepool. This led me to digging into why nodepool wasn't reliably building images an dthen why it wasn't reliably uploading images
19:12:56 <fungi> and now it is!
19:13:00 <clarkb> I ended up clearing out /opt/dib_tmp on the nodepool builders to free up disk space so thatimages could build again. In that process I rebooted the builders to clear out stale mounts as well, and that restarted nodepool which updated openstacksdk
19:13:28 <clarkb> new openstack sdk couldn't upload to rax which was fixed by downgrading sdk and then mordred figured it out and fixed it properly in sdk so next release should be good
19:13:43 <clarkb> oh also discovered we had leaked a bunch of images in the process which I've worked to clean out of the clouds
19:13:56 <clarkb> and now ya I think nodepool is a lot happier and we should be ready to boot a gitea06 :)
19:14:42 <fungi> after which time we get to find out what's involved in transplanting a gitea database?
19:14:50 <clarkb> fungi: yup
19:15:00 * fungi hopes it's just mysqldump and source
19:15:04 <clarkb> fungi: also possibly the git content as well so that we don't have to rely on slow replicatio nfor that
19:15:07 <mordred> clarkb: the patches landed, so I submitted a release request
19:15:27 <clarkb> mordred: cool we should probably plan to restart nodepool services once the new release is installed on the nodepool nodes
19:15:44 <mordred> ++
19:16:03 <mordred> https://review.opendev.org/#/c/664585/ for anyone who is interested
19:16:16 <fungi> hopefully there aren't more regressions hiding in the recent sdk release we haven't found yet
19:16:57 <clarkb> nodepool was updated to exclude the broken releases as well
19:17:10 <corvus> as i understand it, this one was hard to find since it's particularly challenging to deploy a "rackspace" in testing.
19:17:29 <clarkb> ya
19:17:47 <clarkb> and it ended up being a really obtuse thing too
19:17:53 <mordred> yah
19:17:57 <clarkb> (type was wrong in a json document and srever only responded with a 400)
19:19:04 <clarkb> Any other opendev related updates?
19:21:09 <clarkb> #topic Storyboard
19:21:24 <clarkb> We talked about db slow log data collection and recent improvements to that already
19:21:32 <clarkb> Are there any other updates to bring up re storyboard?
19:21:55 <fungi> new feature merged over the weekend to allow auto-assigning of groups to security stories
19:22:28 <fungi> er, not assigning
19:22:38 <fungi> so projects can have groups mapped to them, and if those groups are flagged as security type then they get automatically subscribed to any story marked "security"
19:23:05 <fungi> there is also a distinction now between security stories and just normal private stories
19:23:14 <clarkb> That means people will be able to see them but won't get email notifications yet right?
19:23:27 <SotK> s/subscribed/added to the ACL for/ for clarity
19:23:35 <fungi> er, that right
19:23:50 <fungi> if the security checkbox is checked on creation, the story automatically starts out private and can't be made public until after creation
19:24:03 <fungi> which is a nice extra bit of safety
19:24:58 <clarkb> and we deploy from master too right? So these db fixes and new features should show up as they merge?
19:25:10 <fungi> yep. we have yet to exercise those changed features on the storyboard.openstack.org deployment but they should be available now
19:25:33 <clarkb> excellent
19:26:08 <clarkb> Anything else?
19:26:43 <fungi> some of the changes related to that series cleaned up and refactored the queries used for private story matching
19:26:59 <fungi> so hopefully at least some performance improvement there
19:27:34 <fungi> there was also at least one unrelated change which merged that should have improved one of the db queries, i think, but now i forget which it was
19:28:26 <clarkb> Ok lets move on
19:28:29 <fungi> oh, and i approved the project-config change today where karbor is moving to sb
19:28:29 <clarkb> #topic General Topics
19:28:31 <fungi> that's all i had
19:28:58 <clarkb> fungi: mordred: any updates or things we can help with wiki and status upgrades?
19:29:05 <clarkb> #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup
19:30:23 <fungi> looks like puppet may not be updating on wiki-dev, i need to dig into why
19:31:02 <fungi> no mention of puppet or ansible in its syslog
19:31:36 <fungi> oh, right
19:31:46 <fungi> i keep looking at the original server not the replacement
19:32:14 <mordred> clarkb: nope - I'm just failing at working hours at the moment - nothing about it will be difficult best I can tell right now
19:32:49 <fungi> i think i'm now into trying to reconcile some missing upstream git repos for various mw extensions
19:32:58 <mordred> (something something selling a house is more work than moving out of an apartment something something)
19:33:28 <clarkb> mordred: I recommend you don't skip out on closing day (the people we bought our house from tried to)
19:33:52 <clarkb> alright next up is the https opendev mirror situation
19:33:53 <mordred> hah. yeah - nope, I'm ready to sign those papers and get that cash money
19:34:03 <clarkb> #link https://review.opendev.org/#/c/663854/ and parents to deploy more opendev mirrors
19:34:06 <mordred> (they walk in to the room with the entire purchase price in a box of $20s right?)
19:34:34 <corvus> it can take a while if you have to go to a bunch of atms
19:34:42 <clarkb> we are still observing the afs failures (cache related?) against the dfw opendev mirror
19:34:57 <clarkb> I think ianw intended on deploying a small number of extras to see if we can reproduce the behavior on them
19:35:02 <clarkb> the change above and its parents should get us there
19:35:13 <ianw> yeah, that's annoying; the hosts for that are up and ready
19:35:24 <clarkb> This is mostly just a call for reviews and notification that the afs stuff is still happening
19:35:40 <clarkb> ianw: oh cool, are they in dns and everything?
19:36:08 <fungi> i approved all of the stack except the one that puts them into production
19:36:09 <ianw> clarkb: umm, have to check the reviews but it's all out there
19:36:23 <clarkb> k
19:36:37 <clarkb> Next up is github replication
19:36:39 <ianw> i guess i'll have to get onto building openafs from upstream and we can roll that out to a host and see if it's more stable ...
19:36:50 <clarkb> ianw: ++
19:36:59 <clarkb> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005007.html
19:37:03 <corvus> mostly i wanted to find out if our attempt to get out of the business of dealing with github was working?  it seems that many folks are interested in replicating to github, and they seem to be asking for our help in doing it.  are people happy with this status quo?
19:37:49 <clarkb> I think the biggest problem we've run into so far is people that want a repo org transfer but don't want us to have temporary admin on the remote side
19:38:06 <mordred> I found a workaround for that which works
19:38:14 <clarkb> When we do get our user added as admin the process seems to go pretty smoothly
19:38:14 <mordred> that maybe we should update our docs to do?
19:38:28 <fungi> well, and that people who want repo org transfers need our help to do it (unless we add volunteers to the gh admins group who want to take that over)
19:38:42 <clarkb> fungi: which I think we intend on doing with openstack/ ?
19:38:48 <fungi> yeah
19:38:55 <mordred> which is - a) transfer repo to openstackadmin b) transfer repo to $otherhuman c) stop caring
19:39:02 <mordred> $otherhuman can then transfer to their org
19:39:09 <fungi> clarkb: i think that's the only one which matters in our case anyway
19:39:21 <clarkb> mordred: and all of the redirects still work ya?
19:39:25 <mordred> yes
19:39:44 <clarkb> cool. Given that maybe we should enlist some openstack volunteers, and write that process down then get out of hte business entirely :)
19:39:50 <mordred> ++
19:39:50 <fungi> likely we need to ask the openstack tc to drum up some volunteers to take over management of the openstack org on gh now
19:40:08 <corvus> in absolute terms, our interaction with github has increased dramatically, but it sounds like folks think this is a fairly time-limited thing still, and we should ramp down to zero soonish (like, this year)?
19:40:25 <corvus> or what clarkb just said :)
19:40:26 <clarkb> corvus: ya and for example with the repo renames we just did we didn't have to touch github at all which was great
19:40:33 <fungi> i would like to ramp down to 0 much sooner than the end of this year
19:40:39 <fungi> this month would be nice
19:41:15 <corvus> okay, that mostly involves working with the tc to set that up. yeah?
19:41:16 <mordred> and the opendevzuul app has moved to opendevorg - so there's no conflict there between that and a tc-delegated team managing the openstack gh org
19:41:16 <clarkb> Considering jroll's preexisting involvment maybe we should hand over volunteer enlistment to jroll?
19:41:21 <fungi> also we ought to figure out a timeline for getting gh out of our gerrit replication config
19:41:36 <mordred> we should probably make an opendevadmin account and make the opendevorg org owned by opendevadmin
19:41:42 <mordred> so that there is no remaining overlap
19:41:44 <clarkb> corvus: ya I think jroll is our contact on the tc to do that
19:41:49 <corvus> mordred: ++
19:42:00 <corvus> okay, how about i take on the 'work with jroll' task
19:42:06 <clarkb> corvus: wfm
19:42:24 <clarkb> #action corvus work with jroll to get openstack github org self managed
19:42:42 <corvus> i'll suggest that he enlist some volunteers to take over operation of the openstackadmin account, continuing the usage policy ianw & co established
19:42:54 <fungi> sounds great to me
19:42:54 <mordred> I can create an opendevadmin user - but I am not set up to do the 2fa setup any time in the next 2 weeks - so it might be an easier task for someone else - or else we can wait until I'm less snowed under and I'll volunteer to do the whole thing
19:43:26 <clarkb> whoever sets up that account remember to make it an owner of the org(s)
19:43:58 <clarkb> mordred: my guess is 2 weeks is probably a fine timeline for that
19:43:59 <corvus> k.  i'll make sure to mention we need to retain access until the opendevadmin thing is set up :)
19:44:23 <mordred> k. I'll take the task then
19:44:37 <clarkb> #action mordred set up github opendevadmin account for the opendevorg org and zuul app
19:44:51 <mordred> that said - the people taking over don't have shared shell access to a vm with a shared secrets file that they can use to managed shared access to an admin account
19:45:17 <clarkb> mordred: could encrypt it with all of their gpg keys or something like that
19:45:29 <corvus> yeah, the discussion may involve more work than just a simple handoff
19:45:37 <mordred> yeah - just saying - the openstackadmin design as it is now works for the infra team ... but yeah ^^ that
19:45:47 <clarkb> This is our 15 minute time check. I think we need to keep moving to get through the agenda
19:45:53 <corvus> i'm good
19:45:54 <mordred> and for the record - I'm fine with however the new owners want to do things
19:45:59 <clarkb> Next up we have fungi's topic for requesting spamhaus pbl exceptions when we boot new servers that send email
19:46:23 <fungi> pretty simple reminder
19:46:51 <fungi> rackspace has blanketed basically all of their ipv4 address space with spamhaus policy blocklist entrues
19:47:04 <corvus> we could put a print() in launch.py
19:47:07 <clarkb> Maybe we should add a note for doing that to launch node's ya that
19:47:22 <clarkb> corvus: I think that reminder would help me remember :)
19:47:27 <fungi> so when we boot a machine there which will be sending e-mail to a variety of people, need to query the pbl for its v4 address
19:47:31 <fungi> https://www.spamhaus.org/query/ip/104.239.149.165
19:47:46 <clarkb> I can write that change really quick after lunch if others think it will be useful
19:47:54 <ianw> ++
19:47:58 <fungi> and then there is a dance you have to perform involving getting a one-time code sent to a responsible e-mail address
19:48:07 <fungi> and feed that back into the site
19:48:29 <fungi> i usually use infra-root@o.o for that
19:48:43 <clarkb> k I can add that info to the print output
19:49:17 <fungi> probably any address would work, but if it winds up getting readded for some reason i don't know if switching e-mail addresses may cause them to decide to not grant a new exception
19:49:46 <clarkb> well clouds do reuse IPs so we may run into that just in our normal operation of replacing servers
19:49:54 <clarkb> but we can figure that out if we get into that situation
19:50:12 <fungi> (spam reporting back to spamhaus for messages from an address in the exception list can cause the exception to get removed)
19:50:43 <clarkb> Next up I wanted to make a quick reminderthat I'm going to replace certs that expire soon this week. I'll probably start the renewal process on thursday
19:50:57 <clarkb> and I'll be getting 2 year certs for those
19:51:01 <fungi> i ran into it in the wake of the ask.o.o server replacement, folks were unable to create new accounts because their e-mail providers were discarding/rejecting the account confirmation e-mails from it
19:51:15 <clarkb> which will hopefully give us plenty of time to switch over to LE over time
19:51:22 <clarkb> fungi: ah
19:51:56 <clarkb> And last item we have is ianw's backup server and client config with ansible
19:51:58 <clarkb> #link https://review.opendev.org/#/c/662657/
19:52:22 <ianw> i just wanted to float a few things past the peanut gallery
19:52:33 <clarkb> Looks like ianw plans to boot a new backup server running bionic and has questions if 1) we would like to keep not running config mgmt there once deplyoed and 2) if we want 2 servers
19:52:51 <ianw> i plan to bring up a new server once the reviews are in on that ... rax ord still preferred home?
19:53:17 <clarkb> we might consider vexxhost if mnaser is comfortable with that to give us geographic and provider redundancy
19:53:53 <clarkb> but I think we do want at least geographic redundancy so ord seems like a good spot
19:54:12 <ianw> yeah, the ansible is intended to automate key and user management, so adding a server should just be an entry in the inventory
19:54:30 <ianw> i don't think we want to deal with stuff at rest in two places ... but we could
19:54:34 <corvus> the main reason not to run config management regularly is extra protection in case of an error or compromise.  granted, i come from a *very* conservative sysadmin background when it comes to such things.  :)
19:55:24 <fungi> i second that preference though
19:56:10 <clarkb> being conservative with our backups seems reasonable
19:56:14 <fungi> also keeping backups in a different provider from where the bulk of our control plane currently resides seems like a prudent choice in case that provider has a sudden and unexpected change of heart
19:56:43 <corvus> agreed.  if we have to pick one, i'd pick vexxhost.  then rax-ord as a second site.
19:56:46 <ianw> ok, we can keep the disabled policy
19:56:47 <fungi> hoping that would never happen, but you have to have contingency plans
19:57:07 <corvus> our original backup hosts were rax-ord and hpcloud
19:57:16 <ianw> ok, i can reach out to mnaser and figure out where we might do it
19:57:17 <corvus> so, erm, that plan has already served its purpose once.
19:57:32 <fungi> except the hpcloud backup server kept getting deleted
19:57:41 <fungi> i mean before the last time it got deleted of course
19:57:48 <corvus> yeah.  none of this is theoretical.  :(
19:58:02 <clarkb> We are just about out of time
19:58:07 <clarkb> #topic Open Discussion
19:58:21 <clarkb> Anything else really quickly that we want to call out before we have lunch/breakfast/dinner
19:58:47 <corvus> clarkb: wow, someone's hungry!  :)
19:58:52 <fungi> #link https://review.opendev.org/664675 Replace the old Stein cycle signing key with Train
19:58:54 <clarkb> Very
19:59:15 <fungi> would be nice to approve that today so i don't have to rewrite the date in the change i'm going to push to release docs
19:59:15 <corvus> fungi: at some point we should figure out what that process means in the opendev world
19:59:21 <fungi> i concur
19:59:29 <fungi> but this one was slightly past due
19:59:33 <corvus> yeah, now is not the time for that :)
19:59:52 <fungi> sounds good for next week's agenda
19:59:55 <fungi> i'll add it
20:00:05 <clarkb> and we are at time
20:00:08 <clarkb> Thank you everyone!
20:00:11 <clarkb> #endmeeting