#openstack-meeting log

19:01:12 <clarkb> #startmeeting infra
19:01:13 <openstack> Meeting started Tue Jan 29 19:01:12 2019 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:17 <openstack> The meeting name has been set to 'infra'
19:01:28 <clarkb> I'll give it a couple more minutes for people to trickle in before starting
19:01:35 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:04:15 <gary_perkins> o/ Hi!
19:04:34 <clarkb> hello :) so far just us. I may just run through the agenda to get it recorded and we can see where that takes us
19:04:40 <clarkb> #topic Announcements
19:04:56 <clarkb> I am going to be on a plane during next week's meeting (February 5th)
19:05:14 <clarkb> we'll need a volunteer to chair the meeting. fungi should be back too so may be willing to do that if others are not
19:05:59 <clarkb> Other than that fungi is afk this week and I get on my first leg of the trip early tomorrow morning so it may start to be quiet around here in the near future
19:06:06 <clarkb> then back to normal sometime later next week
19:06:25 <corvus> oops what'd i miss?
19:06:38 <cmurphy> hello
19:06:51 <clarkb> corvus: just me pointing out we'll need a volunteer meeting chair for next week
19:07:03 <anteaya> clarkb: enjoy safe travels
19:07:05 <clarkb> and that starting tomorrow it will likely get really quiet as fungi is on vacation and I'm headed to fosdem
19:07:13 <clarkb> returning to normal some time next week
19:07:14 <corvus> i volunteer fungi
19:07:21 <anteaya> I second
19:07:32 <corvus> mordred is also gone for the rest of this week
19:07:42 <corvus> with so many absences, i will probably hide under my desk
19:07:56 <corvus> maybe change my nick or something
19:07:57 <anteaya> nah, now's the time to party
19:08:16 <corvus> i'm pretty sure i saw the letters 'pto' near ianw's name recently
19:08:24 <clarkb> ya
19:08:50 <clarkb> #topic Actions from last meeting
19:08:55 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-01-15-19.01.txt minutes from last meeting
19:09:15 <clarkb> I don't think it was officially recorded as a new action last week but I think we are still waiting on docs update for dns management from corvus?
19:09:18 <corvus> #link dns docs update https://review.openstack.org/633569
19:09:24 <clarkb> oh woot
19:09:24 <corvus> wait concluded!
19:09:41 <clarkb> I have added that to my queue as I should be able to review that before I put clothes in a bag
19:10:43 <clarkb> considering the general lack of quorum I'll skip over specs approvals this week (I think the only outstanding one is the anomaly detection in logs and I haven't had a chance to look at it recently)
19:10:49 <clarkb> #topic Priority Efforts
19:11:01 <clarkb> #topic Storyboard
19:11:16 <clarkb> fungi got a new xenial storyboard-dev server up and running against a local database
19:11:25 <clarkb> (at least I believe it switched the db)
19:11:37 <clarkb> keep an eye out for changes to flip over production to a similar setup
19:12:18 <clarkb> SotK has also started to stub out the attachment storage code if you are interested in what that will look like
19:12:36 <clarkb> #topic Update Config Management
19:12:42 <cmurphy> o/
19:13:10 <clarkb> corvus: cmurphy: last week I didn't get to pay much attention to changes on this front and this week I'm trying to catch up on opendev things. Anything we should be looking at?
19:13:47 <cmurphy> clarkb: i have two patches still to pull the upgrade trigger https://review.openstack.org/629667 https://review.openstack.org/630391
19:14:24 <clarkb> cmurphy: ok maybe when I am in your timezone we can get through that first one
19:14:40 <cmurphy> clarkb: okay
19:14:46 <cmurphy> i'll be in brussels too
19:14:57 <corvus> main thing on the container side is the new jobs/system for building images.  that's described in a post to openstack-discuss, and should be all in place now in system-config.
19:14:57 <clarkb> corvus: care to review https://review.openstack.org/#/c/629667/ and cmurphy and I can try to get that rolling from brussels?
19:15:37 <clarkb> corvus: and that is building images for gerrit and zuul now?
19:15:51 <corvus> so basically we have a nice way to build either first-party or third-party images
19:16:03 <corvus> yes, gerrit being an example of a third-party image, zuul first-party
19:16:18 <corvus> i'm continuing to sketch out running a service from that image with this: https://review.openstack.org/630406
19:17:00 <clarkb> #link https://review.openstack.org/#/c/630406/ WIP change to run service from our built docker images
19:18:11 <corvus> i think that approach works (as the previous patchset illustrates), so i'm sure if anyone got around to doing that for a simpler service, they could land such a change sooner :)
19:18:23 <clarkb> nice
19:19:25 <clarkb> #info https://review.openstack.org/#/c/630406/ illustrates method that could be applied to simpler service to land docker based service changes sooner
19:19:42 <clarkb> #topic OpenDev
19:20:30 <clarkb> On the opendev front ttx has been working to get kata its own tenant in zuul. I've been helping with some of the foundation work to make that possible including base jobs repo content and a top level opendev zuul website
19:20:53 <clarkb> After the meeting I intend to go get a cert sorted out for zuul.opendev.org so that we can get that deployed
19:21:04 <clarkb> I expect that zuul itself will quickly follow into becoming its own tenant
19:21:32 <anteaya> zuul or kata?
19:21:42 <clarkb> anteaya: well both. Zuul shortly after kata I expect
19:21:49 <anteaya> okay thank
19:21:51 <anteaya> s
19:22:45 <clarkb> Additionally I've been asked to sort out https://etherpad.opendev.org hosting. I think the way we want to approach this is to use the existing setup but redirect etherpad.openstack.org to etherpad.opendev.org. Then all existing pad urls will work but we'll start pushing people to the new thing. The one gotcha here is we have to set up ssl for both openstack.org and opendev.org for that redirect to
19:22:46 <clarkb> work
19:22:58 <clarkb> if anyone is interested in sorting the config management for that please let me know, I can help with that as I am able
19:23:28 <corvus> is that higher priority than opendev gerrit?
19:23:29 <clarkb> Based on the work above I figured after fosdem I would put together a list of TODOs for work like that so we can track it better and give people a sense of when things will be available under the "neutral" name
19:23:36 <clarkb> corvus: I don't think so
19:23:36 <corvus> it feels like two other services have jumped the queue.
19:23:40 <anteaya> can you expand on the 'I've been asked' part?
19:23:53 <anteaya> mysterious direction makes me nervous
19:24:22 <clarkb> corvus: I've begun to see the gerrit and zuul stuff as more of a unit as they are coupled together quite a bit.
19:24:36 <clarkb> anteaya: the rust vmm folks looking to use etherpads
19:24:58 <anteaya> okay thanks, knowing where requests come from helps me
19:25:07 <clarkb> corvus: I do think that gerrit and zuul are the higher priority as they represent the significant road blocks. But I do also think we can likely work these in parallel if people are wanting to poke at etherpad
19:25:43 <corvus> clarkb: i think zuul and gerrit are separable, which is why i wrote http://specs.openstack.org/openstack-infra/infra-specs/specs/opendev-gerrit.html about gerrit only.
19:26:00 <corvus> i think it's a huge amount of work and will require attention from the whole team at once
19:26:07 <clarkb> corvus: thats fair.
19:26:18 <corvus> i don't think we are at a place where we can suggest people go and pick off little things
19:27:08 <corvus> or, at least, i don't want to be the one person working alone on the hardest problem
19:27:28 <clarkb> corvus: I mention it because I know the gerrit stuff can be overwhelming to a lot of people, but there may still be interest in helping somewhere. If we'd rather hold the line on getting the big things out of the way as a single collective first I'm good with that too
19:28:18 <anteaya> do we have a place for identiying prioritization requests from certain groups?
19:28:38 <corvus> if someone brand new came out of the woodwork and showed up with an etherpad change, great.  i'm just a little worried about splitting the attention of the folks who have the context to do the big stuff.
19:28:45 <clarkb> anteaya: no, I think that would potentially be a secondary function of me trying to write down all the things that need to happen then we can prioritize from there
19:29:25 <anteaya> okay thank you
19:29:29 <clarkb> corvus: this is probably a good time to talk about productionizing gitea then. Are you happy wit hthe current version of things running there?
19:29:47 <corvus> i wanted to ask the rest of the team if they were :)
19:29:48 <clarkb> I guess the docker image building piece was a prereq to ^ but we've got that sorted out now
19:30:00 <corvus> yeah, we could go into production with gitea now
19:30:15 <corvus> mordred says there is a new version of rook which doesn't require the flexvolume driver or something
19:30:29 <corvus> the upshot of which is that we might be able to reconsider running on magnum
19:30:55 <clarkb> my time with magnum for the nodepool k8s cluster has me leaning against wanting to do that
19:31:18 <corvus> so that's a big thing i'd like to get feedback on -- would we rather run our own k8s with the ansible modules as we are doing in the POC, or try again with magnum
19:31:20 <clarkb> it runs on top of deprecated distro release and doesn't have enough disk space to upgrade k8s on top of itself
19:31:49 <clarkb> it is slightly more work to deploy our own cluster, but that way we get up to date distro hosts as well as better control over disk usage
19:32:33 <corvus> okay, so if we lean toward those ansible modules -- it's like the first bit of ansible that we're running from an upstream source.... are we okay running it out of the github repo, or do we want to make our own fork?
19:33:05 <clarkb> corvus: maybe pin to a sha1 from the github source? similar to how we've pulled puppet modules from github
19:34:03 <clarkb> the other two questions I had were did we end up running an ha k8s cluster (if not do we want to test that first?) and how comfortable are we with ceph disk consumption in the current setup?
19:34:49 <corvus> okay.  so i think what we probably want to do is: re-build the poc with new rook, verify it all works, then update the ansible changes to pin a sha, then i think we can land them.  i can send an email with that to the infra list since this meeting is somewhat sparsely attended.
19:35:02 <corvus> clarkb: the k8s cluster is not ha, but i don't think it's necessary
19:35:44 <clarkb> corvus: that plan sounds good to me
19:36:03 <clarkb> corvus: then maybe aim for a switch ~february 11?
19:36:11 <corvus> (similar to openstack, if the control plane is absent, you just can't change things, but they keep running.  the individual services in the cluster will be ha)
19:36:26 <corvus> s/will be/are, in our current config
19:37:08 <corvus> and yeah, i think the ceph disk usage is tolerable.  i don't have the numbers handy, but we have everything replicated and had plenty of headroom, with the poc being only half our expected prod size
19:37:55 <corvus> usage:   78 GiB used, 240 GiB / 318 GiB avail
19:37:57 <clarkb> I guess the switch can be done mostly transparently, we update dns and bam
19:38:25 <clarkb> the gitweb links in gerrit config likely need to be updated, that is straight forward
19:38:58 <clarkb> corvus: that is without the refs/changes refs right?
19:39:14 <clarkb> which is likely to be the biggest user facing change in all of this.
19:39:22 <corvus> i think we should view gitea being in production as a not-very-public step.  i don't think we should switch anything else out yet, there's still quite a bit of planning to be done with redirects, etc.
19:39:53 <corvus> so basically, brining the new system online in parallel with the old, then next we flesh out the changes to actually start switching things to it.
19:40:05 <clarkb> got it. That makes sense
19:40:17 <corvus> that will probably involve things like deciding on dates, making announcements, etc.
19:40:30 <corvus> and yes, that omits refs/changes
19:40:36 <clarkb> corvus: from our perspective I guess its moer about putting line in sand about not rebuilding from scratch for each thing, but rolling forward?
19:40:45 <corvus> yep
19:41:01 <clarkb> sgtm
19:41:05 <corvus> and i think we can go ahead and serve that from the real hostname, since that isn't otherwise in use
19:41:13 <clarkb> ++
19:42:05 <corvus> i think that's it for the gitea-related stuff i had
19:42:39 <clarkb> I think that will also make it a bit easier for other roots to poke at it since it will be a "stable" set of names and tooling we can refer to
19:43:06 <clarkb> corvus: were there any other gerrit related items worth talking about?
19:43:12 <corvus> yes and will make the process of creating the redirects easier if we have something to point to
19:43:16 <clarkb> (I think the bulk of the work so far has been in gitea)
19:43:47 <corvus> nope, that's it
19:43:54 <clarkb> #topic General topics
19:43:55 <corvus> still a bunch of stuff to do on http://specs.openstack.org/openstack-infra/infra-specs/specs/opendev-gerrit.html#work-items :)
19:44:29 <clarkb> corvus: I can ping qa team about git:// urls
19:44:37 <clarkb> may help to be in europe for that too :)
19:44:44 <clarkb> Moving on
19:44:48 <corvus> that would be great, thx
19:45:06 <clarkb> #link https://etherpad.openstack.org/2019-denver-ptg-infra-planning Infra/OpenDev PTG Planning document
19:45:26 <clarkb> I created that link and have just formalized it as part of the meeting agenda but it doesn't have content yet.
19:45:53 <clarkb> As we go through this opendev work if we find things that we want to be in person for or have to deprioritize for whatever reason maybe throw it on that etherpad as a potential work item
19:46:08 <clarkb> then as we get closer to the PTG we can reduce the list to what makes sense (if necessary)
19:46:18 <clarkb> Mostly wanted a place for ideas to go early, but no rush
19:46:37 <clarkb> Next is intermediate container image artifacts
19:46:39 <clarkb> #link http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-January/000718.html Docker registry or log server
19:47:03 <corvus> that was a message to zuul-discuss, but there's kind of a question for openstack-infra/opendev buried in there...
19:47:23 <corvus> i'd like to implement that plan (i've nearly completed the required change to zuul itself)
19:47:49 <corvus> and i'd like to use it in our system.  that either means writing some ansible roles to export/import docker images from the logserver
19:47:58 <corvus> or running a docker registry
19:48:38 <clarkb> fwiw I've long pushed against the docker registry because while it seemed simple in practice no one got around to figuring out how it would look from a distributed standpoint. One way around that is to centralize things (which our log server would be doing anyway)
19:49:10 <clarkb> One thing we'll need to keep in mind is disk usage, though I expect that to be small if we aren't responsible for the base image layers too
19:49:13 <corvus> it would be a central registry just used by jobs to stash intermediate artifacts, but it would be publicly accessible for reads.  the only tags for images on the server would be weird, like "pipeline_check_change_1234", so not very useful to the general public.
19:49:38 <corvus> i think with the registry option, we probably would end up with base layers on there
19:50:33 <clarkb> the docker image registry also officially wants to run out of a container so that may be a good simple first service to run that way
19:51:17 <clarkb> corvus: we could also proxy cache the central registry for reads to make use of the region local caches
19:52:07 <corvus> (with the logserver, i think we can edit them out, but not so much with the registry)
19:52:47 <clarkb> corvus: and we are not planning to use this as the registry of record right? that will remain dockerhub or quay etc?
19:52:55 <clarkb> (its purely an implementation detail of CI jobs)
19:53:09 <corvus> correct
19:53:15 <corvus> final published images go to dockerhub
19:53:58 <clarkb> in that case having a registry fronted by the proxy caches seems like it could be nice because then the tooling mimics the "production" type image publishing
19:54:12 <clarkb> we won't have to shim in the export/import from fileserver bits
19:54:22 <corvus> yep, that's why i lean toward this way
19:54:38 <corvus> and yes, i think with the approach sketched out in the change above, actually running the registry will be simple now
19:55:01 <corvus> i think we can consider all the data ephemeral too, so we don't need a lot of planning around robustness
19:55:41 <corvus> the main goal here (and why i'm heading out on this tangent) is to get things to the point where we can start doing depends-on changes to image builds
19:55:45 <clarkb> I'd be happy to go down that path and if we find problems with it we have half implemented fallback in the form of the log server
19:56:14 <corvus> several of the things we've been doing lately we've had to fall back on the pattern of actually landing changes before proceeding with the next step
19:56:22 <corvus> everything is so slow that way :(
19:56:49 <clarkb> and just overcommunicate that it shouldn't be used as a source  of production deployments
19:57:25 <corvus> i'm sure *someone* is going to do it, but if we name the registry "insecure-temporary-registry.opendev.org" at least they'll be the ones who look bad, and not us.
19:57:32 <clarkb> :)
19:57:50 <corvus> (also, i expect us to aggressively prune images from it)
19:57:58 <anteaya> ha ha ha
19:58:01 <clarkb> But ya keeping the tooling similar to real workloads is useful there so ++ to that plan
19:58:22 <corvus> and being able to 'docker run' something that was built in check could be a powerful dev/debugging tool.
19:58:46 <corvus> okay, i'll proceed with the run-a-registry plan
19:58:55 <clarkb> #topic Open Discussion
19:59:03 <clarkb> and now ~1 minute for anything else that slipped through
19:59:32 <anteaya> have a productive fosdem clarkb and cmurphy and anyone else attending
19:59:58 <corvus> pour one out for the rest of us :)
20:00:03 <corvus> or, better yet, just drink another one
20:00:05 <clarkb> #endmeeting