19:01:03 <clarkb> #startmeeting infra
19:01:04 <openstack> Meeting started Tue May 12 19:01:03 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:07 <openstack> The meeting name has been set to 'infra'
19:01:12 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-May/000023.html Our Agenda
19:01:18 <zbr> o/
19:01:19 <clarkb> #topic Announcements
19:01:57 <clarkb> This one didn't make it on the agenda because I had simply forgotten it. But OpenStack is looking to do its big ussuri release tomorrow morning UTC time
19:02:04 <clarkb> it is a good time to be slushy right now
19:02:08 <ianw> o/
19:02:37 <fungi> now i want a lime slushy
19:02:50 <clarkb> Based on IRC discussions earlier they plan to start the process around 10:00 UTC tomorrow
19:03:06 <clarkb> #topic Actions from last meeting
19:03:10 <fungi> i'll try to be around for the great button-pushing
19:03:13 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-05-05-19.01.txt minutes from last meeting
19:03:26 <clarkb> I took an action to prep a PTG etherpad of ideas
19:03:28 <clarkb> #link https://etherpad.opendev.org/p/opendev-virtual-ptg-june-2020 PTG Ideas
19:03:37 <clarkb> I did that and sent email to the service-discuss list about it
19:04:18 <clarkb> #topic Priority Efforts
19:04:28 <clarkb> #topic Update Config Management
19:04:45 <clarkb> I think the gerritbot work is still a todo.
19:05:37 <clarkb> I may have time to help with that as things settle after the openstack release
19:06:00 <clarkb> On the Zuul for CD front I think we continue to learn new things
19:06:31 <clarkb> First up is that some of our system-config-run jobs don't do a great job of verifying the installation that ansible/docker/etc perform is functional
19:07:22 <clarkb> in particular the system-config-run-zuul job wasn't checking zuul was actually running. corvus has been looking into that and has found that one of our major issues there is uid conflicts between container users and host users
19:07:46 <clarkb> corvus: is your plan around that still to create a "containers" user rather than a different named user with the same uid across all the services?
19:08:08 <clarkb> calling it out because its a tricky problem and eyeballs on the situation is a good thing
19:10:10 <ianw> on nodepool don't we ensure a nodepool user with uid 10001 or something?
19:10:18 <clarkb> I'm not sure I can fully describe it myself, but basically zuul, nodepool, zookeeper and so on set container users. We then try to map the same uid to the user that owns config files and stuff outside the container that we bind map into the container
19:10:30 <clarkb> ianw: yup, but that same uid is used for many other services
19:10:37 <mordred> yeah - I think that's the plan
19:10:51 <clarkb> so the issue becomes we can't have a zuul and a nodepool and a zookeeper user with the same uid on the host side to own all those bind mountable things
19:11:15 <mordred> make a user on each of the hosts that is not called zuul but called something else, like "containers" or something, and use that user for zuul and nodepool
19:11:18 <clarkb> there is a separate issue where the zuul user on test nodes is already precreated so that zuul can ssh in and that uid doesn't match the uid the zuul containers use
19:11:23 <mordred> exactly
19:11:34 <mordred> so making a general service user should fix all of these
19:11:34 <clarkb> and having a separate "containers" user addresses both issues
19:11:37 <mordred> ++
19:12:56 <corvus> sorry late
19:12:59 <clarkb> we also split services by host currently anyway so we don't immediately create a larger attack surface by doing that
19:13:15 <ianw> i guess each container only has mapped in it what it should be seeing, but it does mean there's no host-level separation of configs, secrets etc
19:13:16 <clarkb> but if there are other concerns now is probably a good time to bring them up otherwise I expect we'll start trying to roll that out soon
19:13:19 <corvus> #link run zuul, nodepool, zk as 'container' user: https://review.opendev.org/726958
19:13:26 <corvus> that not ready yet
19:13:34 <corvus> but yes, is the current plan
19:13:36 <clarkb> ianw: correct, but we split the hosts up anyway so we should still have that separation
19:15:26 <ianw> isn't there like a remapping option?  so all the containers would standardise on 10001 say, and then when you start them you map that to a specific user on the host?
19:15:42 <clarkb> ianw: I think only if you remap root onto those uids
19:15:58 <clarkb> at least that seemed to be what the conclusion was the other day
19:16:05 <corvus> i think it remaps a range
19:16:20 <corvus> so if our uid in the image were, say, something less than 10001, that might be a more attractive option
19:16:29 * diablo_rojo sneaks in late
19:16:44 <clarkb> corvus: does the range have to start at 0 ?
19:16:49 <corvus> though, i suppose balooning to 30003 reserved uids still is only worrisome if we colocate...
19:16:52 <corvus> clarkb: i think so
19:16:55 <clarkb> got it
19:17:15 <corvus> i'm fuzzy on this, but my understanding is that 0..N inside the container maps to X..X+N on the host
19:17:27 <ianw> yeah, i'm only reading https://docs.docker.com/engine/security/userns-remap/ for the first time
19:17:31 <fungi> yes, i read up on it and that's what i got too
19:18:15 <fungi> you basically carve up uid/gid ranges on the host context and then the base is added to the uids and gids in the container when mapping to the host context
19:18:57 <fungi> so it's less id remapping, and more id partitioning
19:19:14 <corvus> since our zuul user on the host is 1000, we can't map 10001 in the container to 1000 on the host.  but we could map zuul in the container to $something on the host at, say, uid 20001, and nodepool in the container to $somethingelse on the host at uid 30001....
19:19:42 <clarkb> personally I like the simplicity of the container user plan
19:20:02 <clarkb> I think we are somewhat resistant to the downsides of that plan due to existing design and not needing an extr alayer of abstraction is nice
19:20:15 <corvus> my inclination is to not worry too much about that, just do 'container: 10001' on the host for all 3, and if we hit a colocation issue before we get around to doing everything in k8s, deal with it then
19:20:53 <fungi> is it a concern if the zuul-web and zuul-scheduler contaniers on the same host use the same uid/gid got their processes?
19:21:07 <fungi> er, for their processes
19:21:10 <corvus> no wore than now
19:21:20 <clarkb> ya they already do that
19:21:42 <fungi> right, i'm aware that's what's happennig now, just making sure that's acknowledged as something we're cool with
19:22:08 <clarkb> I think so
19:22:13 <fungi> i believe it matches how we ran pre-containers anyway
19:22:31 <clarkb> yes
19:22:46 <corvus> sorry, i forgot time works like that.  i meant pre-containers when i said 'now'
19:23:00 <fungi> this might become more of a concern for servers like eavesdrop, where we run several bots under different users currently?
19:23:00 <clarkb> alright we probably don't need to solve this problem during the meeting. But I did want to call it out so people can raise concerns or propose other ideas
19:23:06 <corvus> apparently that's more correctly called "the before now time"
19:23:13 <fungi> the long ago
19:23:39 <clarkb> I expect we'll have plenty of time at the end of the meeting too if we want to swing back around to this
19:23:53 <fungi> sure
19:23:55 <clarkb> but lets keep going through the agenda to be sure we cover other topics too (and we don't have much so really should have time later)
19:24:08 <clarkb> THe other Zuul CD thing I wanted to call out is the system-config zuul job reorg landed
19:24:32 <clarkb> this means you may need to rebase any changes to those jobs you had outstanding. Sorry about that, but I think it does help make system-config's jobs a lot easier to read and reason about now
19:24:52 <clarkb> we tried to land it when there was less outstanding changes that would conflict, but some may have gotten caught by it
19:25:52 <clarkb> #topic OpenDev
19:26:24 <clarkb> Things have been busy with all the cd type changes lately that I haven't gotten around to reaching out about advisory council membership
19:26:40 <clarkb> I'm hoping once the openstack release is done then I'll be able to do some of that
19:26:52 <clarkb> That was all I had on the opendev front. Anything else to add?
19:27:39 <clarkb> fungi: the auth/identity service spec is still not up right?
19:27:45 <clarkb> I want to make sure we don't miss that once it is up
19:28:12 <fungi> nope, i've been distracted by other stuff and haven't had a chance to distill the pile of prose i pulled from those various e-mail threads
19:28:28 <fungi> it's a bit of a jumble right now
19:28:44 <fungi> and a lot of it is probably outdated
19:28:56 <clarkb> thanks, just want to make sure I hadn't missed it
19:29:00 <fungi> i could push it up as is, but it's not likely very reviewable yet
19:29:20 <fungi> just a patchwork of random comments right now
19:30:04 <clarkb> #topic General Topics
19:30:19 <clarkb> The virtual ptg is fast approaching
19:30:31 <clarkb> #link https://virtualptgjune2020.eventbrite.com Register if you plan to attend.
19:30:38 <clarkb> #link https://etherpad.opendev.org/p/opendev-virtual-ptg-june-2020 PTG Ideas
19:31:11 <clarkb> The two things to do at this point are register if you plan to attend and submit ideas to the etherpad
19:31:32 <clarkb> I think it would also be great if people can indicate subjects that they are interseted in so that we can try and schedule them during timezone appropriate slots?
19:31:37 <corvus> is anyone interested in using meetpad?
19:31:42 <fungi> sure!
19:31:44 <clarkb> corvus: I am
19:32:01 <corvus> ok cool -- anything we need to do before then?
19:32:23 <clarkb> corvus: I think diablo_rojo plans to give it a good testing on friday this week
19:32:27 <fungi> i need to dig out a microphone
19:32:37 <clarkb> I planned to dial into that and participate as part of the data collecting
19:32:40 <corvus> diablo_rojo, diablo_rojo_phon: count me in, i like to talk to people
19:32:44 <fungi> and yeah, we'll probably get some user feedback from the openstack ussuri celebration
19:32:46 <corvus> sometimes
19:32:47 <corvus> a little
19:33:04 <fungi> i make an exception for people i like ;)
19:33:38 <clarkb> I think all of the known issues have been addressed. Including http -> https redirect and the valid etherpad name characters update
19:33:47 <diablo_rojo> corvus, can't wait!
19:33:51 <clarkb> I tested that -'s are now valid yesterday
19:33:55 <clarkb> (they are)
19:34:05 <diablo_rojo> And to see the cake I am planning to bake?
19:34:21 <corvus> ooooh
19:35:15 <diablo_rojo> Will be my first foray into sculpting cake.
19:35:15 <clarkb> and based on how that goes I think we can schedule some other test times as necessary
19:36:00 <clarkb> also it looks like the schedule is up and we got our requested hours
19:36:02 <fungi> i have trouble bringing myself to sculpt cake because i think about all that trimmed cake going to waste (or rather, which i'll be compelled to eat separately while sculpting)
19:36:25 <diablo_rojo> fungi, it won't go to waste, have no fear
19:36:25 <clarkb> #link http://ptg.openstack.org/ptg.html The schedule is up
19:37:17 <clarkb> That takes us to fungi's standing wiki item. Anything new there fungi?
19:37:25 <fungi> nope!
19:37:42 <clarkb> #topic Open Discussion
19:38:00 <clarkb> As expected we've got additional time today. Feel free to resume the uid and containers discussion
19:39:52 <clarkb> fungi: the point about eavesdrop is a good one and that might be a reasonable example caseto consider since you are correct we'll end up running different bots there with different levels of access to services
19:40:14 <ianw> i've got a couple of low priority but outstanding things for nodepool
19:40:50 <clarkb> fungi: perhaps in that case we can simply set users to non overlapping uids since they are not generally reconsumable images (eg its opendev specific images)
19:41:13 <clarkb> fungi: in the case of zuul and nodepool and zk we consume them from external (though in the case of zuul and nodepool: friendly) sources
19:41:20 <ianw> https://review.opendev.org/721509 , https://review.opendev.org/724452 , https://review.opendev.org/723782 , https://review.opendev.org/726032 ; fairly random things but that have popped up during recent work
19:41:34 <corvus> clarkb: we could also not build the user into the image
19:41:44 <mordred> well - I thnk the main issue is that zuul and nodepool and friends expect to have read-write access to dirs on the host
19:42:13 <corvus> mordred: i don't think that's an expectation that comes from the zuul image; that's just how we're using them
19:42:21 <mordred> corvus: indeed
19:42:28 <mordred> corvus: also, I'm not sure building the user into the image is gettig us much, tbh
19:42:34 <clarkb> mordred: in the case of eavesdrop services we don't want shared read access for credentials though
19:42:40 <clarkb> mordred: or at least ideally we would avoid that
19:43:04 <clarkb> and ya if we can simply avoid setting a user in the image and set that at runtime I think that solves the eavesdrop problem well
19:43:08 <mordred> this is, incidentally, why we originally wrote the container spec to say that we should use podman ...
19:43:12 <corvus> mordred: yes, maybe for opendev images, we should avoid creating users, try that out for a bit, then revisit whether it's actually necessary to have a zuul user in the images at all
19:43:20 <mordred> so that we could just run the containers as non-root in the first place and run them as a specific user
19:43:30 <mordred> of course, we ran in to compose issues with podman sadly
19:43:35 <mordred> corvus: ++
19:44:45 <mordred> corvus: my hunch is that if we stop putting users in the images, then we can do run --uid=$SOME_UID - whatever matches on the host - and things shoudl be fine
19:45:07 <mordred> other than fingergw and priv dropping inside of the container
19:45:45 <fungi> an option there is to not do priv dropping, and proxy the well-known port to one in the untrusted range
19:46:06 <fungi> systemd can "just do that" or we could use a tcpproxy or something
19:46:34 <fungi> or grant the necessary cap to the container so it can bind to privileged ports
19:46:37 <clarkb> or configure the uid to drop to
19:46:47 <clarkb> which is basically what we do today except the uid is fixedright?
19:47:48 <fungi> clarkb: problem there is the container needs to start as root so it can open the low-numbered listening socket
19:48:12 <corvus> we may still be able to run that container as root and let it drop privs without having an /etc/passwd entry
19:48:25 <fungi> also true
19:48:54 <fungi> it doesn't create files or anything, so picking an arbitrary nonzero uid/gid for that is likely fine
19:51:15 <fungi> it's a questionable practice, but if it's made configurable as part of the container invocation that's probably good enough to solve any real-world collisions
19:51:37 <mordred> fungi: docker can just do the port mapping actually if we chose to go that direction
19:51:42 <mordred> it's not even a thing we have to engineer
19:52:01 <fungi> oh, nifty
19:52:17 <fungi> regardless, that falls into the class of "don't need to bother with priv dropping"
19:52:41 <mordred> yeah. there are definitely options - or we can the corvus thing, run the container as root and drop to an arbitrary uid in the container
19:52:45 <corvus> well, that's if we don't do host networking
19:52:51 <corvus> which means firewalls
19:54:16 <fungi> if we do host networking and systemd is starting the container then we could presumably use it to open the listening socket and hand it to the fingergw as a local fd (though that might need a couple additional lines in the fingergw implementation to support not expecting a real network socket)
19:55:24 <fungi> assuming the process we're starting can properly inherit file descriptors from its parent
19:55:59 <fungi> i don't know how many of these traditional unix assumptions containers tend to break
19:58:01 <clarkb> fungi: I have no idea if you can inherit file descriptors like that. Presumably yes, but it is a good question
19:58:18 <clarkb> that takes us to the end of our hour block
19:58:24 <clarkb> thank you everyone! we'll see you here next week
19:58:26 <fungi> thanks clarkb!!!
19:58:27 <clarkb> #endmeeting