#opendev-meeting log

19:01:04 <clarkb> #startmeeting infra
19:01:04 <opendevmeet> Meeting started Tue Oct 11 19:01:04 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:04 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:04 <opendevmeet> The meeting name has been set to 'infra'
19:01:06 <frickler> \o
19:01:21 <clarkb> #link https://lists.opendev.org/pipermail/service-discuss/2022-October/000364.html Our Agenda
19:01:24 <clarkb> #topic Announcements
19:01:52 <clarkb> The PTG is happening next week. Be aware of that as we make changes (best to avoid updating meetpad and etherpad for example)
19:02:16 <clarkb> Also encourage people that have problems to reach out to us directly. In the past we've gotten reports of trouble via a game of telephone and that has been difficult
19:03:19 <clarkb> Also, this morning I spilled a glass of water on my keyboard so I've been in disarray all day. Thankfully it was the desktop keyboard and not the laptop and I've got a spare modem M that has been plugged in
19:03:33 <clarkb> but still its really weird to type on a new keyboard after having that one for a decade
19:03:50 <fungi> the zuul default ansible version changed to 6 across all of our tenants as of late last week, and we'll be dropping ansible 5 support "real soon" so fixing related problems is time-sensitive
19:04:31 <fungi> probably warrants another announcement if we can nail down a timeline for the ansible 5 removal change
19:04:52 <clarkb> I did at least warn of the ansible 5 removal in the email anouncing the switch to 6 by default
19:05:00 <fungi> yep
19:06:40 <clarkb> #topic Topics
19:06:49 <clarkb> #topic Bastion Host Updates
19:07:20 <clarkb> The changes to stop writing console log files on bridge landed yesterday. Looks like there was a small issue getting the flag name correct. ianw do we have an idea yet if that is working as expected?
19:08:07 <fungi> and we still need a similar one for static.o.o right? or is that already up?
19:08:30 <ianw> i just checked and i think so.  there hasn't been a log file written since Oct 11 06:09 (UTC) which is about when all the periodic jobs cleared out
19:09:26 <ianw> static has been done the same way, so it looks good too
19:09:39 <clarkb> awesome. Thats one less thing to worry about now :)
19:10:24 <ianw> heh yes, thank you for reviews.  i think it was good to reach a bit more of a generic solution
19:10:59 <ianw> the "tunnel console things via a socket and the ssh connection" changes are another option that is still on my todo list, and seems like a great thing to look into as well
19:11:11 <ianw> one day ... :)
19:11:21 <clarkb> ya though I think we don't want to expose that on bridge or static due to how the protocl works
19:11:30 <clarkb> since it could be used to read other files?
19:11:38 <clarkb> I think what we've done not exposing it is correct for us
19:11:42 <fungi> is intended for reading additional files
19:11:57 <fungi> designed with that in mind anyway
19:12:33 <clarkb> The other stack of changes in flight here has to do with ansible in a venv
19:12:38 <clarkb> #link https://review.opendev.org/q/topic:bridge-ansible-venv
19:12:43 <corvus> er... the current protocol shouldn't be able to read other files?
19:13:13 <clarkb> corvus: ya I don't think it does, but that was always part of the intention iirc. And I don't want to have to think about undoing any opening of things if that changes
19:13:38 <fungi> i know it was at least a theoretical future use case
19:13:47 <corvus> clarkb: yes, it was originally designed for that, but we haven't implemented it because we haven't figured out how to do it safely
19:13:58 <clarkb> gotcha
19:14:14 <corvus> so if that's the concern -- just future-proofing cool.  but if there was a thought that we had a current vulnerability... then i would like to explore that more.  :)
19:14:41 <clarkb> oh no I don't think we have a current vulnerability. We're set up to avoid it should the zuul behavior change to what was (I thought anyway) the intended behavior
19:15:22 <corvus> (and there's really 2 protocols here -- there's the websocket/finger protocol of user -> zuul, and the internal protocol of executor -> node; the former is the one we designed to allow reading other files in the future, and the second is what we just changed)
19:16:04 <corvus> (though support for the former probably would need changes like the latter)
19:16:58 <corvus> okay, so i think we're all on the same page that currently there is not the ability to read arbitrary files, but that we like the status quo of explicitly disabling log streaming on bridge because, among other things, that future-proofs us against eventually adding that feature.  ya?
19:17:06 <clarkb> ++
19:17:40 <corvus> cool, thx and sorry for the diversion.  just wanted to make sure we didn't open something we didn't intend to.  :)
19:18:04 <clarkb> ianw: for ansible in a venv did you manage to sort out using the first member of a singleton group as the hosts specification?
19:18:28 <ianw> (specifically i was talking about https://review.opendev.org/c/zuul/zuul/+/542469 but let's not go into that further now :)
19:19:17 <ianw> clarkb: thanks ... one step back i just approved the change reviewed by yourself and fungi to move the production ansible into a venv on the current bridge.  so i'll watch that in today.  that's the "venv" bit of it really
19:20:01 <fungi> and that preps us for being able to use newer ansible, right?
19:20:03 <clarkb> ya and in theory thta should just switch over due to symlinking the venv install over to ansible
19:20:10 <clarkb> (that was my thought during review anyway)
19:20:29 <ianw> yep, *in theory* it's a noop :)
19:20:39 <clarkb> fungi: sort of, we need to upgrade the python installation too (which is where the replacement node comes in and why the other group work is related)
19:20:41 <ianw> the bits on top now are about upgrading to jammy, and abstracting the way we address the bastion host so we can switch the host more easily -- in this case to probably bridge01.opendev.org
19:21:11 <ianw> anyway, i did establish that as a playbook matcher "groupname[0]" does seem to work to address the first member of a group
19:22:08 <corvus> like `- hosts: bridgegroup[0]` means this is a play that runs on the first host in the bridge group?
19:22:37 <corvus> (er, in the group named "bridgegroup"; i was trying to be clear and may have failed :)
19:22:37 <fungi> and group member ordering is guaranteed deterministic (uses the order in which the members are added i guess) right?
19:22:38 <clarkb> ya I the idea being we can control what the bridge is in a single place (the bridgegroup group) but then only ever have a single entry in that group
19:22:50 <ianw> yep -- https://review.opendev.org/c/opendev/system-config/+/858476
19:22:52 <clarkb> fungi: the idea is that it would be a singleton group
19:23:03 <clarkb> but to enforce that we would take the first entry everywhere
19:23:09 <fungi> i see
19:23:18 <corvus> why not just let it run on the whole group of 1?
19:23:49 <clarkb> the reason I was concerned with that is it makes the ansible really confusing when you need to address a specific node
19:23:59 <clarkb> like when grabbing the CA files
19:24:27 <clarkb> the ansible you express becomes "create a different CA on every member of the bridge group, but only distribute the CA files for the first group member
19:24:43 <clarkb> if others prefer that I'm ok with that too, but I found it a bit confusing to read when I reviewed it
19:25:06 <ianw> corvus: one problem i haven't dealt with yet is playbooks/bootstrap-bridge.yaml.  that runs both under zuul, where the inventory is setup via the job, and in infra-prod, where the inventory is setup by opendev/base-jobs
19:25:55 <corvus> i'm not sure whether or not i would have the same confusion, but i certainly see your point, and the solution seems good.  now that i know the reasoning, i can be on board with that.
19:26:02 <ianw> so basically both have to agree on the name/group.  this is a bit annoying for clarkb's note of trying to use a different group name for the initial setup bastion host, and the production version
19:26:20 <ianw> sorry, that wasn't intended for corvus: ... :)
19:26:56 <clarkb> oh hrm if using distinct groups for the top level ansible and nested ansible in CI is problematic I think we can just not do that
19:27:06 <corvus> oh whew cause that's a hard question and i was struggling with that.  glad i'm off the hook.  :)
19:27:08 <clarkb> it was an idea I had whentrying to sort out why the job needed to redefine the group
19:27:47 <ianw> yeah, it is mostly explained in the comment at https://review.opendev.org/c/opendev/system-config/+/858476/9/zuul.d/system-config-run.yaml
19:28:32 <ianw> anyway -- i will keep at it and see what we can come up with; i don't think we need a solution now
19:28:42 <corvus> intuitively, having the group name be the same makes sense to me... so if that's a workable/livable option i would be in favor of that.
19:29:35 <ianw> i think that's where i'm coming back to as well ...
19:30:13 <corvus> and maybe keep a version of that comment explaining that we're using that as a group for the zuul playbook
19:30:24 <fungi> sounds good to me
19:30:27 <clarkb> wfm
19:30:51 <ianw> yes i will definitely do my usual probably-too-verbose commenting on all this :)
19:32:22 <ianw> anyway, I think it's quite likely by this time next meeting we'll have a fully updated bridge, and an easier path when we want to rotate it out next time as well
19:32:39 <clarkb> sounds good. Thank you for working through all the little details of this
19:32:47 <clarkb> #topic Upgrading Bionic Servers
19:33:21 <clarkb> The expected fix for removing the ubuntu user has landed. Now just need to try booting a jammy control plane server again. I'm hoping to give that a go sometime this week.
19:33:32 <clarkb> Sounds like ianw may also give it a go
19:33:43 <clarkb> But other than that I didn't have any new updated here
19:33:58 <fungi> we'll want it before we boot the new listserv at the very least
19:35:35 <clarkb> yup I was thinking I'd find something easy to replace as a guinea pig like a mirror maybe
19:35:44 <clarkb> but probably not until the end of this week
19:36:05 <clarkb> Lets keep moving as the last topic on the agenda is one that deserves discussion before we run out of time
19:36:07 <clarkb> #topic Mailman 3
19:36:41 <clarkb> fungi has edited the extra long strings on he production mailman2 site and has begun the process of copying data for reattempting the mm3 migration on a newly held test node with our forked images
19:37:03 <fungi> new held node for this is 149.202.168.204, built from your container image fork
19:37:16 <fungi> will hopefully kick off a new scripted import on it within the next hour or so
19:37:27 <fungi> depending on how much longer the rsync runs
19:37:28 <clarkb> corvus: we noticed that a child change of https://review.opendev.org/c/opendev/system-config/+/860157 doesn't find the images that change builds. And were wondering if we got the bits wrong for telling zuul about the image
19:37:58 <clarkb> corvus: maybe if you get some time you can take a look at how the new image build jobs and system-config-run-mailman3 job are hooked up with the buildest registry and provides/requires and dependencies
19:38:15 <clarkb> we've worked around it by forcingthe node hold change to rebuild the images itself
19:38:48 <clarkb> fungi: anything else you need from the rest of us? I expect it is largely just a wait for test results though
19:39:05 <fungi> we've knocked out about all the remaining todo items, so we're probably ready to talk scheduling for lists.opendev.org and lists.zuul-ci.org production migrations
19:39:34 <corvus> clarkb: let's continue that in #opendev
19:39:37 <fungi> i did want to check a few more urls for possible easy/convenient redirects (things like list description pages which people tend to link in various places)
19:40:16 <fungi> stuff not covered by keeping a copy of the pipermail archives hosted from the new server
19:40:18 <clarkb> corvus: yup don't need to solve that here
19:40:41 <clarkb> fungi: good idea, the existing redirects are probably not much help though as tehy redirect to content on disk but you probbaly want url redirects to mm3 urls for those
19:41:40 <fungi> right. i think the list description pages are probably the only thing we really care about redirects for
19:42:04 <fungi> the list indexes for the sites are just served from the root url of each vhost anyway
19:42:22 <fungi> and i'm not too worried about redirecting old admin and moderator interface urls
19:42:43 <clarkb> makes sense
19:42:51 <clarkb> anything else on mm3 before we continue?
19:43:21 <fungi> we should probably also confirm whether we want local logins for users or whether there's a desire to hold this for keycloak integration in order to avoid local credentials in mailman
19:44:06 <fungi> i'm assuming we'd rather get the mm3 migration done and then look at keycloak integration after the fact, but just want to be sure everyone's on the same page there
19:44:13 <clarkb> you can subscribe to lists without creating a user (I did this with upstream mm3)
19:44:21 <fungi> correct
19:44:30 <clarkb> we might even encourage users to do that if they never want to use the web ui for repsonding to things
19:44:44 <clarkb> but ya I wasn't too worried about a future switch over
19:44:49 <ianw> just off the top of my head, it feels like if we allow local logins and then move to a more generic keycloak, we then have the problem of having to merge the local users too?
19:44:53 <fungi> list admins/moderators will need accounts though, and if someone wants to adjust their subscription preferences they'll need a login
19:45:24 <clarkb> ianw: yes we'd likely need to do that. The good thing is we should have email on both sides to align them at least
19:45:50 <fungi> ianw: we'll have that either way. subscribers technically all have accounts, they just don't necessarily have login info for them unless they go through the password reset
19:46:35 <ianw> ahh ok
19:46:43 <frickler> is the login per list or per site or per installation? for mm2 it was per list iiuc
19:46:50 <clarkb> frickler: its per installation
19:47:03 <fungi> frickler: right, for mm3 it's system-wide
19:47:18 <fungi> so not just all lists on a given site, but all mailman sites on that server
19:47:54 <fungi> convenient for folks who interact with a lot of lists, especially across multiple domains on the sam ehost
19:48:00 <fungi> same host
19:49:09 <frickler> so if this is needed to set e.g. digest mode, I think we cannot delay it into the future
19:49:09 <fungi> anyway, i didn't have anything else. we can mull that over, i expect we'll start doing migration scheduling after the ptg
19:49:21 <fungi> frickler: correct
19:50:08 <fungi> basically the options are 1. wait to migrate lists to mm3 until we have keycloak in production the way we want, or 2. migrate to mm3 and then integrate keycloak later and make sure accounts can be linked/merged as needed
19:50:18 <clarkb> right, I think some users will still need to create accounts, but a good chunk of them shouldn't need to which helps simplify things if we want to try and keep them simple like that
19:50:28 <clarkb> I'm fine with 2
19:50:37 <frickler> ack
19:50:47 <fungi> well, to reiterate, the accounts are precreated, whether the users have login info for them or not
19:51:02 <clarkb> fungi: for all uses?
19:51:16 <clarkb> I guess the migration doesn't stick to not creating an account if it doesn't need to
19:51:52 <fungi> if they're referenced in a config (admin, mod, existing subscription) then the import process creates their accounts. if they subscribe later an account is created the first time they do so
19:51:57 <clarkb> anyway I think its fine to migrate them later since in this case we should have the info needed to make associations
19:52:38 <clarkb> also the mailing list is the sort of thing that can probably safely not have single sign on forever
19:53:07 <clarkb> we are running out of time and I do want to get to the last item on the agenda
19:53:13 <clarkb> we can return to this in #opendev if necessary
19:53:23 <fungi> please do
19:53:24 <clarkb> #topic Updating OpenDev's base job nodeset to Jammy
19:53:51 <clarkb> It has been pointed out that OpenDev's base job nodeset is still Focal. Jammy has been out for about half a year now and has a .1 release. It should be stable enough for our jobs
19:54:05 <clarkb> But that opens questions about how we want to communicate and schedule the switch
19:54:16 <frickler> yes, I came across that while looking to upgrade devstack jobs
19:54:42 <clarkb> I was thinking that we should avoid changing it before the PTG since that will just add a distraction during PTG week. But maybe we can do it the week after ish? Basically do a 2 week notice to service-announce and then swap?
19:54:47 <fungi> openstack is actively switching from focal to jammy for testing now that their zed release is done
19:54:55 <frickler> I think we'd want to run some tests with base-test before discussing details of scheduling?
19:55:30 <clarkb> frickler: in the past we've done that (when the infra team managed this all for openstack) and he problem with that is it sets the expectation that we are repsonsible for making it work for every job
19:55:49 <clarkb> I twas the xenial switch or maybe trusty switch that made me never want to do that again.
19:56:10 <clarkb> I think people should test what they are interested in and be explicit where they know they need to be (say for specific verisons of python).
19:56:36 <frickler> still we'd need to change base-test in order to allow for that?
19:56:45 <frickler> #link https://review.opendev.org/c/opendev/base-jobs/+/860686 would be the change for that
19:56:48 <clarkb> frickler: no, any job can select the jammy nodeset
19:56:48 <fungi> anything inheriting from our default nodeset which breaks when we change it has the option of overriding the nodeset it uses to the earlier value anyway
19:57:19 <fungi> just as it can be adjusted to use the new value before our planned transition date
19:57:28 <frickler> hmm, true that
19:57:37 <clarkb> I think updating base-test is a good idea to keep it in sync with base. But I don't think that is the method for tesitng this. base-test is for testing the roles in base
19:57:47 <clarkb> we know they work on jammy because projects like zuul already use jammy
19:57:51 <clarkb> so we don
19:58:00 <clarkb> er we don't need to test that base functionality
19:58:30 <corvus> i agree i don't think this needs a base-test cycle since we know that the change won't break all jobs (because we can and have made the change explicitly elsewhere, and zuul performs syntax validation on the change)
19:58:42 <fungi> in my mind, the main questions are when do we plan to switch it/how much advance notice do we want to provide users
19:58:51 <clarkb> fungi: ++
19:58:59 <clarkb> I think we should wait for after the PTG at the very least
19:59:18 <fungi> wait for after the ptg to announce it, or for actually changing it?
19:59:32 <clarkb> actually changing it. Ideally we should announce whatever we decide on real soon now
19:59:46 <frickler> 2 week notice should be fine then. announce now
19:59:50 <fungi> sounds good to me
19:59:54 <ianw> ++
20:00:15 <clarkb> cool. I can work on a draft for service-announce after lunch today
20:00:22 <clarkb> (I'm happy to send that as I think most others get moderated)
20:00:28 <fungi> however, we should be mindful of the zuul dropping ansible 5 situation as well, and whether we want those to coincide, or be announced together, or not compete
20:01:16 <clarkb> dropping ansible 5 has already been announced but without a hard date. I think it was a week or so from today that zuul had planned to drop ansible5
20:01:38 <frickler> agree, having a couple of days between them will help in distinguishing failure causes
20:02:00 <clarkb> we will need to manually restart zuul to pick up that change quicker than our weekly restarts. But that is easy to do
20:02:11 <clarkb> (also I don't think anything is using ansible 5 so should be an easy switch)
20:02:29 <clarkb> I'll work on a draft email for all that in a bit
20:02:34 <fungi> thanks!
20:02:37 <clarkb> and we are at time
20:02:39 <ianw> for mine i think it probably gets confusing to combine them as a single change, as they're not really related as such, so agree with doing separtely
20:02:52 <clarkb> thanks everyone
20:02:59 <corvus> thanks clarkb
20:03:02 <clarkb> feel free to continue discussion over in #opendev
20:03:05 <clarkb> #endmeeting