19:01:01 <clarkb> #startmeeting infra
19:01:01 <opendevmeet> Meeting started Tue Nov 15 19:01:01 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:01 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:01 <opendevmeet> The meeting name has been set to 'infra'
19:01:07 <ianw> o/
19:01:07 <clarkb> #link https://lists.opendev.org/pipermail/service-discuss/2022-November/000379.html Our Agenda
19:01:13 <clarkb> #topic Announcements
19:01:23 <clarkb> I had no announcements
19:01:28 <clarkb> #topic Topics
19:01:28 <fungi> there are a couple
19:01:30 <clarkb> #undo
19:01:30 <opendevmeet> Removing item from minutes: #topic Topics
19:01:38 <clarkb> go for it
19:01:53 <fungi> nominations for the openinfra foundation board of directors are open
19:02:06 <fungi> and the cfp for the openinfra summit in vancouver is now open as well
19:02:43 <fungi> #link https://lists.openinfra.dev/pipermail/foundation/2022-November/003104.html 2023 Open Infrastructure Foundation Individual Director nominations are open
19:02:58 <fungi> #link https://lists.openinfra.dev/pipermail/foundation/2022-November/003105.html The CFP for the OpenInfra Summit 2023 is open
19:03:12 <fungi> that's all i can think of though
19:03:44 <clarkb> #topic Bastion Host Updates
19:03:51 <clarkb> #link https://review.opendev.org/q/topic:prod-bastion-group
19:03:55 <clarkb> #link https://review.opendev.org/q/topic:bridge-ansible-venv
19:04:19 <clarkb> looks like a few changes have merged since we last discussed this. ianw anything urgent or otherwise not captured by the two change topics that we should look at?
19:04:37 <clarkb> One idea I had was maybe we should consolidate to a single topic for review even if there are distinct trees of change happening?
19:05:15 <ianw> yeah i can clean up; i think prod-bastion-group is really now about being in a position to run parallel jobs
19:05:37 <ianw> which is basically "setup source in one place, then fire off jobs"
19:06:29 <clarkb> ah so maybe another topic for "things we need to do before turning off the old server" ?
19:06:49 <ianw> the bridge-ansible-venv; one i'll get back to is us storing the host keys for our servers and deploying to /etc/ssh
19:07:02 <ianw> fungi had some good points on making that better, so that's wip, but i'll get to that soon
19:07:47 <ianw> (the idea being that when we start a new bridge, i'm trying to make it so we have as few manual steps as possible :)
19:08:06 <clarkb> ++
19:08:09 <ianw> so writing down the manual steps has been a good way to try and think of ways to codify them :)
19:08:21 <fungi> it's a good approach, but gets weird if you don't include all ip addresses along wit the hostnames, and we have that info in the inventory already
19:08:33 <ianw> the only other one is
19:08:34 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/861284
19:08:53 <ianw> which converts our launch-node into a small package installed in a venv on the bridge node
19:09:07 <clarkb> and that should address our paramiko needs?
19:09:12 <clarkb> I'll have to take a look at that
19:09:27 <fungi> i'm going to need to launch a new server for mm3 this week probably, so will try to give that change a closer look
19:10:42 <clarkb> Great, I'll have need of that too for our next topic
19:10:51 <clarkb> #topic Upgrading old Servers
19:10:52 <ianw> yep, it fixes that issue, and i think is a path to help with openstacksdk versions too
19:10:52 <ianw> if we need two venv's with different versions -- well that's not great, but at least possible
19:11:05 <clarkb> #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades
19:11:49 <clarkb> later this week I'm hoping to put a dent into some of these. I think we've basically sorted out all of the logistical challenges for doing jammy things once 861284 has landed
19:11:57 <clarkb> so ya I'll try to review that and continue to make progress here
19:12:33 <clarkb> I don't think there is much else to say about this now. We've hit the point where we just need to start doing the ugprades.
19:13:06 <clarkb> #topic Mailman 3
19:13:25 <clarkb> fungi has been pushign on this again. We ran into some problems with updated images that I think should be happier now
19:13:40 <fungi> in preparing to do a final round of import tests, i got stuck on upstream updates to the container image tag we were tracking which slightly broke our setup automation
19:13:55 <fungi> and fixing those in turn broke our forked images
19:14:06 <fungi> which i think i have sorted (waiting for zuul to confirm)
19:14:17 <clarkb> the good news is that this is giving us a sneak preview of the sorts of changes that need to be made for mm3 updates
19:14:28 <clarkb> I wouldn't say the story there is great, but it should at least be doable
19:14:59 <fungi> but pending successful repeat of a final round of migration testing, i hope to boot a future mm3 prod server later this week and send announcements for migration maintenance maybe?
19:15:00 <clarkb> the main issue being the way mailman deals with dependencies and needing specific versions of stuff
19:15:08 <clarkb> fungi: ++
19:15:13 <fungi> yeah, it's dependency hell, more or less
19:16:00 <fungi> anyway, might be worth looking at the calendar for some tentative dates
19:16:25 <fungi> i don't expect we need to provide a ton of advance notice for lists.opendev.org and lists.zuul-ci.org, but a couple of weeks heads-up might be nice
19:16:39 <fungi> which puts us in early december at the soonest
19:16:48 <clarkb> that seems reasonable
19:16:55 <ianw> ++
19:17:24 <fungi> based on my tests so far, importing those two sites plus dns cut-over is all doable inside of an hour
19:17:49 <fungi> lists.openstack.org will need a few hours minimum, but i won't want to tackle that until early next year
19:18:19 <fungi> should i be looking at trying to do the initial sites over a weekend, or do we think a friday is probably acceptable?
19:18:41 <clarkb> I think weekdays should be fine. Both lists are quite low traffic
19:18:54 <fungi> thinking about friday december 2, or that weekend (3..4)
19:19:04 <clarkb> the second is bad for me, but I won't let that stop you
19:19:13 <fungi> assuming things look good by the end of this week i can send a two-week warning on friday
19:20:13 <fungi> could shoot for friday december 9 instead, if that works better for folks, but that's getting closer to holidays
19:20:39 <fungi> i expect to be travelling at the end of december, so won't be around a computer as much
19:20:49 <corvus> i think almost no notice is required and you should feel free to do it at your convenience
19:21:24 <fungi> yeah, from a sending and receiving messages standpoint it should be essentially transparent
19:21:27 <corvus> (low traffic lists + smtp queing)
19:22:00 <fungi> for people doing list moderation, the webui will be down, and when it comes back for them it will need a new login (which they'll be able to get into their accounts for via password recovery steps)
19:23:33 <fungi> that's really the biggest impact. that and some unknowns around how dkim signatures on some folks' messages may stop validating if the new mailman alters posts in ways that v2 didn't
19:24:37 <frickler> maybe we should have a test ml where people concerned about their mailsetup could test this?
19:24:52 <clarkb> frickler: unfortunately that would require setting up an entirely new domain
19:25:12 <clarkb> I think if we didn't have to do so much on a per domain case this would be easier to test and transition
19:25:28 <clarkb> but starting with low traffic domains is a reasonable stand in
19:25:48 <clarkb> anyway I agree. I don't think a ton of notice is necessary. But sending some notice is probably a good idea as there will be user facing changes
19:25:48 <frickler> not upfront, but add something like test@lists.opendev.org
19:25:52 <fungi> would require a new domain before the migration, but could be a test ml on a migrated domain later if people want to test through it without bothering legitimate lists with noise posts
19:25:57 <clarkb> frickler: oh I see, ya not a bad idea
19:26:17 <fungi> we've had a test ml in the past, but dropped it during some domain shuffles in recent years
19:26:24 <fungi> i have no objection to adding one
19:26:25 <frickler> so that people from the large lists could test before those get moved
19:27:17 <clarkb> anything else on this topic or should we move on?
19:27:34 <fungi> if we're talking fridays, the next possible friday would be november 25, but since the 24th is a holiday around here i'm probably not going to be around much on the 25th
19:27:59 <clarkb> fungi: maybe we should consider a monday instead? and do the 7th or 28th?
19:29:05 <ianw> it's easier for me to help on a monday, but also i don't imagine i'd be much help ;)
19:29:34 <fungi> mondays mean smaller tolerance for extra downtime and more people actively trying to use stuff
19:29:49 <fungi> but fridays mean subtle problems may not get spotted for days
19:29:54 <fungi> so it's hard to say which is worse
19:29:56 <clarkb> right, but as corvus points out the risk theer is minimal
19:30:11 <clarkb> for openstack that might change, but for opendev.org and zuul-ci.org I think it is fine?
19:30:36 <fungi> i have a call at 17:00 utc on the 28th but am basically open otherwise
19:30:51 <fungi> i can give it a shot. later in the day also means ianw is on hand
19:31:14 <fungi> or earlier in the day for frickler
19:31:18 <clarkb> and we can always reschedule if we get closer and decide that timing is bad
19:32:03 <fungi> yeah, i guess we can revisit in next week's meeting for a go/no-go
19:32:09 <clarkb> sounds good
19:32:14 <fungi> no need to burn more time on today's agenda
19:32:18 <clarkb> #topic Updating python base images
19:32:24 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/862152
19:32:41 <clarkb> This change has the reviews it needs. Assuming nothing else comes up tomorrow I plan to merge it then rebuild some of our images as well
19:33:02 <clarkb> At this point this is mostly a heads up that image churn will occur but it is expected to largely be a noop
19:33:37 <clarkb> #topic Etherpad container log growth
19:33:42 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/864060
19:34:10 <clarkb> This change ahs been approved. I half expected we need to manually restart the container to hvae it change behavior though
19:34:23 <clarkb> I'll try to check on it after lunch today and can manually restart it at that point if necessary
19:34:34 <clarkb> This should make the etherpad service far more reliable whcih is great
19:35:20 <clarkb> #topic Quo vadis Storyboard
19:35:26 <clarkb> #link https://lists.opendev.org/pipermail/service-discuss/2022-October/000370.html
19:35:51 <clarkb> I did send that update to the mailing list las week explicitly asking for users that would like to keep using storyboard so that we can plan accordingly
19:35:57 <clarkb> (and maybe convince them to help maintain it :) )
19:36:08 <clarkb> I have not seen any movement on that thread since then though
19:36:25 <fungi> i've added a highlight of that topic to this week's foundation newsletter too, just to get some added visibility
19:36:45 <clarkb> It does look like the openstack TC is not interested in making any broad openstack wide decisions either. Which means it is unlikely we'll get openstack pushing in any single direction.
19:37:10 <clarkb> I think we should keep the discussion open for another week then consider feedack collected and use that to make decisions on what we should be doing
19:37:59 <clarkb> Any other thoughts or concerns about storyboard?
19:38:23 <fungi> i had a user request in #storyboard this morning, but it was fairly easily resolved
19:38:31 <fungi> duplicate account, colliding e-mail address
19:38:50 <fungi> deactivated the old account and removed the e-mail address from it, then account autocreation for the new openid worked
19:40:11 <clarkb> #topic Vexxhost server rescue behavior
19:40:18 <clarkb> I did more testing of this and learned a bit
19:40:33 <clarkb> For normally launched instances resolving the disk label collision does fix things.
19:41:00 <clarkb> For BFV instances melwitt pointed me at the tempest testing for bfv server rescues and in that testing they set very specific image disk type and bus options
19:41:32 <clarkb> I suspect that we need an image to be created with those properties that are proeprly set for the volume setup in vexxhost. Then theoretically this would work
19:42:06 <clarkb> In both cases I think that vexxhost should consider creating a dedicated rescue image. Possibly one for bfv and one for non bfv. But with labels set (or uuid used) and the appropriate flags
19:42:32 <clarkb> mnaser: ^ I don't think this is urgent, but it is also a nice feature to have. I'd be curious to know if you have an feedback on that as well
19:42:42 <corvus> it sounds like bfv was something we previously needed but don't anymore; should we migrate to non-bfv?
19:43:11 <clarkb> corvus: that is probably worth considering as well. I did that with the newly deployed gitea load balancer
19:43:14 <mnaser> i suggest sticking to bfv, non-bfv means your data is sitting on local storage
19:43:36 <mnaser> risks are if the hv goes poof the data might be gone, so if its a cattle then you're fine
19:43:45 <mnaser> but if it's a pet that might be a bit of a bad time
19:43:57 <clarkb> mnaser: I think for things like gitea and gerrit we would still mount a distinct data volume, but don't necessarily need the disk to be managed that way too. For the load balancer this is definitely a non issue
19:44:21 <mnaser> oh well in that case when you're factoring it in then you're good
19:44:26 <fungi> yeah, we would just deploy a new load balancer in a matter of minutes, it has no persistent state whatsoeevr
19:45:04 <clarkb> but also I think these concerns are distinct. Server rescue should work particularly for a public cloud imo so that users can fixup things themselves.
19:45:13 <clarkb> then whether or not we boot bfv is something we should consider
19:45:46 <ianw> definitely an interesting point to add to our launch node docs though?  mostly things are "cattle" but with various degrees of how annoying it would be to restore i guess
19:46:02 <clarkb> In any case I just wanted to give an update on what I found. rescue can be made to work for non bfv instances on our end and possibly for bfv as well but I'm unsure what to set those image property values to for ceph volumes
19:46:58 <clarkb> #topic Replacing Twitter
19:47:09 <clarkb> we currently have a twitter account to post our status bot alerts to
19:47:34 <clarkb> frickler has put this topic up asking if we should consider a switch to fosstodon instead
19:47:51 <frickler> yes, not urgent for anything, but we should at least prepare imo
19:48:00 <clarkb> I think that is a reasonable thing to do, but I have no idea what that requires in the status bot code. I assume we'd need a new driver since I doubt mastodon and twitter share an api
19:48:07 <fungi> is it just a credential/url change for the api in statusbot's integration, or does it need a different api?
19:48:28 <fungi> yeah, same thing i was wondering
19:48:29 <ianw> definitely a different api
19:48:50 <ianw> i had a quick look and not *too* hard
19:49:27 <fungi> i guess if someone has time to write the necessary bindings for whatever library implements that, i'm okay with it, but i don't use either twitter or mastodon so can't speak to the exodus situation there
19:49:42 <ianw> i don't feel like we need to abandon ship on twitter, but also adding mastodon seems like very little cost.
19:49:51 <corvus> there isn't a technical reason to remove twitter support or stop posting there.  (of course, there may be non-technical reasons, but there always have been).  the twitter support itself is a tertiary output (after irc and wiki)
19:50:18 <clarkb> right we could post to both locations (as long as we can still login to twitter which has apparnetly become an issue for 2fa users)
19:50:18 <fungi> that sums up my position on it as well, yes
19:50:27 <corvus> and i agree that adding mastodon is desirable if there are folks who would like to receive info there
19:50:50 <ianw> opendevinfra i think have chosen fosstondon as a host -- it seems reasonable.  it broadly shares our philosophy, only caveat is that it is english only (for moderation purposes)
19:51:11 <ianw> not that i think we send status in other languages ...
19:51:21 <clarkb> I would say if you are interested and have the time to do it then go for it :) This is unlikely to be a priority but also something taht shouldn't take long I don't expect
19:51:21 <corvus> fosstodon seems like an appropriate place, but note that they currently have a waitlist
19:52:31 <ianw> yeah, personally i have an account and like it enough to pitch in some $ via the patreon.  it seems like a more sustainable model to pay for things you like
19:53:03 <clarkb> maybe we can get on the waitlist now so that it is ready when we have the driver written?
19:53:11 <clarkb> but ya I agree fosstodon seems appropriate
19:53:17 <corvus> it looks like waitlist processing is not slow
19:53:40 <corvus> anecdote: https://fosstodon.org/@acmegating waitlisted and approved within a few hours
19:53:43 <ianw> well i can quickly put in "opendevinfra" as a name, if we like
19:53:54 <frickler> +1
19:54:07 <fungi> that matches the twitter account we're using, right?
19:54:13 <frickler> yes
19:54:13 <ianw> yep
19:54:13 <fungi> if so, sgtm
19:54:37 <ianw> well i'll do that and add it to the usual places, and i think we can make statusbot talk to it pretty quick
19:54:44 <clarkb> sounds good
19:55:03 <clarkb> #topic Open Discussion
19:55:23 <clarkb> There were a few more things I was hoping to get to that weren't on the agenda. Let's see if we can cover them really quickly
19:55:34 <clarkb> I've got a change up to upgrade our Gerrit version
19:55:51 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/864217 Upgrade Gerrit to 3.5.4
19:56:00 <clarkb> This change needs its parent too in order to land
19:56:14 <clarkb> If that lands sometime this week I can be around to restart Gerrit to pick it up at some point
19:56:48 <clarkb> openstack/ansible-role-zookeeper was renamed to windmill/ansible-role-zookeeper and we've since created a new openstack/ansible-role-zookeeper
19:57:01 <fungi> yeah, we recently created a new repository for openstack whose name collides with a redirect for an old repository we moved out of openstack (different project entirely, just happens to use the same name). i've got a held node and am going to poke at options
19:57:05 <clarkb> I didn't expect this to cause problems because we have a bunch of foo/project-config repos but because there was a prior rename we hvae redirects in place which this runs afoul of
19:57:25 <clarkb> in addition to fixing this in gitea one way or another we should look into update our testing to call these problems out
19:58:09 <clarkb> And finally, nodepool needs newer openstacksdk in order to run under python3.11 (because old sdk uses pythonisms that were deprecated and removedin 3.11). However new opnstacksdk previously didn't work with our nodepool and clouds
19:58:38 <frickler> also there's an interesting git-review patch adding patch set descriptions, which looks useful to me https://review.opendev.org/c/opendev/git-review/+/864098 some concern on whether more sophisticated url mangling might be needed, maybe have a look if you're interested
19:58:40 <clarkb> corvus has a nodepool test script thing that I'm hoping to try and use to test this without doing a whole nodepool deployment t osee if openstacksdk updates have made things better (and if not identify the problems)
19:58:43 <ianw> heh, well if it can happen it will ... is the only problem really that apache is sending things the wrong way?
19:59:10 <clarkb> ianw: its gitea itself redirecting, but ya I think that may be the only problem?
19:59:15 <fungi> ianw: not apache but gitea
19:59:44 <clarkb> frickler: interesting, I'm not evensure I know what that feature does in gerrit. I'll have to take a look
19:59:50 <frickler> and I'd also like to learn other root's opinion on the Ubuntu FIPS token patch, if I'm in a minority I might be fine with getting outvoted
20:00:08 <frickler> clarkb: you can see it in the patch, the submitter used their patch version
20:00:52 <clarkb> frickler: excellent that will help :)
20:00:59 <frickler> https://review.opendev.org/c/openstack/project-config/+/861457 for fips
20:01:02 <clarkb> frickler: re FIPs I think that is more a question for openstack
20:01:10 <fungi> yeah, the main issue with things like gerrit patchset descriptions is that we currently can't add regression tests for newer gerrit features unless we can get our git-review tests able to deploy newer gerrit versions
20:01:13 <clarkb> I don't think it runs afoul of our expectations from a hosting side
20:01:17 <corvus> i would have expected a new project creation to invalidate the gitea redirects.  regardless of why it didn't work out, the last time i looked, the redirects were a gitea db entry.  probably can be fixed manually, but if so, then we should remember to record that in our yaml files for repo moves since we have assumed that we should be able to reconstruct the gitea redirect mappings from that data alone.
20:01:32 <clarkb> corvus: yup ++
20:01:47 <fungi> corvus: yes, i plan to comment or comment out the relevant entry in opendev/project-config
20:01:52 <clarkb> frickler: from the hosting side of things this is a big part of why I don't think we should have fips specific images
20:01:58 <fungi> after i finish playing with the held node to determine options
20:02:16 <clarkb> frickler: but we already allow jobs to interact with proprietary services (quay is/was for example)
20:02:42 <clarkb> We are at time now. Feel free to continue discussion in #opendev or on the mailing list. Thank you for your time everyone
20:03:02 <clarkb> Next week is a big US holiday week but I expect I'll be around through tuesday and probably most of wednesday
20:03:13 <clarkb> I don't expect to be around much thursday and friday
20:03:19 <clarkb> #endmeeting