19:01:51 <clarkb> #startmeeting infra
19:01:51 <opendevmeet> Meeting started Tue Nov 22 19:01:51 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:51 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:51 <opendevmeet> The meeting name has been set to 'infra'
19:02:00 <clarkb> #link https://lists.opendev.org/pipermail/service-discuss/2022-November/000381.html Our Agenda
19:02:22 <clarkb> #topic Announcements
19:02:35 <clarkb> fungi: are individual director nominations still open?
19:03:24 <fungi> yes, until january
19:03:31 <fungi> i think?
19:03:52 <fungi> no, december 16
19:04:03 <fungi> #link https://lists.openinfra.dev/pipermail/foundation/2022-November/003104.html 2023 Open Infrastructure Foundation Individual Director nominations are open
19:04:15 <clarkb> thanks
19:04:40 <fungi> also cfp is open for the vancouver summit in june
19:04:57 <fungi> #link https://lists.openinfra.dev/pipermail/foundation/2022-November/003105.html The CFP for the OpenInfra Summit 2023 is open
19:06:01 <clarkb> #topic Bastion Host Updates
19:06:08 <clarkb> Lets dive right in
19:06:30 <clarkb> the bastion is doing centrally managed known hosts now. We also have a dedicated venv for launch node now
19:07:10 <clarkb> fungi discovered that this venv isn't working with rax though. It needed older cinderclient, and iwht that fixed now needs something to address networking stuff with rax. iirc this was the problem that corvus and ianw ran into on old bridge? I think we need older osc maybe?
19:07:19 <clarkb> fungi: that may be why my osc is pinned back on bridge01 fwiw
19:07:59 <fungi> well, the cinderclient version only gets in the way if you want to use launch-node's --volume option in rackspace, but it seems there's also a problem with the default networks behavior
19:08:22 <clarkb> right, I suspect that the reason I pinned back osc but not cinderclient is I was not doing volume stuff but was doing server stuff
19:08:24 <ianw> ... that does sound familiar
19:08:29 <fungi> rackspace doesn't have neutron api support, and for whatever reason osc thinks it should
19:08:43 <fungi> maybe we need to add another option to our clouds.yaml for it?
19:08:52 <clarkb> ianw: I'm pretty sure its the same issue that corvus ran into and helped you solve when you ran into it. And I suspect it was fixed for me on brdige01 with oldre osc
19:09:16 <clarkb> fungi: if you can figure that out you win a prize. unfortunately the options for stuff like that aren't super clear
19:09:34 <ianw> (https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2022-10-24.log.html#t2022-10-24T22:56:16 was what i saw when bringing up new bridge)
19:09:59 <fungi> looks like the server create command is hitting a problem in openstack.resource._translate_response() where it complains about the networks metadata
19:10:01 <ianw> "Bad networks format"
19:10:12 <fungi> yeah, that's it
19:11:46 <clarkb> we can solve the venv problem outside th emeeting but I wanted to call it out
19:11:48 <fungi> okay, so we think ~root/corvus-venv (on old bridge?) has a working set?
19:11:58 <clarkb> fungi: ya
19:12:03 <fungi> yeah, that's enough for me to dig deeper
19:12:06 <fungi> thanks
19:12:06 <clarkb> are there any other bridge updates or changes to call out?
19:12:45 <ianw> the ansible 6 work is close
19:13:00 <ianw> #link https://review.opendev.org/q/topic:bridge-ansible-update
19:13:08 <ianw> anything with a +1 can be reviewed :)
19:13:16 <clarkb> oh right I pulled up the change there that I havne't reviewed yet but saw it was huge compared to the others and got distracted :)
19:13:28 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/865092
19:13:48 <ianw> is one for disabling known_hosts now that we have the global one
19:14:05 <clarkb> ianw: I think you can go ahead and approve that one if you like. I mostly didn't approve it just so that we could take a minute to consider if there was a valid use case for not doing it
19:15:04 <ianw> ok, i can't think of one :)  i have in my todo to look at making it a global thing ... because i don't think we should have any logging in between machines that we don't explicitly codify in system-config
19:15:12 <ianw> #link https://review.opendev.org/q/topic:prod-bastion-group
19:15:28 <ianw> has a couple of changes still if we want to push on the parallel running of jobs.
19:15:47 <ianw> (it doesn't do anything in parallel, but sets up the dependencies so they could, in theory)
19:16:03 <clarkb> ++ I do want to push on that. But I also want to be able to observe things as they change
19:16:13 <clarkb> probbaly won't dig into that until next week
19:16:16 <ianw> that is very high touch, and i don't have 100% confidence it wouldn't cause problems, so not a fire and forget
19:16:20 <ianw> ++
19:16:52 <ianw> other than that, i just want to write up something on the migration process, and i think it's (finally!) done
19:17:01 <clarkb> excellent. Thanks again
19:17:14 <ianw> there's no more points i can think of that we can really automate more
19:18:05 <clarkb> sounds like we can move on to the next item
19:18:07 <clarkb> #topic Upgrading Servers
19:18:25 <clarkb> I was hoping to make progress on another server last week then I got sick and now this is behind a few other efforts :/
19:18:46 <clarkb> that said one of those efforts is mailman3 and does include a server upgrade. Lets just skip ahead to talk about that
19:18:49 <clarkb> #topic Mailman 3
19:19:05 <clarkb> changes for mailman3 config management have landed
19:19:20 <clarkb> after fungi picked this back up and addressed some issues with newer mailman and django
19:19:46 <clarkb> I believe the next step is to launch a node, add it to our inventory and get mailman3 running on it. Then we can plan migration sfor opendev and zuul?
19:19:57 <clarkb> er I guess they are already planned. We can go ahead and perform them :)
19:20:45 <fungi> yeah, the migration plan is outlined here:
19:20:57 <fungi> #link https://etherpad.opendev.org/p/mm3migration Mailman 3 Migration Plan
19:22:18 <clarkb> anything else to add to this?
19:22:59 <ianw> sounds great, thanks!
19:23:01 <fungi> i'm thinking since we said let's do go/no-go in the meeting today for a monday 2022-11-28 maintenance, i'm leaning toward no-go since i'm currently struggling to get the server created and expect to not be around much on thursday/friday (also a number of people are quite unlikely to see an announcement this week)
19:23:22 <clarkb> fungi: that is reasonable. no objection from me
19:23:36 <ianw> yeah i wouldn't want to take this one on alone :)
19:24:21 <fungi> later nect week might work, we could do a different day than friday given clarkb's unavailability
19:24:45 <fungi> also apologies for typos and slow response, i seem to be suffering some mighty lag/packet loss at the moment
19:25:23 <clarkb> yup later that week works for me. I'm just not around that friday
19:25:25 <fungi> anyway, nothing else on this topic
19:25:26 <clarkb> (the 2nd)
19:25:42 <fungi> mainly working to get the prod server booted and added to inventory
19:25:43 <clarkb> cool, lets move o nto the next thing
19:25:53 <clarkb> #topic Using pip wheel in python base images
19:26:17 <clarkb> The change to update how our assemble system creates wheels in our base images has landed
19:26:36 <clarkb> At this point I don't expect any problems due to the testing I did with nodepool which covered siblings and all that
19:26:50 <clarkb> But if you do see problems please make note of them so that they can be debugged, but reverting is also fine
19:26:58 <clarkb> mostly just a heads up
19:28:35 <clarkb> #topic Vexxhost instance rescuing
19:28:55 <clarkb> I didn't want this to fall off our mind swithout some sort of conclusion
19:29:11 <clarkb> For normally booted nodes we can create a custom boot image that uses a different device label for /
19:29:27 <clarkb> we can create this with dib and uploda it or make a snapshot of an image we've done surgery on in the cloud.
19:29:58 <clarkb> However, for BFV nodes we would need a special image with special metadata flags and I have no idea what the appropriate flags would be
19:30:38 <clarkb> given that, do we want to change anything about how we operate in vexxhost? previously people had asked about console login
19:31:15 <clarkb> I'm personally hopeful that public clouds would make this work out of the box for users. But that isn't currently the case so I'm open to ideas
19:31:57 <ianw> i guess review is the big one ...
19:32:17 <ianw> i guess console login only helps if it gets to the point of starting tty's ... so not borked initrd etc.
19:32:35 <clarkb> ya its definitely a subset of issues
19:32:58 <clarkb> maybe our effort is best spent trying to help mnaser figure out what that metadata for bfv instances is?
19:33:08 <clarkb> er metadata for bfv rescue images is
19:33:41 <fungi> yes, being able to mount the broken server's rootfs on another server instance is really the only 100% solution
19:33:47 <ianw> it does feel like there's not much replacement for it
19:34:16 <ianw> i'm trying to think of things in production that have stopped boot
19:34:33 <clarkb> ianw: the lists server is the main one. But thats due to its ancientness
19:34:39 <clarkb> (which we are addressing by replacing it)
19:34:42 <ianw> a disk being down in the large static volume and not mounting was one i can think of
19:35:24 <ianw> i think i had to jump into rescue mode for that once
19:35:56 <fungi> for that case, having a ramdisk image in the bootloader is useful
19:36:22 <fungi> but not for cases where the hypervisor can't fire the bootloader at all (rackspace pv lists.o.o)
19:37:06 <clarkb> ya it definitely seems like rescuing is the most versatile tool. I'll see if I can learn anything more about how to configure ceph for it
19:37:26 <clarkb> in particular jrosser had input and we might be able to try setting metadata like jrosser and see if it owrks :)
19:38:11 <clarkb> and if we can figure it out then other vexxhost users will potentially benefit too
19:38:15 <clarkb> so worthwhile I think
19:39:09 <clarkb> #topic Quo vadis Storyboard
19:39:25 <clarkb> Its been about two weeks since I sent out anothre email asking for more input
19:39:35 <clarkb> #link https://lists.opendev.org/pipermail/service-discuss/2022-October/000370.html
19:39:49 <clarkb> we've not had any responses since my last email. I think that means we should probably consider feedback gathered
19:40:51 <clarkb> At this point I think our next step is to work out amongst ourslves (but probably still on the mailing list?) what we think opendev's next steps should be
19:41:31 <clarkb> to summarize the feedback from our users, it seems there aren't any specifically interested in using storaybodr or helping to maintain/develop it.
19:41:47 <clarkb> There are users interested in migrating from storyboard to launchpad
19:41:48 <ianw> the options are really to upgrade it to something maintainable or somehow gracefully shut it down?
19:42:13 <frickler> do we know whether there is any other storyboard deployment beside ours?
19:42:22 <clarkb> ianw: yes I think ultimately where that puts us is we need to decide if we are willing to maintain it ourselves (including necessary dev work). This would require a shift in our priorities considering it hasn't been updated yet
19:42:39 <clarkb> ianw: and if we don't want to do that figure out if migrating to something else is feasible
19:42:56 <clarkb> frickler: there were once upon a time but I'm not sure today
19:44:08 <fungi> as always people can fork or take up maintenance later if it's something they're using, regardless of whether we're maintaining it
19:45:03 <clarkb> right I'm not too worried about others unless they are willing to help us. And so far no one has indicated that is something they can or want to do
19:45:26 <clarkb> I don't think we should make any decisions in the meeting. But I wanted to summarize where we've ended up and what our two likely paths forward appear to be
19:45:27 <ianw> it feels like if we're going to expend a fair bit of effort, it might be worth thinking about putting that into figuring out something like using gitea issues
19:45:56 <clarkb> ianw: my main concern with that is I don't think we can effectively disable repo usage and allow issues
19:46:06 <clarkb> its definitely something that can be investigated though
19:46:29 <fungi> right now there are a lot of features in gitea which we avoid supporting because we aren't allowing people to have gitea accounts
19:46:47 <clarkb> and a number of features we explicitly don't want
19:47:07 <fungi> as soon as we do allow gitea accounts, we need to more carefully go through the features and work out which ones can be turned off or which we're stuck supporting because people will end up using them
19:47:17 <clarkb> But we've got really good CI for gitea so testing setups should't be difficult if people want to look into that.
19:47:48 <fungi> also not allowing authenticated activities has shielded us from a number of security vulnerabilities in gitea
19:47:57 <clarkb> ++
19:48:37 <clarkb> anyway I do think we should continue the discussion on the mailing list. I can write a followup indicating that we have options like maintain it ourselves and do the work, migrate to something else (lp and/or gitea) and try to outline the effort required for each?
19:48:51 <clarkb> Do we want that in a separate thread? Or should I keep rolling forward with the existing one
19:49:27 <ianw> i do agreee, but it feels like useful time spent investigating that path
19:52:58 <ianw> it probably also loops back to our authentication options, something that would also be useful to work on i guess
19:53:21 <clarkb> ya at the end of the day everything end sup intertwined somehow and we're doing our best to try and prioritize :)
19:53:43 <clarkb> (fwiw I think the recent priorities around gerrit and zuul and strong CI have been the right choice. They hvae enabled us to move more quickly on other things as necessary)
19:54:10 <clarkb> sounds like no strong opinion on where exactly I send the email. I'll take a look at the existing thread and decide if a follow up makes esnse or not
19:54:19 <clarkb> but I'll try to get that out later today or tomorrow and we can pick up discussion there
19:54:41 <fungi> agreed, i would want to find time to put into the sso spec if we went that route
19:56:19 <clarkb> #topic Open Discussion
19:56:24 <clarkb> Anything else before we run out of time
19:57:07 <frickler> just thx to ianw for the mastodon setup
19:57:13 <fungi> reminder that i don't expect to be around much thursday through the weekend
19:57:19 <clarkb> fungi: me too
19:57:21 <frickler> that went much easier than I had expected
19:57:30 <clarkb> my kids actually have wednesday - monday out of school
19:57:33 <clarkb> I wish I realized that sooner
19:57:51 <fungi> well, if you're not around tomorrow i won't tell anyone ;)
19:58:11 <ianw> ++ enjoy the turkeys
19:59:51 <clarkb> and we are at time. Thank you for your time everyone. We'll be back next week
19:59:54 <fungi> thanks clarkb!
19:59:54 <clarkb> #endmeeting