19:00:02 <clarkb> #startmeeting infra
19:00:02 <opendevmeet> Meeting started Tue May 20 19:00:02 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:02 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:02 <opendevmeet> The meeting name has been set to 'infra'
19:00:10 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/VLMXFS7RL6WB3XG26DGRRRLL72WOZ7YM/ Our Agenda
19:00:22 <clarkb> #topic Announcements
19:00:51 <clarkb> I didn't have anything super important to announce
19:01:31 <clarkb> was there anything to announce from anyone else?
19:02:47 <clarkb> Sounds like no
19:02:52 <clarkb> #topic Zuul-launcher image builds
19:03:00 <clarkb> mnasiadka has continued to push this along
19:03:05 <clarkb> #link https://review.opendev.org/c/opendev/zuul-providers/+/949696 Rocky Images
19:03:53 <clarkb> reviewing that is on my todo list, and I think that concludes the image porting from nodepool into zuul-launcher except for gentoo and openeuler
19:04:30 <clarkb> corvus was there anything else to add to this topic?
19:06:10 <clarkb> I guess not. That said there is a related topic of the whole CentOS 10 hardware requirement problem that is probably worth calling out here too
19:06:51 <clarkb> tonyb did some investigating by manually booting nodes in each cloud region we operate within and checking the cpus there. tl;dr is that every cloud but rax classic should support CentOS 10 x86-64-v3 requirements
19:07:08 <clarkb> https://paste.opendev.org/show/827859/
19:07:23 <clarkb> that also roughly aligns with where we are able to support nested virt labels
19:07:59 <corvus_> on images: we should be running some changes which i intended to cut down on some minor issues we were seeing
19:08:06 <corvus_> but i haven't had a chance to check on that yet
19:08:25 <clarkb> I suggested earlier today that we can suggest testing of centos 10 images (glean, dib, etc) simply rely on those labels for now. Then if/when we get to deploying images directly for centos 10 stream we can give them nested virt only lables
19:08:39 <corvus_> (we were in an image upload loop because we were missing some image builds).  i'll check on that soon.
19:08:42 <clarkb> then the typical special label rules apply (use them when necessary, be on the lookout for problems)
19:08:45 <clarkb> corvus_: ack good to know
19:09:40 <clarkb> but then the next steps for centos 10 stream will be getting the testing of centos 10 stream sorted out for glean and dib. Then once we're happy with that we can decide if/how we're deploying images directly with zuul-launcher/nodepool
19:10:00 <clarkb> the glean change for network manager keyfiles neesd some testing updates that mnasiadka offered to make
19:10:23 <clarkb> thats probably step 0, then we can update dib (and do a dib release), then we're considering what to do in zuul-launcher/nodepool
19:10:45 <clarkb> anything else related to image builds?
19:10:52 <corvus> what do you mean "what to do"?
19:11:01 <corvus> like, which one to use?
19:11:05 <corvus> or some other question?
19:11:46 <clarkb> corvus: mostly how we can support centos 10 as a top level label/image within our environment given the hardware requirements
19:12:04 <clarkb> corvus: for testing dib and glean (which build and boot a nested centos 10) I think we can just use nested virt labels for those jobs
19:12:35 <clarkb> but when it comes to adding centos 10 stream into nodepool/zuul-launcher do we want to only add it as a nested virt label, add it as a regularl label only in clouds that can boot it, not add it all, etc
19:13:01 <clarkb> the thought I had about adding it as a nested virt label is it gives you some indication that its different which might be lost if we add it normally and just don't upload to some clouds
19:13:12 <clarkb> but that distinction may not be very important either
19:13:31 <corvus> i see, more policy questions around an image that is supported by 40% of resources
19:13:38 <clarkb> right
19:14:08 <corvus> thx. that's all i have on the topic
19:14:25 <clarkb> #topic Gerrit 3.11 Upgrade Planning
19:14:34 <clarkb> #link https://www.gerritcodereview.com/3.11.html
19:14:49 <clarkb> as mentioned last week if you can pull this up and skim over the notes there for things to be cautious of that would be great
19:14:55 <clarkb> #link https://etherpad.opendev.org/p/gerrit-upgrade-3.11 Planning Document for the eventual Upgrade
19:15:10 <clarkb> I did start trying to collect info in this document but its early so needs more work. But feel free to add concerns into that document
19:15:20 <clarkb> 104.130.253.194 is a held Gerrit 3.11 node for testing purposes.
19:15:38 <clarkb> I did a quick look around ^ and things don't appear very different visually
19:16:09 <clarkb> There are two pre upgrade tasks I would like to get done this week though. The first is updating our images to 3.10.6 and 3.11.3 and the other is moving the images to quay
19:16:14 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/949778 Update Gerrit images to 3.10.6 and 3.11.3
19:16:19 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/882900 Host Gerrit images on quay.io
19:16:36 <clarkb> assuming nothing comes up before tomorrow morning I'll probably start approving things then with plans to restart the service as necessary
19:17:30 <clarkb> which brings us to the last Gerrit related update I had: Shutting down gerrit can race with new changes being indexed. If we shutdown before the change is properly indexed then subsequent pushes can create a new chagne with a different change number using the same change id on the same branch
19:18:12 <clarkb> I discussed this a bunch upstream and Luca says he always reindexes after restarting gerrit and this is the workaround we should be using for now
19:18:51 <clarkb> historically many of our restarts have coincided with version upgrades many of wich automatically reindex so that mitigates things. But when you restart for other reasons or restart to upgrade versions that don't require reindexing you can end up in this trap
19:20:01 <clarkb> I also filed an issue upstream and captured some of mfick's design thoughts on addressing this. Bsaically we write a flag file to disk for the change(s) that have been pushed before they are indexed recording the thread id responsible for indexing the changes. Then we can run a monitor thread that looks for those files that don't have a corresponding running thread in the
19:20:03 <clarkb> system. If it finds them it will reindex the changes listed
19:20:21 <clarkb> I get the sense that no one is going to work on that unless I dig into it myself. Which is fine, but I'm not sure when I'll have time to figure that out
19:20:43 <fungi> i take it they don't want to block graceful shutdown until the pending index queue is empty
19:20:45 <clarkb> so anyway tl;dr we should reindex changes after gerrit restarts which is relevant to updating to 3.10.6 and/or moving to quay
19:21:05 <clarkb> fungi: yes that was the impression I got. I asked if gracful shutdown should be more graceful and got those other suggestions instead
19:21:17 <fungi> fun
19:21:23 <clarkb> fungi: I think the reason for that is apparently there are some other situations that can lead to this
19:21:34 <clarkb> so its best to have the monitoring system that catches things up quickly in all cases not just restarts
19:21:57 <fungi> yeah, i guess that way it's also robust in the face of ungraceful stopping
19:22:01 <clarkb> (they weren't super specific on those conditions but HA gerrit has a system in place to deal with a similar problem for example)
19:22:29 <clarkb> unfortunately the ha system requires comparing indexes between gerrits so doesn't work standalone
19:23:10 <clarkb> overall I feel like we're making slow progress on the 3.11 upgrade, but other things need to be done first just to get them out of the way
19:23:18 <clarkb> so I'll keep pushing on those items hopefully get that done this week
19:23:28 <clarkb> any other gerrit concerns/issues/thoughts?
19:24:15 <clarkb> #topic Upgrading old servers
19:24:37 <clarkb> This is something I wanted to have time for but then got sniped by gerrit related stuff (and other thigns like PBR whcih we can talk about during open discussion)
19:24:50 <clarkb> did anyone else have server upgrade updates?
19:25:04 <tonyb> Not from me
19:25:34 <clarkb> fungi: with everything else going on I'm assuming we have no refstack updates yet
19:25:48 <fungi> nope
19:25:53 <fungi> sorry!
19:25:54 <clarkb> #topic Working through our TODO list
19:25:59 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup
19:26:23 <clarkb> just our regular reminder we've got this list. If you'd like to help out feel free to look it over and send any questions our way. And if we have new things that are back burnered we can add them to the list
19:26:34 <clarkb> But I don't have any updates to this. Just a reminder it exists
19:26:40 <clarkb> #topic Rotating mailman 3 logs
19:26:52 <clarkb> fungi: I think this chagne is still open. I was thinking this would be another good one to get in this week
19:26:57 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/948478
19:27:16 <fungi> yah, happy to self-approve if there's nobody else to review it
19:27:32 <clarkb> I think that would be great. I can help with monitoring and debugging if it comes to that
19:27:53 <fungi> done
19:28:01 <clarkb> #topic opendev.org Matrix Homeserver has been Updated by EMS
19:28:21 <clarkb> on May 12 we got a notice from EMS that they would be upgrading our homeserver which would come with additional authentication requirements
19:28:46 <clarkb> They applied that update on May 19 at about 11:30UTC
19:29:08 <clarkb> since then the matrix eavesdrop bot and gerritbot both continue to operate so I don't think this update impacted the bots
19:29:16 <clarkb> if you notice that either bot does stop working as expected please let us know
19:30:08 <clarkb> #topic OFTC Matrix bridge no longer supporting new users
19:30:31 <clarkb> People have begun noticing that some people communicating on IRC don't end up having their messages sent across the bridge
19:30:49 <clarkb> it appears that anyone with a new nick on IRC using an IRC client is affected
19:30:54 <clarkb> #link https://github.com/matrix-org/matrix-appservice-irc/issues/1851
19:31:38 <clarkb> I suspect this is related to the warning the matrix foundation made about not being able to support the bridges without additional funding. It seems like they may just be allowing things to die on the vine. They are running but issues like this are not addressed/debugged
19:32:06 <clarkb> unfortunately there is no official word on what they are doing with the bridges. But the functionality has definitely degraded in the last few weeks
19:32:22 <clarkb> This is not ideal and I think gives us a few things to consider.
19:33:25 <clarkb> We could go back to suggesting/encouraging people use matrix clients for matrix and irc clients for irc and not try to bridge. We could host a bridge either by paying EMS to do so (this could be a big increase in our homeserver hosts) or running one ourselves. Finally, we could embrace matrix more and move more in that direction
19:33:56 <clarkb> if we do look into hosting our own bridge this blog post has some useful info: https://postmarketos.org/blog/2025/03/31/matrix-bridge-migration/
19:35:08 <corvus> if there's any appetite to pay ems, i'd be happy to come up with a cost estimate, but it's definitely non-zero, certainly at least like $50/mo.
19:35:21 <fungi> i use an irc client with a matrix protocol plugin, though it's not without its rough edges and shortcomings
19:35:24 <clarkb> Personally, I've been reasonably happy with matrix since we've started using it for some things (like zuul). In particular I think it represents a good compromise between IRC and slack/discord with the important bits for each party being present in matrix. Matrix preserves the openness and federation from IRC. Matrix operates over HTTPS and has clients that are more familar
19:35:26 <clarkb> to those who might want to use slack or discord
19:36:28 <tonyb> I think, hosting our own bridge while we work with the community on moving off of IRC. .... OR at least minimising IRC.   That said I dont know what that will look like in terms of the EMS bill
19:37:09 <corvus> given that people are still interested in a better experience for new users for all openinfra projects, i think leaning more into matrix would be a good idea, set a good example, and maybe add some inertia.
19:37:32 <clarkb> ya and I guess to be clear I'm not really suggesting we sign up to migrate openstack for example
19:37:52 <clarkb> but opendev could move its comms into matrix and that is a much more straightforward move. Set an example then maybe others would follow
19:38:15 <corvus> right.  i'm hoping to smooth the way.  :)
19:38:19 <clarkb> I don't think frickler was able to attend meetings this evening and I know frickler has had thoughts on this in the past. I 'd be curious if any of those opinions have changed
19:38:28 <tonyb> We'd need good docs, as I said I found matrix very confusing to setup, and I'm still not confident ive done it optimally
19:38:53 <fungi> as in picking non-default settings in the element webclient?
19:39:47 <fungi> it does seem like it's tuned more for people coming from slack and discord type interfaces rather than catering to people looking for a more irc-like interface, if that's what you mean
19:39:56 <tonyb> No more fundamental like home server choice
19:40:03 <fungi> oh
19:40:47 <fungi> it seemed to me like the guided new account configuration in element was straightforward and just picked matrix.org by default
19:41:14 <clarkb> I'm not sure there is an optimal choice there. There is easy mode just use matrix.org and there is host your own. The choice will depend on how far into the matrix federated server network you want t ogo
19:41:34 <tonyb> Okay.
19:41:42 <clarkb> I suspect the vast majority of our users would use matrix.org for the simplicity and I dont' think there is naything wrong with that (I do that myself)
19:42:10 <fungi> yeah, it may be that people who go into this expecting complicated instructions are finding those instead of the easy ones
19:42:11 <corvus> it's kind of an interesting question; we're not used to choosing "what kind of autonomy over your online identity do you want?", but with matrix, bluesky, and mastodon, etc, maybe that will change?  but yeah, it doesn't have to be hard and we can tell people the easy mode.
19:42:18 <tonyb> Just repeating that I found it nontrivial but that came be dealt with with docs
19:43:09 <tonyb> I worry that I picked easy mode and now I can't update to hard should I want to.
19:43:17 <tonyb> Anyway that's a tangent
19:44:14 <clarkb> and to be claer I don't think we're making any decisions here. I wanted to bring up the observed issue and then what I consider to be viable options for proceeding. I do think this may be a good push in the direction of using matrix which should address problems people have complained about with IRC
19:44:23 <fungi> the simple way is to use the element web client account creation that presents minimal options, and then if you want to set up a different client to connect to that account or create a different account on another homeserver you can do those things later after you have a better grasp of the fundamentals
19:44:33 <clarkb> but yes that potentially brings new problems. However, we've been using matrix a fair bit with zuul and I haven't found anything I would consider a deal breaker
19:45:01 <clarkb> Most of the issues I've had with amtrix have to do with encryption which we would not do for an opendev room
19:45:47 <clarkb> mull it over for the next week and if there are strong opinions or new things to consider we can discuss in more depth next week
19:46:10 <corvus> thanks clarkb !
19:46:40 <clarkb> I guess the meetbot would not work so we'd stop getting fancy meeting notes until we updated an existing matrix bot or wrote a new one
19:46:52 <clarkb> but I don't think that is a deal breaker particularly if the switch is scoped to opendev
19:47:07 <clarkb> (its one meeting each week and I can even produce notes manually if necessary)
19:47:16 <clarkb> #topic Open Discussion
19:47:36 <clarkb> I didn't think to put this on the agenda but probably should've. PBR has until ~october before setuptools breaks it completely
19:47:51 <clarkb> OpenStack is aware of this and stephen finucan has been looking into workarounds
19:48:20 <clarkb> for OpenDev I'm wondering if we should consider using something like setuptools scm since we don't rely on the "fancier" PBR features like auto version bumping based on commit message content
19:48:47 <clarkb> The main feature we'd lose is the git hash for the commit used to build the package. Which is admittedly a very nice feature to have particularly with the supply chain concerns of 2025
19:49:23 <clarkb> again this is thinking out loud/brainstorming that I want to bring up as there may be better ideas out there that i haven't considered yet.
19:49:42 <clarkb> And we probably can just stick with the status quo and help update PBR so that it works well into the future (something that fungi and I are likely to be doing either way)
19:50:01 <fungi> yeah
19:50:49 <clarkb> but unlike openstack opendev's python tools aren't tightly coupled to pbr features in a way that would prevent us from switching so I thought I'd throw that out there as an option
19:51:19 <clarkb> and we have plenty of time. Consider yourselves all warned and we should keep an eye on this over time and see if we need to change anything and where we can possibly help out
19:51:31 <clarkb> that was all I had. Anything else?
19:51:33 <fungi> unrelated minor item... i've taken an initial pass at moving a bunch of the content from the opendev.org main page into the infra manual, if anyone else is interested in reviewing that pair of changes:
19:51:36 <fungi> #link https://review.opendev.org/c/opendev/infra-manual/+/949924 Add frequently asked questions
19:51:38 <fungi> #link https://review.opendev.org/c/opendev/system-config/+/949939 Remove content duplicated in the Infra Manual FAQ
19:51:47 <corvus> mordred did some related work with poetry (and i extended that a bit to use some dynamic version stuff).  just mentioning that as potentially relevant work in the space.
19:52:27 <clarkb> fungi: both changes have plenty of +2's are you looking for more consensus?
19:52:49 <clarkb> just wondering if we should proceed or if you'd like broader input given it affecst the "front page"
19:52:52 <fungi> yeah, just didn't want to approve it if there are outstanding questions folks may have
19:53:19 <fungi> it seems there's at least some support of this direction
19:53:54 <fungi> but that content has been there a long time, so giving a little longer to review isn't a problem, mainly want to make sure everyone is aware it's there
19:54:23 <fungi> aware the changes to move it exist, i mean
19:54:42 <fungi> i don't want this to surprise any regular participants in the meeting, at least
19:55:17 <fungi> so if you don't want that content moved off the main page, please raise objections on the changes asap
19:55:21 <clarkb> ++
19:56:19 <corvus> is there a preview of the homepage?
19:56:38 <corvus> https://44a737a17d7e744da544-e8c985e942bf44b286d3f5e0d40a9d67.ssl.cf5.rackcdn.com/openstack/c6b69397f8f24c6aa2e4f293a612c23e/bridge99.opendev.org/screenshots/gitea-main.png
19:56:44 <clarkb> https://44a737a17d7e744da544-e8c985e942bf44b286d3f5e0d40a9d67.ssl.cf5.rackcdn.com/openstack/c6b69397f8f24c6aa2e4f293a612c23e/bridge99.opendev.org/screenshots/gitea-main.png
19:56:44 <corvus> i guess?
19:56:46 <clarkb> yup that
19:56:51 <fungi> yes, exactly
19:57:36 <fungi> i'd like to make some updates to the page, but whittling down what's there already is a first step
19:57:43 <corvus> i could leave this as a comment, or make a followup, but since we're discussing... how about cloud donors at the bottom?
19:58:13 <corvus> i don't mean to slight them at all... the bottom of the page is a pretty important part of the page too.  :)
19:58:24 <fungi> it was at the bottom, i moved it up the page for better visibility though if others think that's not important then i can split it into a separate change for further discussion
19:58:55 <clarkb> I'm indifferent to that. I think trimming the content on the most important bits (what it is, who supports it, how to contact us) is a good idea though
19:59:18 <fungi> yeah, i'm relatively meh on what order we put those remaining sections in
19:59:26 <clarkb> I think by trimming the content all the content becomes more important so the order is less important to me
19:59:47 <corvus> the new location puts them before the service manifesto or our own contact info
20:00:36 <clarkb> thats a good point. It breaks the content in half I guess
20:00:44 <fungi> also i'm aware the prose could use a little rework after the other content removal in order to flow better, i didn't make any changes to the content that was there and just either left sections or moved them verbatim (but moving the donors list up the page to improve visibility)
20:00:50 <corvus> i don't feel strongly about it, but a progression from "what we do, why we do, how we do" from top to bottom would sort of make sense to me
20:00:58 <corvus> and have some symmetry
20:01:05 <clarkb> ya I think that does make sense from a content flow
20:01:09 <clarkb> but also we are at time
20:01:14 <fungi> wfm, thanks
20:01:22 <corvus> definitely not a -1 :)
20:01:23 <clarkb> thanks everyone. Feel free to continue the discussion in #opendev or on the mailing list
20:01:26 <clarkb> #endmeeting