19:00:02 <clarkb> #startmeeting infra 19:00:02 <opendevmeet> Meeting started Tue May 20 19:00:02 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:02 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:02 <opendevmeet> The meeting name has been set to 'infra' 19:00:10 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/VLMXFS7RL6WB3XG26DGRRRLL72WOZ7YM/ Our Agenda 19:00:22 <clarkb> #topic Announcements 19:00:51 <clarkb> I didn't have anything super important to announce 19:01:31 <clarkb> was there anything to announce from anyone else? 19:02:47 <clarkb> Sounds like no 19:02:52 <clarkb> #topic Zuul-launcher image builds 19:03:00 <clarkb> mnasiadka has continued to push this along 19:03:05 <clarkb> #link https://review.opendev.org/c/opendev/zuul-providers/+/949696 Rocky Images 19:03:53 <clarkb> reviewing that is on my todo list, and I think that concludes the image porting from nodepool into zuul-launcher except for gentoo and openeuler 19:04:30 <clarkb> corvus was there anything else to add to this topic? 19:06:10 <clarkb> I guess not. That said there is a related topic of the whole CentOS 10 hardware requirement problem that is probably worth calling out here too 19:06:51 <clarkb> tonyb did some investigating by manually booting nodes in each cloud region we operate within and checking the cpus there. tl;dr is that every cloud but rax classic should support CentOS 10 x86-64-v3 requirements 19:07:08 <clarkb> https://paste.opendev.org/show/827859/ 19:07:23 <clarkb> that also roughly aligns with where we are able to support nested virt labels 19:07:59 <corvus_> on images: we should be running some changes which i intended to cut down on some minor issues we were seeing 19:08:06 <corvus_> but i haven't had a chance to check on that yet 19:08:25 <clarkb> I suggested earlier today that we can suggest testing of centos 10 images (glean, dib, etc) simply rely on those labels for now. Then if/when we get to deploying images directly for centos 10 stream we can give them nested virt only lables 19:08:39 <corvus_> (we were in an image upload loop because we were missing some image builds). i'll check on that soon. 19:08:42 <clarkb> then the typical special label rules apply (use them when necessary, be on the lookout for problems) 19:08:45 <clarkb> corvus_: ack good to know 19:09:40 <clarkb> but then the next steps for centos 10 stream will be getting the testing of centos 10 stream sorted out for glean and dib. Then once we're happy with that we can decide if/how we're deploying images directly with zuul-launcher/nodepool 19:10:00 <clarkb> the glean change for network manager keyfiles neesd some testing updates that mnasiadka offered to make 19:10:23 <clarkb> thats probably step 0, then we can update dib (and do a dib release), then we're considering what to do in zuul-launcher/nodepool 19:10:45 <clarkb> anything else related to image builds? 19:10:52 <corvus> what do you mean "what to do"? 19:11:01 <corvus> like, which one to use? 19:11:05 <corvus> or some other question? 19:11:46 <clarkb> corvus: mostly how we can support centos 10 as a top level label/image within our environment given the hardware requirements 19:12:04 <clarkb> corvus: for testing dib and glean (which build and boot a nested centos 10) I think we can just use nested virt labels for those jobs 19:12:35 <clarkb> but when it comes to adding centos 10 stream into nodepool/zuul-launcher do we want to only add it as a nested virt label, add it as a regularl label only in clouds that can boot it, not add it all, etc 19:13:01 <clarkb> the thought I had about adding it as a nested virt label is it gives you some indication that its different which might be lost if we add it normally and just don't upload to some clouds 19:13:12 <clarkb> but that distinction may not be very important either 19:13:31 <corvus> i see, more policy questions around an image that is supported by 40% of resources 19:13:38 <clarkb> right 19:14:08 <corvus> thx. that's all i have on the topic 19:14:25 <clarkb> #topic Gerrit 3.11 Upgrade Planning 19:14:34 <clarkb> #link https://www.gerritcodereview.com/3.11.html 19:14:49 <clarkb> as mentioned last week if you can pull this up and skim over the notes there for things to be cautious of that would be great 19:14:55 <clarkb> #link https://etherpad.opendev.org/p/gerrit-upgrade-3.11 Planning Document for the eventual Upgrade 19:15:10 <clarkb> I did start trying to collect info in this document but its early so needs more work. But feel free to add concerns into that document 19:15:20 <clarkb> 104.130.253.194 is a held Gerrit 3.11 node for testing purposes. 19:15:38 <clarkb> I did a quick look around ^ and things don't appear very different visually 19:16:09 <clarkb> There are two pre upgrade tasks I would like to get done this week though. The first is updating our images to 3.10.6 and 3.11.3 and the other is moving the images to quay 19:16:14 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/949778 Update Gerrit images to 3.10.6 and 3.11.3 19:16:19 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/882900 Host Gerrit images on quay.io 19:16:36 <clarkb> assuming nothing comes up before tomorrow morning I'll probably start approving things then with plans to restart the service as necessary 19:17:30 <clarkb> which brings us to the last Gerrit related update I had: Shutting down gerrit can race with new changes being indexed. If we shutdown before the change is properly indexed then subsequent pushes can create a new chagne with a different change number using the same change id on the same branch 19:18:12 <clarkb> I discussed this a bunch upstream and Luca says he always reindexes after restarting gerrit and this is the workaround we should be using for now 19:18:51 <clarkb> historically many of our restarts have coincided with version upgrades many of wich automatically reindex so that mitigates things. But when you restart for other reasons or restart to upgrade versions that don't require reindexing you can end up in this trap 19:20:01 <clarkb> I also filed an issue upstream and captured some of mfick's design thoughts on addressing this. Bsaically we write a flag file to disk for the change(s) that have been pushed before they are indexed recording the thread id responsible for indexing the changes. Then we can run a monitor thread that looks for those files that don't have a corresponding running thread in the 19:20:03 <clarkb> system. If it finds them it will reindex the changes listed 19:20:21 <clarkb> I get the sense that no one is going to work on that unless I dig into it myself. Which is fine, but I'm not sure when I'll have time to figure that out 19:20:43 <fungi> i take it they don't want to block graceful shutdown until the pending index queue is empty 19:20:45 <clarkb> so anyway tl;dr we should reindex changes after gerrit restarts which is relevant to updating to 3.10.6 and/or moving to quay 19:21:05 <clarkb> fungi: yes that was the impression I got. I asked if gracful shutdown should be more graceful and got those other suggestions instead 19:21:17 <fungi> fun 19:21:23 <clarkb> fungi: I think the reason for that is apparently there are some other situations that can lead to this 19:21:34 <clarkb> so its best to have the monitoring system that catches things up quickly in all cases not just restarts 19:21:57 <fungi> yeah, i guess that way it's also robust in the face of ungraceful stopping 19:22:01 <clarkb> (they weren't super specific on those conditions but HA gerrit has a system in place to deal with a similar problem for example) 19:22:29 <clarkb> unfortunately the ha system requires comparing indexes between gerrits so doesn't work standalone 19:23:10 <clarkb> overall I feel like we're making slow progress on the 3.11 upgrade, but other things need to be done first just to get them out of the way 19:23:18 <clarkb> so I'll keep pushing on those items hopefully get that done this week 19:23:28 <clarkb> any other gerrit concerns/issues/thoughts? 19:24:15 <clarkb> #topic Upgrading old servers 19:24:37 <clarkb> This is something I wanted to have time for but then got sniped by gerrit related stuff (and other thigns like PBR whcih we can talk about during open discussion) 19:24:50 <clarkb> did anyone else have server upgrade updates? 19:25:04 <tonyb> Not from me 19:25:34 <clarkb> fungi: with everything else going on I'm assuming we have no refstack updates yet 19:25:48 <fungi> nope 19:25:53 <fungi> sorry! 19:25:54 <clarkb> #topic Working through our TODO list 19:25:59 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup 19:26:23 <clarkb> just our regular reminder we've got this list. If you'd like to help out feel free to look it over and send any questions our way. And if we have new things that are back burnered we can add them to the list 19:26:34 <clarkb> But I don't have any updates to this. Just a reminder it exists 19:26:40 <clarkb> #topic Rotating mailman 3 logs 19:26:52 <clarkb> fungi: I think this chagne is still open. I was thinking this would be another good one to get in this week 19:26:57 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/948478 19:27:16 <fungi> yah, happy to self-approve if there's nobody else to review it 19:27:32 <clarkb> I think that would be great. I can help with monitoring and debugging if it comes to that 19:27:53 <fungi> done 19:28:01 <clarkb> #topic opendev.org Matrix Homeserver has been Updated by EMS 19:28:21 <clarkb> on May 12 we got a notice from EMS that they would be upgrading our homeserver which would come with additional authentication requirements 19:28:46 <clarkb> They applied that update on May 19 at about 11:30UTC 19:29:08 <clarkb> since then the matrix eavesdrop bot and gerritbot both continue to operate so I don't think this update impacted the bots 19:29:16 <clarkb> if you notice that either bot does stop working as expected please let us know 19:30:08 <clarkb> #topic OFTC Matrix bridge no longer supporting new users 19:30:31 <clarkb> People have begun noticing that some people communicating on IRC don't end up having their messages sent across the bridge 19:30:49 <clarkb> it appears that anyone with a new nick on IRC using an IRC client is affected 19:30:54 <clarkb> #link https://github.com/matrix-org/matrix-appservice-irc/issues/1851 19:31:38 <clarkb> I suspect this is related to the warning the matrix foundation made about not being able to support the bridges without additional funding. It seems like they may just be allowing things to die on the vine. They are running but issues like this are not addressed/debugged 19:32:06 <clarkb> unfortunately there is no official word on what they are doing with the bridges. But the functionality has definitely degraded in the last few weeks 19:32:22 <clarkb> This is not ideal and I think gives us a few things to consider. 19:33:25 <clarkb> We could go back to suggesting/encouraging people use matrix clients for matrix and irc clients for irc and not try to bridge. We could host a bridge either by paying EMS to do so (this could be a big increase in our homeserver hosts) or running one ourselves. Finally, we could embrace matrix more and move more in that direction 19:33:56 <clarkb> if we do look into hosting our own bridge this blog post has some useful info: https://postmarketos.org/blog/2025/03/31/matrix-bridge-migration/ 19:35:08 <corvus> if there's any appetite to pay ems, i'd be happy to come up with a cost estimate, but it's definitely non-zero, certainly at least like $50/mo. 19:35:21 <fungi> i use an irc client with a matrix protocol plugin, though it's not without its rough edges and shortcomings 19:35:24 <clarkb> Personally, I've been reasonably happy with matrix since we've started using it for some things (like zuul). In particular I think it represents a good compromise between IRC and slack/discord with the important bits for each party being present in matrix. Matrix preserves the openness and federation from IRC. Matrix operates over HTTPS and has clients that are more familar 19:35:26 <clarkb> to those who might want to use slack or discord 19:36:28 <tonyb> I think, hosting our own bridge while we work with the community on moving off of IRC. .... OR at least minimising IRC. That said I dont know what that will look like in terms of the EMS bill 19:37:09 <corvus> given that people are still interested in a better experience for new users for all openinfra projects, i think leaning more into matrix would be a good idea, set a good example, and maybe add some inertia. 19:37:32 <clarkb> ya and I guess to be clear I'm not really suggesting we sign up to migrate openstack for example 19:37:52 <clarkb> but opendev could move its comms into matrix and that is a much more straightforward move. Set an example then maybe others would follow 19:38:15 <corvus> right. i'm hoping to smooth the way. :) 19:38:19 <clarkb> I don't think frickler was able to attend meetings this evening and I know frickler has had thoughts on this in the past. I 'd be curious if any of those opinions have changed 19:38:28 <tonyb> We'd need good docs, as I said I found matrix very confusing to setup, and I'm still not confident ive done it optimally 19:38:53 <fungi> as in picking non-default settings in the element webclient? 19:39:47 <fungi> it does seem like it's tuned more for people coming from slack and discord type interfaces rather than catering to people looking for a more irc-like interface, if that's what you mean 19:39:56 <tonyb> No more fundamental like home server choice 19:40:03 <fungi> oh 19:40:47 <fungi> it seemed to me like the guided new account configuration in element was straightforward and just picked matrix.org by default 19:41:14 <clarkb> I'm not sure there is an optimal choice there. There is easy mode just use matrix.org and there is host your own. The choice will depend on how far into the matrix federated server network you want t ogo 19:41:34 <tonyb> Okay. 19:41:42 <clarkb> I suspect the vast majority of our users would use matrix.org for the simplicity and I dont' think there is naything wrong with that (I do that myself) 19:42:10 <fungi> yeah, it may be that people who go into this expecting complicated instructions are finding those instead of the easy ones 19:42:11 <corvus> it's kind of an interesting question; we're not used to choosing "what kind of autonomy over your online identity do you want?", but with matrix, bluesky, and mastodon, etc, maybe that will change? but yeah, it doesn't have to be hard and we can tell people the easy mode. 19:42:18 <tonyb> Just repeating that I found it nontrivial but that came be dealt with with docs 19:43:09 <tonyb> I worry that I picked easy mode and now I can't update to hard should I want to. 19:43:17 <tonyb> Anyway that's a tangent 19:44:14 <clarkb> and to be claer I don't think we're making any decisions here. I wanted to bring up the observed issue and then what I consider to be viable options for proceeding. I do think this may be a good push in the direction of using matrix which should address problems people have complained about with IRC 19:44:23 <fungi> the simple way is to use the element web client account creation that presents minimal options, and then if you want to set up a different client to connect to that account or create a different account on another homeserver you can do those things later after you have a better grasp of the fundamentals 19:44:33 <clarkb> but yes that potentially brings new problems. However, we've been using matrix a fair bit with zuul and I haven't found anything I would consider a deal breaker 19:45:01 <clarkb> Most of the issues I've had with amtrix have to do with encryption which we would not do for an opendev room 19:45:47 <clarkb> mull it over for the next week and if there are strong opinions or new things to consider we can discuss in more depth next week 19:46:10 <corvus> thanks clarkb ! 19:46:40 <clarkb> I guess the meetbot would not work so we'd stop getting fancy meeting notes until we updated an existing matrix bot or wrote a new one 19:46:52 <clarkb> but I don't think that is a deal breaker particularly if the switch is scoped to opendev 19:47:07 <clarkb> (its one meeting each week and I can even produce notes manually if necessary) 19:47:16 <clarkb> #topic Open Discussion 19:47:36 <clarkb> I didn't think to put this on the agenda but probably should've. PBR has until ~october before setuptools breaks it completely 19:47:51 <clarkb> OpenStack is aware of this and stephen finucan has been looking into workarounds 19:48:20 <clarkb> for OpenDev I'm wondering if we should consider using something like setuptools scm since we don't rely on the "fancier" PBR features like auto version bumping based on commit message content 19:48:47 <clarkb> The main feature we'd lose is the git hash for the commit used to build the package. Which is admittedly a very nice feature to have particularly with the supply chain concerns of 2025 19:49:23 <clarkb> again this is thinking out loud/brainstorming that I want to bring up as there may be better ideas out there that i haven't considered yet. 19:49:42 <clarkb> And we probably can just stick with the status quo and help update PBR so that it works well into the future (something that fungi and I are likely to be doing either way) 19:50:01 <fungi> yeah 19:50:49 <clarkb> but unlike openstack opendev's python tools aren't tightly coupled to pbr features in a way that would prevent us from switching so I thought I'd throw that out there as an option 19:51:19 <clarkb> and we have plenty of time. Consider yourselves all warned and we should keep an eye on this over time and see if we need to change anything and where we can possibly help out 19:51:31 <clarkb> that was all I had. Anything else? 19:51:33 <fungi> unrelated minor item... i've taken an initial pass at moving a bunch of the content from the opendev.org main page into the infra manual, if anyone else is interested in reviewing that pair of changes: 19:51:36 <fungi> #link https://review.opendev.org/c/opendev/infra-manual/+/949924 Add frequently asked questions 19:51:38 <fungi> #link https://review.opendev.org/c/opendev/system-config/+/949939 Remove content duplicated in the Infra Manual FAQ 19:51:47 <corvus> mordred did some related work with poetry (and i extended that a bit to use some dynamic version stuff). just mentioning that as potentially relevant work in the space. 19:52:27 <clarkb> fungi: both changes have plenty of +2's are you looking for more consensus? 19:52:49 <clarkb> just wondering if we should proceed or if you'd like broader input given it affecst the "front page" 19:52:52 <fungi> yeah, just didn't want to approve it if there are outstanding questions folks may have 19:53:19 <fungi> it seems there's at least some support of this direction 19:53:54 <fungi> but that content has been there a long time, so giving a little longer to review isn't a problem, mainly want to make sure everyone is aware it's there 19:54:23 <fungi> aware the changes to move it exist, i mean 19:54:42 <fungi> i don't want this to surprise any regular participants in the meeting, at least 19:55:17 <fungi> so if you don't want that content moved off the main page, please raise objections on the changes asap 19:55:21 <clarkb> ++ 19:56:19 <corvus> is there a preview of the homepage? 19:56:38 <corvus> https://44a737a17d7e744da544-e8c985e942bf44b286d3f5e0d40a9d67.ssl.cf5.rackcdn.com/openstack/c6b69397f8f24c6aa2e4f293a612c23e/bridge99.opendev.org/screenshots/gitea-main.png 19:56:44 <clarkb> https://44a737a17d7e744da544-e8c985e942bf44b286d3f5e0d40a9d67.ssl.cf5.rackcdn.com/openstack/c6b69397f8f24c6aa2e4f293a612c23e/bridge99.opendev.org/screenshots/gitea-main.png 19:56:44 <corvus> i guess? 19:56:46 <clarkb> yup that 19:56:51 <fungi> yes, exactly 19:57:36 <fungi> i'd like to make some updates to the page, but whittling down what's there already is a first step 19:57:43 <corvus> i could leave this as a comment, or make a followup, but since we're discussing... how about cloud donors at the bottom? 19:58:13 <corvus> i don't mean to slight them at all... the bottom of the page is a pretty important part of the page too. :) 19:58:24 <fungi> it was at the bottom, i moved it up the page for better visibility though if others think that's not important then i can split it into a separate change for further discussion 19:58:55 <clarkb> I'm indifferent to that. I think trimming the content on the most important bits (what it is, who supports it, how to contact us) is a good idea though 19:59:18 <fungi> yeah, i'm relatively meh on what order we put those remaining sections in 19:59:26 <clarkb> I think by trimming the content all the content becomes more important so the order is less important to me 19:59:47 <corvus> the new location puts them before the service manifesto or our own contact info 20:00:36 <clarkb> thats a good point. It breaks the content in half I guess 20:00:44 <fungi> also i'm aware the prose could use a little rework after the other content removal in order to flow better, i didn't make any changes to the content that was there and just either left sections or moved them verbatim (but moving the donors list up the page to improve visibility) 20:00:50 <corvus> i don't feel strongly about it, but a progression from "what we do, why we do, how we do" from top to bottom would sort of make sense to me 20:00:58 <corvus> and have some symmetry 20:01:05 <clarkb> ya I think that does make sense from a content flow 20:01:09 <clarkb> but also we are at time 20:01:14 <fungi> wfm, thanks 20:01:22 <corvus> definitely not a -1 :) 20:01:23 <clarkb> thanks everyone. Feel free to continue the discussion in #opendev or on the mailing list 20:01:26 <clarkb> #endmeeting