Tuesday, 2025-05-20

tonybI am around for the meeting but I'm multitasking (driving) so I'll be laggy and sporadic 18:48
clarkbyou should probably focus on driving and do that safely18:48
fungiyes, that18:49
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue May 20 19:00:02 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/VLMXFS7RL6WB3XG26DGRRRLL72WOZ7YM/ Our Agenda19:00
clarkb#topic Announcements19:00
clarkbI didn't have anything super important to announce19:00
clarkbwas there anything to announce from anyone else?19:01
clarkbSounds like no19:02
clarkb#topic Zuul-launcher image builds19:02
clarkbmnasiadka has continued to push this along19:03
clarkb#link https://review.opendev.org/c/opendev/zuul-providers/+/949696 Rocky Images19:03
clarkbreviewing that is on my todo list, and I think that concludes the image porting from nodepool into zuul-launcher except for gentoo and openeuler19:03
clarkbcorvus was there anything else to add to this topic?19:04
clarkbI guess not. That said there is a related topic of the whole CentOS 10 hardware requirement problem that is probably worth calling out here too19:06
clarkbtonyb did some investigating by manually booting nodes in each cloud region we operate within and checking the cpus there. tl;dr is that every cloud but rax classic should support CentOS 10 x86-64-v3 requirements19:06
clarkbhttps://paste.opendev.org/show/827859/19:07
clarkbthat also roughly aligns with where we are able to support nested virt labels19:07
corvus_on images: we should be running some changes which i intended to cut down on some minor issues we were seeing19:07
corvus_but i haven't had a chance to check on that yet19:08
clarkbI suggested earlier today that we can suggest testing of centos 10 images (glean, dib, etc) simply rely on those labels for now. Then if/when we get to deploying images directly for centos 10 stream we can give them nested virt only lables19:08
corvus_(we were in an image upload loop because we were missing some image builds).  i'll check on that soon.19:08
clarkbthen the typical special label rules apply (use them when necessary, be on the lookout for problems)19:08
clarkbcorvus_: ack good to know19:08
clarkbbut then the next steps for centos 10 stream will be getting the testing of centos 10 stream sorted out for glean and dib. Then once we're happy with that we can decide if/how we're deploying images directly with zuul-launcher/nodepool19:09
clarkbthe glean change for network manager keyfiles neesd some testing updates that mnasiadka offered to make19:10
clarkbthats probably step 0, then we can update dib (and do a dib release), then we're considering what to do in zuul-launcher/nodepool19:10
clarkbanything else related to image builds?19:10
corvuswhat do you mean "what to do"?19:10
corvuslike, which one to use?19:11
corvusor some other question?19:11
clarkbcorvus: mostly how we can support centos 10 as a top level label/image within our environment given the hardware requirements19:11
clarkbcorvus: for testing dib and glean (which build and boot a nested centos 10) I think we can just use nested virt labels for those jobs19:12
clarkbbut when it comes to adding centos 10 stream into nodepool/zuul-launcher do we want to only add it as a nested virt label, add it as a regularl label only in clouds that can boot it, not add it all, etc19:12
clarkbthe thought I had about adding it as a nested virt label is it gives you some indication that its different which might be lost if we add it normally and just don't upload to some clouds19:13
clarkbbut that distinction may not be very important either19:13
corvusi see, more policy questions around an image that is supported by 40% of resources19:13
clarkbright19:13
corvusthx. that's all i have on the topic19:14
clarkb#topic Gerrit 3.11 Upgrade Planning19:14
clarkb#link https://www.gerritcodereview.com/3.11.html19:14
clarkbas mentioned last week if you can pull this up and skim over the notes there for things to be cautious of that would be great19:14
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.11 Planning Document for the eventual Upgrade19:14
clarkbI did start trying to collect info in this document but its early so needs more work. But feel free to add concerns into that document19:15
clarkb104.130.253.194 is a held Gerrit 3.11 node for testing purposes.19:15
clarkbI did a quick look around ^ and things don't appear very different visually19:15
clarkbThere are two pre upgrade tasks I would like to get done this week though. The first is updating our images to 3.10.6 and 3.11.3 and the other is moving the images to quay19:16
clarkb#link https://review.opendev.org/c/opendev/system-config/+/949778 Update Gerrit images to 3.10.6 and 3.11.319:16
clarkb#link https://review.opendev.org/c/opendev/system-config/+/882900 Host Gerrit images on quay.io19:16
clarkbassuming nothing comes up before tomorrow morning I'll probably start approving things then with plans to restart the service as necessary19:16
clarkbwhich brings us to the last Gerrit related update I had: Shutting down gerrit can race with new changes being indexed. If we shutdown before the change is properly indexed then subsequent pushes can create a new chagne with a different change number using the same change id on the same branch19:17
clarkbI discussed this a bunch upstream and Luca says he always reindexes after restarting gerrit and this is the workaround we should be using for now19:18
clarkbhistorically many of our restarts have coincided with version upgrades many of wich automatically reindex so that mitigates things. But when you restart for other reasons or restart to upgrade versions that don't require reindexing you can end up in this trap19:18
clarkbI also filed an issue upstream and captured some of mfick's design thoughts on addressing this. Bsaically we write a flag file to disk for the change(s) that have been pushed before they are indexed recording the thread id responsible for indexing the changes. Then we can run a monitor thread that looks for those files that don't have a corresponding running thread in the19:20
clarkbsystem. If it finds them it will reindex the changes listed19:20
clarkbI get the sense that no one is going to work on that unless I dig into it myself. Which is fine, but I'm not sure when I'll have time to figure that out19:20
fungii take it they don't want to block graceful shutdown until the pending index queue is empty19:20
clarkbso anyway tl;dr we should reindex changes after gerrit restarts which is relevant to updating to 3.10.6 and/or moving to quay19:20
clarkbfungi: yes that was the impression I got. I asked if gracful shutdown should be more graceful and got those other suggestions instead19:21
fungifun19:21
clarkbfungi: I think the reason for that is apparently there are some other situations that can lead to this19:21
clarkbso its best to have the monitoring system that catches things up quickly in all cases not just restarts19:21
fungiyeah, i guess that way it's also robust in the face of ungraceful stopping19:21
clarkb(they weren't super specific on those conditions but HA gerrit has a system in place to deal with a similar problem for example)19:22
clarkbunfortunately the ha system requires comparing indexes between gerrits so doesn't work standalone19:22
clarkboverall I feel like we're making slow progress on the 3.11 upgrade, but other things need to be done first just to get them out of the way19:23
clarkbso I'll keep pushing on those items hopefully get that done this week19:23
clarkbany other gerrit concerns/issues/thoughts?19:23
clarkb#topic Upgrading old servers19:24
clarkbThis is something I wanted to have time for but then got sniped by gerrit related stuff (and other thigns like PBR whcih we can talk about during open discussion)19:24
clarkbdid anyone else have server upgrade updates?19:24
tonybNot from me19:25
clarkbfungi: with everything else going on I'm assuming we have no refstack updates yet19:25
funginope19:25
fungisorry!19:25
clarkb#topic Working through our TODO list19:25
clarkb#link https://etherpad.opendev.org/p/opendev-january-2025-meetup19:25
clarkbjust our regular reminder we've got this list. If you'd like to help out feel free to look it over and send any questions our way. And if we have new things that are back burnered we can add them to the list19:26
clarkbBut I don't have any updates to this. Just a reminder it exists19:26
clarkb#topic Rotating mailman 3 logs19:26
clarkbfungi: I think this chagne is still open. I was thinking this would be another good one to get in this week19:26
clarkb#link https://review.opendev.org/c/opendev/system-config/+/94847819:26
fungiyah, happy to self-approve if there's nobody else to review it19:27
clarkbI think that would be great. I can help with monitoring and debugging if it comes to that19:27
fungidone19:27
clarkb#topic opendev.org Matrix Homeserver has been Updated by EMS19:28
clarkbon May 12 we got a notice from EMS that they would be upgrading our homeserver which would come with additional authentication requirements19:28
clarkbThey applied that update on May 19 at about 11:30UTC19:28
clarkbsince then the matrix eavesdrop bot and gerritbot both continue to operate so I don't think this update impacted the bots19:29
clarkbif you notice that either bot does stop working as expected please let us know19:29
clarkb#topic OFTC Matrix bridge no longer supporting new users19:30
clarkbPeople have begun noticing that some people communicating on IRC don't end up having their messages sent across the bridge19:30
clarkbit appears that anyone with a new nick on IRC using an IRC client is affected19:30
clarkb#link https://github.com/matrix-org/matrix-appservice-irc/issues/185119:30
clarkbI suspect this is related to the warning the matrix foundation made about not being able to support the bridges without additional funding. It seems like they may just be allowing things to die on the vine. They are running but issues like this are not addressed/debugged19:31
clarkbunfortunately there is no official word on what they are doing with the bridges. But the functionality has definitely degraded in the last few weeks19:32
clarkbThis is not ideal and I think gives us a few things to consider.19:32
clarkbWe could go back to suggesting/encouraging people use matrix clients for matrix and irc clients for irc and not try to bridge. We could host a bridge either by paying EMS to do so (this could be a big increase in our homeserver hosts) or running one ourselves. Finally, we could embrace matrix more and move more in that direction19:33
clarkbif we do look into hosting our own bridge this blog post has some useful info: https://postmarketos.org/blog/2025/03/31/matrix-bridge-migration/19:33
corvusif there's any appetite to pay ems, i'd be happy to come up with a cost estimate, but it's definitely non-zero, certainly at least like $50/mo.19:35
fungii use an irc client with a matrix protocol plugin, though it's not without its rough edges and shortcomings19:35
clarkbPersonally, I've been reasonably happy with matrix since we've started using it for some things (like zuul). In particular I think it represents a good compromise between IRC and slack/discord with the important bits for each party being present in matrix. Matrix preserves the openness and federation from IRC. Matrix operates over HTTPS and has clients that are more familar19:35
clarkbto those who might want to use slack or discord19:35
tonybI think, hosting our own bridge while we work with the community on moving off of IRC. .... OR at least minimising IRC.   That said I dont know what that will look like in terms of the EMS bill19:36
corvusgiven that people are still interested in a better experience for new users for all openinfra projects, i think leaning more into matrix would be a good idea, set a good example, and maybe add some inertia.19:37
clarkbya and I guess to be clear I'm not really suggesting we sign up to migrate openstack for example19:37
clarkbbut opendev could move its comms into matrix and that is a much more straightforward move. Set an example then maybe others would follow19:37
corvusright.  i'm hoping to smooth the way.  :)19:38
clarkbI don't think frickler was able to attend meetings this evening and I know frickler has had thoughts on this in the past. I 'd be curious if any of those opinions have changed19:38
tonybWe'd need good docs, as I said I found matrix very confusing to setup, and I'm still not confident ive done it optimally 19:38
fungias in picking non-default settings in the element webclient?19:38
fungiit does seem like it's tuned more for people coming from slack and discord type interfaces rather than catering to people looking for a more irc-like interface, if that's what you mean19:39
tonybNo more fundamental like home server choice 19:39
fungioh19:40
fungiit seemed to me like the guided new account configuration in element was straightforward and just picked matrix.org by default19:40
clarkbI'm not sure there is an optimal choice there. There is easy mode just use matrix.org and there is host your own. The choice will depend on how far into the matrix federated server network you want t ogo19:41
tonybOkay.19:41
clarkbI suspect the vast majority of our users would use matrix.org for the simplicity and I dont' think there is naything wrong with that (I do that myself)19:41
fungiyeah, it may be that people who go into this expecting complicated instructions are finding those instead of the easy ones19:42
corvusit's kind of an interesting question; we're not used to choosing "what kind of autonomy over your online identity do you want?", but with matrix, bluesky, and mastodon, etc, maybe that will change?  but yeah, it doesn't have to be hard and we can tell people the easy mode.19:42
tonybJust repeating that I found it nontrivial but that came be dealt with with docs19:42
tonybI worry that I picked easy mode and now I can't update to hard should I want to.19:43
tonybAnyway that's a tangent 19:43
clarkband to be claer I don't think we're making any decisions here. I wanted to bring up the observed issue and then what I consider to be viable options for proceeding. I do think this may be a good push in the direction of using matrix which should address problems people have complained about with IRC19:44
fungithe simple way is to use the element web client account creation that presents minimal options, and then if you want to set up a different client to connect to that account or create a different account on another homeserver you can do those things later after you have a better grasp of the fundamentals19:44
clarkbbut yes that potentially brings new problems. However, we've been using matrix a fair bit with zuul and I haven't found anything I would consider a deal breaker19:44
clarkbMost of the issues I've had with amtrix have to do with encryption which we would not do for an opendev room19:45
clarkbmull it over for the next week and if there are strong opinions or new things to consider we can discuss in more depth next week19:45
corvusthanks clarkb !19:46
clarkbI guess the meetbot would not work so we'd stop getting fancy meeting notes until we updated an existing matrix bot or wrote a new one19:46
clarkbbut I don't think that is a deal breaker particularly if the switch is scoped to opendev19:46
clarkb(its one meeting each week and I can even produce notes manually if necessary)19:47
clarkb#topic Open Discussion19:47
clarkbI didn't think to put this on the agenda but probably should've. PBR has until ~october before setuptools breaks it completely19:47
clarkbOpenStack is aware of this and stephen finucan has been looking into workarounds19:47
clarkbfor OpenDev I'm wondering if we should consider using something like setuptools scm since we don't rely on the "fancier" PBR features like auto version bumping based on commit message content19:48
clarkbThe main feature we'd lose is the git hash for the commit used to build the package. Which is admittedly a very nice feature to have particularly with the supply chain concerns of 202519:48
clarkbagain this is thinking out loud/brainstorming that I want to bring up as there may be better ideas out there that i haven't considered yet.19:49
clarkbAnd we probably can just stick with the status quo and help update PBR so that it works well into the future (something that fungi and I are likely to be doing either way)19:49
fungiyeah19:50
clarkbbut unlike openstack opendev's python tools aren't tightly coupled to pbr features in a way that would prevent us from switching so I thought I'd throw that out there as an option19:50
clarkband we have plenty of time. Consider yourselves all warned and we should keep an eye on this over time and see if we need to change anything and where we can possibly help out19:51
clarkbthat was all I had. Anything else?19:51
fungiunrelated minor item... i've taken an initial pass at moving a bunch of the content from the opendev.org main page into the infra manual, if anyone else is interested in reviewing that pair of changes:19:51
fungi#link https://review.opendev.org/c/opendev/infra-manual/+/949924 Add frequently asked questions19:51
fungi#link https://review.opendev.org/c/opendev/system-config/+/949939 Remove content duplicated in the Infra Manual FAQ19:51
corvusmordred did some related work with poetry (and i extended that a bit to use some dynamic version stuff).  just mentioning that as potentially relevant work in the space.19:51
clarkbfungi: both changes have plenty of +2's are you looking for more consensus?19:52
clarkbjust wondering if we should proceed or if you'd like broader input given it affecst the "front page"19:52
fungiyeah, just didn't want to approve it if there are outstanding questions folks may have19:52
fungiit seems there's at least some support of this direction19:53
fungibut that content has been there a long time, so giving a little longer to review isn't a problem, mainly want to make sure everyone is aware it's there19:53
fungiaware the changes to move it exist, i mean19:54
fungii don't want this to surprise any regular participants in the meeting, at least19:54
fungiso if you don't want that content moved off the main page, please raise objections on the changes asap19:55
clarkb++19:55
corvusis there a preview of the homepage?19:56
corvushttps://44a737a17d7e744da544-e8c985e942bf44b286d3f5e0d40a9d67.ssl.cf5.rackcdn.com/openstack/c6b69397f8f24c6aa2e4f293a612c23e/bridge99.opendev.org/screenshots/gitea-main.png19:56
clarkbhttps://44a737a17d7e744da544-e8c985e942bf44b286d3f5e0d40a9d67.ssl.cf5.rackcdn.com/openstack/c6b69397f8f24c6aa2e4f293a612c23e/bridge99.opendev.org/screenshots/gitea-main.png19:56
corvusi guess?19:56
clarkbyup that19:56
fungiyes, exactly19:56
fungii'd like to make some updates to the page, but whittling down what's there already is a first step19:57
corvusi could leave this as a comment, or make a followup, but since we're discussing... how about cloud donors at the bottom?19:57
corvusi don't mean to slight them at all... the bottom of the page is a pretty important part of the page too.  :)19:58
fungiit was at the bottom, i moved it up the page for better visibility though if others think that's not important then i can split it into a separate change for further discussion19:58
clarkbI'm indifferent to that. I think trimming the content on the most important bits (what it is, who supports it, how to contact us) is a good idea though19:58
fungiyeah, i'm relatively meh on what order we put those remaining sections in19:59
clarkbI think by trimming the content all the content becomes more important so the order is less important to me19:59
corvusthe new location puts them before the service manifesto or our own contact info19:59
clarkbthats a good point. It breaks the content in half I guess20:00
fungialso i'm aware the prose could use a little rework after the other content removal in order to flow better, i didn't make any changes to the content that was there and just either left sections or moved them verbatim (but moving the donors list up the page to improve visibility)20:00
corvusi don't feel strongly about it, but a progression from "what we do, why we do, how we do" from top to bottom would sort of make sense to me20:00
corvusand have some symmetry20:00
clarkbya I think that does make sense from a content flow20:01
clarkbbut also we are at time20:01
fungiwfm, thanks20:01
corvusdefinitely not a -1 :)20:01
clarkbthanks everyone. Feel free to continue the discussion in #opendev or on the mailing list20:01
clarkb#endmeeting20:01
opendevmeetMeeting ended Tue May 20 20:01:26 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:01
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2025/infra.2025-05-20-19.00.html20:01
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-05-20-19.00.txt20:01
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2025/infra.2025-05-20-19.00.log.html20:01
corvusthanks!20:01

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!