Tuesday, 2025-05-27

clarkbmeeting time19:00
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue May 27 19:00:26 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/WTHTKBQ5IUYSAX6ITU7F46PBDATVMYCU/ Our Agenda19:00
clarkb#topic Announcements19:01
clarkbI don't have anything to announce. Did anyone else have something?19:01
clarkbsounds like no. We can probably continue in that case19:03
clarkb#topic Zuul-launcher image builds19:03
clarkb#link https://review.opendev.org/c/opendev/zuul-providers/+/949696 Rocky Images19:03
clarkbthis change just merged19:03
clarkbThere was an issue with jobs hitting POST_FAILUREs that seems to have self resolved. Probalby some issue with the internet or one of our dependencies19:04
clarkbThe next steps here that I am aware of are to switch to using zuul provided image types and use the zuul-jobs role to upload to swift19:05
clarkbhttps://review.opendev.org/c/opendev/zuul-providers/+/95101819:05
corvusi think the only thing i'd add to that is that there is a change out there to remove no_log if we feel comfortable with that19:05
clarkbhttps://review.opendev.org/c/opendev/zuul-providers/+/94994419:06
clarkbhttps://review.opendev.org/c/opendev/zuul-providers/+/948989 this is the no_log change19:06
corvusyep19:06
clarkbI think I'm ok with ^ if we approve it when people can check the results afterwards (so that we can rotate creds quickly if necessary_19:06
clarkbI'll +2 it but not approve19:06
corvusdefinitely want buy-in on that; needs a few more +2s at least19:06
clarkbcorvus: any preference in order between zuul-jobs role switch or zuul image type source?19:07
corvusimage type19:07
corvusthen role19:07
clarkbcan I recheck that one now that rocky iomages have landed?19:08
clarkbthat one == image type19:08
corvus++ thanks19:08
clarkbhttps://review.opendev.org/c/opendev/zuul-providers/+/949944 has been rechecked19:08
clarkb#topic Gerrit shutdown problems19:08
clarkblast week we restarted gerrit to update from 3.10.5 to 3.10.6 and unfortunately our sigint didn't seem to shutdown gerrit cleanly19:09
fungiand now we think this is cache cleanup taking too long?19:10
clarkbwe ended up waiting for the 5 minute timout before docker compose issued a sigkill. The restart prior to this we managed to test things and sigint did work then19:10
clarkbso ya I started brainstorming what could be different and one difference is the size of caches and our use of h2 cache db compaction19:10
clarkbI think it is possible that shutdown is slow beacuse it is trying to compact things and not doing so quickly enough. The total compaction time should be about 4 minutes max if done serially though which is less than our timeout. but if the shutdown needs at least a minute to do other things...19:11
clarkb#link https://review.opendev.org/c/opendev/system-config/+/950595 One theory is that h2 compaction time may be slowing down shutdown enough to time out19:11
clarkbI've got this change to remove the compaction timeout increase (default is 200ms so should be very quick after we remove the config). This won't apply until the next restart as this config is in place for this restart (it does it on shutdown not startup)19:12
clarkbI'd like to propose that the next time we've got a clean block of time to restart gerrit we do this: land 950595 and wait for it to apply, manually issue a SIGHUP using kill to bypass the podman SIGHUP apparmor issue and see if SIGHUP behaves differently since the compaction won't take effect anyway19:13
fungisgtm19:13
clarkbbasically use SIGHUP to gather more data this next restart. Then after the next restart will be running without compaction timeout increases which means the restart after next can attempt to use sigint again19:13
clarkband if compaction is the problem then we should see sigint become more reliable. If it isn't then I want to know if hup vs int is something we can measure more accurately19:13
corvus++19:14
clarkbgreat. In that case let me know if you want to restart Gerrit and I can help. Or if you have to restart gerrit for urgent reasons try to remember to use kill -HUP $GERRITPID ; docker-compose down ; docker-compose up -d19:14
clarkbthat should be safe since we don't auto restart gerrit so docker compose will notice that the container is not running then down will delete the container and we can start fresh on the up -d19:15
clarkbotherwise I'll keep in mind that I want to do that soonish and try to make time for it19:15
clarkb#topic Gerrit 3.11 Upgrade Planning19:15
corvusI don't restart gerrit often, but when I do, I use kill -HUP $GERRITPID ; docker-compose down ; docker-compose up -d19:15
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.11 Planning Document for the eventual Upgrade19:16
clarkbI haven't made a ton of progress on this since we last spoke. Was hoping to get some of the pre steps out of the way like the 3.10.6 upgrade and switching image location to quay that got derailed by the shutdown issue19:16
clarkbI suppose switching the image location to quay is not something that should impact gerrit shutdown behavior so that should be safe to do in conjunction with the earlier restart proposals19:17
clarkbso hopefully I can sneak that in soon too19:17
clarkbI think that is the last major pre pre step. Then its all testing things and double checking behavior changes that we'll have to accomodate19:17
clarkb#link https://www.gerritcodereview.com/3.11.html19:17
clarkbif you have time to look over the release notes and make notes in the etherpad about things that you think deserve testing or attention please do so19:18
clarkbI was hoping to have things a bit further along for an early june upgrade but I'm not sure that is feasible now just with other stuff I know I need to get done in the next coupel of weeks19:19
clarkbbut we'll see maybe mid june is still doable19:19
clarkbAny questions or concerns or comments about Gerrit 3.11 upgrade?19:19
funginot from my end19:20
clarkb#topic Upgrading old servers19:20
clarkbno updates here from me. fungi I don't think we have any word on refstack yet do we?19:20
funginope, sorry19:21
fungibeen distracted recently19:21
clarkbya I have similar distractions19:22
clarkb#topic Working through our TODO list19:22
clarkb#link https://etherpad.opendev.org/p/opendev-january-2025-meetup19:22
clarkbjust our weekly reminder to anyone listening that if they would like to help out more starting with the list at hte bottom of this etherpad is a good place to start. Happy to answer any questions anyone may have about the list too19:23
clarkb#topic OFTC Matrix bridge no longer supporting new users19:23
clarkbI mentioned this to the openstack TC last week and gouthamr did some testing and it seemed to work for him19:23
clarkbso I'm not sure if this is a persistent issue, or maybe only affects subsets of users (specific server, client I dunno)19:24
clarkb#link https://github.com/matrix-org/matrix-appservice-irc/issues/185119:24
clarkbthe issue did get closed as others seem to have noticed it seems to be working since a bot restart too19:24
clarkbthat occurred ~May 1919:24
corvusthere was definitely a bridge restart19:24
clarkblong story short this may not be super urgent and irc users should be able to talk to matrix users once again19:24
corvuswe're in the same place we were: the bridge has an unknown future.19:25
clarkbright. Things are functional today but in limbo for the future19:25
corvusbut we also now have some mixed evidence: it may be subject to bitrot; and fixing that bitrot may happen; but if it does, it may not be a high priority19:25
corvusso... "unknown'":)19:25
clarkbPersonally I would still be happy to migrate opendev into matrix if we want to go that route.19:26
corvus(it was broken for... 2 weeks?  many days at least)19:26
clarkbafter a week of thinking about our options here (irc for irc, matrix for matrix; pay for a bridge; host a bridge; move to matrix) did anyone else have opinions on what they'd like to see us do?19:27
clarkband again to be clear this would basically be for #opendev and #opendev-meeting (thought maybe #opendev-meeting bceomes a thread in #opendev)19:27
fungioh please not a thread19:28
clarkbnot talking about moving openstack or anyone else. Just our opendev specific synchronous comms channels19:28
clarkbfungi: ha19:28
fungiyou were just baiting me, i'm sure19:28
clarkbto justfiy my position on this I think haveing a single room whether that be IRC or Matrix is valuable. Matrix enables us to cater to those using matrix to IRC today without forcing them to figure out persistent connections for scrollback etc. And we don't have to give up on using open source tools19:29
clarkbthen from a user perspective I've largely been happy using matrix particularly when encryption is not involved. The only real issues I'ev had have been in rooms with encryption which we would not configure for opendev as it would be public and logged anyway19:30
corvus++ and we continue to blaze a trail for other openinfra projects to follow in addressing their own issues19:30
clarkband given the regular care and feeding bridges appear to need I worry that eitherp aying for one or hosting one would just be more effort and time we could spend elsewhere19:31
corvus(to be clear re encryption, the issues are usually that it works too well, not the other way, so... could be worse :)19:31
corvusi agree, i don't love the bridge idea at the opendev/openinfra level. i think it works best either network-wide or very small (individual/team)19:32
clarkbso I guess thinking about next steps here do we think I should make a formal proposal on service-discuss? or do we want to have rough consensus among us before proposing things more broadly on the list?19:32
fungii still struggle a bit to make matrix something i can pay attention to the way i can irc, but that's down to my obscure workflows and preferences not being met well by existing client options so i try not to let that cloud my opinion19:33
corvusi think a consensus check would be good19:34
corvusthen take it wider if no one violently objects19:35
clarkbin that case can the other infra-roots let me know what they are thinking as far as options here go? feel free to pm or email me or discuss publicly further19:35
clarkbthen based on that I can make a formal proposal if appropriate19:35
clarkbI don't think we need to do the polling in this meeting. But please do follow up19:35
fungii'm willing to go along with and supportive of whatever others want to propose for this19:36
clarkback19:36
fungibut i don't have any strong opinions either way19:36
clarkbI think we can move on for now and follow up when I have a bit more feedback19:36
clarkb#topic Enabling hashtags globally19:36
fungithis on the other hand ;)19:37
clarkbcorvus brought this up again and asks if we need a process to enable this globally for registered users19:37
clarkbI think that if we set this in All-Projects then any existing configuration for specific projects limiting this would continue to limit things so we wouldn't immediately break those users' use cases19:37
fungiwhat's the config option (if anyone knows off the top of their heads)?19:37
clarkbfungi: editHashtags19:38
corvusseeing some folks have issues since it's now bifurcated so some projects allow it and others don't, and being able to set them on a group of changes across a bunch of projects would be useful :)19:38
clarkbgiven that existing configs should continue to win I'm thinking we can probably proceed with enabling this in all projects to address the 80% case19:38
fungithanks, just making sure i know what to git grep so i can figure out who in openstack to reach out to if they're already overriding it in different ways19:38
clarkbthen after we've enabled it and things haven't burnt down for a week or two we can reach out and get those other projects to drop their specific configs19:38
fungithe main hurdle is that some projects enabled it only for change owner and core review teams19:38
fungiso i wanted to look to see who might have done, exactly19:39
clarkbya I'm hoping that if we just go ahead and enable it then we've got examples of how we don't need tol limit it anymore19:39
clarkblooks like its about a 50-50 split between registered users and core groups in project-config19:39
clarkbI think the main hurdle here has been that all projects isn't managed by project-config so one of us has to update it using admin creds which is annoying (but doable)19:40
fungiat quick count, 11 openstack acls restrict editHashtags to core reviewers19:40
corvusi'm very happy to do the typing if/when it's decided19:41
clarkboh I guess since editHashtags isn't marked exclusive we wouldn't have their specific rules override the global rule19:41
fungibut looks like 9 out of the 11 are managed by the technical committee directly, so maybe only a few groups of folks for me to reach out to about it19:41
clarkbwe would essentially make the specific rules obsolete/redundant with a global rule19:41
fungibasically, it looks like the tc restricted hashtag use in ~all of their own repositories19:42
clarkbfor a process how about we announce our intent to change this to service-announce (or discuss if we think this is too much noise for announce) and give the tc a week to object and if that doesn't happen corvus can do the typing19:42
clarkbfungi: or do you think you want to reach out directly first since only openstack has the special rules and we can change it as soon as we get the all clear?19:43
corvussgtm19:43
clarkbI'm ahppy to draft and send the announcement if we go that route19:43
clarkbI should be able to do that this afternoon19:43
corvus(that sounds even better tm)19:43
fungii can handle the direct outreach, sure. it's just the tc kolla team, looks like19:44
fungi2 kolla repos, 9 tc repos19:44
clarkbfungi: cool do you want to do that post announcement or do you think we can forego the announcement?19:44
fungii wouldn't forego the announcement, because it'll still be a behavior change for all projects19:45
clarkback I'll send that out today with an announced all projects update of June 319:45
clarkbcorvus: ^ does that timing work for your typing driver?19:45
fungiand i can do outreach more easily after the announcement if i'm able to refer people back to it19:45
corvusyep19:46
clarkbexcellent19:46
clarkb#topic Adding CentOS 10 Stream Support to Glean, DIB, and Nodepool19:46
clarkb(assuming that with that decided we can move on to the next topic)19:46
clarkbCentOS 10 Stream drops Network Manager support for the old /etc/sys/config (or whatever the paths were) network configuration compatibility layer19:47
clarkbthis means you have to configure interfaces with NetworkManager directly which requires updates to glean19:47
clarkb#link https://review.opendev.org/c/opendev/glean/+/941672 Glean NetworkManager Keyfile support19:47
clarkbI think this change is basically there at this point to enable that (there is one small open question but shouldn't impact many if anyone and using this as a forcing function to get their feedaback seems useful at this point. Inline comments have details)19:48
clarkbReviews on that are helpful19:48
clarkbThen with glean sorted out we can figure out diskimage-builder support19:48
clarkb#link https://review.opendev.org/c/openstack/diskimage-builder/+/934045 DIB support for CentOS 10 Stream19:48
clarkbGetting the DIB testing of CentOS 10 Stream has been somewhat complicated for two reasons. The first is CentOS 10 Stream requires x86-64-v3 hardware capabilities which rax classic does not provide (the other clouds do apparently but still that means ~40% of our cloud resources can boot CentOS 10 Straem which is not ideal)19:49
clarkbThis requirement has impacted dib's nodepool based testing and functest chroot based testing as code built for centos 10 stream is executed in both cases and needs to handle those cpu instructions19:50
clarkbthe current plan for dib testing is to rely on nested virt labels which aren't in rax classic19:50
clarkbthe last major complication related to this in dib is updating nodepool devstack deployments to configure the devstack nested VM cpu type (by default devstack uses some old cpu type to simplify testing of openstack things like live migration)19:51
clarkbthe plan there is to switch over to running devstack and dib without nodepool so that we can have greater control over the devstack configuration and don't need to update nodepool and zuul-job related stuff for this big corner case19:51
clarkbI think tonyb was looking into this19:51
clarkbThen the second issue is centos 10 stream's upstream disk images label / as /boot using partition uuids in the partition table19:52
clarkbthis breaks dib's detection of filesystems during image builds in the centos element19:52
clarkbI've asked that we not add workarounds to dib until we've tried to get centos 10 stream to fix their partition labels instead19:52
clarkbbut this means we may land initial centos 10 stream support in dib only with the centos-minimal element which builds things from scratch and doesn't use the upstream image as a starting point19:53
clarkbbut overall I think we have a plan to land some sort of support for CentOS 10 stream in dib that is also tested19:53
clarkbOnce that happens we'll have to consider whether or not we're comfortable adding CentOS 10 Stream images to nodepool/zuul-launcher that can only run in 40% of our cloud resources19:54
clarkbconsider this a warning to start mulling that over19:54
clarkbI'm somewhat concerned that that will become a hack to not use the 60% of our resources creating extar contention for the other 40%19:55
clarkbbut its a bit early to worry about that. If you have time to review the glean change I think that is reviewable at this point19:55
clarkbthen the dib stuff is close but you may wish to wait for the testing job fixups before worrying about proper review19:55
clarkb#topic Open Discussion19:56
clarkbthat was all I had to say about CentOS 10 stream and that was the last thing on the agenda. Anything else with the last ~4 minutes of our hour?19:56
funginot from me19:58
clarkbsounds like that may be everything then19:58
clarkbthank you everyone. We'll be back here at the same time and location if I don't have any last minute stuff come up (there is a small possibility this happens next week...)19:59
clarkb#endmeeting19:59
opendevmeetMeeting ended Tue May 27 19:59:40 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:59
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2025/infra.2025-05-27-19.00.html19:59
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-05-27-19.00.txt19:59
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2025/infra.2025-05-27-19.00.log.html19:59
corvusthanks!19:59
fungithanks clarkb!20:00

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!