Tuesday, 2024-05-14

clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue May 14 19:00:28 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/Q3ZNOUTLIF3HWCUX7CQI7ITFBYC3KCCF/ Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI'm planning to take Friday off and probably Monday as well and do a long weekend19:01
fricklermonday is a public holiday here19:01
fungii am back from my travels and trying to catch up again19:02
clarkboh nice. The monday after is a holiday here but I'm likely going to work that monday and instead do the long weekend this weekend19:02
clarkb#topic Upgrading Old Servers19:03
clarkbnot sure if tonyb is around but I think we can dive right in19:03
clarkbthe meetpad cluster has been upgraded to jammy \o/19:03
tonybYup  I'm here19:04
clarkbthe old servers are still around but in the emergency file and shutdown19:04
tonybso next is wiki and/or cacti19:04
clarkbI think we can probably delete them in the near future as I've just managed to use meetpad successfully without issue19:04
tonybGreat.  I'll remove them today/tomorrow19:04
tonybWRT cacti there is an approved spec to implement prometheus19:05
clarkbtonyb: sounds good. In addition to what is next I'm thinking we should try and collapse our various todo lists into a single one for server upgrades. Do you want to do that or should I take that on?19:05
tonybI an do it.19:05
tonyb*can19:05
clarkbtonyb: yes there is. The main bit of that that needs effort is likely the agent for reporting from servers to prometheus. I think running prometheus should be straightfowrard with a volume for metric data backing and containers for the service (thats still non zero effort but is fairly straightfowrard)19:06
tonybSo how long would cacti be around once we have prometheus? 19:06
clarkbtonyb: that is a good qusetion. I suspect we can keep it around for some time in a locked down state for historical access, but don't need to worry about that being public whcih does reduce a number of concerns19:07
tonybI looked at cacti and it seems like it should be doable on jammy or noble which I think would avoid the need for extended OS support from Canonical19:07
clarkbI think currently cacti will give us a yaer of graphing by default so thats probably a good goal to aim for if possible19:07
clarkbI'm not sure if older data is accessible at all19:07
tonybOkay.  I'll work on that.19:08
corvusyep 1 year no older data19:08
tonybgot it.  I have the start of a plan19:09
clarkbthanks, feel free to reach out with any other questions or concerns19:09
tonybI was also thinking .... we have a number of "simple" services that just got put onto jammy.  Is is a good/bad or indifferent idea to pull them onto noble19:10
clarkbtonyb: I don't think it is a bad idea, but I also think they are less urgent19:10
tonybI worry that doing so will mean more pressure when getting away from noble, but it also simplifies the infra19:10
tonybOkay.  Noted19:10
clarkbhistorically we've been happy to skip the intermediate LTS and just go to the next one19:10
clarkbthough I think we've got a ton of stuff on focal there may be some imbalance in the distribution19:11
tonybif there are ones that are easy or make sense I'll see about including them19:11
clarkbwe also don't have noble test nodes yet so testing noble system-config nodes isn't yet doable19:11
clarkbwhich is related to the next coupel of topics19:11
clarkblets jump into those19:12
clarkb#topic Adding Noble Test Nodes19:12
clarkbI think we're still at the step of needing to mirror noble. I may have time to poke at that later this week depending on how some of this gerrit upgrade prep goes19:12
clarkbOnce we've managed to write all that data into afs we can then add noble nodes to nodepool. I don't think we need a dib release for that baceuse only glean and our own elements changed for noble19:13
tonybSounds good.19:13
clarkbbut if my undersatnding of that is wrong please let me know and we can make a new dib release too19:13
clarkb#topic AFS Volume Cleanup19:14
clarkbafter our meeting last week all the mergeable topic:drop-ubuntu-xenial changes merged19:14
clarkbthank you for the reviews. At this point we are no longer building wheels for xenial and a number of unused jobs have been cleared out. That said there is still much to do19:15
clarkbnext on my list for this item is retiring devstack-gate19:15
clarkbit is/was a big xenial consumer and openstack doesn't use it anymore so I'm going to retire it and pull it out of zuul entirely19:15
clarkbthen when that is done the next step is pruning the rest of the xenial usage for python jobs and nodejs and so on. I suspect this is where momentum will slow down and/or we'll just pull the plug on some stuff19:16
fungihopefully the blast radius for that is vanishingly small, probably non-openstack projects which haven't been touched in years19:16
fungifor devstack-gate i mean19:16
clarkbya a lot of x/* stuff have xenial jobs too19:16
clarkbI expect many of those will end up removed from zuul (without full retirement)19:16
fungialso means we should be able to drop a lot of the "legacy" jobs converted from zuul v219:17
clarkbsimilar to what we did with fedora cleanup and windmill19:17
clarkboh good point19:17
fungitripleo was one of the last stragglers depending on legacy jobs, i think19:17
tonybfun.  never a dull moment ;P19:17
clarkbso ya keep an eye out for changes related to this and I'm sure I'll bug people when I've got a good set that need eyeballs19:19
clarkbslow progress is being made otherwise19:19
clarkb#topic Gerrit 3.9 Upgrade Planning19:19
clarkbLast week we landed a chnge to rebuild our gerrit images so that we would actually have an up to date 3.9 image after the last attempt failed (my fault as the jobs I needed to run were not triggered)19:20
clarkbwe restarted Gerrit to pick up the corresponding 3.8 image update just to be sure everything there was happy and also upgraded mariadb to 10.11 at the same time19:20
clarkbI have since held some test nodes to go through the upgrade and downgrade process in order to capture notes for our planning either19:21
clarkbs/either/etherpad/19:21
fungii can't remember, did i see a change go by last week so that similar changes will trigger image rebuilds in the future?19:21
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.9 Upgrade prep and process notes19:21
tonybfungi: I think you are thinking wishfully 19:21
clarkbfungi: no change that I am aware of. I'm on the fence over whether or not we want that19:21
clarkbits ambiguous to me if the 3.9 image should be promoted when we aren't explicitly updating the 3.9 image and only modifying the job. Maybe it should19:22
fungitonyb: i'm a repo-half-full kinda guy19:22
clarkbother items of note here: testing seems to confirm the topic change limit should be a non issue for the upgrade and our known areas of concern (since we never have that many open changes)19:23
clarkbthe limit is also forgiving enough to allow you to restore abandoned chagnes that push it over the limit19:23
clarkbcreating new changes with that topic or applying the topic to existing changes that would push it over the limit does error though19:23
fungiand only rejects attempts to push new ones or set the topic19:23
fungiyeah, that. perfect19:23
clarkbyup that appears to be the case through testing19:23
clarkbI haven't seen any feedback on my change to add a release war build target to drop web fonts19:24
clarkbwould probably be good for other infra-root to form an opinion on whether or not we want to patch the build stuff locally for our prod builds or just keep using docs with web fonts19:24
fungiwhat was the concern there? were we leaking visitor details by linking to remotely-hosted webfont files?19:25
clarkbfungi: we are but only if you open docs I think19:25
fungioh, so not for the general interface19:26
fungijust the documentation part19:26
clarkband gerrit added the ability to disable that very recently when building docs. However the didn't expose that option when building release wars19:26
clarkbjust when building the documentation directly19:26
fungineat19:26
clarkbya just did a qucik check. Gerrit proper serves the fonts it needs19:26
clarkbthe docs fetch them from google19:27
fungiprobably added the disable feature to aid in faster local gerrit doc development19:27
fungiand didn't consider that people might not want remote webfont links in production19:28
clarkbso anyway I pushed a change upstream to have a release war build target that depends on the documentation build without web fonts19:28
clarkbwe can apply that patch locally to our gerrit builds if we want19:28
clarkbor we can stick with the status quo until the change merges upstream19:28
clarkbfeedback on approach taken there very much welcome19:29
clarkb#link https://gerrit-review.googlesource.com/c/gerrit/+/424404 Upstream release war build without web fonts change19:30
clarkb#topic openstack.org DNS record cleanup19:30
clarkb#link https://paste.opendev.org/show/bVHJchKcKBnlgTNRHLVK/ Proposed DNS cleanup19:30
fungijust a quick request for folks to look through that list and let me know if there are any entries which i shouldn't delete19:30
clarkbfungi put together this list of openstack.org records that can be cleaned up. I skimmed the list and had a couple of records to call out, but otherwise seems fine19:30
clarkbapi.openstack.org redirects to developer.openstack.org so something is backing that name and record19:31
clarkband then I don't think we managed whatever sat behind the rss.cdn record so unsure if that is in use or expected to function19:31
fungioh, yep, i'm not sure why api.openstack.org was in the list19:31
fungii'll say it was a test to see if anyone read closely ;)19:32
fungiwe did manage the rss.cdn.o.o service, back around 2013 or so19:32
fungiit was used to provide an rss feed of changes for review19:32
clarkboh I remember that now.19:33
fungii set up the swift container for that, and anita did the frontend work i think19:33
fungiopenstackwatch, i think it was called?19:33
clarkbthat sounds right19:33
fungicronjob that turned gerrit queries into rss and uploaded the blob there19:34
clarkbthose were the only two I had questions about. Everything else looked safe to me19:35
clarkband now api.o.o is the only one I would remove fro mthe list19:35
tonybNothing stands out to me, but I can do some naive checking to confirm19:36
fungioddly, i can't tell what's supplying the api.openstack.org redirect19:36
fungisomething we don't manage inside an ip block allocated to Liquid Web, L.L.C19:37
fungii guess we should host that redirect on static.o.o instead19:38
fungii'll push up a change for that and make the dns adjustment once it deploys19:39
clarkbsounds like a plan19:40
clarkb#topic Open Discussion19:40
clarkbAnything else?19:40
tonybVexxhost and nodepool19:40
clarkbI meant to mention this in the main channel earlier today then didn't: but this is the classic reason for why we have redundancy19:41
tonybfrickler: reported API issues and they seem to ongoing19:41
clarkbI don't think this is an emergency19:41
clarkbhowever, it would be good to understand in case its a systemic problem that would affect say gitea servers or review19:41
fricklerseeing the node failures I'm also wondering whether disabling that region would help19:41
tonybNo not an emergency just don't want it to linger and wondering when, or indeed, if we should act/escalate19:42
clarkbyes i think the next step may be to set max-servers to 0 in those region(s) then try manually booting instances19:43
clarkband debug from there19:43
clarkbthen hopefully that gives us information that can be used to escalate effectively19:44
tonybOkay.  That sounds reasonable.19:45
clarkbsounds like that may be everything. Thank you everyone19:48
clarkbWe'll be back here same time and location next week19:48
corvusthanks!19:48
clarkband feel free to bring things up in irc or on the mailing list in the interim19:48
clarkb#endmeeting19:48
opendevmeetMeeting ended Tue May 14 19:48:53 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:48
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2024/infra.2024-05-14-19.00.html19:48
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-05-14-19.00.txt19:48
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2024/infra.2024-05-14-19.00.log.html19:48

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!