Tuesday, 2024-05-14

clarkb	#startmeeting infra	19:00
opendevmeet	Meeting started Tue May 14 19:00:28 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:00
opendevmeet	The meeting name has been set to 'infra'	19:00
clarkb	#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/Q3ZNOUTLIF3HWCUX7CQI7ITFBYC3KCCF/ Our Agenda	19:01
clarkb	#topic Announcements	19:01
clarkb	I'm planning to take Friday off and probably Monday as well and do a long weekend	19:01
frickler	monday is a public holiday here	19:01
fungi	i am back from my travels and trying to catch up again	19:02
clarkb	oh nice. The monday after is a holiday here but I'm likely going to work that monday and instead do the long weekend this weekend	19:02
clarkb	#topic Upgrading Old Servers	19:03
clarkb	not sure if tonyb is around but I think we can dive right in	19:03
clarkb	the meetpad cluster has been upgraded to jammy \o/	19:03
tonyb	Yup I'm here	19:04
clarkb	the old servers are still around but in the emergency file and shutdown	19:04
tonyb	so next is wiki and/or cacti	19:04
clarkb	I think we can probably delete them in the near future as I've just managed to use meetpad successfully without issue	19:04
tonyb	Great. I'll remove them today/tomorrow	19:04
tonyb	WRT cacti there is an approved spec to implement prometheus	19:05
clarkb	tonyb: sounds good. In addition to what is next I'm thinking we should try and collapse our various todo lists into a single one for server upgrades. Do you want to do that or should I take that on?	19:05
tonyb	I an do it.	19:05
tonyb	*can	19:05
clarkb	tonyb: yes there is. The main bit of that that needs effort is likely the agent for reporting from servers to prometheus. I think running prometheus should be straightfowrard with a volume for metric data backing and containers for the service (thats still non zero effort but is fairly straightfowrard)	19:06
tonyb	So how long would cacti be around once we have prometheus?	19:06
clarkb	tonyb: that is a good qusetion. I suspect we can keep it around for some time in a locked down state for historical access, but don't need to worry about that being public whcih does reduce a number of concerns	19:07
tonyb	I looked at cacti and it seems like it should be doable on jammy or noble which I think would avoid the need for extended OS support from Canonical	19:07
clarkb	I think currently cacti will give us a yaer of graphing by default so thats probably a good goal to aim for if possible	19:07
clarkb	I'm not sure if older data is accessible at all	19:07
tonyb	Okay. I'll work on that.	19:08
corvus	yep 1 year no older data	19:08
tonyb	got it. I have the start of a plan	19:09
clarkb	thanks, feel free to reach out with any other questions or concerns	19:09
tonyb	I was also thinking .... we have a number of "simple" services that just got put onto jammy. Is is a good/bad or indifferent idea to pull them onto noble	19:10
clarkb	tonyb: I don't think it is a bad idea, but I also think they are less urgent	19:10
tonyb	I worry that doing so will mean more pressure when getting away from noble, but it also simplifies the infra	19:10
tonyb	Okay. Noted	19:10
clarkb	historically we've been happy to skip the intermediate LTS and just go to the next one	19:10
clarkb	though I think we've got a ton of stuff on focal there may be some imbalance in the distribution	19:11
tonyb	if there are ones that are easy or make sense I'll see about including them	19:11
clarkb	we also don't have noble test nodes yet so testing noble system-config nodes isn't yet doable	19:11
clarkb	which is related to the next coupel of topics	19:11
clarkb	lets jump into those	19:12
clarkb	#topic Adding Noble Test Nodes	19:12
clarkb	I think we're still at the step of needing to mirror noble. I may have time to poke at that later this week depending on how some of this gerrit upgrade prep goes	19:12
clarkb	Once we've managed to write all that data into afs we can then add noble nodes to nodepool. I don't think we need a dib release for that baceuse only glean and our own elements changed for noble	19:13
tonyb	Sounds good.	19:13
clarkb	but if my undersatnding of that is wrong please let me know and we can make a new dib release too	19:13
clarkb	#topic AFS Volume Cleanup	19:14
clarkb	after our meeting last week all the mergeable topic:drop-ubuntu-xenial changes merged	19:14
clarkb	thank you for the reviews. At this point we are no longer building wheels for xenial and a number of unused jobs have been cleared out. That said there is still much to do	19:15
clarkb	next on my list for this item is retiring devstack-gate	19:15
clarkb	it is/was a big xenial consumer and openstack doesn't use it anymore so I'm going to retire it and pull it out of zuul entirely	19:15
clarkb	then when that is done the next step is pruning the rest of the xenial usage for python jobs and nodejs and so on. I suspect this is where momentum will slow down and/or we'll just pull the plug on some stuff	19:16
fungi	hopefully the blast radius for that is vanishingly small, probably non-openstack projects which haven't been touched in years	19:16
fungi	for devstack-gate i mean	19:16
clarkb	ya a lot of x/* stuff have xenial jobs too	19:16
clarkb	I expect many of those will end up removed from zuul (without full retirement)	19:16
fungi	also means we should be able to drop a lot of the "legacy" jobs converted from zuul v2	19:17
clarkb	similar to what we did with fedora cleanup and windmill	19:17
clarkb	oh good point	19:17
fungi	tripleo was one of the last stragglers depending on legacy jobs, i think	19:17
tonyb	fun. never a dull moment ;P	19:17
clarkb	so ya keep an eye out for changes related to this and I'm sure I'll bug people when I've got a good set that need eyeballs	19:19
clarkb	slow progress is being made otherwise	19:19
clarkb	#topic Gerrit 3.9 Upgrade Planning	19:19
clarkb	Last week we landed a chnge to rebuild our gerrit images so that we would actually have an up to date 3.9 image after the last attempt failed (my fault as the jobs I needed to run were not triggered)	19:20
clarkb	we restarted Gerrit to pick up the corresponding 3.8 image update just to be sure everything there was happy and also upgraded mariadb to 10.11 at the same time	19:20
clarkb	I have since held some test nodes to go through the upgrade and downgrade process in order to capture notes for our planning either	19:21
clarkb	s/either/etherpad/	19:21
fungi	i can't remember, did i see a change go by last week so that similar changes will trigger image rebuilds in the future?	19:21
clarkb	#link https://etherpad.opendev.org/p/gerrit-upgrade-3.9 Upgrade prep and process notes	19:21
tonyb	fungi: I think you are thinking wishfully	19:21
clarkb	fungi: no change that I am aware of. I'm on the fence over whether or not we want that	19:21
clarkb	its ambiguous to me if the 3.9 image should be promoted when we aren't explicitly updating the 3.9 image and only modifying the job. Maybe it should	19:22
fungi	tonyb: i'm a repo-half-full kinda guy	19:22
clarkb	other items of note here: testing seems to confirm the topic change limit should be a non issue for the upgrade and our known areas of concern (since we never have that many open changes)	19:23
clarkb	the limit is also forgiving enough to allow you to restore abandoned chagnes that push it over the limit	19:23
clarkb	creating new changes with that topic or applying the topic to existing changes that would push it over the limit does error though	19:23
fungi	and only rejects attempts to push new ones or set the topic	19:23
fungi	yeah, that. perfect	19:23
clarkb	yup that appears to be the case through testing	19:23
clarkb	I haven't seen any feedback on my change to add a release war build target to drop web fonts	19:24
clarkb	would probably be good for other infra-root to form an opinion on whether or not we want to patch the build stuff locally for our prod builds or just keep using docs with web fonts	19:24
fungi	what was the concern there? were we leaking visitor details by linking to remotely-hosted webfont files?	19:25
clarkb	fungi: we are but only if you open docs I think	19:25
fungi	oh, so not for the general interface	19:26
fungi	just the documentation part	19:26
clarkb	and gerrit added the ability to disable that very recently when building docs. However the didn't expose that option when building release wars	19:26
clarkb	just when building the documentation directly	19:26
fungi	neat	19:26
clarkb	ya just did a qucik check. Gerrit proper serves the fonts it needs	19:26
clarkb	the docs fetch them from google	19:27
fungi	probably added the disable feature to aid in faster local gerrit doc development	19:27
fungi	and didn't consider that people might not want remote webfont links in production	19:28
clarkb	so anyway I pushed a change upstream to have a release war build target that depends on the documentation build without web fonts	19:28
clarkb	we can apply that patch locally to our gerrit builds if we want	19:28
clarkb	or we can stick with the status quo until the change merges upstream	19:28
clarkb	feedback on approach taken there very much welcome	19:29
clarkb	#link https://gerrit-review.googlesource.com/c/gerrit/+/424404 Upstream release war build without web fonts change	19:30
clarkb	#topic openstack.org DNS record cleanup	19:30
clarkb	#link https://paste.opendev.org/show/bVHJchKcKBnlgTNRHLVK/ Proposed DNS cleanup	19:30
fungi	just a quick request for folks to look through that list and let me know if there are any entries which i shouldn't delete	19:30
clarkb	fungi put together this list of openstack.org records that can be cleaned up. I skimmed the list and had a couple of records to call out, but otherwise seems fine	19:30
clarkb	api.openstack.org redirects to developer.openstack.org so something is backing that name and record	19:31
clarkb	and then I don't think we managed whatever sat behind the rss.cdn record so unsure if that is in use or expected to function	19:31
fungi	oh, yep, i'm not sure why api.openstack.org was in the list	19:31
fungi	i'll say it was a test to see if anyone read closely ;)	19:32
fungi	we did manage the rss.cdn.o.o service, back around 2013 or so	19:32
fungi	it was used to provide an rss feed of changes for review	19:32
clarkb	oh I remember that now.	19:33
fungi	i set up the swift container for that, and anita did the frontend work i think	19:33
fungi	openstackwatch, i think it was called?	19:33
clarkb	that sounds right	19:33
fungi	cronjob that turned gerrit queries into rss and uploaded the blob there	19:34
clarkb	those were the only two I had questions about. Everything else looked safe to me	19:35
clarkb	and now api.o.o is the only one I would remove fro mthe list	19:35
tonyb	Nothing stands out to me, but I can do some naive checking to confirm	19:36
fungi	oddly, i can't tell what's supplying the api.openstack.org redirect	19:36
fungi	something we don't manage inside an ip block allocated to Liquid Web, L.L.C	19:37
fungi	i guess we should host that redirect on static.o.o instead	19:38
fungi	i'll push up a change for that and make the dns adjustment once it deploys	19:39
clarkb	sounds like a plan	19:40
clarkb	#topic Open Discussion	19:40
clarkb	Anything else?	19:40
tonyb	Vexxhost and nodepool	19:40
clarkb	I meant to mention this in the main channel earlier today then didn't: but this is the classic reason for why we have redundancy	19:41
tonyb	frickler: reported API issues and they seem to ongoing	19:41
clarkb	I don't think this is an emergency	19:41
clarkb	however, it would be good to understand in case its a systemic problem that would affect say gitea servers or review	19:41
frickler	seeing the node failures I'm also wondering whether disabling that region would help	19:41
tonyb	No not an emergency just don't want it to linger and wondering when, or indeed, if we should act/escalate	19:42
clarkb	yes i think the next step may be to set max-servers to 0 in those region(s) then try manually booting instances	19:43
clarkb	and debug from there	19:43
clarkb	then hopefully that gives us information that can be used to escalate effectively	19:44
tonyb	Okay. That sounds reasonable.	19:45
clarkb	sounds like that may be everything. Thank you everyone	19:48
clarkb	We'll be back here same time and location next week	19:48
corvus	thanks!	19:48
clarkb	and feel free to bring things up in irc or on the mailing list in the interim	19:48
clarkb	#endmeeting	19:48
opendevmeet	Meeting ended Tue May 14 19:48:53 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	19:48
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-05-14-19.00.html	19:48
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-05-14-19.00.txt	19:48
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-05-14-19.00.log.html	19:48

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!