Tuesday, 2023-08-29

*** NeilHanlon_ is now known as NeilHanlon		13:11
clarkb	almost meeting time	18:59
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue Aug 29 19:01:10 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
clarkb	#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/2LK5PHWBDIBZDHVLIEFKFZJKB3AEJZ45/ Our Agenda	19:01
clarkb	#topic Announcements	19:01
clarkb	Monday is a holiday in some parts of the world.	19:01
clarkb	#topic Service Coordinator Election	19:02
fungi	congratudolences	19:02
clarkb	heh I was the only nominee so I'm it by default	19:02
clarkb	feedback/help/interest in taking over in the future all welcome	19:03
clarkb	just let me know	19:03
clarkb	#topic Infra Root Google Account	19:03
clarkb	This is me noting I still haven't tried to dig into that. I feel like i need to be in a forensic frame of mind for that and I just haven't had that lately	19:03
clarkb	#topic Mailman 3	19:04
clarkb	Cruising along to a topic with good news!	19:04
fungi	si	19:04
clarkb	all of fungi's outstanding changes have landed and been applied to the server. This includes upgraded to latest mailman3	19:04
clarkb	thank you fungi for continuing to push this along	19:04
fungi	i think we've merged everything we expected to merge	19:04
fungi	so far no new issues observed and known issues are addressed	19:04
fungi	next up is scheduling migrations for the 5 remaining mm2 domains we're hosting	19:05
clarkb	we have successfully sent and received email through it since the changes	19:05
fungi	migrating lists.katacontainers.io first might be worthwhile, since that will allow us to decommission the separate server it's occupying	19:05
fungi	we also have lists.airshipit.org which is mostly dead so nobody's likely to notice it moving anyway	19:06
fungi	as well as lists.starlingx.io and lists.openinfra.dev	19:06
clarkb	ya starting with airshipit and kata seems like a good idea	19:07
fungi	then lastly, lists.openstack.org (which we should also save for last, it will be the longest outage and should definitely have a dedicated window to itself)	19:07
clarkb	do you think we should do them sequentially or try to do blocks of a few at a time for the smaller domains	19:07
fungi	i expect the openstack lists migration to require a minimum of 3 hours downtime	19:07
fungi	i think maybe batches of two? so we could do airship/kata in one maintenance, openinfra/starlingx in another	19:08
clarkb	sounds like a plan. We can also likely go ahead with those two blocks whenever we are ready	19:08
clarkb	I don't think any of those projects are currently in the middle of release activity or similar	19:08
fungi	i'll identify the most relevant mailing lists on each of those to send a heads-up to	19:09
clarkb	I'm happy to be an extra set of hands/eyeballs during those migrations. I expect you'll be happy for any of us to participate	19:10
fungi	mainly it's the list moderators who will need to be aware of interface changes	19:10
fungi	and yes, all assistance is welcome	19:10
fungi	the migration is mostly scripted now, the script i've been testing with is in system-config	19:10
clarkb	great I guess let us know when you've got times picked and list moderators notified and we can take it from there	19:11
fungi	will do. we can coordinate scheduling those outside the meeting	19:12
clarkb	#topic Server Upgrades	19:12
clarkb	Another topic where I've had some todos but haven't made progress yet	19:12
clarkb	I do plan to clean up the old isnecure ci registry server today and then I need to look at replacing some old server	19:12
clarkb	#topic Rax IAD image upload struggles	19:13
clarkb	fungi: frickler: anything new to add here? What is the current state of image uplaods for that region?	19:13
fungi	i cleaned up all the leaked images in all regions	19:13
fungi	there were about 400 each in dfw/ord and around 80 new in iad. now that things are mostly clean we should look for newly leaked nodes to see if we can spot why they're not getting cleaned up (if there are any, i haven't looked)	19:14
fungi	also i'm not aware of a ticket for rackspace yet	19:14
clarkb	would be great if we can put one of those together. I feel like I don't have enough of the full debug history to do it justice myself	19:15
fungi	yeah, i'll try to put something together for that tomorrow	19:16
frickler	I think if we could limit nodepool to upload no more than one image at a time, we would have no issue	19:16
clarkb	I think we can do that but its nodepool builder instance wide. So we might need to run a special isntace jkust for that region	19:16
clarkb	(there is a flag for number of upload threads)	19:17
clarkb	it would be clunky to do with current nodepool but possible	19:17
frickler	so that would also build images another time just for that region?	19:17
clarkb	yes	19:18
clarkb	definitely not ideal	19:18
frickler	the other option might be to delete other images and just run jammy jobs there? not sure how that would affect mixed nodesets	19:18
clarkb	I think it would prevent mixed nodesets from running there but nodepool would properly avoid using that region for those nodesets	19:19
clarkb	so ya that would work	19:19
frickler	so I could delete the other images manually	19:19
frickler	and then we can wait for the rackspace ticket to work	19:19
clarkb	if things are okayish right now maybe see if we get a response on the ticket quickly otherwise we can refactor something like ^ or even look at nodepool changes to make it more easily "load balanced"	19:20
frickler	well the issue is that the other images get older each day, not sure when that will start to cause issues in jobs	19:21
clarkb	got it. The main risk is probably that we're ignoring possible bugfixes upstream of us.	19:21
fungi	they are almost certainly already causing jobs to take at least a little longer since more git commits and packages have to be pulled over the network	19:21
clarkb	definitely not ideal	19:21
fungi	jobs which were hovering close to timeouts could be pushed over the cliff by that, i suppose	19:22
fungi	or the increase in network activity could raise their chances that a stray network issue causes the job to be retried	19:22
clarkb	ya maybe we should just focus on our default label (jammy) since most jobs run on that and let the others lie dormant/disabled/removed for now	19:23
clarkb	ok anything else on this topic?	19:24
frickler	ok, so I'll delete other image, we can still reupload manually if needed	19:24
frickler	*images	19:24
corvus	what if...	19:24
corvus	what if we set the upload threads to 1 globally; so don't make any other changes than that	19:25
clarkb	corvus: we'll end up with more stale images everywhere. But maybe within a few days so thats ok?	19:25
corvus	it would slow everything down, but would it be too much? or would that be okay?	19:25
clarkb	I think the upper bound of image uploads on things that are "happy" is ~1hour	19:26
frickler	I think it will be too much, 10 or so images times ~8 regions times ~30mins per image	19:26
clarkb	so we'll end up about 5 ish days behind doing some quick math in my head on fuzzy numbers	19:26
fungi	and we have fewer than 24 images presently	19:26
corvus	yeah, like, what's our wall-clock time for uploading to everywhere? if that is < 24 hours than it's not a big deal?	19:26
fungi	oh, upload to only one provider at a time too	19:26
clarkb	10 * 8 * .5 / 2 = 20 hours?	19:26
corvus	(but also keeping in mind that we still have multiple builders, so it's not completely serialized)	19:27
clarkb	.5 for half an hour per upload and /2 because we haev two builders	19:27
frickler	oh, that is per builder then, not global?	19:27
frickler	so then we could still have two parallel uploads to IAD	19:27
clarkb	frickler: yes its an option on the nodepool-builder process	19:27
clarkb	frickler: yes	19:27
corvus	(but of different images)	19:28
corvus	(not that matters, just clarifying)	19:28
corvus	so it'd go from 8 possible to 2 possible in parallel	19:28
frickler	but that would likely still push those over the 1h limit according to what we tested	19:28
clarkb	maybe it is worth trying since it is a fairly low effort change?	19:28
clarkb	and reverting it is quick since we don't do anything "destructive" to cloud image content	19:29
corvus	that's my feeling -- like i'm not strongly advocating for it since it's not a complete solution, but maybe it's easy and maybe close enough to good enough to buy some time	19:29
frickler	yeah, ok	19:30
clarkb	I'm up for trying it and if we find by the end of the week we are super behind we can revert	19:30
corvus	yeah, if it doesn't work out, oh well	19:30
clarkb	cool lets try that and take it from there (including a ticket to rax if we can manage a constructive write up)	19:31
clarkb	#topic Fedora cleanup	19:32
clarkb	#link https://review.opendev.org/c/opendev/base-jobs/+/892380 Remove the fedora-latest nodeset	19:32
clarkb	I think we're readyish for this change? The nodes themselves are largely nonfunctional so if this breaks anything it won't be more broken than before?	19:32
clarkb	then we can continue towards removing the labels and images from nodepool (which will make the above situation better too)	19:33
clarkb	I'm happy to continue helping nudge this along as long as we're in rough agreement about impact and process	19:33
corvus	i think zuul-jobs is ready for that. wfm.	19:34
fungi	yeah, we dropped the last use of the nodeset we're aware of (was in bindep)	19:35
frickler	we are still building f35 images, too, btw	19:35
clarkb	frickler: ah ok so we'll claen up multiple images	19:35
clarkb	alright I'll approve that change later today if I don't hear any objections	19:36
frickler	just remember to drop them in the right order (which I don't remember), so nodepool can clean them up on all providers	19:36
clarkb	ya I'll have to think about the ndoepool ordering after zuul side is cleaner	19:36
corvus	hopefully https://zuul-ci.org/docs/nodepool/latest/operation.html#removing-from-the-builder helps	19:37
clarkb	++	19:37
corvus	(but don't actually remove the provider at the end)	19:37
clarkb	#topic Zuul Ansible 8 Default	19:38
clarkb	We are ansible 8 by default in opendev zuul now everywhere but openstack	19:38
clarkb	I brought up the plan to switch openstack to ansible 8 by dfeault on Monday to the TC in their meeting today and no one screamed	19:38
clarkb	Its also a holiday for some of us whcih should help a bit	19:38
fungi	i'll be around in case it goes sideways	19:39
clarkb	I plan to be around long enough in the morning (and probably longer) monday to land that change and monitor it a bit	19:39
fungi	well, weather permitting anyway	19:39
clarkb	ya I don't have any plans yet, but it is the day before my parents leave so might end up doign some family stuff but nothing crazy enough I can't jump on for debugging or a revert	19:39
fungi	(things here might literally go sideways if the current storm track changes)	19:39
clarkb	fungi: is that when the hurricane(s) might pass by?	19:39
fungi	no, but if things get bad i'll likely be unavailable next week for cleanup	19:40
frickler	if you prepare and review a patch, I can also approve that earlier on monday and watch a bit	19:40
corvus	i should also be around	19:40
clarkb	frickler: can do	19:40
clarkb	looks like it is just one hurricane at least now	19:41
clarkb	franklin is predicted to go further north and east	19:41
clarkb	#topic Python container updates	19:42
fungi	yeah, idalia is the one we have to watch for now	19:42
clarkb	#link https://review.opendev.org/q/hashtag:bookworm+status:open Next round of image rebuilds onto bookworm.	19:42
clarkb	thank you corvus for pushing up another set of these. Other than the gerrit one I think we can probably land these whenever. For Gerrit we should plan to land it when we are able to restart the container just in case	19:42
clarkb	particularly since the gerrit change bumps java up to java 17	19:43
corvus	o7	19:43
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/893073 Gitea bookworm migration. Does not use base python image.	19:43
clarkb	I pushed a change for gitea earlier today that does not use the same base pythion images but those images will do a similar bullseye to bookworm bump	19:43
clarkb	similar to gerrit gitea probably deserves a bit of attention in this case to ensure that gerrit replication isn't affected.	19:44
clarkb	I'm also happy to do more testing with gerrit and or gitea if we feel that is prudent	19:44
clarkb	reviews and feedback very much welcome	19:44
clarkb	#topic Open Discussion	19:45
clarkb	Other things of note: we upgraded gitea to 1.20.3 and etherpad to 1.9.1 recently	19:45
clarkb	It has been long enough that I don't expect trouble but somethign to be aware of	19:45
fungi	yay upgrades. bigger yay for our test infrastructure which makes them almost entirely worry-free	19:46
clarkb	I mentioned meetpad to someone recently and was todl some group had tried it and ran into problems again. It may be worth doing a sanity check it works as expected	19:46
fungi	i'm free to do a test on it soon	19:47
clarkb	I can do it after I eat some lunch. Say about 20:45UTC	19:47
fungi	i may be in the middle of food at that time but can play it by ear	19:48
clarkb	tox 4.10.0 + pyproject-api 1.6.0/1.6.1 appear to have blown up projects using tox. Tox 4.11.0 fixes it apparently so rechecks will correct it	19:48
clarkb	debugging of this was happening during this meeting so it is very new :)	19:48
corvus	in other news, nox did not break today	19:49
clarkb	Oh I meant to metnion to tonyb to feel free to jump into any of the above stuff or new things if still able/interested. I think you are busy with openstack election stuff right now though	19:49
clarkb	sounds like that is everything. Thank you everyone!	19:50
clarkb	#endmeeting	19:50
opendevmeet	Meeting ended Tue Aug 29 19:50:32 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	19:50
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-08-29-19.01.html	19:50
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-08-29-19.01.txt	19:50
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-08-29-19.01.log.html	19:50
tonyb	election and some internal stuff but noted	19:50
clarkb	tonyb: mostly didn't want you to feel like we're pushing you out of any of this. We're like a river that keeps flowing and more than welcome to have people jump in when able :)	19:51
tonyb	I totally get it.	19:51
fungi	ever tried to drink a whole river? no? well now's your chance!	19:52

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!