Thursday, 2025-08-21

opendevreview	OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/957995	02:20
*** clarkb is now known as Guest24593		07:31
*** dmellado62 is now known as dmellado		13:32
fungi	clarkb: yes, i'm keeping them in the emergency list until they're all done	14:07
zigo	Hi there!	14:38
zigo	I'd like to backport https://review.opendev.org/c/openstack/watcher/+/958207 to unmaintained/zed, though there's no such branch anymore. How can I do, as there's only the EOL tag?	14:38
zigo	(that's for Watcher's OSSN-0094)	14:38
fungi	zigo: you would carry a local backport	14:42
zigo	You mean in Debian? Yeah, I have that already done. Though I would have prefer to share it.	14:42
fungi	it's really more of an openstack question, not an opendev question, but basically if you need unmaintained branches to stay open for longer then the people serving as caretakers for those branches would probably appreciate the help	14:43
fungi	if there are no volunteers to keep up minimal testing for them at least, then they get closed down (tagged and deleted) so that people will stop relying on them	14:43
zigo	Well, then IMO we should stop destroying branches, and keep them open just in case there's a security problem and people start to care again.	14:45
zigo	It's very much ok to keep them as unmaintained/X	14:45
zigo	I just heard Red Hat people are even backporting to Train. Why not sharing these patches then?	14:46
fungi	zigo: that's an excellent question to ask them	14:46
fungi	but i suspect it's because their work would violate upstream policies since they aren't backporting to newer branches when doing so	14:47
fungi	which leaves users on those versions without a clear upgrade path to newer versions	14:47
zigo	I do the work, and would be happy to share it for watcher unmaintained/zed.	14:48
zigo	I guess I should open a new thread in the list about this, since that's not the first time this happens.	14:48
fungi	i will say that from an opendev hosting perspective, we don't want projects leaving an ever-growing pile of branches around because every branch is additional configuration in the ci/cd system and it makes pruning old unused configuration impossible	14:50
zigo	I think it's very much ok to delete all the CI stuff around it, and just let downstream share patches without the CI.	14:51
fungi	workflows, processes and policies are built around many years of an assumption that unused branches will be deleted	14:51
zigo	Can I quote this ? :)	14:51
fungi	feel free! i know we've had this discussion ad nauseum, and every time openstack tries to appease you on this by coming up with new ways to leave branches open for longer you still complain	14:52
fungi	the latest attempt is the unmaintained branch policy, which has put a lot of additional strain on project maintainers and our systems	14:53
zigo	I do each time there's a new security fix that needs backporting, and we have no space to share work. :)	14:53
fungi	well, basically the technical committee came up with a way for interested downstream stakeholders to volunteer to take care of those branches, but when nobody volunteers to do that they get closed	14:56
fungi	https://governance.openstack.org/tc/resolutions/20230724-unmaintained-branches.html which was further amended by https://governance.openstack.org/tc/resolutions/20231114-amend-unmaintained-status.html	14:57
fungi	if that's not adequately solving the problem, then #openstack-tc would be a good place to discuss it (or on the openstack-discuss mailing list, but i'd recommend adding at least [tc] in the subject line)	14:57
Guest24593	re the system strain: it wouldn't be so bad if peopl were actively caring for the branches as presumably the ci jobs would get pruned or updaetd as necesasry to keep things mostly working. The problem is when we create the branch under the assumption it will be cared for, then it is ignored which orphans the system configurations leaving others to clean them up when say opendev wants	15:01
Guest24593	to drop a test platform. Or if zuul changes some syntax	15:01
Guest24593	arg I'm a guest again	15:01
frickler	yeah, we could consider to more actively clean up broken zuul configs, like delete them completely after a while	15:03
Guest24593	the thing I was trying to argue for was not to open the branches in the first place. Wait until someone volunteers. But I'm not sure what effect that may have on say tempest testing and branch defaults	15:04
*** Guest24593 is now known as clarkb		15:05
clarkb	ok sorry about that I am me again	15:05
clarkb	looks like rax flex iad3 did use a couple of instances overnight. Not a lot of load there but non zero	15:08
clarkb	fwiw I'm not sure if we can easily tell zuul to ignore branches either	15:08
fungi	i suppose they could merge a change to replace all their pipeline configs with noop jobs, but that's still configuration on every branch. and even without any configuration on a branch at all zuul still needs to evaluate the branch contents to determine there is no configuration	15:11
clarkb	explicitly using a noop config is probably a nice way to represent it for humans if zuul won't complain	15:13
clarkb	infra-root https://review.opendev.org/c/opendev/system-config/+/957950 is a relatively straightforward change to drop testing of bionic servers with system-config	15:18
clarkb	I checked our fact cache and as far as I can tell we don't have any bionic servers any longer	15:18
clarkb	(then once zuul drops ansible 9 we can drop bionic from zuul-launcher completely and clean up our mirrors etc)	15:18
mattcrees[m]	Hi all. In the Blazar project, we still have a stable/pike branch available. I understand this is because the branch was made before Blazar was managed by opendev. Does anyone know how we'd go about removing this branch?	15:19
clarkb	mattcrees[m]: in general the openstack release team has permissions to manage branches within openstack projects. If they want you to clean it up then you'd need extra gerrit acl permissions on the project or have a gerrit admin do it for you	15:21
clarkb	I would check with them first. I believe they already have tools that script branch cleanups which does the eol tag first then drops the branch	15:21
mattcrees[m]	I see, thanks clarkb. I'll reach out to them	15:22
frickler	mattcrees[m]: clarkb: likely the branch was created before release-management was in place, so we'd need to delete it manually anyway. I'll add that to my todo list	15:22
fungi	yeah, i think this one may fall into a grey area where they've avoided managing existing branches that pre-date a project's inclusion in openstack, but it would be good to confirm with them first	15:22
fungi	ah sounds like we just did ;)	15:22
mattcrees[m]	Nice, thanks frickler	15:23
frickler	confirmed, deliverables/pike/blazar.yaml doesn't exist	15:23
frickler	for the zuul config issue, I created https://review.opendev.org/c/openstack/tacker/+/958219 as an example, seems to work fine. waiting for feedback from elodilles but maybe that can be a simple workaround for the pile of zuul config errors we still have	15:24
clarkb	frickler: that looks promising	15:34
elodilles	frickler: well, tacker has zuul config errors (and broken gates?) even on stable branches. so probably tacker team should start with those, as i guess we don't want to set noop for the whole project...	15:35
clarkb	not for the whole project but unmaintained branches seems like a good idea if tyhey are broken since they are well unmaintained	15:36
elodilles	clarkb: but as i said, stable branches are broken too	15:36
clarkb	yes and those should be fixed	15:37
elodilles	i agree	15:38
clarkb	the extra old branches present extra problems because they tend to be even less cared after and also rely on old ci constructs that need to go away. Replacing them with noop jobs nicely addresses both problems	15:38
elodilles	anyway, i'm not against dropping the complete CI on unmaintained in this case, but i feel it a bit drastic when there are broken stable branches, too	15:40
frickler	the thing is we do not notice the broken stable branches when the huge amount of issues for unmaintained branches overwhelm that list. plus, it is an explicit requirement to keep unmaintained branches open, even though we are slacking at enforcing that requirement	15:46
fungi	yeah, stable branches with broken testing need to be fixed, unmaintained branches with broken testing are supposed to just get deleted	15:49
fungi	but it's an acceptable compromise to remove the testing on the unmaintained branches instead of deleting them immediately	15:50
elodilles	frickler: well, i could name a couple of places where we could add the noop and after that not the unmaintained branches will be the majority of the zuul config errors (like monasca-* and openstack-ansible-tests)	15:50
elodilles	fungi: yepp, now that frickler is proposed the monasca 2023.1-eol patches we will be one step closer to that.	15:51
fungi	hopefully monasca ceases to be a problem soon (either because the person offering to adopt it fixes the jobs, or because the tc decides to go ahead with retiring it)	15:51
elodilles	yepp	15:53
frickler	elodilles: yes, but monasca will hopefully be retired, so I simply chose the next best other example I found	16:00
frickler	also someone seems to be actively working on fixing tacker at least for master https://review.opendev.org/c/openstack/tacker/+/956458	16:00
frickler	which I haven't seen happen for many of the unmaintained branches (though maybe I didn't look closely enough?)	16:01
clarkb	ok I'm popping out now for the eyeball inspection. I'll be back in a bit	16:31
fungi	hope you come back with as many as you left with!	16:31
fungi	(or at least as many)	16:32
fungi	looks like backups failed today on kdc03	17:05
fungi	aha, i think we install borg into a venv and the python version has changed, so we'll need ansible to recreate that venv. sound right?	17:14
fungi	the log has a traceback for importlib.metadata.PackageNotFoundError: No package metadata was found for borgbackup	17:14
fungi	following https://docs.opendev.org/opendev/system-config/latest/afs.html#no-outage-server-maintenance for afs01.ord.openstack.org it looks like there are no rw volumes on it, so not moving any before upgrading	17:29
fungi	having to rm -rf /var/lib/docker/aufs on these too	17:32
clarkb	fungi: yes that sounds right	17:40
clarkb	re having ansible recreate the venv	17:40
clarkb	and then you'll have to do that again with the jump to noble.	17:41
fungi	i have a feeling with the additional work required for the afs01.dfw server, it will make the most sense to upgrade it from focal to jammy and then from jammy to noble, and start working my way back through the noble upgrades on the others	17:42
fungi	that way we don't have to move rw volumes off and back onto it more than once	17:42
clarkb	makes sense	17:42
clarkb	one thing I wondered about is if the dkms stuff is reinstalling the pacakges on these upgrades in order to rebuild against the new kernels	17:43
fungi	so basically having it be the last focal->jammy upgrade and then be the first jammy->noble upgrade	17:43
clarkb	it must as I'm pretty sure this is hwo we upgraded them in the past	17:43
fungi	and yes it is, i'm wayching it right now	17:43
clarkb	cool	17:43
fungi	part of what takes so long with the upgrades	17:43
fungi	as for holding writes to the rw volume on afs01.dfw, i wonder whether we need to put zuul executors on hold too somehow	17:47
fungi	er, rw volumes	17:47
fungi	the mirror-update server can just be shut down temporarily, but it's not the only system we have doing writes into afs	17:48
clarkb	isn't that why we change the rw volume to the other srver?	17:48
clarkb	or do we have to hold writes to do that?	17:48
fungi	oh! it's an either/or in the doc i guess	17:49
fungi	make sure i'm not misreading that	17:49
fungi	so if i move rw volumes from afs01.dfw to afs02.dfw then that in theory happens transparently and i don't have to block anything from writing	17:50
clarkb	that was my understanding though I have't reread the docs	17:51
clarkb	but yes I thought the idea was to always keep the rw volumes up so that we didn't have to stop writers. Do the work only on the ro side and then let it resync	17:51
fungi	and https://grafana.opendev.org/d/9871b26303/afs seems to indicate that they both have the same amount of available space and sizes	17:52
fungi	afs01.ord.openstack.org is on jammy now, working on afs02.dfw.openstack.org next and saving afs01.dfw.openstack.org for last	17:53
fungi	afs seems to be at least functional on afs01.ord	17:54
fungi	still reporting all its same ro volume sites	17:54
fungi	all the volumes on afs02.dfw are confirmed to be ro too so no need to move any yet	17:55
clarkb	"Basically what we need to do is make sure that either no one needs the RW volumes hosted by a fileserver before taking it down or move the RW volume to another fileserver."	18:04
clarkb	yes I read that as two options are available to us and ensuring all of the RW volumes are on one fileserver and taking down the other avoids needing to stop all the writers	18:04
clarkb	the other options requires stopping all writers	18:05
fungi	cool, so i think we're fine here. the main unknown is how long the rw volume moves will take, but hopefully not long since there are synced ro equivalents of all of them	18:06
clarkb	I wonder if disabling cron jobs on mirror-update will make that go more smoothly/quickly	18:11
clarkb	it may still be worth doing if not strictly necessary if it speeds the process up	18:11
fungi	could just... stop crond too	18:17
fungi	or whatever systemd replaced it with	18:17
fungi	afs02.dfw.openstack.org is up on jammy and afs seems to work there still. now on to afs01.dfw.openstack.org, going to start moving its rw volumes to afs02.dfw.openstack.org	18:21
fungi	i'll move a small one initially and double-check its still functional	18:22
clarkb	++	18:22
fungi	afs01.dfw.openstack.org had 57 rw volumes and 55 ro volumes	18:25
clarkb	that sounds right if all the rw volumes are there since I think one or two don't have ro volumes	18:25
clarkb	iirc its ok for those volumes to go down. But its worth double checking	18:25
fungi	docs-old and mirror.logs	18:25
clarkb	hrm is mirror.logs the volume that hosts: https://mirror.dfw.rax.opendev.org/logs/ ?	18:26
fungi	yeah	18:26
clarkb	if so then we probably do end up having writers to that volume and we need to do something about that	18:26
clarkb	afsmon and afs-release run super often. Then the others are the mirror cron jobs	18:26
fungi	oh, also the "service" volume only exists rw on afs01.dfw and there are no ro sites	18:27
clarkb	should we maybe add an ro volume for mirror.logs on afs02.dfw and then it can become the rw site?	18:27
clarkb	I think docs-old is unlikely to be an issue. And I'm not sure about server	18:27
clarkb	*service	18:27
fungi	and then there's a test.corvus volume which is ro on afs02.dfw but has no rw site at all	18:27
clarkb	I suspect that is partial cleanup that orphaned the RO volume	18:28
fungi	looks like there are a few volumes which don't have an ro replica on afs02.dfw	18:28
clarkb	fungi: some may be on afs01.ord	18:28
fungi	well, i mean there are some where the only ro volume is also on afs01.dfw, no other servers have a replica	18:29
fungi	afs02.dfw.openstack.org has 47 ro volumes	18:29
clarkb	got it. I was just calling out that afs02 may be the RO site but also afs01.ord could be	18:29
fungi	and afs01.ord.openstack.org has 17	18:30
corvus	i don't need test.corvus	18:30
fungi	i figured, looked like it was just a missed deletion of a replica	18:30
fungi	but yeah, i'll need to audit the volumes on afs01.dfw to see which ones only have local replicas and no remote ones	18:31
fungi	this'll take a few minutes	18:31
fungi	okay, so these are the volumes with no remote replica, residing only on afs01.dfw: docs-old, mirror.logs, service, user	18:38
fungi	the have remote replicas on afs01.ord but not afs02.dfw: docs, docs.dev, mirror, project, project.airship, root.afs, root.cell, user.corvus	18:38
fungi	all other volumes with rw on afs01.dfw have a ro replica on afs02.dfw	18:39
clarkb	I think its ok to have RO in ord and not dfw we just have to make that site the new RW site temporarily	18:39
clarkb	which means the main thing to consider is whether docs-old, mirror.logs, service, and user need secondary sites. I think mirror.logs having a secondary site is a good idea so that we don't have to turn off all the logging and afs monitoring	18:40
clarkb	I suspect that not having access to user.corvus (because user is down) is something that won't be a big deal. Any idea what "service" is for?	18:41
fungi	i can probably still move its rw volume, it has no ro replica not even locally	18:41
clarkb	oh ya maybe that is the case. I guess in my head you had to have an RO first that gets promoted to RW but that is probably an inaccurate mental model	18:42
fungi	i think it just ends up having to move all the data	18:42
clarkb	got it that makes sense. Wheraes if you have an up to date RO copy its a matter of flipping some attributes	18:43
fungi	vos move -id mirror.logs -toserver afs02.dfw.openstack.org -topartition vicepa -fromserver afs01.dfw.openstack.org -frompartition vicepa -localauth	18:43
fungi	55M /afs/.openstack.org/mirror/logs	18:44
fungi	probably wouldn't take too long	18:44
clarkb	ya and they are in the same region too	18:45
fungi	5 directories and 707 files	18:45
clarkb	I'm guessing service and user are even smaller	18:46
clarkb	docs-old is maybe old enough to not worry about?	18:46
fungi	right, i don't think we hook any readers up to it	18:46
fungi	and we definitely don't write to it	18:46
fungi	we kept it just in case	18:47
fungi	maybe it's time to tar it up and stick the file somewhere for posterity	18:47
clarkb	but also if size isnt massive could just move it too	18:47
fungi	also just realized that on afs01.dfw i have to use 104.130.138.161 instead of afs01.dfw.openstack.org in commands	18:49
fungi	moving mirror.logs rw volume to afs02.dfw.openstack.org took 16 seconds	18:49
clarkb	any idea why we need the ip?	18:49
fungi	has to do with the host lookup	18:50
fungi	vos: server 'afs01.dfw.openstack.org' not found in host table	18:50
clarkb	ah it gets back 127.0.0.1	18:50
fungi	yeah	18:50
clarkb	I smell lunch so will be afk for a bit	18:51
fungi	/afs/.openstack.org/mirror/logs/ still has contents so i think we're good with that	18:51
clarkb	will you do the same for service and user? seems like a good idea given I'm still not quite sure what is under service	18:52
fungi	i've moved service and user to afs02.dfw as well just now, yes. they took only a few seconds each	18:53
clarkb	ack	18:53
fungi	i think it's just an anchor for the various service.foo volumes	18:53
fungi	i'm not going to bother moving docs-old	18:53
fungi	i'll work through the 8 that have ro replicas on afs01.ord next, moving the corresponding rw volumes there	18:54
Clark[m]	Oh if it is the anchor for those volumes we probably want ro copies on a second server?	18:54
fungi	though i also wonder why root.afs and root.cell have replicas on afs01.ord but not afs02.dfw	18:55
fungi	i assume those are special in some way	18:55
Clark[m]	Ord was the original alternative. But the small window size made it unworkable for large mirrors	18:56
Clark[m]	So we added the second dfw server after and prefer it for new stuff	18:56
fungi	right, basically i was wondering if those were forgotten	18:56
fungi	some volumes have remote replicas on both	18:56
Clark[m]	Or just unnecessary to move them since they are small	18:57
fungi	i'm switching docs rw to afs01.ord now, seeing how long it takes	18:57
fungi	answer so far: it's definitely not instantaneous (still in progress after ~10 minutes), i've got it running in a root screen session on afs01.dfw in case anyone needs that	19:06
fungi	still going after half an hour	19:30
fungi	and still going... i'll grab a bite to eat and check back in, hopefully that'll give it enough time. bbiab	19:48
clarkb	I've been doing laptop surgery. New wifi card seems to work a lot more reliably than the old one. Now to reinstall everything to clear out the mess of debugging steps I ahd previously applied	20:37
*** mtreinish_ is now known as mtreinish		20:37
*** keekz_ is now known as keekz		20:39
fungi	and the docs volume move is still in progress	20:46
*** ShadowJonathan_ is now known as ShadowJonathan		20:55
*** keekz_ is now known as keekz		20:55
*** dviroel_ is now known as dviroel		20:55
*** clayg_ is now known as clayg		20:55
*** naskio_ is now known as naskio		20:55
*** dan_with__ is now known as dan_with_		20:55
*** thystips_ is now known as thystips		20:55
*** rpittau_ is now known as rpittau		20:55
*** tonyb_ is now known as tonyb		20:55
*** gmaan_ is now known as gmaan		20:55
*** clarkb is now known as Guest24679		20:58
Guest24679	is anyone else having conenctivity issues to oftc or is it more likely on my client side?	21:39
Guest24679	the other irc networks I'm connceted to don't seem to be bothered. But maybe its an ipv4 vs ipv6 problem	21:39
tonyb	my connection dropped about 40 mins ago but had been stable since then.	21:40
fungi	i think everyone fell off the wagon	21:41
fungi	seems the wheels are back on now	21:41
Guest24679	ack this is just the third time in 2 days so I've started to wonder if it was me or the servers	21:41
fungi	it's the first time it's gotten me, fwiw	21:42
fungi	or at least that i've noticed	21:42
fungi	https://meetings.opendev.org/irclogs/%23opendev/latest.log.html also hasn't been updating that i can see	21:42
Guest24679	taking longer this time because I can't just identify I have hto igure out how to ghost myself	21:43
*** Guest24679 is now known as clarkb		21:44
fungi	ah, though the raw version at https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2025-08-21.log has content at least	21:44
clarkb	looks like reclaim and regain are aliases of one another	21:45
clarkb	fungi: I assume afs is still working on that docs volume?	21:47
corvus	i'm going to restart the launchers, schedulers, and web in order to pick up some zuul provider configuration syntax changes	21:48
clarkb	ack	21:49
fungi	clarkb: yeah, it almost feels like having a ro replica at that location doesn't speed up moving the rw there at all and it's just transferring all the data anyway	21:49
clarkb	fungi: I wonder if that is because we're publishing docs updates via zuul jobs regularly	21:50
fungi	no idea, if it doesn't finish before i knock off in a bit, i'll pick it back up in the morning. at least i ran it under time so i'll know how long it took to finish	21:51
clarkb	oh good idea. I wouldn't have done that :)	21:52
corvus	it'd be cool if a shell would let you retroactively "time" something	21:53
corvus	like "time %1"	21:53
fungi	yeah, seems like it wouldn't take much for a shell to track that internally	21:53
corvus	apparently fish has a $CMD_DURATION variable set after each interactive command	21:54
fungi	oh cool!	21:54
tonyb	history -w and look at the timestamps	21:54
fungi	that's a tempting selling point for it	21:55
fungi	tonyb: that gets you start time but not end time, i think?	21:55
fungi	so if you catch the process right after it terminates that's mostly usable	21:55
fungi	if you leave and come back later and want to know when the process ended, i don't think history gets you that	21:55
tonyb	true. it'd be an approximation	21:56
corvus	i've been trying out fish on one of my machines... i'm growing addicted to its command completion	21:56
clarkb	tonyb: corvus anyone want to weigh in on https://review.opendev.org/c/opendev/system-config/+/957950 which gets system-config out of needing ansible 9 overrides?	21:57
tonyb	you could add something to a prompt_command to record timestamps	21:57
tonyb	but that's getting kinda clunky. fish's solution is much neater	21:58
fungi	i just noticed mirror.debian last updated 12 days ago. looking at the reprepro logs, it's complaining about a missing pubkey for signatures by B8E5F13176D2A7A75220028078DBA3BC47EF2265, so probably we need to add another key to our config but i don't think i have time for that tonight	22:01
corvus	clarkb: lgtm	22:01
fungi	see the end of /var/log/reprepro/debian.log on mirror-update.o.o for details	22:02
clarkb	fungi: I'm guessing the new key came in via trixie and they started signing the old stuff with it too? I can take a look	22:02
fungi	likely, that's what i'm assuming anyway	22:02
fungi	timing about lines up	22:02
tonyb	clarkb: lgtm. I guess approve at will	22:03
opendevreview	James E. Blair proposed opendev/zuul-providers master: Update bindep-fallback path https://review.opendev.org/c/opendev/zuul-providers/+/958247	22:04
clarkb	thanks I've approved it	22:04
corvus	the change to remove the nodepool elements from openstack/project-config merged, but that exposed a place where zuul-providers was referencing that repo. https://review.opendev.org/958247 is needed to fix our image builds, and we need to fix our image builds in order to get images for rax-flex iad3 (i think we have probably exceeded our 3 day timeout for keeping the objects around)	22:05
corvus	since clarkb +2d that i went ahead and approved it. that's got several hours of work in the gate ahead of it still if anyone else has thoughts....	22:07
corvus	actually it's failing in gate because	22:07
corvus	2025-08-21 22:07:11,093 ERROR zuul.Launcher: zuul.exceptions.LaunchStatusException: Server in error state	22:07
corvus	we're getting that from osuosl	22:07
clarkb	corvus: if we can get the error message we can pass that along to Ramereth[m]	22:08
corvus	clarkb: zuul logs it if it gets one	22:08
corvus	and i don't see one	22:08
corvus	worth double checking on bridge though in case there's a bug in that	22:09
clarkb	corvus: via server show you mean?	22:09
corvus	(or maybe there are errors that only show up in detailed server gets, not server listings)	22:09
corvus	yeah	22:09
corvus	i'm on that	22:09
corvus	this command right? /usr/launcher-venv/bin/openstack --os-cloud opendevzuul-osuosl --os-region RegionOne server list	22:10
corvus	that's empty, i think zuul is deleting the error servers very quickly	22:11
clarkb	ya that command looks right	22:11
corvus	i think it may be time to consider this change: https://review.opendev.org/955797	22:11
clarkb	we may need to boot something out of band and check the error or see if Ramereth[m] sees anything on the cloud side	22:12
clarkb	corvus: for 955797 what causes zuul to ignore the failed image builds? They won't have the archive info listed because the failed jobs don't get to that point?	22:12
corvus	yep	22:13
corvus	they either produced and artifact or didn't, that's all we really care about	22:13
clarkb	+2 from me	22:14
corvus	another thing we could consider is doing another set of pipelines for the arm images. would let us group them together in buildsets. that's an alternative to 797, but we can also do both.	22:16
corvus	797 gives us the swiss-cheese model of image builds: whatever builds, we use. making another set of pipelines lets us do all-or-none for x86, and all-or-none for arm. both are different from our current global-all-or-none.	22:17
opendevreview	Clark Boylan proposed opendev/system-config master: Add Trixie keys to reprepro config https://review.opendev.org/c/opendev/system-config/+/958248	22:17
corvus	(and, if we do both things, then we get swiss-cheese buildsets for each)	22:17
clarkb	corvus: considering each image build is largely its own thing (even each release within a distro is pretty independent) I think taking what we can get is probably best	22:17
clarkb	rather than splitting it up by arch or distro or whatever	22:18
corvus	wfm	22:18
clarkb	fungi: something like 958248 maybe	22:18
opendevreview	James E. Blair proposed opendev/zuul-providers master: Remove build_diskimage_image_name variable https://review.opendev.org/c/opendev/zuul-providers/+/956373	22:19
clarkb	fungi: the key hash string you provided above doesn't seem to match the key hashes I found but I'm guessing thats just because its hashing something different	22:20
opendevreview	Merged opendev/zuul-providers master: Use label-defaults https://review.opendev.org/c/opendev/zuul-providers/+/956946	22:20
corvus	i have not seen this error building images before: https://zuul.opendev.org/t/opendev/build/6cdc17c3b50c41d2a9daa5149813c1c7	22:21
fungi	clarkb: looks like it's a related subkey: https://lists.debian.org/debian-devel-announce/2025/04/msg00001.html	22:21
corvus	other builds got past that point... so it seems like maybe a fluke, but i don't know the origin	22:22
clarkb	corvus: let me try cloning that repo locally from each backend I guess	22:22
corvus	clarkb: oh good idea	22:22
clarkb	could be gitea backend specific	22:22
fungi	clarkb: 958248 is adding 225629DF75B188BD which is the corresponding master key for that subkey, so that looks right to me	22:23
clarkb	first thing I notice is that repo appears to be somewhat large....	22:23
clarkb	like bigger than nova	22:23
corvus	oO	22:24
clarkb	https://opendev.org/cfn/computing-offload/src/branch/master/LingYaoSNIC/BCLinux they are just shoving rpms in there	22:24
clarkb	456MB	22:25
clarkb	so maybe not bigger than nova but in the same range	22:25
corvus	give it time	22:26
clarkb	fungi: do you know who to talk to about this? I know you interacted with them a couple times then I tried to respond to them on the list. But this is really not what gerrit or git is good at...	22:27
clarkb	anyway gitea09 clones cleanly	22:27
fungi	i do not, sorry	22:27
fungi	horace may have some contacts	22:27
fungi	if memory serves, he follows some of their development effort	22:28
clarkb	thanks I'll followup there once I have a bit more info	22:28
fungi	he might also be able to get their attention more easily through wechat/weixin	22:31
opendevreview	Merged opendev/system-config master: Drop Bionic testing for OpenDev services, playbooks and roles https://review.opendev.org/c/opendev/system-config/+/957950	22:35
clarkb	corvus: all 6 gitea backends clone that repo for me successfully right now	22:35
clarkb	corvus: the only other thought I've got is maybe a zuul merger is copying data over that is unhappy and that is propogating into the image builds?	22:36
clarkb	except I think we're just copying the old cache over not using the mergers?	22:36
clarkb	could be the repo is large enough to have hit a bit flip?	22:36
clarkb	I've asked horace if we can see about helping them use our tools more effectively (code review to prevent bad patches in the first place, artifact build and storage, git/gerrit/gitea, etc)	22:47
clarkb	corvus: tonyb: re arm64 there is a bookworm arm64 node booted right now that is active and not in an error state and I was able to ssh into it	23:37
clarkb	I also manually booted an ubuntu noble node and that worked (using our image)	23:38
clarkb	so maybe whatever the issue is has been resolved or is image specific?	23:38
clarkb	I'm going to delete my test node now	23:39
corvus	maybe	23:39
corvus	i can recheck something in a sec	23:39
corvus	but i just got distracted by the fact that all the x86 image builds failed somehow	23:39
corvus	oh	23:40
opendevreview	James E. Blair proposed opendev/zuul-providers master: Update bindep-fallback path https://review.opendev.org/c/opendev/zuul-providers/+/958247	23:40
corvus	clarkb: ^ i got the path wrong	23:41
clarkb	+2	23:41
corvus	that will produce a number of arm requests	23:41
clarkb	the last server listing I did shows 4 servers. 2 noble, 1 jammy, 1 bookwrom all active on the cloud side	23:42
corvus	well, they're 2 minutes in the building stat, that's promising	23:43
corvus	and some in use now	23:43
corvus	so yes, i guess the anomaly was temporally bounded :)	23:44

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!