Wednesday, 2024-12-18

ykarel	Hi is the issue with latest ubuntu jammy image known ? jobs using linuxbridge/openvswitch broken as buggy kernel shipped	04:11
ykarel	reported https://bugs.launchpad.net/neutron/+bug/2091990	05:04
tonyb	ykarel: Have you reported that to the ubuntu kernel team/package?	05:06
ykarel	tonyb, just commented on the linked bugs	05:07
tonyb	ykarel: Okay I think I follow	05:14
ykarel	also asking on #ubuntu-kernel on libra.chat, or there any other place to report?	05:15
ykarel	tonyb, any way to hold this update on infra side?	05:16
tonyb	ykarel: Thanks probably adequate, I assume if they need more info they'll ask and/or point you to the right process	05:16
tonyb	ykarel: I'm looking into it	05:16
ykarel	like reverting to previous good image and stopping new updates until there is new kernel fix	05:16
ykarel	/etc/dib-builddate.txt 2024-12-17 07:45	05:17
tonyb	ykarel: I think it's recoverable, but it requires a little more nodepool ops than I have. I'm reading but it might be a job for frickler	05:43
tonyb	infra-root: ykarel noticed https://bugs.launchpad.net/neutron/+bug/2091990 with jammy kernels. Lookin at the nodepool logs ubuntu-jammy-e6937ec44c254831b6a59904e5e3c655 contains the bad kernel but ubuntu-jammy-6b82bb6f7d00476c90a7ab18de68b645 contains the previous ... expected good kernel. I'm not sure of the right way to force the clouds+regions that have the latest image to go back one ... and pause uploading new	05:46
tonyb	versions	05:46
frickler	tonyb: ykarel: ok, I did this now, let's see whether it helps, not sure how old the broken kernel build actually is https://paste.opendev.org/show/b04mPkWgdEVY6nE3kaRU/	05:51
frickler	maybe also jamespage is around and can help pushing a fix/revert on the ubuntu archive side	05:52
tonyb	frickler: thanks	05:58
frickler	#status log deleted the latest ubuntu-jammy image and paused new builds for it in an attempt to avoid a likely kernel bug	06:00
opendevstatus	frickler: finished logging	06:00
*** ykarel_ is now known as ykarel		06:02
ykarel	thx frickler tonyb	06:04
tonyb	frickler: thanks, and I assume the launcher will detect clouds that have e693... (re)upload 6b82... and cycle the nodes	06:05
ykarel	and how much time ^ usually takes?	06:05
frickler	the clouds also keep the two copies, no reupload needed. the deletion is almost done, just openmetal seems to be very slow at it	06:06
frickler	so nodes starting anywhere else now should already be using the old image	06:07
frickler	note that jobs may use up existing nodes for a bit, though	06:07
tonyb	frickler: thanks for the clarification	06:07
ykarel	thx, me rechecks and will have results in some time	06:08
frickler	ah, openmetal cannot delete images that are still in use. there is also an image 116d old that nodepool keeps trying to delete, I assume that that me be the one we booted the mirror node from	06:15
frickler	Failed to delete image with name or ID 'ubuntu-jammy-1734429394': ConflictException: 409: Client Error for url: http://192.168.3.254:9292/v2/images/eca58c19-ce6f-4e74-b3fa-a955293042cb, : 409 Conflict: Image eca58c19-ce6f-4e74-b3fa-a955293042cb could not be deleted because it is in use: The image cannot be deleted because it is in use through the backend store outside of Glance.	06:15
frickler	not sure whether we want to look into that with openmetal folks or just accept it as a given	06:16
ykarel	ok can confirm jobs started using old kernel in latest triggered job KERNEL_VERSION=5.15.0-126-generic	06:17
ykarel	^ from Provider: openmetal-iad3	06:17
tonyb	frickler: Ahh okay. I think it's definitely something we should look into, IIRC I mistakenly booted the mirror node in the wrong tennat, so perhaps we should move it which would free up the image for deletion	06:19
frickler	too bad that this bug will undermine the "IPv6 is bad" impression for some people	06:27
*** darmach5 is now known as darmach		14:24
clarkb	frickler: yes this is a fundamental problem with boot from volume imo. I wish that the cloud could logically separate the resources instead of making humans keep track of it all	15:48
clarkb	frickler: tonyb ykarel does the kernel bug affect any releases other than jammy?	15:50
frickler	clarkb: afaict one could configure cinder to flatten volumes after cloning from an image, not sure though how that'd affect performance	15:50
frickler	clarkb: no, only 5.15 kernel affected	15:51
clarkb	frickler: I'm saying that this shouldn't be a configuration thing at all. I should be able to delete an image even if things are booted from it. Ownership of that data in ceph or wherever should simply be attributed to the instance and not the image at that point. Basically I want hardlinks for images but instead we hav symlinks to the glance owned resource	15:58
clarkb	frickler: looks like https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2091990 should tell us when it is safe to unpause the jammy builds too?	16:27
frickler	clarkb: yes, I'm subscribed to that bug. do we want to write the pause into the config, too? I applied it manually this morning in order to avoid being too slow and having the old image getting lost	16:31
clarkb	that might be good. I'm also trying to understand if this affects any of our jammy nodes	16:31
tonyb	clarkb: I wasn't abel to determine that yesterday but Its on my to-do list for today	22:01
tonyb	but it looks like the Ubuntu kernel team has already answered that question	22:05
clarkb	tonyb: ya seems like oracular and jammy only	22:08
clarkb	still no updates to ubuntu jammy proposed unfortunately	22:27
tonyb	Yeah :/	22:27
tonyb	I wonder how/why noble was skipped	22:27
clarkb	I get the sense that the SRU process for each kernel is somewhat independent of the others?	22:29
clarkb	and those two just happened to be updated already? But I'm not positive of that	22:30
*** haleyb is now known as haleyb\|out		22:47
tonyb	Yeah, seems somewhat strange to me. Speaking only as an ex kernel engineer and distro packager	23:34
opendevreview	Clark Boylan proposed openstack/pbr master: Update PBR testing for Noble https://review.opendev.org/c/openstack/pbr/+/938030	23:48
clarkb	after leaving my comments on https://review.opendev.org/c/openstack/pbr/+/924216 I decided to try and figure out what actually is required and the first step is getting CI running on noble. That said the python3.12 unittests pass so ya I think PBR itself works its largely the supporting structures that we need to udpate then also maybe dealing with deprecations and the like	23:49
clarkb	I've forced the docs job to noble in taht first patchset to determine if we need setuptools in doc/requirements.txt. Currently the job runs on jammy so it just works	23:49
clarkb	I wasn't a fan of precommit before ^ but now that precommit doesn't log what it is installing int othe env that it runs commands out of I'm really not a fan	23:50
clarkb	I only know that hacking needs to be updated to make this work because I've had to debug this elsewhere but in those cases I wasn't using pre commit so it was easier	23:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!