Wednesday, 2024-12-18

ykarelHi is the issue with latest ubuntu jammy image known ? jobs using linuxbridge/openvswitch broken as buggy kernel shipped04:11
ykarelreported https://bugs.launchpad.net/neutron/+bug/209199005:04
tonybykarel: Have you reported that to the ubuntu kernel team/package?05:06
ykareltonyb, just commented on the linked bugs05:07
tonybykarel: Okay I think I follow05:14
ykarelalso asking on #ubuntu-kernel on libra.chat, or there any other place to report?05:15
ykareltonyb, any way to hold this update on infra side?05:16
tonybykarel: Thanks probably adequate, I assume if they need more info they'll ask and/or point you to the right process05:16
tonybykarel: I'm looking into it05:16
ykarellike reverting to previous good image and stopping new updates until there is new kernel fix05:16
ykarel /etc/dib-builddate.txt 2024-12-17 07:4505:17
tonybykarel: I think it's recoverable, but it requires a little more nodepool ops than I have. I'm reading but it might be a job for frickler05:43
tonybinfra-root: ykarel noticed https://bugs.launchpad.net/neutron/+bug/2091990 with jammy kernels.  Lookin at the nodepool logs ubuntu-jammy-e6937ec44c254831b6a59904e5e3c655 contains the bad kernel but ubuntu-jammy-6b82bb6f7d00476c90a7ab18de68b645 contains the previous ... expected good kernel.  I'm not sure of the right way to force the clouds+regions that have the latest image to go back one ... and pause uploading new 05:46
tonybversions05:46
fricklertonyb: ykarel: ok, I did this now, let's see whether it helps, not sure how old the broken kernel build actually is https://paste.opendev.org/show/b04mPkWgdEVY6nE3kaRU/05:51
fricklermaybe also jamespage is around and can help pushing a fix/revert on the ubuntu archive side05:52
tonybfrickler: thanks05:58
frickler#status log deleted the latest ubuntu-jammy image and paused new builds for it in an attempt to avoid a likely kernel bug06:00
opendevstatusfrickler: finished logging06:00
*** ykarel_ is now known as ykarel06:02
ykarelthx frickler tonyb 06:04
tonybfrickler: thanks, and I assume the launcher will detect clouds that have e693... (re)upload 6b82... and cycle the nodes06:05
ykareland how much time ^ usually takes?06:05
fricklerthe clouds also keep the two copies, no reupload needed. the deletion is almost done, just openmetal seems to be very slow at it06:06
fricklerso nodes starting anywhere else now should already be using the old image06:07
fricklernote that jobs may use up existing nodes for a bit, though06:07
tonybfrickler: thanks for the clarification06:07
ykarelthx, me rechecks and will have results in some time06:08
fricklerah, openmetal cannot delete images that are still in use. there is also an image 116d old that nodepool keeps trying to delete, I assume that that me be the one we booted the mirror node from06:15
fricklerFailed to delete image with name or ID 'ubuntu-jammy-1734429394': ConflictException: 409: Client Error for url: http://192.168.3.254:9292/v2/images/eca58c19-ce6f-4e74-b3fa-a955293042cb, : 409 Conflict: Image eca58c19-ce6f-4e74-b3fa-a955293042cb could not be deleted because it is in use: The image cannot be deleted because it is in use through the backend store outside of Glance.06:15
fricklernot sure whether we want to look into that with openmetal folks or just accept it as a given06:16
ykarelok can confirm jobs started using old kernel in latest triggered job KERNEL_VERSION=5.15.0-126-generic06:17
ykarel^ from Provider: openmetal-iad306:17
tonybfrickler: Ahh okay.  I think it's definitely something we should look into, IIRC I mistakenly booted the mirror node in the wrong tennat, so perhaps we should move it which would free up the image for deletion06:19
fricklertoo bad that this bug will undermine the "IPv6 is bad" impression for some people06:27
*** darmach5 is now known as darmach14:24
clarkbfrickler: yes this is a fundamental problem with boot from volume imo. I wish that the cloud could logically separate the resources instead of making humans keep track of it all15:48
clarkbfrickler: tonyb ykarel does the kernel bug affect any releases other than jammy?15:50
fricklerclarkb: afaict one could configure cinder to flatten volumes after cloning from an image, not sure though how that'd affect performance15:50
fricklerclarkb: no, only 5.15 kernel affected15:51
clarkbfrickler: I'm saying that this shouldn't be a configuration thing at all. I should be able to delete an image even if things are booted from it. Ownership of that data in ceph or wherever should simply be attributed to the instance and not the image at that point. Basically I want hardlinks for images but instead we hav symlinks to the glance owned resource15:58
clarkbfrickler: looks like https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2091990 should tell us when it is safe to unpause the jammy builds too?16:27
fricklerclarkb: yes, I'm subscribed to that bug. do we want to write the pause into the config, too? I applied it manually this morning in order to avoid being too slow and having the old image getting lost16:31
clarkbthat might be good. I'm also trying to understand if this affects any of our jammy nodes16:31
tonybclarkb: I wasn't abel to determine that yesterday but Its on my to-do list for today22:01
tonybbut it looks like the Ubuntu kernel team has already answered that question22:05
clarkbtonyb: ya seems like oracular and jammy only22:08
clarkbstill no updates to ubuntu jammy proposed unfortunately22:27
tonybYeah :/22:27
tonybI wonder how/why noble was skipped22:27
clarkbI get the sense that the SRU process for each kernel is somewhat independent of the others?22:29
clarkband those two just happened to be updated already? But I'm not positive of that22:30
*** haleyb is now known as haleyb|out22:47
tonybYeah, seems somewhat strange to me.  Speaking only as an ex kernel engineer and distro packager23:34
opendevreviewClark Boylan proposed openstack/pbr master: Update PBR testing for Noble  https://review.opendev.org/c/openstack/pbr/+/93803023:48
clarkbafter leaving my comments on https://review.opendev.org/c/openstack/pbr/+/924216 I decided to try and figure out what actually is required and the first step is getting CI running on noble. That said the python3.12 unittests pass so ya I think PBR itself works its largely the supporting structures that we need to udpate then also maybe dealing with deprecations and the like23:49
clarkbI've forced the docs job to noble in taht first patchset to determine if we need setuptools in doc/requirements.txt. Currently the job runs on jammy so it just works23:49
clarkbI wasn't a fan of precommit before ^ but now that precommit doesn't log what it is installing int othe env that it runs commands out of I'm really not a fan23:50
clarkbI only know that hacking needs to be updated to make this work because I've had to debug this elsewhere but in those cases I wasn't using pre commit so it was easier23:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!