opendevreview | sean mooney proposed openstack/nova master: refactor numa claims https://review.opendev.org/c/openstack/nova/+/898056 | 03:35 |
---|---|---|
opendevreview | sean mooney proposed openstack/nova master: refactor numa claims https://review.opendev.org/c/openstack/nova/+/898056 | 03:37 |
opendevreview | sean mooney proposed openstack/nova master: imporve nova object logging https://review.opendev.org/c/openstack/nova/+/898057 | 04:29 |
tobias-urdin | sean-k-mooney: when you have some seconds, based on https://review.opendev.org/c/openstack/nova/+/824048 and the issue from libvirt down expressed in https://gitlab.com/libvirt/libvirt/-/issues/161 I assume it would be thrown upon implementing the workaround in the nova layer by checking for cgroupsv1/cgroupsv2 and reintroducing the default | 09:08 |
tobias-urdin | cputune.shares value based on that (lets ignore the live migration part of it now and just focus on "having the correct value" based on the crazy decision made in libvirt).. any thoughts on that? | 09:08 |
opendevreview | Sylvain Bauza proposed openstack/nova master: WIP: add mtty support for vgpus https://review.opendev.org/c/openstack/nova/+/898100 | 09:40 |
bauzas | dansmith: I know that's too early for you but my devstack fails on your mtty patch https://paste.opendev.org/show/bwpHZPMuePifFk0JWi7v/ (just reply when you want) | 12:46 |
bauzas | sean-k-mooney: ^ | 12:46 |
bauzas | actually the whole stack is here https://paste.opendev.org/show/bLirDAMuGmxhLLod0S9p/ | 12:47 |
bauzas | stack@sbauza-dev2:~/devstack$ cat /etc/lsb-release | 12:49 |
bauzas | DISTRIB_ID=Ubuntu | 12:49 |
bauzas | DISTRIB_RELEASE=22.04 | 12:49 |
bauzas | DISTRIB_CODENAME=jammy | 12:49 |
bauzas | DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS" | 12:49 |
bauzas | stack@sbauza-dev2:~/devstack$ uname -r | 12:49 |
bauzas | 5.15.0-1037-kvm | 12:49 |
frickler | bauzas: is there some linux-modules-extra pkg for that kernel? | 13:19 |
dansmith | bauzas: looks like maybe it has module symbols from a different kernel | 13:22 |
dansmith | since it failed to create /opt/stack/kernel .. did you run it once, upgrade kernel, run again? | 13:22 |
bauzas | dansmith: it didn't failed to create kernel | 13:36 |
bauzas | (sorry was on a meeting) | 13:36 |
dansmith | didn't fail? | 13:36 |
dansmith | ++/opt/stack/nova/devstack/lib/mdev_samples:compile_mdev_samples:19 mkdir /opt/stack/kernel | 13:36 |
dansmith | mkdir: cannot create directory ‘/opt/stack/kernel’: File exists | 13:36 |
bauzas | the kernel modules, yeah | 13:36 |
bauzas | ah | 13:36 |
bauzas | probably because devstack failed before on OVS | 13:37 |
bauzas | but I hadn't upgraded my kernel | 13:37 |
dansmith | that's what I just said yeah :) | 13:37 |
dansmith | blow that away and try again, I'll add cleaning of that to clean.sh | 13:37 |
bauzas | want me to unstack, delete the kernel dir and stack ? | 13:37 |
bauzas | ok | 13:38 |
bauzas | I can try | 13:38 |
bauzas | it's a pure compute node, no controllers on it so should be small and quick | 13:38 |
dansmith | well, you could just run the plugin itself again (after cleaning the kernel dir) but re-stacking would be better | 13:39 |
bauzas | yeah | 13:39 |
bauzas | if that fails again, I'll run the plugin itself | 13:40 |
bauzas | dansmith: fwiw, I started to provide the mtty change on nova | 13:40 |
bauzas | I could have missed something (and I know I did, since I haven't modified the method that checks the mdevs when we restart, but meh not a problem for testing) but I think it's a start | 13:41 |
dansmith | cool | 13:42 |
bauzas | dansmith: failing again https://paste.opendev.org/show/bFTPQR1VBSJNQj2jrknG/ | 13:42 |
bauzas | this time, this is linking the modules | 13:43 |
bauzas | but still failing on the definitions | 13:43 |
dansmith | bauzas: show me "dpkg -l | grep linux" and "ls /usr/src" | 13:43 |
bauzas | cool | 13:43 |
dansmith | also uname -r and "ls -l /lib/modules/*/build" | 13:44 |
dansmith | worked fine for me and sean-k-mooney, but it gets weird if you have a bunch of mismatched kernels and things installed sometimes | 13:44 |
bauzas | dansmith: there you go https://paste.opendev.org/show/bSXwvpbc6Z96awnWuIRS/ | 13:45 |
bauzas | I can reclone the machine and reinstall devstack again | 13:46 |
bauzas | I mean, if my env is dirty, meh | 13:46 |
bauzas | probably better to start with fresh things | 13:46 |
dansmith | yeah that' a ton of kernel distraction, although it seems like it all lines up | 13:48 |
dansmith | however, | 13:48 |
dansmith | there are lots of reports out there about having linux-headers installed for the right kernel, but needing to reinstall it if it was installed by a previous version | 13:48 |
dansmith | not sure why, because it should be all separate | 13:49 |
dansmith | did you already have linux-headers for your kernel installed or did the first run of this install it for you? | 13:49 |
sean-k-mooney | you have a kvm variant of the kernl | 13:52 |
sean-k-mooney | so maybe thats the issue | 13:52 |
sean-k-mooney | i have Linux upstream-devstack 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | 13:52 |
dansmith | oh is there a separate headers package for the kvm variant? if so that's definitely it | 13:52 |
bauzas | I just rebuilt the instance I have as devstack2 | 13:53 |
dansmith | ah yep | 13:53 |
dansmith | that's the problem | 13:53 |
bauzas | and yeah, if the symbols are only on the plain kernel, that explains | 13:54 |
bauzas | not sure how I got this variant, probably because of the image I used in my cloud | 13:54 |
dansmith | so linux-headers-$kver won't work for the variants, because it needs to be linux-headers-kvm I guess | 13:54 |
dansmith | bauzas: so you can switch to the normal one? | 13:55 |
bauzas | yeah that's my guess | 13:55 |
bauzas | dansmith: too late, I rebuilt it | 13:55 |
dansmith | what does that mean, you installed the right headers package and built the modules? | 13:55 |
sean-k-mooney | i think they mean reinstalled the vm | 13:55 |
dansmith | that's what I said.. switch to the normal kernel, however that needs to happen :) | 13:56 |
dansmith | either with apt or rm :) | 13:56 |
sean-k-mooney | bauzas: are you using a cloud imave or similar? i have never seen the kvm kernel actully installed in anything | 13:56 |
dansmith | yeah I'm not sure what it's for either | 13:56 |
sean-k-mooney | well its a striped down kernel without any hard specific drivers | 13:56 |
dansmith | "Targeting KVM instance usage. This carries the minimum support required for a guest kernel." | 13:56 |
sean-k-mooney | but they dont use it for the cloud image | 13:57 |
sean-k-mooney | so i dont know who the audiance for that is | 13:57 |
dansmith | yeah, probably for ultra size-conscious situations or something | 13:57 |
bauzas | sean-k-mooney: I used some internal cloud at my employer | 13:57 |
bauzas | stack@sbauza-dev2:~$ uname -r | 13:58 |
bauzas | 5.15.0-1017-kvm | 13:58 |
bauzas | okay, coming from the image itsefl | 13:58 |
bauzas | fun | 13:58 |
bauzas | I'm tempted to pick another image or download another kernel | 13:59 |
sean-k-mooney | yes just install a diffent kernel and reboot | 13:59 |
sean-k-mooney | or use the ubuntu image i upoloaded | 13:59 |
bauzas | that will be the quickiest | 13:59 |
dansmith | I can see if I can make it work either way, but yeah do the workaround for the moment | 14:00 |
sean-k-mooney | the offical ubuntu cloud images have 5.15.0-48-generic | 14:00 |
sean-k-mooney | well thats an old image so its probaly outdated but i just confirmed that so you booted form a customized image | 14:01 |
dansmith | yeah so you can't even get the support packages for very old kernels, so you need to be running something current anyway | 14:02 |
dansmith | like they remove them | 14:02 |
bauzas | hum | 14:05 |
bauzas | which kernel version is currently running on our CI jobs ? | 14:06 |
dansmith | the image in our CI gets refreshed periodically for that reason | 14:06 |
bauzas | because I have the choices | 14:06 |
sean-k-mooney | i allways do an apt dist-upgrade and reboot before running devstack for what its worth | 14:06 |
sean-k-mooney | nightly for upstream ci | 14:06 |
dansmith | bauzas: https://zuul.opendev.org/t/openstack/build/3c70446c0452434a87166d336dc0dbc5/log/job-output.txt#22580 | 14:06 |
sean-k-mooney | bauzas: just install the generic kernel | 14:06 |
dansmith | linux-headers-5.15.0-86-generic | 14:06 |
bauzas | I'll then install linux-image-5.15.0-86-generic | 14:07 |
dansmith | that's what I'm running locally too | 14:07 |
sean-k-mooney | same | 14:07 |
dansmith | hmm, actually linux-headers-kvm resolves to linux-headers-$ver-kvm as well, so I'm not quite sure why this isn't just working, actually | 14:11 |
dansmith | oh but there's also linux-kvm-headers-$ver .. weird | 14:12 |
bauzas | ok, changing the kernel gave me some kernel tain at boot | 14:15 |
bauzas | taint* | 14:15 |
dansmith | at boot? | 14:15 |
bauzas | yeah | 14:15 |
bauzas | I'll just upload a fucking plain ubuntu image | 14:15 |
dansmith | lol | 14:15 |
dansmith | can you pastebin the taint message? really curious | 14:16 |
dansmith | unless your provider is inserting something of their own, I can't imagine... | 14:16 |
bauzas | argh, sorry, again already rebuilt | 14:21 |
bauzas | IIRC, it was about getting the disk | 14:22 |
bauzas | ok I need to go parenting my kid | 14:25 |
bauzas | bbiab | 14:25 |
dansmith | bauzas: fwiw, I can reproduce your issue if I install the kvm variant, so I'll see if I can figure out why that's not working | 14:50 |
bauzas | ack | 14:50 |
bauzas | that's crazy, I don't know yet why but the variant is automatically installed in my cloud env :facepalm | 14:50 |
dansmith | yeah | 14:51 |
dansmith | oh neither of the headers packages for that kernel contain the module symbols it seems | 14:52 |
dansmith | yeah totally different, weird | 14:54 |
bauzas | dansmith: yup and apparently I can't avoid this kernel variant to be installed in my cloud | 14:54 |
bauzas | I can try to use a bare kernel like the last time but my guess is that it won't boot | 14:55 |
dansmith | bauzas: can't avoid installing it or can't avoid booting to it? | 14:55 |
bauzas | both | 14:55 |
dansmith | how does it fail? | 14:55 |
bauzas | it's internal thing, so slack | 14:55 |
dansmith | bauzas: sean-k-mooney: ah, I got it.. the -kvm kernel has none of the vfio or mdev stuff enabled in config (because it's for virtual only) and so we have module symbols, but not symbols for the vfio or mdev base infrastructure | 15:19 |
dansmith | which makes sense of course | 15:19 |
dansmith | so we can't compile those mdev drivers because the infra they depend on is missing | 15:20 |
bauzas | dansmith: I eventually was able to create a new instance w/o that variant thanks to another image | 15:26 |
dansmith | cool, I think that's the way | 15:26 |
bauzas | but fwiw https://cloud-images.ubuntu.com/jammy/20231012/ provides a specific QEMU image with the kvm variant | 15:26 |
dansmith | yep, but we're simulating hardware here so we need a kernel with hardware stuff in it | 15:27 |
bauzas | so that's something so we can't support | 15:27 |
bauzas | agreed | 15:27 |
-opendevstatus- NOTICE: The lists.openstack.org site will be offline over the next few hours for migration to a new server | 15:31 | |
opendevreview | Dan Smith proposed openstack/nova master: Compile mdev samples for nova-next https://review.opendev.org/c/openstack/nova/+/897708 | 15:58 |
dansmith | bauzas: this ^ hard fails on the kvm variant to make sure it's obvious | 16:01 |
dansmith | and also addresses some of sean-k-mooney's feedback | 16:02 |
bauzas | cool | 16:02 |
bauzas | my devstack node is still deploying the stack but I have good hopes | 16:08 |
bauzas | actually, my hope was wrong : E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 30066 (apt) | 16:08 |
bauzas | shit, need to respin | 16:09 |
dansmith | is auto upgrades running in the background? | 16:09 |
dansmith | that's a common thing | 16:09 |
sean-k-mooney | thats why i always do sudo apt update; sudo apt dist-upgrade -y; sudo reboot | 16:17 |
sean-k-mooney | before i run devstack for the first time | 16:17 |
sean-k-mooney | ok i also do "sudo apt install python3-dev libffi-dev libssl-dev gcc make git" too | 16:19 |
sean-k-mooney | devstack does pull those in as needed but it causes less issues if you do it upfront | 16:19 |
sean-k-mooney | thats just making sure that we can compile python c modules for pip deps | 16:21 |
bauzas | fwiw, was able to build the kernel modules with the plain ubuntu kernel | 16:43 |
bauzas | so I'll +2 dansmith's patch | 16:43 |
bauzas | still fighting with a last devstack issue, but unrelated to dan's | 16:43 |
dansmith | cool | 16:43 |
bauzas | dansmith: for some reason, I'm hitting hard https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1523 but it's just the first devstack start | 17:21 |
bauzas | so I wonder how to fix it | 17:21 |
dansmith | uh, leftover database? | 17:21 |
bauzas | probably | 17:22 |
bauzas | so, do you prefer that I directly write the file or removing the node ? | 17:22 |
dansmith | up to you, either way it's detecting the right thing | 17:24 |
bauzas | actually yeah, I recloned the host but this isn't a AIO | 17:24 |
bauzas | so the DB still exists | 17:24 |
bauzas | DB record* | 17:24 |
bauzas | okay, will write the file | 17:24 |
dansmith | right which is exactly what that check is for :P | 17:27 |
bauzas | dansmith: actually, I have to doublecheck but since state_path is /opt/stack/data/nova, this probably goes away when you unstack | 17:54 |
dansmith | only when you clean I think | 17:54 |
dansmith | but you said it's not an AIO right? | 17:54 |
bauzas | yeah, just a single compute connecting to another instance | 17:55 |
dansmith | oh you mean nova's state path gets deleted | 17:55 |
bauzas | yeah, my theory is when unstacking | 17:55 |
dansmith | right, but that's the thing it's checking for.. that you basically rebuilt a node with the same name in place | 17:55 |
dansmith | if you don't want that, copy the uuid file to /etc/nova and it will re-use the same thing over and over | 17:56 |
bauzas | right, on purpose :) | 17:56 |
dansmith | (assuming you don't delete /etc/nova I guess unstack might) | 17:56 |
dansmith | maybe we need to let devstack force a uuid so we can keep it for subsequent stack runs | 17:56 |
dansmith | I always do AIO so I never hit that | 17:56 |
bauzas | anyway, not sure we need to change anything, but that's a bit complicating a story of a 2-node devstack install with the n-cpu failing once | 17:56 |
bauzas | maybe, for the moment, my brain is doing concurrency between a devstack install and a customer bug | 17:57 |
bauzas | so I don't have the energy to think about any potential solution | 17:58 |
dansmith | it's doing the thing it's supposed to which is catching that you've recreated a compute node with the same name without deleting it from the database | 18:01 |
dansmith | so either delete it from the DB, or slap the uuid file into place | 18:01 |
dansmith | I'll add something to devstack to basically disable that behavior by letting you force the uuid to be the same on each stack | 18:02 |
dansmith | bauzas: https://review.opendev.org/c/openstack/devstack/+/898134 | 18:10 |
dansmith | put a uuid in NOVA_CPU_UUID= in your localrc and you should be good | 18:10 |
bauzas | yup | 18:11 |
bauzas | excellent thing | 18:11 |
bauzas | +1 from me | 18:11 |
opendevreview | Dan Smith proposed openstack/nova master: Do not manage CPU0's state https://review.opendev.org/c/openstack/nova/+/898137 | 19:20 |
dansmith | bauzas: yw ^ | 19:21 |
opendevreview | Sylvain Bauza proposed openstack/nova master: WIP: add mtty support for vgpus https://review.opendev.org/c/openstack/nova/+/898100 | 19:22 |
sean-k-mooney | dansmith: before i finish for tonight do we want to merge the mdev plugin or wait for bauzas to modify nova to work with it first | 20:26 |
sean-k-mooney | im leanign towords merge since we can trivially revert it if we dont end up using it for any reason but i said i would ask first | 20:27 |
bauzas | I still need to test devstack witth precreated mdevs | 20:27 |
sean-k-mooney | ok we can wait then i was assuming your +2 ment you were ok to merge it but was not sure | 20:28 |
sean-k-mooney | im just clsoing broser tabs so said i would ask before i do | 20:28 |
sean-k-mooney | i can loop back to this on monday | 20:28 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!