Thursday, 2026-05-28

songwenpinghi team, is NVIDIA A100 MIG devices can passthrough to KVM VM?03:00
ralonsohHi folks, just a heads-up. I updated the Neutron policy file, using yours: https://github.com/openstack/nova/blob/380f657b5102707a3da676478e72fd96691e966b/doc/source/configuration/policy-concepts.rst06:22
ralonsohThis is my patch: https://review.opendev.org/c/openstack/neutron/+/99006906:22
ralonsohYou should remove the references to `enforce_scope`, that is now True, deprecated and marked for removal06:23
opendevreviewyatin proposed openstack/nova master: [DNM] Do not use sqlalchemy reserved metadata  https://review.opendev.org/c/openstack/nova/+/99042908:02
sahido/08:17
sahidquick question guys, there is a missing field "tags" in the unversioned notifications, i'm wondering whether we can fix that or perhpas unversioned notif is deprecated and we don't want to do anything here?08:19
gibisahid: unversioned is deprecated we only add new stuff to versioned notificatoin08:22
sahidgibi: ack thank you08:45
sahidbtw i have noticed that tags field is missing during delete if I have a moment i will try to fix that soon08:45
opendevreviewBalazs Gibizer proposed openstack/nova master: Poison eventlet import in native threading mode  https://review.opendev.org/c/openstack/nova/+/98628209:01
gibisahid: ack, feel free to ping me with the patc09:01
gibih09:01
opendevreviewhuanhongda proposed openstack/nova master: Fix scheduler aggregate cache race on concurrent add_host  https://review.opendev.org/c/openstack/nova/+/99043709:05
sean-k-mooneyim wondering if we coudl run a little experiment in the nova gate for a week or two to see if we could finally get rid of the kernel paincs09:16
sean-k-mooneyi was chating to frickler in the infra channel and they had the insight that we may have been thinking about the root cause incorrectly and that the panic was actully cause by file system currption09:17
sean-k-mooneythey pointed out that the current behiovor looks as if the inital copy of the filesystem form the initram was interupted09:18
sean-k-mooneycirros ships with a mostly empty rootfs (jsut a partion table) 09:18
sean-k-mooneyand copies the root on first boot09:18
sean-k-mooneyif we interupt that then we will get a panic when we resize09:19
sean-k-mooneyso we inheringly have a race in the ci right now09:19
sean-k-mooneyonce we normally win because the root fs is like 10 MB or so09:19
sean-k-mooneybased on there suggestion i built a test image https://github.com/SeanMooney/cirros/releases/tag/cirros-d260527-x86_64-5a75ef2-test09:20
sean-k-mooneywhere the root is prepopulated and tested it with a quick devstack change09:20
sean-k-mooneyhttps://review.opendev.org/c/openstack/devstack/+/99034809:20
sean-k-mooneyso the experimetn i would like to run in nova gate09:20
sean-k-mooneyis override the cirrors image we use in teh 2-3 jobs we normally see the issue in09:21
sean-k-mooneyand poitn it at my image09:21
sean-k-mooneywhile i speraly chat to frickler and the other cirros folks about how to do this upstram in the cirros project09:21
sean-k-mooneywe typically see 10-20  kernel panics a day on nova jobs09:22
sean-k-mooneyso what im hopign is if we run that experimetn for a week or two that woudl drop to 009:22
sean-k-mooneyim going to create a patch to override the cirros image path for the relevent jobs so we can dicsuss mroe but i would be intersted in what folks think09:23
sean-k-mooneyare we willing to run the experiment for 2 weeks to get the data? we can obvioulsy quickly revert if needed09:24
sean-k-mooneymy corrent workign thory is we actully had 2 diffent kernel paninc, the first was resovled with the waiting ofr sshable changes for volume attach/detach. and the second was this file system currptionissue that we see with  BFV + resize or BFV + any reboot i guess09:28
gibisean-k-mooney: your plan sounds sane to me09:29
sean-k-mooneywhat i can do is prepare the url overied patch and add the topic to the irc meeting adjenda for monday. if we are happy to proceed we can merge it after the meeting adn then monitor it and see09:39
opendevreviewJoan Gilabert proposed openstack/nova-specs master: Repropose and update cyborg vGPU (mdev) support  https://review.opendev.org/c/openstack/nova-specs/+/96751511:06
*** jgilaber_ is now known as jgilaber12:02
opendevreviewThibaut Démaret proposed openstack/nova master: libvirt: add disk rotation_rate support for local disks  https://review.opendev.org/c/openstack/nova/+/97969312:05
kklimaszewskiHello, I've recently proposed a new nova spec for SPDK-based nova-provisioned storage backend: https://review.opendev.org/c/openstack/nova-specs/+/985676. Would anyone here be willing to review it/be its liaison? Also: Dominik (known here as Etua) has reproposed a spec from Ussuri for NUMA topology with resource providers: https://review.opendev.org/c/openstack/nova-specs/+/978570. Could someone here please review it?12:37
JayFsean-k-mooney: I think I need to find time, regardless of interruptions and urgency, to get your patches cleaned up and landed upstream. That's a radical improvement. https://usercontent.irccloud-cdn.com/file/i2G0mxRP/image.png14:49
JayFtl;dr: G-Research's (somewhat patched) 2024.1, one nova-compute that supported 1000 Ironic nodes was taking 65 minutes to restart. With three changes, that was reduced to 2 minutes.14:50
sean-k-mooneydamb14:51
sean-k-mooneyok14:51
JayF(NCI = "nova-compute-ironic" -- the internal GR moniker for nova-computes w/Ironic driver)14:51
JayFgiven step #1 is also the first thing we wanna remove -- moving the resource updates to Ironic -- gives me hope as well that we're on the right track14:51
sean-k-mooneyso step 1 is the dispatching of the udpate to the thread pool right14:52
sean-k-mooneyso you can run N threads to do the work14:53
sean-k-mooneyyou mentioned 32 workser but were they all used14:53
JayFpassed on the ask, I don't know14:53
sean-k-mooneyi assume you have not had time to test if say 8 or 16 has a linear effect?14:53
JayFto be clear: I don't have operational access to this cluster at all14:54
sean-k-mooneyno worries14:54
sean-k-mooneyjsut wondering14:54
JayFHonestly though, if we really get it down to 2 minutes14:54
JayFsomething like a graceful shutdown starts looking really good for HA updates14:54
sean-k-mooneyJayF: the vtpm thing is also interesting i think we may have fixed that laze loadign thing on master recently14:55
JayFif we could push Ironic queries to a secondary nova-compute while letting the first one complete actions in progress (or as discussed; hand off those in-progress actions from checkpoints)14:55
sean-k-mooneyJayF: https://review.opendev.org/c/openstack/nova/+/97703714:57
sean-k-mooneyso not merged yet14:57
sean-k-mooneyJayF: but htat will fix the vtpm issue14:57
sean-k-mooneyits an issue for libivrt as well14:57
sean-k-mooneyadded you to that so you shoudl get a notificaion whn that lands14:58
sean-k-mooneyi kind of forgot about that patch if im being honest but looks like i called out the ironic inpact the last tiem i looked :)14:59
sean-k-mooneyJayF: so the 3 patchs ye really need are https://review.opendev.org/c/openstack/nova/+/977037 https://review.opendev.org/c/openstack/nova/+/980676 and https://review.opendev.org/c/openstack/nova/+/980679/115:07
sean-k-mooneyon other thing that we coudl  perhaps look into in the futre is allow nova-compute too run in a active stadby mode or a better way to have muliple nova-computes per shared then the previous hashring approch15:16
JayFsean-k-mooney: I'm not convinced https://review.opendev.org/c/openstack/nova/+/977037 does the same as the downstream does15:16
sean-k-mooneyit does not skip the check but it but it remove the laze load overhead15:17
JayFif not self.driver.capabilities.get('can_have_vtpm_instances', True):; return <--- he essentially just added this to the top of _validate_vtpm_configuration15:17
JayFand added that as a driver capability (Which ironic has set to false)15:17
sean-k-mooneyya that replace the expensive lazeload with an if15:17
sean-k-mooneywhich is also valid15:17
sean-k-mooneyyep that will also work15:17
sean-k-mooneybut the libvirt driver or any that supprt it in the futre woudl still be needless slwo withouthte other fix15:18
sean-k-mooneywe can do both15:18
sean-k-mooneythose are not competing15:18
JayFperfect15:19
JayFI should have time this afternoon/tomorrow to look at my normal todo list, which has been cast aside with "do the startup benchmark on nova-compute" at the top for a few weeks15:19
sean-k-mooneyhum15:19
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/977037/6/nova/compute/manager.py#115115:19
sean-k-mooneyso that if was ment to skip this on ironic already15:20
sean-k-mooney...15:21
sean-k-mooney        if self.driver.capabilities.get('supports_vtpm', False):15:21
sean-k-mooney            return15:21
sean-k-mooneydo you see the bug....15:21
sean-k-mooneyJayF: a not woudl really help that do the right ting dont you think 15:23
sean-k-mooneyoh well no15:24
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1150-L116815:24
sean-k-mooneythe full funtion is 15:24
sean-k-mooneyworkign slightly difetn so if the driver supprot vtpu we return early15:25
sean-k-mooneyif if the dreiver does not reprot supprot we check fi there are any isntace that request it 15:25
sean-k-mooneyand treate that as an error bceaus you for migrated a vm or similar15:25
JayFhttps://opendev.org/openstack/nova/src/branch/master/nova/virt/ironic/driver.py#L15615:25
sean-k-mooneythe problme with the logic is that15:26
sean-k-mooneyfor ironic is say it does not supprot it15:26
sean-k-mooneyand then we validate all of the instance anyway15:26
sean-k-mooneyeven though we knwo none of them shoud ever use it15:27
JayFthe code you linked is an early return?15:27
JayFif Ironic says we don't support it?15:27
sean-k-mooneyif ironic said you did supprot it it would early exit15:28
sean-k-mooneythe logic in the validat fuction is incorrect15:28
JayFOH15:28
JayFIT'S REVERSED15:28
sean-k-mooneykind of ya15:28
sean-k-mooneythat why i orginall said a not would help15:28
sean-k-mooneywhat the fucntion is for15:29
sean-k-mooneywas to detch instnace that requested vtpm on host where it was disable15:29
JayFYeah but this flows both ways; this also means you haven't been running this on drivers that *support* it15:29
JayFwhich seems like a more concerning bug, no?15:29
sean-k-mooneyso yes and no15:29
sean-k-mooneythis was kind of a tepmey upgrade check  of a sort15:29
sean-k-mooneythe code is buggy and we shoudl fix it15:30
sean-k-mooneybut it s not just buggy for ironic15:30
sean-k-mooneyit buggy in general15:30
sean-k-mooneyJayF: so the vtpm supprot in the libvirt driver is configurbale15:32
sean-k-mooneyJayF: it depen on an addtion software depency called swtpm15:32
sean-k-mooneywhich when we added the feature was only shiped in some distos15:32
sean-k-mooneyso we made thie a boolean flag15:32
sean-k-mooneyand this was ment ot prevent start an agent when you have instnace that need vtpm supprot but you have it turn off15:33
sean-k-mooneyJayF: ill try and write this up as a bug noting the ironic impact15:34
sean-k-mooneyand see if i can hack somethign togheter for this. 15:34
sean-k-mooneyJayF: but also feel free to push any patches ye have if ye get time15:34
opendevreviewsean mooney proposed openstack/nova master: Fix unified limits to include all resource types  https://review.opendev.org/c/openstack/nova/+/97587215:42
sean-k-mooneyJayF: https://bugs.launchpad.net/nova/+bug/2154495 is the bug for that by the way17:38
sean-k-mooneyadding the new capablity trait is one of the ways to fix that but i am goign to see if i can first create a repoducer and then we can see what the best way forward is17:39
sean-k-mooneyim condiering if i just want to wrap the body of the function in a check for the verit driver that is in use or if this check shoudl move into the driver instead17:39
sean-k-mooneyall 3 approchs woudl work its just a question fo what is the better long term approch17:40
sean-k-mooneyim really not sure why this is not part of self.driver.init_host()17:41
sean-k-mooneygiven its a driver sepcific check in the first palce17:41
isaacvicente[m]hi o/ is this bug still being tracked? https://bugs.launchpad.net/nova/+bug/206559918:16
isaacvicente[m]there's a patch for this but seems to be abandoned, what you guys think?18:16
sean-k-mooneyisaacvicente[m]: having both imageRef": "f2285517-a996-40d3-b331-c8214ec66b77", and destion_type: volume is not invlid nessisarly18:30
sean-k-mooneyppofied the bdm and the imageref are the same18:30
sean-k-mooneythe metadata in the domain xml is a debug interface so wether tie image is incldue is mostly cosmetic18:31
sean-k-mooneythe real pain poitn woudl be if you were rlying on the presence or absance of the image refe to determin if its a BFV guest18:32
sean-k-mooneythat is curerntly really a hack18:32
sean-k-mooneyi have argued in the past that the image refence shoudl be aviabel for instnace that are booted form volume18:33
sean-k-mooneyif they are created like this or is storead on the volume as volume/image metadata18:34
sean-k-mooneyisaacvicente[m]: what is the issue you are actuly having and tryign to adress here?18:34
sean-k-mooneythis is not somethign that is beign activly worked on18:34
sean-k-mooneybut its also not a priorty to fix as its mostly cosmetic and its unclear that this request is invilad18:35
sean-k-mooneyfixing it woudl likely need a new microver version unless we decied to make this a 400 but that would break a set of exiting user where funcitally the call works and had no real negitive side effect18:36
isaacvicente[m]I was searching for bugs to work on and found this, so I wish to know if the issue reported is relevant. If so, I would continue to work on the patch that already exists18:36
sean-k-mooneyisaacvicente[m]: oh ok 18:36
sean-k-mooneyit a inconsitency in the api18:37
sean-k-mooneybut likely not one that is worth yoru time18:37
isaacvicente[m]So, as fair as I understand this is not a bug, right? And could break some users' workflow18:37
sean-k-mooneyits boarderline18:37
sean-k-mooneyfixing it woudl break client that pass both18:38
sean-k-mooneyunless we do it via a new microverion18:38
sean-k-mooneywhich would need a spec and woudl not be backportable18:38
isaacvicente[m]Hmm I get it18:38
sean-k-mooneyand if we were to add a new microversion i woudl arrgue it woudl be better to add a boot_form_volume boolean to server show instead18:38
sean-k-mooneyand always show the galnce image if aviabel for the root disk regardless of if its boot form voluem or not18:39
sean-k-mooneyisaacvicente[m]: https://bugs.launchpad.net/nova/+bug/2108980 and https://review.opendev.org/c/openstack/nova/+/95446018:46
sean-k-mooneywoudl be nicer to finish18:46
isaacvicente[m]Thanks sean-k-mooney! I will check it out18:49
sean-k-mooneyif im being enfitly hone i dont think the lowhangin frut list for nova is really curated to low haning fruit18:50
isaacvicente[m]yeah... some of them are definitely not for newcomers haha18:54
sean-k-mooneyi woudl almost say sorting bug by age and fining one that is triage but not in progess is a bettwer way https://bugs.launchpad.net/nova/+bug/2154428 for exmaple could be a nice one if fixe as sugested 18:55
isaacvicente[m]thats interesting, this will help me finding some bugs to work on18:58
isaacvicente[m]that nova nova-scheduler aggregate bug seems a nice one, thanks again sean!19:05
opendevreviewGhanshyam Maan proposed openstack/nova-specs master: Spec for the graceful shutdown part2: Task tracking  https://review.opendev.org/c/openstack/nova-specs/+/98644719:19
gmaandansmith: ^^ added the generic way of marking untracked tasks19:20
opendevreviewsean mooney proposed openstack/nova master: Add vTPM startup validation reproducer  https://review.opendev.org/c/openstack/nova/+/99055119:29
opendevreviewsean mooney proposed openstack/nova master: Limit vTPM startup check to libvirt  https://review.opendev.org/c/openstack/nova/+/99055219:29
sean-k-mooneyJayF: ^ i think makes sense but as part of that i notice that nova (at least for the libvir  driver) is doing 2 seprate calls to objects.InstanceList.get_by_host during startup19:39
sean-k-mooneyso we can likely impove that again19:40
*** erlon6 is now known as erlon20:16
opendevreviewIsaac Silva proposed openstack/nova master: Disable interactive prompt on LVM image creation  https://review.opendev.org/c/openstack/nova/+/99057621:18
isaacvicente[m]I found this easy one sean-k-mooney, could you review please?21:21
opendevreviewIsaac Silva proposed openstack/nova master: Disable interactive prompt on LVM image creation  https://review.opendev.org/c/openstack/nova/+/99057622:03
sean-k-mooneyi say that in the list too but the lvm driver is a little more neice 22:04
sean-k-mooneyit is howver potentially a quick fix22:04
sean-k-mooneyso adding -y is an option the reason this happens is becasue lvm does not delete the disk content by defualt22:05
sean-k-mooneywhen yo udelete a previous voluem22:05
sean-k-mooneyso if you set https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.volume_clear to none22:06
sean-k-mooneythen when you recret a voluem you get that prompt22:07

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!