| songwenping | hi team, is NVIDIA A100 MIG devices can passthrough to KVM VM? | 03:00 |
|---|---|---|
| ralonsoh | Hi folks, just a heads-up. I updated the Neutron policy file, using yours: https://github.com/openstack/nova/blob/380f657b5102707a3da676478e72fd96691e966b/doc/source/configuration/policy-concepts.rst | 06:22 |
| ralonsoh | This is my patch: https://review.opendev.org/c/openstack/neutron/+/990069 | 06:22 |
| ralonsoh | You should remove the references to `enforce_scope`, that is now True, deprecated and marked for removal | 06:23 |
| opendevreview | yatin proposed openstack/nova master: [DNM] Do not use sqlalchemy reserved metadata https://review.opendev.org/c/openstack/nova/+/990429 | 08:02 |
| sahid | o/ | 08:17 |
| sahid | quick question guys, there is a missing field "tags" in the unversioned notifications, i'm wondering whether we can fix that or perhpas unversioned notif is deprecated and we don't want to do anything here? | 08:19 |
| gibi | sahid: unversioned is deprecated we only add new stuff to versioned notificatoin | 08:22 |
| sahid | gibi: ack thank you | 08:45 |
| sahid | btw i have noticed that tags field is missing during delete if I have a moment i will try to fix that soon | 08:45 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Poison eventlet import in native threading mode https://review.opendev.org/c/openstack/nova/+/986282 | 09:01 |
| gibi | sahid: ack, feel free to ping me with the patc | 09:01 |
| gibi | h | 09:01 |
| opendevreview | huanhongda proposed openstack/nova master: Fix scheduler aggregate cache race on concurrent add_host https://review.opendev.org/c/openstack/nova/+/990437 | 09:05 |
| sean-k-mooney | im wondering if we coudl run a little experiment in the nova gate for a week or two to see if we could finally get rid of the kernel paincs | 09:16 |
| sean-k-mooney | i was chating to frickler in the infra channel and they had the insight that we may have been thinking about the root cause incorrectly and that the panic was actully cause by file system currption | 09:17 |
| sean-k-mooney | they pointed out that the current behiovor looks as if the inital copy of the filesystem form the initram was interupted | 09:18 |
| sean-k-mooney | cirros ships with a mostly empty rootfs (jsut a partion table) | 09:18 |
| sean-k-mooney | and copies the root on first boot | 09:18 |
| sean-k-mooney | if we interupt that then we will get a panic when we resize | 09:19 |
| sean-k-mooney | so we inheringly have a race in the ci right now | 09:19 |
| sean-k-mooney | once we normally win because the root fs is like 10 MB or so | 09:19 |
| sean-k-mooney | based on there suggestion i built a test image https://github.com/SeanMooney/cirros/releases/tag/cirros-d260527-x86_64-5a75ef2-test | 09:20 |
| sean-k-mooney | where the root is prepopulated and tested it with a quick devstack change | 09:20 |
| sean-k-mooney | https://review.opendev.org/c/openstack/devstack/+/990348 | 09:20 |
| sean-k-mooney | so the experimetn i would like to run in nova gate | 09:20 |
| sean-k-mooney | is override the cirrors image we use in teh 2-3 jobs we normally see the issue in | 09:21 |
| sean-k-mooney | and poitn it at my image | 09:21 |
| sean-k-mooney | while i speraly chat to frickler and the other cirros folks about how to do this upstram in the cirros project | 09:21 |
| sean-k-mooney | we typically see 10-20 kernel panics a day on nova jobs | 09:22 |
| sean-k-mooney | so what im hopign is if we run that experimetn for a week or two that woudl drop to 0 | 09:22 |
| sean-k-mooney | im going to create a patch to override the cirros image path for the relevent jobs so we can dicsuss mroe but i would be intersted in what folks think | 09:23 |
| sean-k-mooney | are we willing to run the experiment for 2 weeks to get the data? we can obvioulsy quickly revert if needed | 09:24 |
| sean-k-mooney | my corrent workign thory is we actully had 2 diffent kernel paninc, the first was resovled with the waiting ofr sshable changes for volume attach/detach. and the second was this file system currptionissue that we see with BFV + resize or BFV + any reboot i guess | 09:28 |
| gibi | sean-k-mooney: your plan sounds sane to me | 09:29 |
| sean-k-mooney | what i can do is prepare the url overied patch and add the topic to the irc meeting adjenda for monday. if we are happy to proceed we can merge it after the meeting adn then monitor it and see | 09:39 |
| opendevreview | Joan Gilabert proposed openstack/nova-specs master: Repropose and update cyborg vGPU (mdev) support https://review.opendev.org/c/openstack/nova-specs/+/967515 | 11:06 |
| *** jgilaber_ is now known as jgilaber | 12:02 | |
| opendevreview | Thibaut Démaret proposed openstack/nova master: libvirt: add disk rotation_rate support for local disks https://review.opendev.org/c/openstack/nova/+/979693 | 12:05 |
| kklimaszewski | Hello, I've recently proposed a new nova spec for SPDK-based nova-provisioned storage backend: https://review.opendev.org/c/openstack/nova-specs/+/985676. Would anyone here be willing to review it/be its liaison? Also: Dominik (known here as Etua) has reproposed a spec from Ussuri for NUMA topology with resource providers: https://review.opendev.org/c/openstack/nova-specs/+/978570. Could someone here please review it? | 12:37 |
| JayF | sean-k-mooney: I think I need to find time, regardless of interruptions and urgency, to get your patches cleaned up and landed upstream. That's a radical improvement. https://usercontent.irccloud-cdn.com/file/i2G0mxRP/image.png | 14:49 |
| JayF | tl;dr: G-Research's (somewhat patched) 2024.1, one nova-compute that supported 1000 Ironic nodes was taking 65 minutes to restart. With three changes, that was reduced to 2 minutes. | 14:50 |
| sean-k-mooney | damb | 14:51 |
| sean-k-mooney | ok | 14:51 |
| JayF | (NCI = "nova-compute-ironic" -- the internal GR moniker for nova-computes w/Ironic driver) | 14:51 |
| JayF | given step #1 is also the first thing we wanna remove -- moving the resource updates to Ironic -- gives me hope as well that we're on the right track | 14:51 |
| sean-k-mooney | so step 1 is the dispatching of the udpate to the thread pool right | 14:52 |
| sean-k-mooney | so you can run N threads to do the work | 14:53 |
| sean-k-mooney | you mentioned 32 workser but were they all used | 14:53 |
| JayF | passed on the ask, I don't know | 14:53 |
| sean-k-mooney | i assume you have not had time to test if say 8 or 16 has a linear effect? | 14:53 |
| JayF | to be clear: I don't have operational access to this cluster at all | 14:54 |
| sean-k-mooney | no worries | 14:54 |
| sean-k-mooney | jsut wondering | 14:54 |
| JayF | Honestly though, if we really get it down to 2 minutes | 14:54 |
| JayF | something like a graceful shutdown starts looking really good for HA updates | 14:54 |
| sean-k-mooney | JayF: the vtpm thing is also interesting i think we may have fixed that laze loadign thing on master recently | 14:55 |
| JayF | if we could push Ironic queries to a secondary nova-compute while letting the first one complete actions in progress (or as discussed; hand off those in-progress actions from checkpoints) | 14:55 |
| sean-k-mooney | JayF: https://review.opendev.org/c/openstack/nova/+/977037 | 14:57 |
| sean-k-mooney | so not merged yet | 14:57 |
| sean-k-mooney | JayF: but htat will fix the vtpm issue | 14:57 |
| sean-k-mooney | its an issue for libivrt as well | 14:57 |
| sean-k-mooney | added you to that so you shoudl get a notificaion whn that lands | 14:58 |
| sean-k-mooney | i kind of forgot about that patch if im being honest but looks like i called out the ironic inpact the last tiem i looked :) | 14:59 |
| sean-k-mooney | JayF: so the 3 patchs ye really need are https://review.opendev.org/c/openstack/nova/+/977037 https://review.opendev.org/c/openstack/nova/+/980676 and https://review.opendev.org/c/openstack/nova/+/980679/1 | 15:07 |
| sean-k-mooney | on other thing that we coudl perhaps look into in the futre is allow nova-compute too run in a active stadby mode or a better way to have muliple nova-computes per shared then the previous hashring approch | 15:16 |
| JayF | sean-k-mooney: I'm not convinced https://review.opendev.org/c/openstack/nova/+/977037 does the same as the downstream does | 15:16 |
| sean-k-mooney | it does not skip the check but it but it remove the laze load overhead | 15:17 |
| JayF | if not self.driver.capabilities.get('can_have_vtpm_instances', True):; return <--- he essentially just added this to the top of _validate_vtpm_configuration | 15:17 |
| JayF | and added that as a driver capability (Which ironic has set to false) | 15:17 |
| sean-k-mooney | ya that replace the expensive lazeload with an if | 15:17 |
| sean-k-mooney | which is also valid | 15:17 |
| sean-k-mooney | yep that will also work | 15:17 |
| sean-k-mooney | but the libvirt driver or any that supprt it in the futre woudl still be needless slwo withouthte other fix | 15:18 |
| sean-k-mooney | we can do both | 15:18 |
| sean-k-mooney | those are not competing | 15:18 |
| JayF | perfect | 15:19 |
| JayF | I should have time this afternoon/tomorrow to look at my normal todo list, which has been cast aside with "do the startup benchmark on nova-compute" at the top for a few weeks | 15:19 |
| sean-k-mooney | hum | 15:19 |
| sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/977037/6/nova/compute/manager.py#1151 | 15:19 |
| sean-k-mooney | so that if was ment to skip this on ironic already | 15:20 |
| sean-k-mooney | ... | 15:21 |
| sean-k-mooney | if self.driver.capabilities.get('supports_vtpm', False): | 15:21 |
| sean-k-mooney | return | 15:21 |
| sean-k-mooney | do you see the bug.... | 15:21 |
| sean-k-mooney | JayF: a not woudl really help that do the right ting dont you think | 15:23 |
| sean-k-mooney | oh well no | 15:24 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1150-L1168 | 15:24 |
| sean-k-mooney | the full funtion is | 15:24 |
| sean-k-mooney | workign slightly difetn so if the driver supprot vtpu we return early | 15:25 |
| sean-k-mooney | if if the dreiver does not reprot supprot we check fi there are any isntace that request it | 15:25 |
| sean-k-mooney | and treate that as an error bceaus you for migrated a vm or similar | 15:25 |
| JayF | https://opendev.org/openstack/nova/src/branch/master/nova/virt/ironic/driver.py#L156 | 15:25 |
| sean-k-mooney | the problme with the logic is that | 15:26 |
| sean-k-mooney | for ironic is say it does not supprot it | 15:26 |
| sean-k-mooney | and then we validate all of the instance anyway | 15:26 |
| sean-k-mooney | even though we knwo none of them shoud ever use it | 15:27 |
| JayF | the code you linked is an early return? | 15:27 |
| JayF | if Ironic says we don't support it? | 15:27 |
| sean-k-mooney | if ironic said you did supprot it it would early exit | 15:28 |
| sean-k-mooney | the logic in the validat fuction is incorrect | 15:28 |
| JayF | OH | 15:28 |
| JayF | IT'S REVERSED | 15:28 |
| sean-k-mooney | kind of ya | 15:28 |
| sean-k-mooney | that why i orginall said a not would help | 15:28 |
| sean-k-mooney | what the fucntion is for | 15:29 |
| sean-k-mooney | was to detch instnace that requested vtpm on host where it was disable | 15:29 |
| JayF | Yeah but this flows both ways; this also means you haven't been running this on drivers that *support* it | 15:29 |
| JayF | which seems like a more concerning bug, no? | 15:29 |
| sean-k-mooney | so yes and no | 15:29 |
| sean-k-mooney | this was kind of a tepmey upgrade check of a sort | 15:29 |
| sean-k-mooney | the code is buggy and we shoudl fix it | 15:30 |
| sean-k-mooney | but it s not just buggy for ironic | 15:30 |
| sean-k-mooney | it buggy in general | 15:30 |
| sean-k-mooney | JayF: so the vtpm supprot in the libvirt driver is configurbale | 15:32 |
| sean-k-mooney | JayF: it depen on an addtion software depency called swtpm | 15:32 |
| sean-k-mooney | which when we added the feature was only shiped in some distos | 15:32 |
| sean-k-mooney | so we made thie a boolean flag | 15:32 |
| sean-k-mooney | and this was ment ot prevent start an agent when you have instnace that need vtpm supprot but you have it turn off | 15:33 |
| sean-k-mooney | JayF: ill try and write this up as a bug noting the ironic impact | 15:34 |
| sean-k-mooney | and see if i can hack somethign togheter for this. | 15:34 |
| sean-k-mooney | JayF: but also feel free to push any patches ye have if ye get time | 15:34 |
| opendevreview | sean mooney proposed openstack/nova master: Fix unified limits to include all resource types https://review.opendev.org/c/openstack/nova/+/975872 | 15:42 |
| sean-k-mooney | JayF: https://bugs.launchpad.net/nova/+bug/2154495 is the bug for that by the way | 17:38 |
| sean-k-mooney | adding the new capablity trait is one of the ways to fix that but i am goign to see if i can first create a repoducer and then we can see what the best way forward is | 17:39 |
| sean-k-mooney | im condiering if i just want to wrap the body of the function in a check for the verit driver that is in use or if this check shoudl move into the driver instead | 17:39 |
| sean-k-mooney | all 3 approchs woudl work its just a question fo what is the better long term approch | 17:40 |
| sean-k-mooney | im really not sure why this is not part of self.driver.init_host() | 17:41 |
| sean-k-mooney | given its a driver sepcific check in the first palce | 17:41 |
| isaacvicente[m] | hi o/ is this bug still being tracked? https://bugs.launchpad.net/nova/+bug/2065599 | 18:16 |
| isaacvicente[m] | there's a patch for this but seems to be abandoned, what you guys think? | 18:16 |
| sean-k-mooney | isaacvicente[m]: having both imageRef": "f2285517-a996-40d3-b331-c8214ec66b77", and destion_type: volume is not invlid nessisarly | 18:30 |
| sean-k-mooney | ppofied the bdm and the imageref are the same | 18:30 |
| sean-k-mooney | the metadata in the domain xml is a debug interface so wether tie image is incldue is mostly cosmetic | 18:31 |
| sean-k-mooney | the real pain poitn woudl be if you were rlying on the presence or absance of the image refe to determin if its a BFV guest | 18:32 |
| sean-k-mooney | that is curerntly really a hack | 18:32 |
| sean-k-mooney | i have argued in the past that the image refence shoudl be aviabel for instnace that are booted form volume | 18:33 |
| sean-k-mooney | if they are created like this or is storead on the volume as volume/image metadata | 18:34 |
| sean-k-mooney | isaacvicente[m]: what is the issue you are actuly having and tryign to adress here? | 18:34 |
| sean-k-mooney | this is not somethign that is beign activly worked on | 18:34 |
| sean-k-mooney | but its also not a priorty to fix as its mostly cosmetic and its unclear that this request is invilad | 18:35 |
| sean-k-mooney | fixing it woudl likely need a new microver version unless we decied to make this a 400 but that would break a set of exiting user where funcitally the call works and had no real negitive side effect | 18:36 |
| isaacvicente[m] | I was searching for bugs to work on and found this, so I wish to know if the issue reported is relevant. If so, I would continue to work on the patch that already exists | 18:36 |
| sean-k-mooney | isaacvicente[m]: oh ok | 18:36 |
| sean-k-mooney | it a inconsitency in the api | 18:37 |
| sean-k-mooney | but likely not one that is worth yoru time | 18:37 |
| isaacvicente[m] | So, as fair as I understand this is not a bug, right? And could break some users' workflow | 18:37 |
| sean-k-mooney | its boarderline | 18:37 |
| sean-k-mooney | fixing it woudl break client that pass both | 18:38 |
| sean-k-mooney | unless we do it via a new microverion | 18:38 |
| sean-k-mooney | which would need a spec and woudl not be backportable | 18:38 |
| isaacvicente[m] | Hmm I get it | 18:38 |
| sean-k-mooney | and if we were to add a new microversion i woudl arrgue it woudl be better to add a boot_form_volume boolean to server show instead | 18:38 |
| sean-k-mooney | and always show the galnce image if aviabel for the root disk regardless of if its boot form voluem or not | 18:39 |
| sean-k-mooney | isaacvicente[m]: https://bugs.launchpad.net/nova/+bug/2108980 and https://review.opendev.org/c/openstack/nova/+/954460 | 18:46 |
| sean-k-mooney | woudl be nicer to finish | 18:46 |
| isaacvicente[m] | Thanks sean-k-mooney! I will check it out | 18:49 |
| sean-k-mooney | if im being enfitly hone i dont think the lowhangin frut list for nova is really curated to low haning fruit | 18:50 |
| isaacvicente[m] | yeah... some of them are definitely not for newcomers haha | 18:54 |
| sean-k-mooney | i woudl almost say sorting bug by age and fining one that is triage but not in progess is a bettwer way https://bugs.launchpad.net/nova/+bug/2154428 for exmaple could be a nice one if fixe as sugested | 18:55 |
| isaacvicente[m] | thats interesting, this will help me finding some bugs to work on | 18:58 |
| isaacvicente[m] | that nova nova-scheduler aggregate bug seems a nice one, thanks again sean! | 19:05 |
| opendevreview | Ghanshyam Maan proposed openstack/nova-specs master: Spec for the graceful shutdown part2: Task tracking https://review.opendev.org/c/openstack/nova-specs/+/986447 | 19:19 |
| gmaan | dansmith: ^^ added the generic way of marking untracked tasks | 19:20 |
| opendevreview | sean mooney proposed openstack/nova master: Add vTPM startup validation reproducer https://review.opendev.org/c/openstack/nova/+/990551 | 19:29 |
| opendevreview | sean mooney proposed openstack/nova master: Limit vTPM startup check to libvirt https://review.opendev.org/c/openstack/nova/+/990552 | 19:29 |
| sean-k-mooney | JayF: ^ i think makes sense but as part of that i notice that nova (at least for the libvir driver) is doing 2 seprate calls to objects.InstanceList.get_by_host during startup | 19:39 |
| sean-k-mooney | so we can likely impove that again | 19:40 |
| *** erlon6 is now known as erlon | 20:16 | |
| opendevreview | Isaac Silva proposed openstack/nova master: Disable interactive prompt on LVM image creation https://review.opendev.org/c/openstack/nova/+/990576 | 21:18 |
| isaacvicente[m] | I found this easy one sean-k-mooney, could you review please? | 21:21 |
| opendevreview | Isaac Silva proposed openstack/nova master: Disable interactive prompt on LVM image creation https://review.opendev.org/c/openstack/nova/+/990576 | 22:03 |
| sean-k-mooney | i say that in the list too but the lvm driver is a little more neice | 22:04 |
| sean-k-mooney | it is howver potentially a quick fix | 22:04 |
| sean-k-mooney | so adding -y is an option the reason this happens is becasue lvm does not delete the disk content by defualt | 22:05 |
| sean-k-mooney | when yo udelete a previous voluem | 22:05 |
| sean-k-mooney | so if you set https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.volume_clear to none | 22:06 |
| sean-k-mooney | then when you recret a voluem you get that prompt | 22:07 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!