| gmaan | sean-k-mooney: I did not +W this as you might want to review it too, if not let me know and I can apply +W https://review.opendev.org/c/openstack/nova/+/974445 | 01:17 |
|---|---|---|
| gmaan | rest other in the series are good to go | 01:18 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: prepare to bump service version for live migration https://review.opendev.org/c/openstack/nova/+/962051 | 01:19 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: support live migration of `host` secret security https://review.opendev.org/c/openstack/nova/+/941483 | 01:19 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: support live migration of `deployment` secret security https://review.opendev.org/c/openstack/nova/+/925771 | 01:19 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: test live migration between hosts with different security https://review.opendev.org/c/openstack/nova/+/952629 | 01:19 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: add late check for supported TPM secret security https://review.opendev.org/c/openstack/nova/+/956975 | 01:19 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: opt-in to new TPM secret security via resize https://review.opendev.org/c/openstack/nova/+/962052 | 01:19 |
| opendevreview | melanie witt proposed openstack/nova master: DNM vtpm tempest https://review.opendev.org/c/openstack/nova/+/957477 | 01:19 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: bump service version to enable live migration https://review.opendev.org/c/openstack/nova/+/975724 | 01:19 |
| gmaan | sean-k-mooney: ah you already checked and left comment there, got it now. I +w it. | 01:22 |
| opendevreview | Takashi Natsume proposed openstack/nova master: Update contributor guide for 2026.1 Gazpacho https://review.opendev.org/c/openstack/nova/+/961896 | 02:32 |
| opendevreview | Ghanshyam proposed openstack/nova master: DNM: test oslo.service set_service_opts_defaults https://review.opendev.org/c/openstack/nova/+/975739 | 02:37 |
| opendevreview | Ghanshyam proposed openstack/nova master: DNM: test oslo.service set_service_opts_defaults https://review.opendev.org/c/openstack/nova/+/975739 | 02:38 |
| opendevreview | Merged openstack/nova master: Libvirt event handling without eventlet https://review.opendev.org/c/openstack/nova/+/965949 | 03:14 |
| opendevreview | Merged openstack/nova master: SubclassSignatureTestCase to use NoDBTestCase as base https://review.opendev.org/c/openstack/nova/+/974861 | 03:14 |
| opendevreview | Merged openstack/nova master: Enable mypy on nova/utils.py https://review.opendev.org/c/openstack/nova/+/969936 | 03:43 |
| opendevreview | Ghanshyam proposed openstack/nova master: DNM: test oslo.service set_service_opts_defaults https://review.opendev.org/c/openstack/nova/+/975739 | 03:56 |
| opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Extend functional test coverage of UEFI boot guests https://review.opendev.org/c/openstack/nova/+/969263 | 06:36 |
| opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Add basic xml generation for firmware auto selection https://review.opendev.org/c/openstack/nova/+/969085 | 06:36 |
| opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Add capability to load loader and nvram from xml https://review.opendev.org/c/openstack/nova/+/969086 | 06:36 |
| opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Add capability to load smm feature from existing xml https://review.opendev.org/c/openstack/nova/+/969131 | 06:36 |
| opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Use firmware auto-selection by libvirt https://review.opendev.org/c/openstack/nova/+/969132 | 06:36 |
| opendevreview | Takashi Kajinami proposed openstack/nova master: AMD SEV: omit iommu='on' for virtio devices https://review.opendev.org/c/openstack/nova/+/909635 | 07:40 |
| opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Remove tpm support detection for libvirt < 8.0.0 https://review.opendev.org/c/openstack/nova/+/952308 | 07:40 |
| opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Drop redundant chown of tpm data directory https://review.opendev.org/c/openstack/nova/+/962446 | 07:41 |
| opendevreview | Merged openstack/nova master: Live migration with iothreads https://review.opendev.org/c/openstack/nova/+/975000 | 10:56 |
| opendevreview | John Garbutt proposed openstack/nova master: Make PCPUs not land on VCPUs by default https://review.opendev.org/c/openstack/nova/+/975779 | 11:18 |
| opendevreview | Max proposed openstack/nova master: fix: _get_guest_disk_device UnboundLocalError https://review.opendev.org/c/openstack/nova/+/975783 | 12:18 |
| LarsErikP | hi! Struggling a little bit with unified limits and PCI passthrough here. I have servers where with multiple non SR-IOV (type-PCI) GPUs I want to passthrough. I've set them up with a custom resource class, and referenced that class in the flavor together with the pci_passthrough:alias. But then the device is counted double in placement.. | 14:28 |
| LarsErikP | basically this bugs.launchpad.net/nova/+bug/2098496 but for type-PCI as well.. | 14:29 |
| LarsErikP | my goal here is of course to have unified limits on the custom resource class for these GPUs, but that doesn't play very well when the allocations in placement is wrongly counted | 14:30 |
| LarsErikP | am I doing anything wrong here? or is the bug I mentioned also valid for type-PCI hostdevs? | 14:30 |
| dansmith | LarsErikP: I could be wrong, but I don't think you should reference the custom class in placement if you're using PCI-in-placement.. nova will already allocate one for you and I think the flavor reference is adding a second | 14:37 |
| LarsErikP | right, but when the custom resource is not referenced in the flavor, the limit is not enforced | 14:38 |
| dansmith | ah, okay | 14:41 |
| dansmith | I would sync with melwitt when she's up later this morning | 14:42 |
| LarsErikP | I guess we can considered her hilighted? :P | 14:42 |
| dansmith | note that the bug you reference is about leaks in the allocation process not over-allocation in placement (IIRC), which both have been fixed and are not what you're seeing I think | 14:42 |
| dansmith | yup | 14:42 |
| LarsErikP | yeah, I've experienced this as described in the bug with type-VF devices as well. And that is very much fixed with that patch | 14:43 |
| LarsErikP | I got the same kind of symptomps before I applied that patch with type-VF devices. Requesting instances with both the resource class and pci_passthrough resulted in too much usage registered in placement. That is fixed | 14:45 |
| LarsErikP | but yeah. with my current problem, the allocations are correctly recorded in placement when I remove the custom resource class from the flavor | 14:46 |
| LarsErikP | but then again, I can't use limits for these resources :-( | 14:46 |
| LarsErikP | I can hack it though.. Using a custom resource for quota counting which I add to the RP for the compute host with the PCI-devices, and reference that in the flavor.. | 14:59 |
| sean-k-mooney | you shoudl be able to use limits for the resouce without doing resouce: | 15:00 |
| gmaan | bauzas: I know you added it in your review but gate is green on this and ready for review https://review.opendev.org/c/openstack/nova/+/975242/4 | 15:00 |
| sean-k-mooney | LarsErikP: if you have pci in placment the quota check in nova is ment to validate the resouce request form the alisa as part of unifed limits automaticlly | 15:00 |
| LarsErikP | hmm maybe I have to set pci/report_in_placement on nova-api nodes as well? This far, I've only set that to true on the compute nodes | 15:06 |
| sean-k-mooney | report not but you need to set teh filter schduler one | 15:07 |
| LarsErikP | I have that on my nodes running nova-scheduler | 15:07 |
| LarsErikP | I have nova-api/apache2 running on separate hosts from the ones running conductor,scheduler etc | 15:09 |
| sean-k-mooney | LarsErikP: you do need ot have it in nova-api yes | 15:45 |
| sean-k-mooney | LarsErikP: if you do not set https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.pci_in_placement in the nova-api config it will not prperly translate the pci ailases to resocue classes and unifed limits wont work | 15:47 |
| sean-k-mooney | in https://docs.openstack.org/nova/latest/admin/pci-passthrough.html#pci-tracking-in-placement | 15:48 |
| sean-k-mooney | we state """Since nova 27.0.0 (2023.1 Antelope) scheduling and allocation of PCI devices in Placement can also be enabled via filter_scheduler.pci_in_placement config option set in the nova-api, nova-scheduler, and nova-conductor configuration. Please note that this should only be enabled after all the computes in the system is configured to report PCI inventory in Placement | 15:49 |
| sean-k-mooney | via enabling pci.report_in_placement. In Antelope flavor based PCI requests are support but Neutron port base PCI requests are not handled in Placement.""" | 15:49 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Move the concurrent builds to its own Executor https://review.opendev.org/c/openstack/nova/+/975694 | 15:49 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Move the concurrent builds to its own Executor https://review.opendev.org/c/openstack/nova/+/975694 | 15:51 |
| melwitt | dansmith, bauzas: I updated https://review.opendev.org/c/openstack/nova/+/962051 to remove a few mock decorators I realized I didn't need since MIN_COMPUTE_VTPM_LIVE_MIGRATION = None. otherwise it is the same as when dan had +2 | 16:09 |
| gmaan | bauzas: replied to your comment, I am planning to add doc when new RPC server will be used and also RPC versioning https://review.opendev.org/c/openstack/nova/+/975242 | 16:21 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Move the concurrent builds to its own Executor https://review.opendev.org/c/openstack/nova/+/975694 | 16:26 |
| opendevreview | Lajos Katona proposed openstack/nova master: Add regression test to repoduce bug 2140537 https://review.opendev.org/c/openstack/nova/+/975832 | 17:22 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Deprecate unlimited compute actions https://review.opendev.org/c/openstack/nova/+/975833 | 17:28 |
| melwitt | I just read through the backscroll ... thanks sean-k-mooney for the info about pci_in_placement, LarsErikP: lmk if unified limits still doesn't work after you try the config Sean mentioned. I'm not that familiar with the pci in placement code and it's not immediately clear to me if/how unified limits works with it | 17:32 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Move the concurrent builds to its own Executor https://review.opendev.org/c/openstack/nova/+/975694 | 17:33 |
| sean-k-mooney | melwitt: ack if it doesnt that a bug. the pci in placment code translate the pci alis into placment resocue groups and resouce/tratis requests | 17:34 |
| sean-k-mooney | now i know it does that when calling placment | 17:34 |
| sean-k-mooney | but its possibel we dont do that before we check unified limits | 17:35 |
| sean-k-mooney | but there is a generic fucntion for that | 17:35 |
| melwitt | sean-k-mooney: agreed it will be a bug if it doesn't work. I'm trying to look at the code and can't tell what's going on haha | 17:35 |
| sean-k-mooney | i alwasy have to go looing for this as its not in the file i expect it to be in | 17:36 |
| melwitt | on the unified limits side (nova/limit/placement.py) I think i only see use of the flavor and not yet seeing it being connected with anything else | 17:37 |
| melwitt | on the pci/resource requests side I don't understand anything :) | 17:37 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py | 17:38 |
| sean-k-mooney | so the resouceReseust stuff is in there | 17:38 |
| opendevreview | Merged openstack/nova master: Use an executor to delay STOPPED events https://review.opendev.org/c/openstack/nova/+/974445 | 17:38 |
| melwitt | yeah I did find that part. it's the from_request_spec() is that where the pci in placement stuff comes from maybe? | 17:38 |
| sean-k-mooney | potitally yes | 17:40 |
| sean-k-mooney | but i m not imidiatly seeing it | 17:40 |
| sean-k-mooney | ofcouse it coudl be in the request spec class | 17:40 |
| sean-k-mooney | i think form here https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L224-L230 | 17:41 |
| melwitt | oh, yeah, I think limits is only taking stuff from flavor. bc it creates a "fake" RequestSpec object using only the flavor | 17:42 |
| sean-k-mooney | which woudl mean it woudl need to use teh alis to cofnver the alis to pci request | 17:43 |
| melwitt | ohh ok that's what you have been saying is the alias is in the flavor and then from there eventually it will turn into resource classes underneath | 17:44 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L664 | 17:53 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L503 | 17:54 |
| sean-k-mooney | so the resqouse spec object ahs a generate_request_groups_from_pci_requests function that adds in the reqoeus request form teh neutron prot but also the request form the pci alias | 17:54 |
| sean-k-mooney | if your only lookign at teh falvor you woudl also be missign teh resouces form cyborg | 17:57 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L677-L679 calls | 17:59 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L663-L674 | 17:59 |
| sean-k-mooney | but that wont calls from_request_spec | 18:00 |
| sean-k-mooney | with the faike object | 18:00 |
| sean-k-mooney | and that does not call generate_request_groups_from_pci_requests() | 18:00 |
| sean-k-mooney | so yes there is a bug in the integration fo pci in placmenet and unified limit | 18:01 |
| sean-k-mooney | generate_request_groups_from_pci_requests is only called form RequestSpec.from_components | 18:01 |
| sean-k-mooney | so https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L665 need to be a reals reqeust spec for https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L672 to be accurate | 18:03 |
| melwitt | sean-k-mooney: gotcha, thanks | 18:04 |
| sean-k-mooney | the note form john https://github.com/openstack/nova/blob/master/nova/limit/placement.py#L127-L131 | 18:05 |
| melwitt | haha I was literally just typing that | 18:05 |
| melwitt | yeah. so he mentions cyborg in there and now since pci in placement there's also pci | 18:06 |
| sean-k-mooney | we obviously overlooked updating that for pci in placement too | 18:06 |
| sean-k-mooney | yep an as a result gpu/vgpu | 18:06 |
| melwitt | I think this may have landed before pci and placement happened but maybe I'm misremembering | 18:06 |
| sean-k-mooney | yep it may | 18:07 |
| sean-k-mooney | it hink they were happeing aroudn the same tiem | 18:07 |
| sean-k-mooney | the pci spec expected unifed limtis to "just work" | 18:07 |
| sean-k-mooney | i think becuase it was expecting the request spec to have the same view | 18:07 |
| sean-k-mooney | missing the fact that this is only a partial request spec | 18:07 |
| melwitt | so LarsErikP you are good to file a bug for the unified limits non integration with pci. it's a known issue/limitation (see linked code comment above) but we could consider it a bug | 18:08 |
| melwitt | so I wonder would it be as simple as grabbing the full request spec in the api database if some alias is present in the flavor or something like that? just to avoid making the db query if we know we won't need to | 18:09 |
| sean-k-mooney | https://specs.openstack.org/openstack/nova-specs/specs/zed/approved/pci-device-tracking-in-placement.html#dependencies | 18:09 |
| melwitt | I see | 18:10 |
| sean-k-mooney | so we inteded it to work so clearly a bug from my perspective | 18:10 |
| melwitt | I see, ok | 18:11 |
| sean-k-mooney | and ya i was just looking to see. when we are doing the limit check its really early | 18:11 |
| sean-k-mooney | so im not sure hwo easy that woudl be | 18:11 |
| melwitt | yeah, mostly all in nova-api but we do have the "recheck" logic in nova-conductor | 18:12 |
| melwitt | so maybe we could intercept into there. if that is not also too early | 18:12 |
| sean-k-mooney | we can check in the conductor yes after schdulgin and before we commit the allction to placement | 18:14 |
| sean-k-mooney | we proably coudl check before callign the schduler to do select destination in the conductor too but i dont knwo if that is a good idea or not | 18:15 |
| sean-k-mooney | the thing is i belvie we do generate teh pci reqeusts in the api | 18:16 |
| sean-k-mooney | before we call the conductor | 18:16 |
| sean-k-mooney | so i think we can still validate it in the api | 18:16 |
| melwitt | that would be nice if we can | 18:16 |
| sean-k-mooney | we have _validate_flavor_image_nostatus for example https://github.com/openstack/nova/blob/master/nova/compute/api.py#L749 | 18:18 |
| sean-k-mooney | that where we validate all the numa stuff | 18:18 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/compute/api.py#L920-L921 | 18:19 |
| sean-k-mooney | ok so https://github.com/openstack/nova/blob/master/nova/pci/request.py#L324 | 18:19 |
| sean-k-mooney | we coudl just use get_pci_requests_from_flavor | 18:19 |
| sean-k-mooney | to add the pci request to the fake request spec | 18:20 |
| sean-k-mooney | that wont cover cyborg or neutron request but ti woudl make LarsErikP usecase work | 18:20 |
| melwitt | if the pci requests are in the flavor then that's ideal :) | 18:21 |
| sean-k-mooney | yep in this case they are in the form of the pci_passthoug flavor extra spec | 18:22 |
| melwitt | I am relieved if this will not be complicated :D | 18:23 |
| sean-k-mooney | we can use that to create teh pci reqst object and then add those to the request spec and then call geenrate_request_groups_form_pci on the request spec | 18:23 |
| sean-k-mooney | well cyborg and neutron port will be harder but we dont need to fix all the edge cases in one go | 18:23 |
| sean-k-mooney | cyborg is doabel form the falvor too by the way i just need a call to cyborg to get the request form the device profile | 18:24 |
| sean-k-mooney | the neutorn port request are the only bit that are a litte tricky | 18:24 |
| melwitt | agree that we need not fix them all at the same time. that's nice about cyborg too, also sounds pretty clean and simple | 18:25 |
| melwitt | I wonder though for neutron, would that not be a neutron quota thing? I guess the same could be asked of cyborg | 18:26 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/accelerator/cyborg.py#L90-L93 | 18:26 |
| sean-k-mooney | sicne nova is creating the allocation in placment for the packet per second resouce class i think we have to enforce the check | 18:27 |
| melwitt | hm ok | 18:28 |
| sean-k-mooney | with that said we do also get the port request in teh api | 18:28 |
| sean-k-mooney | we have to validae all teh uuid exits at a minium | 18:28 |
| sean-k-mooney | for neutorn qos to work you have to pre create the ports | 18:29 |
| sean-k-mooney | if you dont do that and you just pass --network then we wont create teh port until we hit the compute node an we wont allocate any bandwith ro qos resouce request to the vm | 18:30 |
| sean-k-mooney | so i think we can fix that as well but we woudl nee mor ehten just bfv and the flavor to do that | 18:30 |
| sean-k-mooney | we woudl need the network request object or the ports | 18:31 |
| melwitt | I see | 18:33 |
| sean-k-mooney | i need to update a patch to fix some unit test quickly do you plan to follow up with a bug fix for this. if not i might get claude to create a repoducer and take a crack at fixign it | 18:34 |
| melwitt | sean-k-mooney: I will definitely work on it if no one else wants to -- I feel responsible for unified limits. but if you want, feel free to go ahead :) | 18:36 |
| sean-k-mooney | im kind of curios what will happen if i copy past the last few miniutes into it and ask ti to repoduce the bug we wre dicsussion in a regression test | 18:36 |
| melwitt | I uploaded this fix for user-scoped quotas a while back because I felt bad https://review.opendev.org/c/openstack/nova/+/967148 haha | 18:39 |
| sean-k-mooney | feeling pride in the craftmanship of the thign you create/maintain is good but you should not feel bad ifhtere are bugs that no one noticed | 18:40 |
| melwitt | yeah :) no I felt bad about the idea of knowing about the problem but not fixing it | 18:41 |
| sean-k-mooney | ya i get that that half the reason i fix bugs | 18:42 |
| melwitt | so even though user-scoped quotas are legacy and "should not be used" I just put up a patch anyway | 18:42 |
| melwitt | the tough part is it's hard to get review on such things bc they are so "low prio" | 18:43 |
| sean-k-mooney | oh right we are doing 2 level quat now global and proejct | 18:43 |
| melwitt | yeah. there is also supposed to be a TwoLevelEnforcer but iirc it is not implemented yet. I have seen John WIP patch for it but not sure if it ever got finished | 18:44 |
| melwitt | I think it might be complicated | 18:45 |
| sean-k-mooney | well cladie is fixing my unit tests so ill review your patch while i wait | 18:46 |
| melwitt | haha thanks. lmk if you need any reviews also :) | 18:47 |
| sean-k-mooney | well ill need gibi and you or stephen to reappove the patch that claude is fixing. but that is the last nova patch im activly working on at least for this week | 18:48 |
| melwitt | ah ok | 18:49 |
| sean-k-mooney | im sure i have bug fixes form months? years? ago open but noting im activly looking for review on beyond the live migration patch im about to push | 18:49 |
| sean-k-mooney | how is the vtpm feature progressing i keep meening to try it out but not finding time | 18:50 |
| melwitt | going well. not sure we will get both the 'host' and 'deployment' modes merged this cycle but we should at least be able to get 'host' I think. and that's the more important one we wthink | 18:52 |
| melwitt | *we think | 18:53 |
| sean-k-mooney | you had a few devstack commit in WIP state that i came across before the break | 18:54 |
| sean-k-mooney | i ment ot ask you about them when i got back in january but forgot | 18:54 |
| melwitt | oh yeah the swtpm and mdevctl install | 18:54 |
| sean-k-mooney | yes that was one of them | 18:54 |
| sean-k-mooney | is there a reason that was in wip state | 18:54 |
| melwitt | I don't know if someone else already did the mdevctl by now so maybe only need swtpm. I have been using it for running vtpm live migration in tempest | 18:55 |
| sean-k-mooney | mdevctl is an optional dep of libvirt so maybe its pulled in by default now | 18:55 |
| opendevreview | sean mooney proposed openstack/nova master: Support os-vif TAP pre-creation for OVS/OVN ports https://review.opendev.org/c/openstack/nova/+/973149 | 18:55 |
| melwitt | not really other than I wasn't sure if the patch is universally wanted. or if vtpm live migration in regular tempest would be accepted upstream. currently there is only whitebox testing but I found there is quite a bit we can do with regular tempest | 18:56 |
| melwitt | *only whitebox testing for vtpm | 18:56 |
| melwitt | I wrote a bunch of vtpm live migration tempest tests for regular tempest mostly for myself to more easily test all of the scenarios a ton of times | 18:57 |
| sean-k-mooney | well we will need it when we ever get around to finishign teh mdev ci testing | 18:57 |
| melwitt | I hope they will be wanted upstream in general though, that would be nice | 18:57 |
| opendevreview | Lajos Katona proposed openstack/nova master: Use SDK for Neutron networks https://review.opendev.org/c/openstack/nova/+/928022 | 18:58 |
| sean-k-mooney | i dont see why we would not add it | 19:00 |
| sean-k-mooney | i mean if it works with generic tempest and has no hardware requriement which vtpm does not why woudl we not add them | 19:00 |
| melwitt | they only seem to work with virt_type = kvm, that's the only non optimal thing. I couldn't get vtpm to work with virt_type = qemu in upstream CI | 19:01 |
| sean-k-mooney | it shoudl work with qemu | 19:02 |
| melwitt | so they can only run on some subset of the CI fleet but I have experienced no issues with getting hosts when I run them | 19:02 |
| sean-k-mooney | but ok we have nested virt nodeset we can use for the relevent job | 19:02 |
| melwitt | I know it should but it would not work when I did it. like if you flip that flag to virt_type = qemu it will fail | 19:02 |
| melwitt | this is my setup https://review.opendev.org/c/openstack/nova/+/957477/46/.zuul.yaml | 19:03 |
| sean-k-mooney | so question on https://bugs.launchpad.net/nova/+bug/2131272 for one second. in nvoa the ram quta is 4000 you set the user quota to 4000 for use a | 19:04 |
| sean-k-mooney | user a and b are in the same project | 19:04 |
| melwitt | and I had no un-hardcode some things in our evacuate hook https://review.opendev.org/c/openstack/nova/+/957477/46/roles/run-evacuate-hook/tasks/main.yaml | 19:04 |
| sean-k-mooney | and you epxectign the project quota to still reject it with a 403 | 19:04 |
| sean-k-mooney | instead of the current 500 correct | 19:05 |
| sean-k-mooney | melwitt: some projec have jobs that run with enst virt nodes for speed (and to avoid the kernel painc we have) i dont see why nova should not use a nested virt node set if the feature need to to be test able | 19:06 |
| sean-k-mooney | so if you remove LIBVIRT_TYPE: kvm and virt_type: kvm it fials | 19:07 |
| melwitt | yeah so this is kind of confusing and took a long time for me to figure out. but it's that you need a situation where the user has enough user-scoped quota to fulfill the request BUT the project-wide usage is too high to fulfill the request | 19:07 |
| melwitt | in order to reproduce the problem | 19:08 |
| sean-k-mooney | right i just wanted to make sure that that is the edge case you were tryign to fix | 19:08 |
| melwitt | oh. yes that was | 19:09 |
| sean-k-mooney | the user quoat is the quota fo the instnace that hta user create on any project right | 19:09 |
| sean-k-mooney | but the project itslef still need to have quota | 19:09 |
| sean-k-mooney | which is what conused everyone | 19:09 |
| melwitt | I think the user-scoped is still nested under the project scope. but regardless of the user-scoped quota for a user, they are not supposed to be able to exceed the project quota with all of the project usage added together among all users in the same project | 19:10 |
| melwitt | so if you look at user-scoped quota and usage in isolation it might look like the request should pass but if the project has too much already existing usage due to other users in the project, that will affect the new request | 19:11 |
| melwitt | sean-k-mooney: yes when I did not have LIBVIRT_TYPE: kvm it failed. I don't remember the error bc it was months ago and I'm not sure if I commented it somewhere in the DNM patch. I should have but I might not have | 19:12 |
| melwitt | ok good I did: | 19:13 |
| melwitt | ERROR:system/cpus.c:504:qemu_mutex_lock_iothread_impl: assertion failed: (!qemu_mutex_iothread_locked()) | 19:13 |
| melwitt | Bail out! ERROR:system/cpus.c:504:qemu_mutex_lock_iothread_impl: assertion failed: (!qemu_mutex_iothread_locked()) | 19:13 |
| melwitt | 2025-09-23 21:47:17.508+0000: shutting down, reason=crashed | 19:13 |
| melwitt | there was no other indication of any problem in nova or libvirt logs that I found | 19:13 |
| melwitt | and I could not figure out or find the root cause of the guest crashing. bc the error reason just says "crashed" without other detail | 19:14 |
| sean-k-mooney | melwitt: does https://review.opendev.org/c/openstack/nova/+/967148/comment/bcfcff4e_9dbf054a/ make sense. i might be missing somethign | 19:24 |
| sean-k-mooney | melwitt: ok that is clearly a qemu bug that you were triggering | 19:24 |
| sean-k-mooney | that might be fixed | 19:25 |
| sean-k-mooney | melwitt: i feel liek that is a bug that we may have reportedn and it may have been fixed | 19:26 |
| melwitt | sean-k-mooney: yeah what you said makes sense and I found this is why the message changes, it's re-raised as TooManyInstances https://github.com/openstack/nova/blob/a17b44f3eb16b9284ec8a6292bb942d803688e72/nova/compute/utils.py#L1181-L1184 | 19:31 |
| melwitt | sean-k-mooney: ok, I'll try running without nested virt and see what happens | 19:31 |
| sean-k-mooney | ah https://github.com/openstack/nova/blob/master/nova/exception.py#L1383-L1385 | 19:32 |
| sean-k-mooney | ok well the message could be slightly better but im ok with the current patch | 19:32 |
| sean-k-mooney | i just didnt see where it was been converted | 19:32 |
| sean-k-mooney | i woudl have expected TooManyInstances to be jsut for instance quota | 19:33 |
| melwitt | I'm fine with improving the message. I just didn't think about it | 19:35 |
| melwitt | you will find many things in quotas defy expectations | 19:35 |
| sean-k-mooney | :) | 19:35 |
| sean-k-mooney | well im +2 on the patch but happy to re review if you end up updating it | 19:35 |
| melwitt | thanks 🥹 | 19:38 |
| sean-k-mooney | melwitt: https://gitlab.com/qemu-project/qemu/-/issues/2978 | 19:46 |
| sean-k-mooney | my quess is that is the issue you hit in the ci job | 19:47 |
| melwitt | sean-k-mooney: thanks, I'll subscribe | 19:53 |
| melwitt | I guess they closed it but someone commented they're still hitting the issue after it was closed | 19:55 |
| sean-k-mooney | ya so if we recretaed in ci we coudl update it with an assertion that it stilll happens and prove the releven livbirt xml | 19:55 |
| sean-k-mooney | i.e. show that it happens with vtpm for example | 19:56 |
| sean-k-mooney | the simplete way woudl be a second patch on your DNM to just go back to qemu and see if it explodes | 19:56 |
| melwitt | yeah. might as well | 19:58 |
| obre | Hi! There is a BP (add-amx-traits) thats been lying around since 2023, which describes a feature I would like to see in nova, and that I guess Im able to implement. The BP is not approved, and posted by someone else. Should I in some way update the BP and assign it to myself, or should I create a new one to get it approved and then Implemented? | 20:45 |
| opendevreview | sean mooney proposed openstack/nova master: Add regression test for unified limits PCI bug https://review.opendev.org/c/openstack/nova/+/975859 | 20:45 |
| sean-k-mooney | obre: there might eb a poc of that you coudl take over the blueprint but it would have to be for next cycle | 20:46 |
| sean-k-mooney | obre: https://review.opendev.org/c/openstack/os-traits/+/868149 | 20:46 |
| sean-k-mooney | so there is an os-traits ptach but no nova patch. | 20:47 |
| sean-k-mooney | we woudlneed to update the libvirt driver to report them i think as well | 20:47 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L13549-L13625 | 20:49 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/utils.py#L608-L622 | 20:49 |
| sean-k-mooney | they woudl need to be added here https://github.com/openstack/nova/blob/master/nova/virt/libvirt/utils.py#L61-L107 | 20:50 |
| sean-k-mooney | i.e. the mapping form teh way the feature flag is reprot by livbirt to the trati | 20:50 |
| sean-k-mooney | so that small enough ot be a specless bluepirnt | 20:50 |
| sean-k-mooney | obre: but you would have to propsoe it for 2026.2 and ask for it to be approved in the nova irc meeting | 20:51 |
| obre | Yes; I think it looks simple enough; But Im a bit uncertain for the process. | 20:51 |
| obre | How/where do I propose it? | 20:52 |
| sean-k-mooney | https://wiki.openstack.org/wiki/Meetings/Nova | 20:52 |
| obre | And are we in a feature-freeze or something for the G-release since that apparently is out of the question? | 20:52 |
| sean-k-mooney | i responed on the mialing list a few miniots ago for a diffent feature | 20:53 |
| sean-k-mooney | yes the bluepirnt and spec freeze was decemebr 8th this cycle | 20:53 |
| obre | Right. | 20:53 |
| sean-k-mooney | master reopens for new feature work in march | 20:53 |
| sean-k-mooney | its currenlty open for approved feature work and bug fixes | 20:53 |
| sean-k-mooney | but you can propose patches at any time | 20:53 |
| sean-k-mooney | so you can get it ready adn show it works | 20:54 |
| sean-k-mooney | non clinet lib freze i think i next week | 20:54 |
| sean-k-mooney | so its realistcly too late to add a new standar trait to os-traits | 20:54 |
| obre | Too late for H-release? | 20:54 |
| sean-k-mooney | for G | 20:55 |
| sean-k-mooney | its fine for H | 20:55 |
| sean-k-mooney | h is the release in september | 20:55 |
| obre | But Im already too late for G-release in nova? And I guess its fine to get both the Trait in and the nova-use of it for H? | 20:55 |
| sean-k-mooney | well echinially i think october | 20:55 |
| sean-k-mooney | yep | 20:56 |
| sean-k-mooney | its fien to rebase the patches and get everythign lined up for H | 20:56 |
| sean-k-mooney | if you ask for an excption in the irc meeting on monday there is a very very small chace that the team will say yes | 20:56 |
| sean-k-mooney | but it woudl be much less stressful for you and everyone else to just do it early in H | 20:56 |
| obre | It feels like such a small change; So it would be interesting to ask nicely :) | 20:57 |
| sean-k-mooney | in this case i kind fo agree. you could argue that it could be a wishlist bug | 20:58 |
| sean-k-mooney | i.e. just keepign the tratis and feature flags in sync | 20:59 |
| obre | But regardless of G or H; I guess the process then is to add a point to the agenda of a nova meeting; and if the feature is approved based on the current BP Ill create a patch for nova (and rebase the path for os-traits if that would be needed) and wait for the possibility to merge in March/April if its for the H-release? | 20:59 |
| sean-k-mooney | but on the os-traits side we dont backport traits | 20:59 |
| sean-k-mooney | yes exactly | 20:59 |
| sean-k-mooney | you add the topic ot the open disucssion section of the etherpad | 20:59 |
| sean-k-mooney | * of the wiki | 21:00 |
| obre | Its certainy on the wishlist for my part :P Im managing openstack for a University, and the amx-feature would be nice for some of our researchers :) | 21:00 |
| sean-k-mooney | you could also link to the irc logs of this converstation https://meetings.opendev.org/irclogs/%23openstack-nova/latest.log.html#openstack-nova.2026-02-05.log.html#t2026-02-05T20:45:42 | 21:01 |
| sean-k-mooney | to show i supprot the propsal in general | 21:01 |
| sean-k-mooney | you can use amx without the trait you just cant scheudle based on the capablity automaticlly | 21:01 |
| sean-k-mooney | as a workaroudn you could add a CUSTOM_AMX trait for now | 21:02 |
| sean-k-mooney | eiter usign provider.yaml or the placement api/cli | 21:02 |
| sean-k-mooney | of couse if your usign the nova feature where you list multiple cpu models in teh config an we give you the first one that matchs based on your trati requests | 21:03 |
| sean-k-mooney | that wont work | 21:03 |
| sean-k-mooney | but if you have host-model or host-passthough or a hardcoded cpu model on the host with amx | 21:04 |
| sean-k-mooney | the vm will get it even if the trait does not exist | 21:04 |
| obre | We are listing multiple CPU-models, and pick the first one based on Trait. | 21:04 |
| sean-k-mooney | ah :) | 21:05 |
| sean-k-mooney | im glad someone uses that feature | 21:05 |
| obre | Which allows me to have the bulk of my VMs run on all my compute-nodes; and the have a smaller subset available to people needing more modern IA's. | 21:05 |
| obre | And it works great :) | 21:05 |
| obre | And Im exposing it to my users through "generations" of flavors: https://www.ntnu.no/wiki/spaces/skyhigh/pages/114296806/Flavors+of+instances | 21:06 |
| sean-k-mooney | ah yes that a good approch. espcially if you are aslo godo and do not modify your falvors after they are in use | 21:07 |
| obre | And now Im starting to have quite a few compute-nodes with amx-support; which sparks my interest in this patch :) | 21:07 |
| obre | Yeah; we try not to modify flavors. | 21:07 |
| sean-k-mooney | i have never run a lab/cloud for more then my team but i just chated adn created a AZ per cpu generation | 21:08 |
| sean-k-mooney | so if you cared you just specifed the az you wanted other wise the schduelr just selected a host for you and you got whatever you landed on | 21:09 |
| obre | Having the possibility to run "old generations" on newer CPU's is to me a very valuable when it comes to flexibility during upgrades. And I think I need the "list of multiple cpu models" for that to work. | 21:10 |
| obre | At least Im struggeling to see how to accomplish it with AZ's. | 21:10 |
| sean-k-mooney | ya that the only way to supprot that usecase at least without leave performace on the table | 21:10 |
| sean-k-mooney | you are doing it correctly the reason i said i was cheating i that was before the feature existed and we used host-passthough | 21:11 |
| obre | Makes sense. | 21:11 |
| obre | Ill been running this plattform since Ice-House or Juno, so there has been some evolvement over time to end up doing it like this. We started with host-passthrough, but we found the need for the flexibility when the compute-nodes are very heterogenous. | 21:13 |
| obre | And now we are scaling the platforms up quite significantly; as we are bailing VMware in the org as well :) So lots of fun times ahead. | 21:14 |
| sean-k-mooney | ya vmware bills seam to have that effect now | 21:14 |
| obre | In the edu-sector the increases is mind-bogglingly. | 21:15 |
| obre | To an extent that Its difficult to believe its true :) | 21:15 |
| obre | But i have seen the quotes. So I know Im not dreaming. | 21:16 |
| obre | Anyways; thanks a lot for your responses. Its been very valuable! But now its evening here; so Ill go afk. | 21:17 |
| opendevreview | sean mooney proposed openstack/nova master: Fix unified limits to include PCI resource classes https://review.opendev.org/c/openstack/nova/+/975872 | 21:48 |
| sean-k-mooney | melwitt: ^ LarsErikP | 21:53 |
| sean-k-mooney | we shoudl file a proper bug and deceice if we are goign to fix it for cyborg and neutron port ectra but at least for cyborg i think its shoudl be relitively simple | 21:54 |
| sean-k-mooney | gibi: i had to rebase this and fix some unit test failures fare os-vif promoted. https://review.opendev.org/c/openstack/nova/+/973149 woudl you mined rereviewing it tomorrow | 21:59 |
| opendevreview | sean mooney proposed openstack/nova master: enable tap creation in nova-live-migration https://review.opendev.org/c/openstack/nova/+/975500 | 21:59 |
| opendevreview | sean mooney proposed openstack/nova master: Fix unified limits to include PCI resource classes https://review.opendev.org/c/openstack/nova/+/975872 | 22:11 |
| sean-k-mooney | https://bugs.launchpad.net/nova/+bug/2140631 | 22:20 |
| *** haleyb is now known as haleyb|out | 22:21 | |
| melwitt | sean-k-mooney: cool thanks, I will look | 22:23 |
| melwitt | it would be super if LarsErikP can try out the patch also | 22:25 |
| opendevreview | melanie witt proposed openstack/nova master: DNM: vtpm tempest without nested virt https://review.opendev.org/c/openstack/nova/+/975874 | 22:31 |
| opendevreview | melanie witt proposed openstack/nova master: DNM: vtpm tempest without nested virt https://review.opendev.org/c/openstack/nova/+/975874 | 22:33 |
| opendevreview | sean mooney proposed openstack/nova master: Add regression test for unified limits PCI bug https://review.opendev.org/c/openstack/nova/+/975859 | 22:36 |
| opendevreview | sean mooney proposed openstack/nova master: Fix unified limits to include PCI resource classes https://review.opendev.org/c/openstack/nova/+/975872 | 22:36 |
| sean-k-mooney | melwitt: yep i have not deploy this on real hardware so it would be nice if it was on LarsErikP system. im going to call it a night but i just updated the patches with the bug links and topic | 22:42 |
| melwitt | yeah.. I was imagining he has an env already set up (hence asking in the channel today) so it would be cool if it would be an easy effort thing | 22:46 |
| LarsErikP | hello! I've been afk since I left work. Scrolled through the backlog now, and I see that I sparked quite a discussion/conversation here :P I'll look into the patches tomorrow and see if I can test something. I have a live non-production env running, so I can probably test the patches quite easily | 22:55 |
| LarsErikP | melwitt: sean-k-mooney: ^ | 22:55 |
| LarsErikP | Thanks alot! And I'm so delighted that this probably just wasn't a pebcak :p | 22:56 |
| melwitt | it's great to have people using unified limits so we can find out and fix issues :) | 22:57 |
| LarsErikP | Yeah, that has been this week's project for me. Testing out the migration to unified limits. And the killer feature we really want is exactly this - having quotas on i.e GPUs. | 23:01 |
| melwitt | ++ | 23:02 |
| LarsErikP | I guess I should do some testing with type-VF devices here as well. But I guess that's going to work better, as they are actually counted correctly in placement when i have the resource class in the flavor | 23:03 |
| LarsErikP | with the fix for the bug I mentioned earlier today, that is | 23:03 |
| LarsErikP | I'll look into the details of what you've been doing, and what I should test when I get back to work tomorrow =) Good night Norway time ;) Thanks for all the efforts! | 23:05 |
| melwitt | yeah, earlier Sean said after the fix the remaining gaps they see are for cyborg resources and neutron ports. the former seems like a fix would be simple but for neutron it's more complicated | 23:06 |
| melwitt | gnite! | 23:06 |
| melwitt | sean-k-mooney: if you were curious, still a fail without nested virt https://review.opendev.org/c/openstack/nova/+/975874 sample instance log https://zuul.opendev.org/t/openstack/build/009d9ddc64354457a0b926adf753899c/log/compute1/logs/libvirt/libvirt/qemu/instance-00000019_log.txt | 23:55 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!