opendevreview | suiong ng proposed openstack/nova stable/2024.2: Fix parameter order in add_instance_info_to_node https://review.opendev.org/c/openstack/nova/+/943994 | 01:12 |
---|---|---|
opendevreview | Amit Uniyal proposed openstack/nova master: DNM: enable NFS backend job https://review.opendev.org/c/openstack/nova/+/944008 | 06:19 |
opendevreview | Amit Uniyal proposed openstack/nova master: DNM: enable NFS backend job https://review.opendev.org/c/openstack/nova/+/944008 | 06:25 |
opendevreview | Amit Uniyal proposed openstack/nova master: DNM: enable NFS backend job https://review.opendev.org/c/openstack/nova/+/944008 | 08:02 |
opendevreview | Amit Uniyal proposed openstack/nova master: DNM: enable NFS backend job https://review.opendev.org/c/openstack/nova/+/944008 | 08:15 |
opendevreview | Amit Uniyal proposed openstack/nova master: DNM: enable NFS backend job https://review.opendev.org/c/openstack/nova/+/944008 | 08:28 |
ralonsoh | hello folks, where exactly in Nova compute the VF is created and the VLAN ID defined? | 08:32 |
ralonsoh | can you point me to the code? | 08:32 |
ralonsoh | I'm talking about SR-IOV ports | 08:32 |
ralonsoh | bauzas, sean-k-mooney ^ | 08:40 |
sean-k-mooney[m] | the vlan id is compute by neturon | 08:41 |
sean-k-mooney[m] | we just apply the value ye provide to the vf | 08:41 |
ralonsoh | sean-k-mooney[m], the VLAN ID is set by Neutron? | 08:41 |
sean-k-mooney[m] | no its set by nova | 08:42 |
ralonsoh | no no, I know the VLAN info is provided by Nova | 08:42 |
ralonsoh | but where this VLAN ID is set in the VF? | 08:42 |
sean-k-mooney[m] | but we use the value form the neutron network accosated with the neutron port | 08:42 |
ralonsoh | by Neutron* (not Nova) | 08:42 |
sean-k-mooney[m] | its set by nova in the libvirt xml | 08:42 |
ralonsoh | ok, this is what I was looking for | 08:43 |
sean-k-mooney[m] | ill find the code one sec | 08:43 |
ralonsoh | thanks! | 08:43 |
sean-k-mooney | so our LibvirtConfigGuestInterface has a vlan property https://github.com/openstack/nova/blob/master/nova/virt/libvirt/config.py#L1912 which is rendered into the xml for type direct and hostdeve vnic_type=direct is hostdev in this case and vnic_type=macvtap is direct... | 08:46 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/config.py#L2015-L2019 | 08:46 |
ralonsoh | right, so the VLAN is set in the interface definition | 08:46 |
ralonsoh | thanks a lot! | 08:47 |
sean-k-mooney | that invloked form here https://github.com/openstack/nova/blob/master/nova/virt/libvirt/vif.py#L630-L657 | 08:48 |
ralonsoh | yes, this vif plug method I was aware of it | 08:49 |
sean-k-mooney | ralonsoh: so yes it defiend in the libvirt defintion and it will program the VLAN filter via the PF before it assigned the VF to qemu | 08:49 |
ralonsoh | but I didn't see where the VLAN was defined | 08:49 |
ralonsoh | but is in the XML | 08:49 |
ralonsoh | sean-k-mooney, ups, now you are here | 08:50 |
ralonsoh | qq about resource providers | 08:50 |
sean-k-mooney | sure | 08:50 |
ralonsoh | in Neutron, if we add BW info for an agent (SRIOV, OVN, etc) and a network | 08:50 |
ralonsoh | Neutron will use the API to create the RP, the inventories, etc | 08:50 |
ralonsoh | what if we remove the RP? | 08:51 |
ralonsoh | I don't think we have considered that | 08:51 |
ralonsoh | at least in OVN, that reads the RP info not from the configuration but from the chassis (so there is no API restart) | 08:51 |
sean-k-mooney | so the design at the time considered removing an RP that has allcoation ot be operator error | 08:51 |
ralonsoh | so if there is an error in the configuration (for example, the physnet name is incorrect) | 08:52 |
ralonsoh | do we need to manually remove the RP ? | 08:52 |
opendevreview | Maksim Malchuk proposed openstack/nova stable/2024.2: Allow hyphen in cinder catalog_info service-type https://review.opendev.org/c/openstack/nova/+/944009 | 08:52 |
ralonsoh | ah ah, so if there is allocation, it will fail | 08:52 |
sean-k-mooney | so it depends. we proposed but did not implemtne the idea of owner triats | 08:53 |
sean-k-mooney | the idea was that neutron or nvoa could add an owner trait to all RPs it owns | 08:53 |
ralonsoh | we use custom names for the physnets | 08:54 |
sean-k-mooney | that way it could check fi ther ewere any that were created that should not exist | 08:54 |
ralonsoh | hmmm I think this "deletion" operations are going to be more difficult than expected | 08:54 |
opendevreview | Maksim Malchuk proposed openstack/nova stable/2024.1: Allow hyphen in cinder catalog_info service-type https://review.opendev.org/c/openstack/nova/+/944010 | 08:54 |
sean-k-mooney | ralonsoh: right so you use thing slik same subtree | 08:54 |
ralonsoh | yes, the host->agent subtree | 08:55 |
ralonsoh | so we "own" this RP | 08:55 |
sean-k-mooney | ya so you can get all of the sub RPs in the subtree that match your nameign convention | 08:55 |
opendevreview | Maksim Malchuk proposed openstack/nova stable/2023.2: Allow hyphen in cinder catalog_info service-type https://review.opendev.org/c/openstack/nova/+/944011 | 08:55 |
sean-k-mooney | and delete any that shoudl not exist | 08:55 |
sean-k-mooney | if you have a OWNER_NEUTRON trait on them its safer | 08:55 |
sean-k-mooney | as even if there are two service usign the same naming convetion you can look at only the ones you created | 08:56 |
sean-k-mooney | that makes deleting it simper | 08:56 |
ralonsoh | yeah, this could be a good improvement for sure | 08:56 |
sean-k-mooney | for now however you know they starte with or contain phsenet right | 08:56 |
ralonsoh | that will mark our RP with no doubt | 08:56 |
ralonsoh | sean-k-mooney, and the agent RP, that is handled by Neutron | 08:57 |
sean-k-mooney | so you can must look at the RPs in the subtree for the agent | 08:57 |
sean-k-mooney | and remove any that are not in you current config. | 08:57 |
ralonsoh | ok, I think I know (maybe) how to do this | 08:57 |
ralonsoh | reference: https://issues.redhat.com/browse/OSPRH-14597 | 08:57 |
ralonsoh | I'll open a LP bug too now | 08:57 |
sean-k-mooney | if they have allcoation it indicates that the removal migt be in error | 08:57 |
ralonsoh | yes, this is important: we can't remove used RP | 08:58 |
sean-k-mooney | i trhink placement blocks it by default but i dont recall | 08:58 |
sean-k-mooney | when we are removign compute nodes we loop overa ll the allcoation on the RP and delete them first | 08:58 |
ralonsoh | yes, you are right | 08:58 |
sean-k-mooney | then delete the rp | 08:58 |
ralonsoh | in any case, I'll try this first | 08:59 |
sean-k-mooney | ralonsoh: if you can tell there is a rename | 09:00 |
sean-k-mooney | you can also do what is called a reshape | 09:00 |
sean-k-mooney | whish is the operation of moving allcoation between RPs | 09:00 |
sean-k-mooney | they are non tivial but its how we evovled the stucture over time | 09:01 |
sean-k-mooney | https://docs.openstack.org/api-ref/placement/#id86 | 09:01 |
sean-k-mooney | so for a rename you would add the new rp, reshape the inventories/allocation from the old then delete the old RP | 09:02 |
ralonsoh | I don't think we need to implement this reshape (I think) | 09:03 |
ralonsoh | just remove the RP that is un-configured | 09:03 |
sean-k-mooney | ya in this case you dont | 09:03 |
sean-k-mooney | at least not for that jira | 09:03 |
sean-k-mooney | just letting you know that api exists if you need it in the future | 09:03 |
ralonsoh | for sure, good to know this! | 09:04 |
sahid | o/ | 09:51 |
sahid | quick question guys, is there a way to stop provision on a specific AZ, or at least a way to do it | 09:52 |
sahid | I'm sure there is something but I can't remember how | 09:52 |
sean-k-mooney | sahid: no | 09:54 |
sean-k-mooney | sahid: not in general at least | 09:54 |
sean-k-mooney | you could proably hack it usign some of the filters | 09:54 |
sahid | I think it should be an interesting feature, don't you think? | 09:56 |
sean-k-mooney | in general no :) | 09:56 |
sean-k-mooney | we could add someign like disablign an az | 09:57 |
sean-k-mooney | it has a staus but it does nothing | 09:57 |
sahid | ahah ...I hould have bet for your response :-) | 09:57 |
sean-k-mooney | today the status fo an az is just if any compute service is up and not disabled | 09:57 |
sean-k-mooney | sahid: why do you want to disable it? | 09:58 |
sahid | sean-k-mooney: it's to validate the resilience if we lose an AZ | 10:00 |
sahid | they do such check for everything | 10:00 |
sean-k-mooney | AZ are not for fault tollerance | 10:01 |
sean-k-mooney | so if your usign them for that you are starting form a broken foundation | 10:01 |
sahid | it's not the point | 10:01 |
sean-k-mooney | allowing an az to be disabled externally woudl be an api change and need a spec. im not entirly sure that a good thing but its not terible | 10:05 |
sean-k-mooney | today you could jsut disbale the computes theat are in an az to get the same effect | 10:05 |
sahid | sean-k-mooney: it's basically what we are going to do, but they are lot of computes to disable I was trying to think about something else | 10:07 |
sahid | i don't say it's something nova wants, perhpas it's something useful only for us | 10:08 |
sean-k-mooney | the problem is really what are the semantics | 10:08 |
sean-k-mooney | do we filter it form the list of AZs | 10:08 |
sean-k-mooney | do we impact any lifecycle operation on hosts assocated with the az | 10:09 |
sean-k-mooney | i.e. disabel reboot? start? | 10:09 |
sahid | from our use case it is just about to stop provisoning on a specific az | 10:10 |
sahid | wich may beuseful for upgrade or deployment if we can do that easily | 10:10 |
sean-k-mooney | i feel like this would be a new trait added to all host of a given az and an prefilter to forbid that trait | 10:10 |
sahid | sean-k-mooney: in that way we don't need an API change, right? | 10:16 |
sean-k-mooney | no you still do if you want a api call to be able to disable the az | 10:17 |
sean-k-mooney | if you just want to add the trait manually you would not need an api change | 10:17 |
sean-k-mooney | you can do that with the aggreate set commands i belive | 10:17 |
sahid | yes, probably having an API call is better, your right | 10:18 |
sahid | than doing some tweaks on resources | 10:18 |
sahid | should I start to think about it, it is something that nova would be interested? | 10:19 |
sean-k-mooney | it looks like we never merged the helper command to set a trait on all RPs in an aggreate | 10:19 |
sean-k-mooney | so you would have ot loop over all of them and set it one by one without an api call on the nova side | 10:20 |
sean-k-mooney | which is the same or more work then disabling all the computes | 10:21 |
sahid | yes that is why having an api would make more sense if we go in that direction, but there is perhaps a different solution, to indicate to the scheduler that this az is disabled | 10:23 |
sahid | as it has to go throught all the aggregates | 10:23 |
sean-k-mooney | so the schduler does not do anything with AZs anymore directly, the az filter was remvoed 3 releases ago | 10:30 |
sean-k-mooney | we have had the abity to do az enforcement in placment for 3-4 years at this point | 10:30 |
sahid | oh i did not noticed that | 10:31 |
sean-k-mooney | the prefilter was added in rocky, became the default in xena and the filter was removed in bobcat | 10:32 |
sean-k-mooney | sahid: that why im saying this woudl jsut be a trait on the RPs | 10:32 |
sean-k-mooney | and we woudl usdate the pre-filter to add it as a forbiden trait | 10:32 |
sean-k-mooney | it would work the same way the compute status pre-filter does https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py#L242-L255 | 10:34 |
sahid | yes.. that does not really make sense to loop over all the RPs, if we really have to do that I guess we can just create an API to disable all host per AZ | 10:36 |
sahid | but this is more something like orchestration | 10:36 |
sahid | so that does not make sens to have it in nova aswell i guess | 10:36 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Add service version for Epoxy https://review.opendev.org/c/openstack/nova/+/944017 | 10:40 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Update min support for Flamingo https://review.opendev.org/c/openstack/nova/+/944018 | 10:40 |
opendevreview | Sylvain Bauza proposed openstack/nova-specs master: Move Epoxy implemented specs https://review.opendev.org/c/openstack/nova-specs/+/944019 | 10:49 |
opendevreview | sean mooney proposed openstack/nova master: only show standard image properties in server show. https://review.opendev.org/c/openstack/nova/+/942413 | 11:29 |
sean-k-mooney | gibi: ^ fixed the remaining sample tests for 2.100 | 11:30 |
andrewbonney | Hi all. If anyone has chance I'd appreciate a follow up review on a long-standing small bug fix: https://review.opendev.org/c/openstack/nova/+/919961 | 11:37 |
gibi | sean-k-mooney: cool, thanks, +2 | 11:45 |
sean-k-mooney | andrewbonney: the core of the change looks ok. i dont think there is really enough tetst coverage to prevent regressing this, would you mind add a functional test that assert the correct behavior when you shleve? | 11:51 |
andrewbonney | I'll take a look, thanks | 11:52 |
opendevreview | Takashi Kajinami proposed openstack/nova master: doc: Drop deprecated [api] auth_strategy https://review.opendev.org/c/openstack/nova/+/944028 | 12:09 |
opendevreview | Takashi Kajinami proposed openstack/nova master: doc: Remove non-existent [service_user] auth_strategy https://review.opendev.org/c/openstack/nova/+/944029 | 12:09 |
opendevreview | Tobias Urdin proposed openstack/nova master: Add minute to instance_usage_audit_period https://review.opendev.org/c/openstack/nova/+/941488 | 13:10 |
*** haleyb|out is now known as haleyb | 13:30 | |
opendevreview | Andrew Bonney proposed openstack/nova master: Don't reset port dns_name when shelving instances https://review.opendev.org/c/openstack/nova/+/919961 | 13:56 |
andrewbonney | sean-k-mooney: ^ hopefully that's better. Will see what CI thinks, but tests work locally | 13:58 |
sean-k-mooney | andrewbonney: ya assoming that funtional test passes it actully assert the behavior you wanted to change | 14:00 |
sean-k-mooney | lets see if ci is happy | 14:00 |
sean-k-mooney | andrewbonney: thanks for the update | 14:00 |
sean-k-mooney | andrewbonney: im a little conferned with how you implemeneted the list_exteions overried by the way | 14:01 |
andrewbonney | Happy to take advice - that's a copy from something similar in a regression test | 14:02 |
sean-k-mooney | i think you are only modifyign the neutron fixture that is local to this test | 14:02 |
sean-k-mooney | but its very close to a patten that can leak changes between tests | 14:02 |
sean-k-mooney | andrewbonney: https://github.com/openstack/nova/blob/a329c103cbc864591f232b195016407e407bdb5d/nova/tests/functional/regressions/test_bug_1888395.py#L32-L49 ya its probaly fine | 14:03 |
andrewbonney | That's the one | 14:03 |
sean-k-mooney | i wrote that so if its wrong its on me anyway :) | 14:05 |
andrewbonney | :) | 14:05 |
opendevreview | Merged openstack/python-novaclient stable/2025.1: Update .gitreview for stable/2025.1 https://review.opendev.org/c/openstack/python-novaclient/+/943762 | 14:09 |
opendevreview | Dan Smith proposed openstack/os-traits master: Add HW_PCI_ONE_TIME_USE trait https://review.opendev.org/c/openstack/os-traits/+/944049 | 14:09 |
opendevreview | Merged openstack/python-novaclient stable/2025.1: Update TOX_CONSTRAINTS_FILE for stable/2025.1 https://review.opendev.org/c/openstack/python-novaclient/+/943763 | 14:11 |
opendevreview | Merged openstack/nova master: Add service version for Epoxy https://review.opendev.org/c/openstack/nova/+/944017 | 14:23 |
opendevreview | Dmitriy Chubinidze proposed openstack/nova master: doc: nova service_user wrong auth_url https://review.opendev.org/c/openstack/nova/+/938680 | 14:34 |
dansmith | sean-k-mooney: gibi: do we allow passing through a PF like a whole device? it seems like that's probably a bad idea, unless we check that a PF with VFs has none of those VFs available for use by nova, right? | 14:35 |
sean-k-mooney | yes we do | 14:35 |
sean-k-mooney | without pci in placement we only allow pfs if non of the VFs are allocated | 14:35 |
sean-k-mooney | with pci in placment you can only list the PF or the VFs | 14:36 |
sean-k-mooney | but not both for the same device | 14:36 |
dansmith | allocated...currently? or available for allocation at all? | 14:36 |
sean-k-mooney | allcoated currently | 14:36 |
dansmith | wow, okay | 14:36 |
sean-k-mooney | it was a usecase that telcos really wanted | 14:36 |
dansmith | either way, if pci in placement has a strict either-or rule, that helps | 14:36 |
sean-k-mooney | ya to make the accounting work we could not have it chaging dynmicaly | 14:37 |
dansmith | can I tell that something is a virtual function just because it has the physfunc (or whatever) in its /sys directory? | 14:37 |
sean-k-mooney | yes | 14:37 |
dansmith | cool | 14:37 |
sean-k-mooney | we look at the pci capablites which are reported in the nodedevxml | 14:37 |
gibi | dansmith: yep PCI in Placement has a strict either or check | 14:37 |
dansmith | cool | 14:37 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L1373-L1430 | 14:38 |
sean-k-mooney | that is how th elibvirt driver decied what type of device it is | 14:38 |
gibi | https://github.com/openstack/nova/blob/0d484ce37d86e989c8abdf57aec5e334f68206ef/nova/tests/functional/libvirt/test_pci_in_placement.py#L206-L239 this is the functional test proving that nova rejectes dependent devices if pci in placement is enabled | 14:39 |
opendevreview | sean mooney proposed openstack/nova master: only show standard image properties in server show. https://review.opendev.org/c/openstack/nova/+/942413 | 14:59 |
opendevreview | sean mooney proposed openstack/nova master: [tests] add priting of sample and templete paths https://review.opendev.org/c/openstack/nova/+/944054 | 14:59 |
sean-k-mooney | dansmith: ^ i think thats what you wanted | 14:59 |
opendevreview | Dan Smith proposed openstack/nova master: [tests] Add printing of sample and template paths https://review.opendev.org/c/openstack/nova/+/944054 | 15:05 |
dansmith | sean-k-mooney: got 'em thanks | 15:05 |
dansmith | bauzas: thanks, I need to quickly tweak that work item title if you can re-apply your +2 after I do, that would be appreciate | 15:28 |
dansmith | d | 15:28 |
bauzas | dansmith: sure | 15:28 |
opendevreview | Dan Smith proposed openstack/nova-specs master: Add one-time-use-devices spec https://review.opendev.org/c/openstack/nova-specs/+/943486 | 15:29 |
dansmith | that ^ just makes the work item reflect the discussion and the pre-review on the PoC | 15:30 |
sean-k-mooney | dansmith: just an fyi i have been putting off doing some ramp on horizon stuff for like 2 weeks so im tryign to spend the rest of the week doing that. ill loop back to the spec firday or early next week | 15:32 |
dansmith | sean-k-mooney: no worries, we've made a lot of quick progress and I have lots of smaller items to do (like all the test coverage and startup checks) so I'm good for a while for sure | 15:32 |
dansmith | surely appreciate sean-k-mooney gibi and bauzas helping sanity check everything early so I can work on implementing the things | 15:33 |
bauzas | dansmith: I'll try to do code review | 15:33 |
bauzas | #startmeeting nova | 16:02 |
opendevmeet | Meeting started Tue Mar 11 16:02:21 2025 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:02 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:02 |
opendevmeet | The meeting name has been set to 'nova' | 16:02 |
elodilles | o/ | 16:03 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:03 |
dansmith | o/ | 16:03 |
bauzas | I guess we can start | 16:05 |
* gibi arrives lae | 16:06 | |
gibi | *late | 16:06 |
bauzas | #topic Bugs (stuck/critical) | 16:07 |
bauzas | #info No Critical bug | 16:07 |
bauzas | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:07 |
bauzas | anything to discuss about bugs ? | 16:08 |
bauzas | looks not | 16:10 |
bauzas | #topic Gate status | 16:10 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:12 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status | 16:12 |
bauzas | all periodics are green | 16:13 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:13 |
bauzas | #info Please try to provide meaningful comment when you recheck | 16:13 |
bauzas | anything about bugs ? | 16:14 |
Uggla | o/ | 16:17 |
bauzas | #topic Release Planning | 16:17 |
bauzas | #link https://releases.openstack.org/epoxy/schedule.html | 16:17 |
bauzas | #info Nova deadlines are set in the above schedule | 16:18 |
bauzas | #link https://etherpad.opendev.org/p/nova-epoxy-rc-potential | 16:18 |
elodilles | fyi the RC1 patches are already proposed: | 16:18 |
elodilles | https://review.opendev.org/c/openstack/releases/+/943941 | 16:18 |
elodilles | and | 16:18 |
elodilles | https://review.opendev.org/c/openstack/releases/+/943947 | 16:18 |
elodilles | (please add a -1 if something is in the queue to merge o:)) | 16:20 |
bauzas | nova-cores, please look at this etherpad, we have a number of patches to merge before we can propose RC1 | 16:20 |
bauzas | elodilles: will do | 16:20 |
bauzas | and saw those | 16:20 |
elodilles | thanks o/ | 16:20 |
* bauzas running another meeting at the same time :( | 16:20 | |
* bauzas definitely needs to pass the PTL baton to Uggla soon | 16:20 | |
bauzas | as a reminder, there is a section in the etherpad for adding bugfixes worth to be delivered | 16:23 |
dansmith | even without the upcoming baton change, probably best to let someone else run the meeting if you have a conflict | 16:23 |
bauzas | yeah it just went over for 24 mins :( | 16:24 |
bauzas | anyway, moving on | 16:24 |
bauzas | you'll see myself speaking in the channel asking cores to review thinfs | 16:24 |
bauzas | #topic PTG planning | 16:25 |
bauzas | #info Next PTG will be held on Apr 7-11 | 16:25 |
bauzas | #link https://etherpad.opendev.org/p/nova-2025.2-ptg | 16:25 |
bauzas | and that's it | 16:25 |
bauzas | #topic Stable Branches | 16:25 |
bauzas | elodilles: please | 16:25 |
elodilles | yepp | 16:25 |
elodilles | nothing special to report: | 16:26 |
elodilles | #info stable gates seem to be healthy | 16:26 |
elodilles | #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci | 16:26 |
bauzas | cool | 16:26 |
elodilles | (and we have some stable/2025.1 already) | 16:26 |
bauzas | #topic vmwareapi 3rd-party CI efforts Highlights | 16:26 |
bauzas | fwiesel seems not to be present | 16:26 |
bauzas | moving on then | 16:26 |
bauzas | #topic Open discussion | 16:26 |
bauzas | nothing in the agenda | 16:27 |
bauzas | anything anyone ? | 16:27 |
opendevreview | mitya-eremeev-2 proposed openstack/nova master: Add more thorough handling of evacuated instances https://review.opendev.org/c/openstack/nova/+/944058 | 16:30 |
bauzas | looks we can close the meeting | 16:30 |
bauzas | thanks all | 16:30 |
bauzas | #endmeeting | 16:30 |
opendevmeet | Meeting ended Tue Mar 11 16:30:48 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:30 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2025/nova.2025-03-11-16.02.html | 16:30 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2025/nova.2025-03-11-16.02.txt | 16:30 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2025/nova.2025-03-11-16.02.log.html | 16:30 |
dansmith | bauzas: ack on the previous going over, sorry I just assumed you meant you intended to be superman and run two meetings at once :P | 16:31 |
elodilles | thanks o/ | 16:31 |
bauzas | dansmith: well, I really have no choices | 16:31 |
bauzas | hence myself leaving the hot seat, or I was about to burn | 16:31 |
ratailor | melwitt, could you please review https://review.opendev.org/c/openstack/nova/+/873901 and provide your feedback ? | 16:38 |
ratailor | gibi, could you please remove -1 here https://review.opendev.org/c/openstack/nova/+/933107 and provide your feedback ? | 16:38 |
gibi | ratailor: done | 16:45 |
ratailor | gibi, Thanks! | 16:45 |
dansmith | gibi: if you're still around.. it looks to me like we don't do all the "is this a physical function?" checking if the devspec is a pattern instead of a single device | 17:13 |
dansmith | I assume because there could be multiple things matched that have different characteristics | 17:14 |
dansmith | so you think we need to examine all the matching ones so we can abort if your spec matches a VF? | 17:14 |
gibi | auch. Yeah you are right, at that point we only have the dev_spec that can be wildcard and match to multiple devices. I'm not sure how the remote managed check below works then as that does some physical function check. Anyhow that means we cannot validate that input in at the device_spec parsing code and need to add it later in the pci_placemenet_translator | 17:16 |
dansmith | gibi: it just punts for patterns | 17:16 |
gibi | OK | 17:16 |
dansmith | gibi: well, my translator code already skips VFs | 17:16 |
gibi | not nice but meh | 17:16 |
dansmith | meaning, it won't reserve for VFs.. I could log something there though | 17:16 |
sean-k-mooney | we determin if its a phsyica function by looking at the libvirt nodedev xml when generiging the set of pci dicts and then we filter those by the devspec regexes ectra | 17:17 |
gibi | dansmith: yeah I think we can check in the translator that if the OTU is requested for a VF as that is clearly invalid. Currently your code ignoring it but that is missleading as the user expects that the OTU marked device will be reseved but it won't be | 17:17 |
gibi | so I suggest to raise there and let the deployer know that what is requested not supported | 17:18 |
dansmith | gibi: you mean the deployer.. the user can't see any of this | 17:18 |
dansmith | but okay | 17:18 |
gibi | yeah the deployer | 17:18 |
gibi | sorry | 17:18 |
gibi | I think we already have late input validation for the dependend device handling in the translator (when both the PF and its VFs are matched) | 17:18 |
dansmith | okay | 17:19 |
gibi | https://github.com/openstack/nova/blob/0d484ce37d86e989c8abdf57aec5e334f68206ef/nova/compute/pci_placement_translator.py#L146-L151 | 17:19 |
dansmith | gibi: okay so that's where we try to add children, and explode if we already have the parent? so I can just check OTU there and refuse similarly I guess? | 17:21 |
sean-k-mooney | add child is passed a pcidevce object so it could have also checked dev.dev_type | 17:21 |
dansmith | but no need to even check because if we're calling add_child and one_time_use=true, then fail right? | 17:22 |
sean-k-mooney | i mean you can fail there that pretty late | 17:24 |
dansmith | right but, see above.. when we're checking the devspec early, we don't have enough into to make a determination for patterns | 17:24 |
sean-k-mooney | dansmith: the devspec cant be used to tell if its a pf | 17:25 |
dansmith | ...right, I understand | 17:25 |
sean-k-mooney | but the device we get back form the driver alredy know if its a pf or not | 17:25 |
sean-k-mooney | i think you cna check in here https://github.com/openstack/nova/blob/0d484ce37d86e989c8abdf57aec5e334f68206ef/nova/pci/manager.py#L111 | 17:26 |
gibi | yeah you can check it in the pci manager as well. There we also have the both the dev and it's matching dev_spec | 17:28 |
sean-k-mooney | right roughtly here https://github.com/openstack/nova/blob/0d484ce37d86e989c8abdf57aec5e334f68206ef/nova/pci/manager.py#L135 | 17:28 |
sean-k-mooney | dev is the list of dicts form the virt driver and pci_dev_spec is the matched spec | 17:29 |
dansmith | okay | 17:29 |
sean-k-mooney | so here the dict has been encreced by the tags form the devspec https://github.com/openstack/nova/blob/0d484ce37d86e989c8abdf57aec5e334f68206ef/nova/pci/manager.py#L140 | 17:30 |
sean-k-mooney | and you can add a check before we add it to the devices list | 17:30 |
sean-k-mooney | you will need to add onetime use here https://github.com/openstack/nova/blob/master/nova/pci/devspec.py#L419 as well | 17:31 |
gibi | I would do it in the translator where you filter the VFs out that is the closest to the actual logic assuming no VFs are OTU | 17:31 |
sean-k-mooney | the translator is not the only place we do filtering but that an option yes | 17:32 |
gibi | the OTU feature hard depends on the translator | 17:32 |
gibi | it acutally implementes the OTU reservation | 17:32 |
sean-k-mooney | ya which is weiered to me because i expect this to really be done in the pci manager | 17:33 |
gibi | as we cannot put it early in the dev_spec parsing, the second best thing for me would be to put the check to the same place where the reserveation logic is | 17:33 |
sean-k-mooney | but i get why its doen that way | 17:33 |
dansmith | gibi: yeah that feels a lot cleaner to me, just because it's so contextually-relevant there | 17:33 |
gibi | if we implement OTU with the PCI tracker then I would agree with you, but we are going with implementing it in Placement | 17:33 |
gibi | dansmith: yepp | 17:34 |
sean-k-mooney | i kind of hate that we have any pci logic out soid eof the pci and hardware.py modules but that ship has sailed... | 17:34 |
gibi | sean-k-mooney: it is not hard to move the pci_placement_translator under the pci module | 17:38 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add functional reproducer for bug 2102038 https://review.opendev.org/c/openstack/nova/+/944061 | 17:39 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Multiple spec per PCI alias limitation https://review.opendev.org/c/openstack/nova/+/944062 | 17:39 |
sean-k-mooney | ^ is not valid config | 17:55 |
sean-k-mooney | gibi: we can have two differntly named aliases refer to the same vedor_id and product_id | 17:56 |
sean-k-mooney | but you cant have 2 alises with the same name and differnt contett | 17:57 |
gibi | sean-k-mooney: you can :) | 18:00 |
gibi | not with PCI in Placement but without it | 18:00 |
sean-k-mooney | it was a bug if it was ever there | 18:00 |
sean-k-mooney | it was never inteded to be supproted | 18:00 |
gibi | the spec in the alias is a list | 18:00 |
sean-k-mooney | yes | 18:00 |
sean-k-mooney | but that is so you can define multiple | 18:01 |
gibi | nope a single named alias can have a list of attached spec dicts | 18:01 |
sean-k-mooney | it was always ment to map to exactly one value | 18:01 |
sean-k-mooney | that not how it was intended to work | 18:01 |
gibi | I have to drop now but we can discuss it tomorrow | 18:02 |
gibi | https://github.com/openstack/nova/blob/0d484ce37d86e989c8abdf57aec5e334f68206ef/nova/pci/request.py#L172 | 18:03 |
gibi | this is very nova appends more spec to the same alias | 18:03 |
gibi | s/very/where/ | 18:03 |
* gibi disappears | 18:03 | |
sean-k-mooney | ya i just found that | 18:04 |
sean-k-mooney | i can tell you tha tthat is very problematic | 18:04 |
sean-k-mooney | sriov live migraiton and the gpu live migration depend on that never happeinging | 18:05 |
sean-k-mooney | it looks like that was added before i started workign on nova in the very early days of pci support | 18:06 |
sean-k-mooney | but that breaks other assumtions made over the years | 18:06 |
sean-k-mooney | i guess it does not break sriov live migration because of the hotplug | 18:09 |
sean-k-mooney | the vgpu live migraiton does not expect this nor does pci in placment | 18:09 |
gibi | yeah pci in placement explicitly forbid this, just not with the proper response hence the bug and reproduce above | 18:12 |
sean-k-mooney | we need to add a check to prevent merging aliais with diffent resouce classes | 18:12 |
sean-k-mooney | like https://github.com/openstack/nova/blob/0d484ce37d86e989c8abdf57aec5e334f68206ef/nova/pci/request.py#L168-L170 | 18:13 |
gibi | we have a clear reject in the pci in placement code already https://github.com/openstack/nova/blob/0d484ce37d86e989c8abdf57aec5e334f68206ef/nova/objects/request_spec.py#L504-L528 | 18:13 |
gibi | just the exception is not translated properly hence the 500 return from nova-api | 18:14 |
sean-k-mooney | that failign for a diffent reason | 18:14 |
sean-k-mooney | the check is vlaid there | 18:14 |
sean-k-mooney | but this really should be a config error in other case | 18:14 |
sean-k-mooney | but ya lets pick this up again tomorrow | 18:14 |
sean-k-mooney | there are several thinkg our config parsing has allwoed over the years that i woudl condier invliad an dthis is right at the top of the list for me | 18:15 |
sean-k-mooney | specificly merging 2 alisie with a diffent product_id | 18:16 |
sean-k-mooney | merging none conflictign values is kind of ok | 18:17 |
sean-k-mooney | but the orign behavior is not in my opipion | 18:17 |
opendevreview | Dan Smith proposed openstack/nova master: Support "one-time-use" PCI devices https://review.opendev.org/c/openstack/nova/+/943816 | 18:48 |
opendevreview | mitya-eremeev-2 proposed openstack/nova master: Add more thorough handling of evacuated instances https://review.opendev.org/c/openstack/nova/+/944058 | 20:46 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!