Friday, 2026-02-06

LarsErikPmelwitt: sean-k-mooney: ok, so I read through the backlog mot thourough now. tl;dr you have already filed a bug so I don't need to do that, right? And I see there is a patch submitted. I'll see what I can do, testing on hardware here08:08
LarsErikPhmmm.. not really lucky so far. It might be because I'm applying the patches to a Expoy installation? I've patched the two files in question on my servers running nova-api, scheduler and conductor. Limits are still not enforced08:49
LarsErikPand, when I add filter_scheduler/pci_in_placement to nova.conf on my nova-api server (i'm not running scheduler etc here) I get NoValidHost and this: https://paste.openstack.org/show/b0WQMUvscHxq6VXtGafT/08:50
LarsErikPah. nvm. I had to patch the pci/stats.py on the compute-node as well (from the other bug I mentioned earlier)09:00
LarsErikPso - with the correct config and patches on the correct hosts here. The patches seems to work09:06
LarsErikPthe limit on my CUSTOM_K80 resource class was enforced, and the only thing I had in the flavor was the pci_passthrough alias09:06
gibisean-k-mooney: melwitt: thanks for the pci in placement vs unified limit investigation. I agree with your conclusions. I think we should be able to validate the PCI resources in the nova-api as the transaltion from PCI alias to InstancePCIRequest happens in the nova-api already09:18
gibiyou can count on my reviews in thefix09:19
opendevreviewAntonin Ruan proposed openstack/nova-specs master: Temp URL download spec  https://review.opendev.org/c/openstack/nova-specs/+/97588309:22
LarsErikPgibi: fwiw I think I've managed to do a valid test on hardware on my side as well. Confirm it works09:29
gibiLarsErikP: nice. Thanks for doing the testing09:30
LarsErikPno problem! Happy to see how fast this was investigated and had a proposed fix! I'm impressed =)09:32
LarsErikPI kinda assumed I'd just did something wrong on the configuration side anyways :P09:33
gibiLarsErikP: sometimes nova team can act quickly :)09:52
gibisean-k-mooney: melwitt: I left review comments in the pci in placement vs unified limits fix. I feel that the current proposed fix is a bit hackish and there should be a better way to use the existing code infra to get the data the unified limits codepath needs09:53
sean-k-mooneygibi: there is which would be to not creat the fake request spec10:40
sean-k-mooneygibi: and proably other ways10:40
gibisean-k-mooney: yeah I would go with creating a proper request_spec via from_components10:43
gibithat does the PCI translation automatically10:44
opendevreviewHiroshi Tsuji proposed openstack/nova master: Limit virtio-net multiqueue per instance  https://review.opendev.org/c/openstack/nova/+/97589010:46
sean-k-mooneyyep i was just saying that in the review i dont like that resources_from_flavor and resources_for_limits exits10:47
sean-k-mooney]to me that is the root of the bug we shoudl always have been usign the full request spec with teh full set of requirements for united limits10:47
sean-k-mooneygibi: the issue with that is this is happenging very early in the api10:49
sean-k-mooneyi dont know if we have created a request spec yet or if we are still dealing with the build_request ectra10:49
sean-k-mooneythe current check is way before we call the conductor and in the current call sigte we do not have the instnace object but its proably not too far away10:50
opendevreviewHiroshi Tsuji proposed openstack/nova master: Limit virtio-net multiqueue per instance  https://review.opendev.org/c/openstack/nova/+/97589010:51
opendevreviewHiroshi Tsuji proposed openstack/nova master: Limit virtio-net multiqueue per instance  https://review.opendev.org/c/openstack/nova/+/97589010:53
sean-k-mooneygibi: thanks for the feedback ill set aside some tiem to fiure out if we can sanly build the full request spec and geta ll the resouce to fix this more generally. i noted that he current patch only fixes pci but there is also neutorn ports and cyborg and i was going to have to fix those in later patches anyway10:57
sean-k-mooneyill see if i can start by expanding the repoducer with neutron qos and cybrog examples. and then we can go from there10:58
opendevreviewHiroshi Tsuji proposed openstack/nova master: Limit virtio-net multiqueue via flavor and image metadata  https://review.opendev.org/c/openstack/nova/+/97589411:22
opendevreviewHiroshi Tsuji proposed openstack/os-traits master: Add trait for virtio-net multiqueue limit  https://review.opendev.org/c/openstack/os-traits/+/97589511:26
opendevreviewHiroshi Tsuji proposed openstack/os-traits master: Add trait for virtio-net multiqueue limit  https://review.opendev.org/c/openstack/os-traits/+/97589511:31
opendevreviewMerged openstack/nova master: Support os-vif TAP pre-creation for OVS/OVN ports  https://review.opendev.org/c/openstack/nova/+/97314911:35
sean-k-mooneyoh  :) ralonsoh ^11:36
ralonsohsean-k-mooney, checking now11:37
sean-k-mooneyralonsoh: so everything on the nova/os-vif side is now merged 11:37
ralonsohyeah, ykarel send me https://review.opendev.org/c/openstack/neutron/+/956910 to review now11:37
ralonsohbecause everything is merged11:37
sean-k-mooneycool11:38
nisha04Hi nova core reviewers, could you review the patch about exception handling in nova cache https://review.opendev.org/c/openstack/nova/+/974695 when you have time. Thanks!12:58
opendevreviewBalazs Gibizer proposed openstack/nova master: Compute to use a common long running task executor  https://review.opendev.org/c/openstack/nova/+/97590713:10
amorinhey sean, me again on the pci group / gpu topic. I am wondering if there is something out there regarding managing the nvswitch configuration from nova?13:56
amorinI have this small FabricManagerClient that I can use to split the GPU in multiples partitions, but afaik for now, it needs to be done outside of nova13:58
sean-k-mooneyamorin: no and we do not plan to ever do that14:00
sean-k-mooneynot in nova at least14:00
amorinso the recommendation is to freeze the config to fit the flavors needs by aggregate or something like this?14:01
sean-k-mooneyamorin: i dont currently have access to nvsiwch capable hardware toe test but on your systems are you able to reconfigure the intergpu connections 14:02
amorinyup14:02
sean-k-mooneyi.e. dynmiclly change the grouping of the gpus via nvswirch at runtime14:02
amorinexactly14:02
sean-k-mooneyok so we will proably end up implemnting suprpot for that in cyborg then14:02
sean-k-mooneyamorin:i woudl love to talk to you about this in detail but i have to join a call14:04
amorinno worries, we can talk later :)14:04
amorinI dont use cyborg on my side, I need to dig into this14:04
sean-k-mooneyamorin: 3 and a half hours of back to back meeting means my brian is starting to get tired15:54
sean-k-mooneyamorin: cyborg has been mostly dormant for the last few years15:55
sean-k-mooneyamorin: of the topic that the "compute team" as in nova/placmenet teams  will be dicssing in the the ptg is how we evolve acclerator manage in openstack and where the lines between nova and cyborg and other service would be.15:56
sean-k-mooneyamorin: as it stands doing nvswich managemetn is out of socpe fo where the nova core team currently want to drwaw the bondary and it possible in scope for cyborg perhaps even in the next 12-18 months15:57
amorinbased on what I can read from cyborg doc, it would make sense15:57
sean-k-mooneyamorin: the reaosn im phrasing it that wasy is i dont think my team will have time to get to it next cycel 2026.2 but its actully in our top 5 ish thigns to look at implemnteing15:58
amorinwhile it's not yet sure, we may have someone ready to work on this on our side, so maybe that could help15:59
sean-k-mooneyright now we are in the process of perusingt test/dev hardware for gpus and other things and that will take some tiem to get in labs but if you have such hardware i woudl be very interested to learn about what it can do and how you would like to use it15:59
sean-k-mooneyalso yes many hand make light work and all16:00
*** erlon1 is now known as erlon16:01
amorinI will have to go soon, but maybe I can prepare some data to explain what we want to achieve with the hardware we have16:02
sean-k-mooneysure. i ame plannign to have a cybrog ptg adn possibel nova cybrog cross project session for the next ptg16:03
sean-k-mooneybut i would be happy to hop on a call or continue this at another time on irc or mail before then16:03
sean-k-mooneyi assume that OVN has alot of customer demand for ai workloads and this woudl help operational solve the gpu allocation problem16:04
sean-k-mooneyor rather the time it take to prepare gpu for custoemrs16:05
opendevreviewBalazs Gibizer proposed openstack/nova master: Create an executor wrapper that has task limit per type  https://review.opendev.org/c/openstack/nova/+/97592416:13
gibisean-k-mooney: gmaan: dansmith: bauzas: I added example implementations for our options in https://etherpad.opendev.org/p/eventlet-executor-semaphore-limit I made couple of discoveries (option B and C is non viable, option D is cheaper than I thought) 16:15
gibiI drop now but I will read back later here or in the etherpad or in the patches16:15
sean-k-mooneygibi: ya i think that a monday me problem16:16
sean-k-mooneyill try and take a look but im mostly done for the day i think. enjoy your weekend16:17
gmaangibi: ack, I will check16:17
dansmithgibi: sorry I'm pretty deep on my stack and haven't looked at any of it16:19
opendevreviewTobias Urdin proposed openstack/nova master: Add a timeout for `ceph df` command  https://review.opendev.org/c/openstack/nova/+/97592716:52
opendevreviewLajos Katona proposed openstack/nova master: Add regression test to repoduce bug 2140537  https://review.opendev.org/c/openstack/nova/+/97583217:03
opendevreviewLajos Katona proposed openstack/nova master: WIP: Fix for bug 2140537  https://review.opendev.org/c/openstack/nova/+/97593417:16
-opendevstatus- NOTICE: Gerrit on review.opendev.org will experience a short outage while we upgrade it to 3.11.817:50
opendevreviewsean mooney proposed openstack/nova master: Add regression test for unified limits resource bug  https://review.opendev.org/c/openstack/nova/+/97585920:11
opendevreviewsean mooney proposed openstack/nova master: Fix unified limits to include all resource types  https://review.opendev.org/c/openstack/nova/+/97596120:11
opendevreviewsean mooney proposed openstack/nova master: Fix unified limits to include all resource types  https://review.opendev.org/c/openstack/nova/+/97587220:13
gibithanks folks, no worries. Have a nice weekend you all20:25
gibiit seems we have a broken gate https://zuul.opendev.org/t/openstack/builds?job_name=openstacksdk-functional-devstack&skip=021:22
melwitttesttools.matchers._impl.MismatchError: 'tenant_id' not in {'id': '57671080-6eb2-4c7f-9e64-c61edfa8995e', 'project_id': '1ef2e6c3fc2d4f6b92fdd8d49f0a05a1', 'port_id': '5b39bbfc-6d3d-440e-9c7a-e56e434ef897', 'network_id': '8aa6b6e2-e0c5-4f96-bb87-5509e30d340c', 'subnet_id': '6f52d8be-cf68-4552-b57d-37142cea65cc', 'subnet_ids': ['6f52d8be-cf68-4552-b57d-37142cea65cc']}22:03
melwittlooks like something removed tenant_id from something and this one openstacksdk test expects it to be there22:04
opendevreviewmelanie witt proposed openstack/nova stable/2024.2: libvirt: Get info with abs path, rebase with rel path  https://review.opendev.org/c/openstack/nova/+/97597222:09
gmaanand since 1.5 hrs it started failing22:13
melwittI'm looking around to try to find what caused it22:14
gmaanthanks22:17
melwittit looks like it might have been this or something similar https://review.opendev.org/c/openstack/neutron/+/97298222:21
melwittthe timing doesn't seem to match though22:21
melwittthe failing test adds a router interface and then expects tenant_id to be in the response https://github.com/openstack/openstacksdk/blob/master/openstack/tests/functional/cloud/test_router.py#L20522:22
melwittso I was looking for neutron API type of changes that could have caused a change in the response parameters22:23
gmaanyeah, this BP seems removing it from API as well https://blueprints.launchpad.net/neutron/+spec/keystone-v322:28
gmaanit seems unnecessary APi breaking change but if neutron team have decided to do it then I think we can fix the sdk test 22:28
melwittyeah, I wondered if they don't have more backward compat requirements with no microversion?22:30
gmaanyeah, it should have been done with microversion at least22:31
gmaanmaybe we should skip this job to unblock gate and leave on neutron and sdk team to fix it either sdk test or neutron API22:33
gmaanit blocked devstack gate also22:33
melwittyeah, I see it failing in cinder, glance, swift, also22:34
gmaanyeah, i remember on one of  backward incompatible failure, we added the sdks jobs in most of the project22:35
melwittyeah, we need to add it in neutron project :)22:36
gmaanah, we are missing at the right place :)22:37
melwittI'm not really sure how the change happened, it's not reflected on their api-ref https://docs.openstack.org/api-ref/network/v2/index.html#add-interface-to-router so I'm thinking it might be something where a db or object change goes straight through to the API :/22:38
gmaanmelwitt: i added it in integrated template but neutron seems not using that integrated one. I think I should propose it to re-add that https://review.opendev.org/c/openstack/tempest/+/801920/2/zuul.d/integrated-gate.yaml#31222:40
melwittah I see22:41
gmaanI think non of the neutron member is online now, maybe good to send it in ML.22:45
gmaan I need to go to pick my son from school22:45
*** gmaan is now known as gmaan_afk22:46
melwittsure I will email ML22:46

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!