| LarsErikP | melwitt: sean-k-mooney: ok, so I read through the backlog mot thourough now. tl;dr you have already filed a bug so I don't need to do that, right? And I see there is a patch submitted. I'll see what I can do, testing on hardware here | 08:08 |
|---|---|---|
| LarsErikP | hmmm.. not really lucky so far. It might be because I'm applying the patches to a Expoy installation? I've patched the two files in question on my servers running nova-api, scheduler and conductor. Limits are still not enforced | 08:49 |
| LarsErikP | and, when I add filter_scheduler/pci_in_placement to nova.conf on my nova-api server (i'm not running scheduler etc here) I get NoValidHost and this: https://paste.openstack.org/show/b0WQMUvscHxq6VXtGafT/ | 08:50 |
| LarsErikP | ah. nvm. I had to patch the pci/stats.py on the compute-node as well (from the other bug I mentioned earlier) | 09:00 |
| LarsErikP | so - with the correct config and patches on the correct hosts here. The patches seems to work | 09:06 |
| LarsErikP | the limit on my CUSTOM_K80 resource class was enforced, and the only thing I had in the flavor was the pci_passthrough alias | 09:06 |
| gibi | sean-k-mooney: melwitt: thanks for the pci in placement vs unified limit investigation. I agree with your conclusions. I think we should be able to validate the PCI resources in the nova-api as the transaltion from PCI alias to InstancePCIRequest happens in the nova-api already | 09:18 |
| gibi | you can count on my reviews in thefix | 09:19 |
| opendevreview | Antonin Ruan proposed openstack/nova-specs master: Temp URL download spec https://review.opendev.org/c/openstack/nova-specs/+/975883 | 09:22 |
| LarsErikP | gibi: fwiw I think I've managed to do a valid test on hardware on my side as well. Confirm it works | 09:29 |
| gibi | LarsErikP: nice. Thanks for doing the testing | 09:30 |
| LarsErikP | no problem! Happy to see how fast this was investigated and had a proposed fix! I'm impressed =) | 09:32 |
| LarsErikP | I kinda assumed I'd just did something wrong on the configuration side anyways :P | 09:33 |
| gibi | LarsErikP: sometimes nova team can act quickly :) | 09:52 |
| gibi | sean-k-mooney: melwitt: I left review comments in the pci in placement vs unified limits fix. I feel that the current proposed fix is a bit hackish and there should be a better way to use the existing code infra to get the data the unified limits codepath needs | 09:53 |
| sean-k-mooney | gibi: there is which would be to not creat the fake request spec | 10:40 |
| sean-k-mooney | gibi: and proably other ways | 10:40 |
| gibi | sean-k-mooney: yeah I would go with creating a proper request_spec via from_components | 10:43 |
| gibi | that does the PCI translation automatically | 10:44 |
| opendevreview | Hiroshi Tsuji proposed openstack/nova master: Limit virtio-net multiqueue per instance https://review.opendev.org/c/openstack/nova/+/975890 | 10:46 |
| sean-k-mooney | yep i was just saying that in the review i dont like that resources_from_flavor and resources_for_limits exits | 10:47 |
| sean-k-mooney | ]to me that is the root of the bug we shoudl always have been usign the full request spec with teh full set of requirements for united limits | 10:47 |
| sean-k-mooney | gibi: the issue with that is this is happenging very early in the api | 10:49 |
| sean-k-mooney | i dont know if we have created a request spec yet or if we are still dealing with the build_request ectra | 10:49 |
| sean-k-mooney | the current check is way before we call the conductor and in the current call sigte we do not have the instnace object but its proably not too far away | 10:50 |
| opendevreview | Hiroshi Tsuji proposed openstack/nova master: Limit virtio-net multiqueue per instance https://review.opendev.org/c/openstack/nova/+/975890 | 10:51 |
| opendevreview | Hiroshi Tsuji proposed openstack/nova master: Limit virtio-net multiqueue per instance https://review.opendev.org/c/openstack/nova/+/975890 | 10:53 |
| sean-k-mooney | gibi: thanks for the feedback ill set aside some tiem to fiure out if we can sanly build the full request spec and geta ll the resouce to fix this more generally. i noted that he current patch only fixes pci but there is also neutorn ports and cyborg and i was going to have to fix those in later patches anyway | 10:57 |
| sean-k-mooney | ill see if i can start by expanding the repoducer with neutron qos and cybrog examples. and then we can go from there | 10:58 |
| opendevreview | Hiroshi Tsuji proposed openstack/nova master: Limit virtio-net multiqueue via flavor and image metadata https://review.opendev.org/c/openstack/nova/+/975894 | 11:22 |
| opendevreview | Hiroshi Tsuji proposed openstack/os-traits master: Add trait for virtio-net multiqueue limit https://review.opendev.org/c/openstack/os-traits/+/975895 | 11:26 |
| opendevreview | Hiroshi Tsuji proposed openstack/os-traits master: Add trait for virtio-net multiqueue limit https://review.opendev.org/c/openstack/os-traits/+/975895 | 11:31 |
| opendevreview | Merged openstack/nova master: Support os-vif TAP pre-creation for OVS/OVN ports https://review.opendev.org/c/openstack/nova/+/973149 | 11:35 |
| sean-k-mooney | oh :) ralonsoh ^ | 11:36 |
| ralonsoh | sean-k-mooney, checking now | 11:37 |
| sean-k-mooney | ralonsoh: so everything on the nova/os-vif side is now merged | 11:37 |
| ralonsoh | yeah, ykarel send me https://review.opendev.org/c/openstack/neutron/+/956910 to review now | 11:37 |
| ralonsoh | because everything is merged | 11:37 |
| sean-k-mooney | cool | 11:38 |
| nisha04 | Hi nova core reviewers, could you review the patch about exception handling in nova cache https://review.opendev.org/c/openstack/nova/+/974695 when you have time. Thanks! | 12:58 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Compute to use a common long running task executor https://review.opendev.org/c/openstack/nova/+/975907 | 13:10 |
| amorin | hey sean, me again on the pci group / gpu topic. I am wondering if there is something out there regarding managing the nvswitch configuration from nova? | 13:56 |
| amorin | I have this small FabricManagerClient that I can use to split the GPU in multiples partitions, but afaik for now, it needs to be done outside of nova | 13:58 |
| sean-k-mooney | amorin: no and we do not plan to ever do that | 14:00 |
| sean-k-mooney | not in nova at least | 14:00 |
| amorin | so the recommendation is to freeze the config to fit the flavors needs by aggregate or something like this? | 14:01 |
| sean-k-mooney | amorin: i dont currently have access to nvsiwch capable hardware toe test but on your systems are you able to reconfigure the intergpu connections | 14:02 |
| amorin | yup | 14:02 |
| sean-k-mooney | i.e. dynmiclly change the grouping of the gpus via nvswirch at runtime | 14:02 |
| amorin | exactly | 14:02 |
| sean-k-mooney | ok so we will proably end up implemnting suprpot for that in cyborg then | 14:02 |
| sean-k-mooney | amorin:i woudl love to talk to you about this in detail but i have to join a call | 14:04 |
| amorin | no worries, we can talk later :) | 14:04 |
| amorin | I dont use cyborg on my side, I need to dig into this | 14:04 |
| sean-k-mooney | amorin: 3 and a half hours of back to back meeting means my brian is starting to get tired | 15:54 |
| sean-k-mooney | amorin: cyborg has been mostly dormant for the last few years | 15:55 |
| sean-k-mooney | amorin: of the topic that the "compute team" as in nova/placmenet teams will be dicssing in the the ptg is how we evolve acclerator manage in openstack and where the lines between nova and cyborg and other service would be. | 15:56 |
| sean-k-mooney | amorin: as it stands doing nvswich managemetn is out of socpe fo where the nova core team currently want to drwaw the bondary and it possible in scope for cyborg perhaps even in the next 12-18 months | 15:57 |
| amorin | based on what I can read from cyborg doc, it would make sense | 15:57 |
| sean-k-mooney | amorin: the reaosn im phrasing it that wasy is i dont think my team will have time to get to it next cycel 2026.2 but its actully in our top 5 ish thigns to look at implemnteing | 15:58 |
| amorin | while it's not yet sure, we may have someone ready to work on this on our side, so maybe that could help | 15:59 |
| sean-k-mooney | right now we are in the process of perusingt test/dev hardware for gpus and other things and that will take some tiem to get in labs but if you have such hardware i woudl be very interested to learn about what it can do and how you would like to use it | 15:59 |
| sean-k-mooney | also yes many hand make light work and all | 16:00 |
| *** erlon1 is now known as erlon | 16:01 | |
| amorin | I will have to go soon, but maybe I can prepare some data to explain what we want to achieve with the hardware we have | 16:02 |
| sean-k-mooney | sure. i ame plannign to have a cybrog ptg adn possibel nova cybrog cross project session for the next ptg | 16:03 |
| sean-k-mooney | but i would be happy to hop on a call or continue this at another time on irc or mail before then | 16:03 |
| sean-k-mooney | i assume that OVN has alot of customer demand for ai workloads and this woudl help operational solve the gpu allocation problem | 16:04 |
| sean-k-mooney | or rather the time it take to prepare gpu for custoemrs | 16:05 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Create an executor wrapper that has task limit per type https://review.opendev.org/c/openstack/nova/+/975924 | 16:13 |
| gibi | sean-k-mooney: gmaan: dansmith: bauzas: I added example implementations for our options in https://etherpad.opendev.org/p/eventlet-executor-semaphore-limit I made couple of discoveries (option B and C is non viable, option D is cheaper than I thought) | 16:15 |
| gibi | I drop now but I will read back later here or in the etherpad or in the patches | 16:15 |
| sean-k-mooney | gibi: ya i think that a monday me problem | 16:16 |
| sean-k-mooney | ill try and take a look but im mostly done for the day i think. enjoy your weekend | 16:17 |
| gmaan | gibi: ack, I will check | 16:17 |
| dansmith | gibi: sorry I'm pretty deep on my stack and haven't looked at any of it | 16:19 |
| opendevreview | Tobias Urdin proposed openstack/nova master: Add a timeout for `ceph df` command https://review.opendev.org/c/openstack/nova/+/975927 | 16:52 |
| opendevreview | Lajos Katona proposed openstack/nova master: Add regression test to repoduce bug 2140537 https://review.opendev.org/c/openstack/nova/+/975832 | 17:03 |
| opendevreview | Lajos Katona proposed openstack/nova master: WIP: Fix for bug 2140537 https://review.opendev.org/c/openstack/nova/+/975934 | 17:16 |
| -opendevstatus- NOTICE: Gerrit on review.opendev.org will experience a short outage while we upgrade it to 3.11.8 | 17:50 | |
| opendevreview | sean mooney proposed openstack/nova master: Add regression test for unified limits resource bug https://review.opendev.org/c/openstack/nova/+/975859 | 20:11 |
| opendevreview | sean mooney proposed openstack/nova master: Fix unified limits to include all resource types https://review.opendev.org/c/openstack/nova/+/975961 | 20:11 |
| opendevreview | sean mooney proposed openstack/nova master: Fix unified limits to include all resource types https://review.opendev.org/c/openstack/nova/+/975872 | 20:13 |
| gibi | thanks folks, no worries. Have a nice weekend you all | 20:25 |
| gibi | it seems we have a broken gate https://zuul.opendev.org/t/openstack/builds?job_name=openstacksdk-functional-devstack&skip=0 | 21:22 |
| melwitt | testtools.matchers._impl.MismatchError: 'tenant_id' not in {'id': '57671080-6eb2-4c7f-9e64-c61edfa8995e', 'project_id': '1ef2e6c3fc2d4f6b92fdd8d49f0a05a1', 'port_id': '5b39bbfc-6d3d-440e-9c7a-e56e434ef897', 'network_id': '8aa6b6e2-e0c5-4f96-bb87-5509e30d340c', 'subnet_id': '6f52d8be-cf68-4552-b57d-37142cea65cc', 'subnet_ids': ['6f52d8be-cf68-4552-b57d-37142cea65cc']} | 22:03 |
| melwitt | looks like something removed tenant_id from something and this one openstacksdk test expects it to be there | 22:04 |
| opendevreview | melanie witt proposed openstack/nova stable/2024.2: libvirt: Get info with abs path, rebase with rel path https://review.opendev.org/c/openstack/nova/+/975972 | 22:09 |
| gmaan | and since 1.5 hrs it started failing | 22:13 |
| melwitt | I'm looking around to try to find what caused it | 22:14 |
| gmaan | thanks | 22:17 |
| melwitt | it looks like it might have been this or something similar https://review.opendev.org/c/openstack/neutron/+/972982 | 22:21 |
| melwitt | the timing doesn't seem to match though | 22:21 |
| melwitt | the failing test adds a router interface and then expects tenant_id to be in the response https://github.com/openstack/openstacksdk/blob/master/openstack/tests/functional/cloud/test_router.py#L205 | 22:22 |
| melwitt | so I was looking for neutron API type of changes that could have caused a change in the response parameters | 22:23 |
| gmaan | yeah, this BP seems removing it from API as well https://blueprints.launchpad.net/neutron/+spec/keystone-v3 | 22:28 |
| gmaan | it seems unnecessary APi breaking change but if neutron team have decided to do it then I think we can fix the sdk test | 22:28 |
| melwitt | yeah, I wondered if they don't have more backward compat requirements with no microversion? | 22:30 |
| gmaan | yeah, it should have been done with microversion at least | 22:31 |
| gmaan | maybe we should skip this job to unblock gate and leave on neutron and sdk team to fix it either sdk test or neutron API | 22:33 |
| gmaan | it blocked devstack gate also | 22:33 |
| melwitt | yeah, I see it failing in cinder, glance, swift, also | 22:34 |
| gmaan | yeah, i remember on one of backward incompatible failure, we added the sdks jobs in most of the project | 22:35 |
| melwitt | yeah, we need to add it in neutron project :) | 22:36 |
| gmaan | ah, we are missing at the right place :) | 22:37 |
| melwitt | I'm not really sure how the change happened, it's not reflected on their api-ref https://docs.openstack.org/api-ref/network/v2/index.html#add-interface-to-router so I'm thinking it might be something where a db or object change goes straight through to the API :/ | 22:38 |
| gmaan | melwitt: i added it in integrated template but neutron seems not using that integrated one. I think I should propose it to re-add that https://review.opendev.org/c/openstack/tempest/+/801920/2/zuul.d/integrated-gate.yaml#312 | 22:40 |
| melwitt | ah I see | 22:41 |
| gmaan | I think non of the neutron member is online now, maybe good to send it in ML. | 22:45 |
| gmaan | I need to go to pick my son from school | 22:45 |
| *** gmaan is now known as gmaan_afk | 22:46 | |
| melwitt | sure I will email ML | 22:46 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!