*** tosky is now known as Guest18397 | 07:37 | |
*** tosky_ is now known as tosky | 07:37 | |
jlejeune | hello, I'm also impacted by that bug : https://bugs.launchpad.net/nova/+bug/2084238, do you know if someone begin to work on it ? | 09:41 |
---|---|---|
jlejeune | I confirm that a request-id is set for new instances (created after migration to 2023.1) but that field stays at NULL for instances created before the migration | 10:03 |
gibi | jlejeune: I don't know about any work on that | 10:03 |
jlejeune | ok thanks, maybe it's something the migration tools should take care | 10:04 |
jlejeune | to fill random request-id in for allocated devices | 10:05 |
gibi | jlejeune: i don't have the time to dig into it right now. But I think this is a valid bug that needs to be fixed | 10:12 |
opendevreview | ribaudr proposed openstack/nova master: Reproducer for bug 2114951 https://review.opendev.org/c/openstack/nova/+/952894 | 14:10 |
opendevreview | ribaudr proposed openstack/nova master: Fix bug 2114951 https://review.opendev.org/c/openstack/nova/+/952895 | 14:10 |
opendevreview | Andriy Kurilin proposed openstack/nova master: Reuse 'detail:get_all_tenants' policy in server get api https://review.opendev.org/c/openstack/nova/+/952896 | 14:33 |
sean-k-mooney | the request_id fucntionalty was added a very very long time ago | 14:57 |
sean-k-mooney | jlejeune: this feels like you skipped runnign an online data mighration | 14:58 |
sean-k-mooney | oh the one i was thinking of was a diffent fild https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L221 | 14:59 |
sean-k-mooney | its the uuid on the device not the requst id | 15:00 |
jlejeune | sean-k-mooney: exactly | 15:01 |
sean-k-mooney | the request id should alwasy be set in recent version of openstack | 15:02 |
sean-k-mooney | it was added 11 years ago | 15:03 |
elodilles | sean-k-mooney: sorry for pinging you directly but you already reviewed the patch on master branch o:) could you please take a quick look/review on the stable/* backports of this patch? https://review.opendev.org/q/I56705bce8ee4354cd5cb1577a520c2d1c525f57b | 15:03 |
jlejeune | in recent version, yes it's filled, but not on stein release for example | 15:03 |
sean-k-mooney | jlejeune: it soudl have been there in stein too | 15:04 |
jlejeune | sean-k-mooney: hm | 15:04 |
jlejeune | it's not the case a t all | 15:05 |
sean-k-mooney | elodilles: sure | 15:05 |
elodilles | sean-k-mooney: o// thanks in advance! | 15:05 |
sean-k-mooney | the filed is nullable, when the reques is form a neuton port the request id is the neutron port uuid i belvie | 15:09 |
jlejeune | sean-k-mooney: indeed, the request_id field has been added 11 years ago | 15:09 |
jlejeune | I don't understand why it's null in my situation | 15:09 |
sean-k-mooney | when the request sorce is the flavor i belive it is expecte dto be null but i woudl have to dig itno this in detail | 15:09 |
sean-k-mooney | well there is request id and requeter_id | 15:10 |
sean-k-mooney | your speicificly asking about request.request_id) | 15:10 |
sean-k-mooney | the requester id is the one that hold the neutron port butalso backfil the request_id with a new uuid when we do that | 15:11 |
sean-k-mooney | jlejeune: is this a pci device related to a pci alis or a neutorn port | 15:13 |
jlejeune | to a pci alias | 15:15 |
sean-k-mooney | ack so the alisas was the older usage of this field as far as i recal which is whwere we woudl least expect to need ot back file | 15:17 |
sean-k-mooney | jlejeune: this is where it shoudl be generated https://github.com/openstack/nova/blob/64ca204c9cf497b0dcfff2d3a24b0dd795a57d1d/nova/pci/request.py#L261 | 15:20 |
sean-k-mooney | https://github.com/openstack/nova/commit/ccab6fed463337c029459469c76e92af3b96fa06 | 15:20 |
jlejeune | sean-k-mooney: ok, thanks for your commit, indeed I don't have backported it in my stein sources... that can explain | 15:28 |
sean-k-mooney | oh you did a feature backprot of the pic in placment code | 15:32 |
sean-k-mooney | ya this was new in zed | 15:32 |
jlejeune | ho there are a lot of missing commits: https://review.opendev.org/q/topic:%22bp/pci-device-tracking-in-placement%22 | 15:35 |
sean-k-mooney | it was a non triival feature which is why we did nto backport it downstream in redhat | 15:35 |
sean-k-mooney | well not the only reaon | 15:35 |
sean-k-mooney | we are actully still n the process of fully graducatign it to full support in our antelope based product | 15:36 |
sean-k-mooney | while the final QE works is happeing its "tech preview" althoug it does work | 15:36 |
sean-k-mooney | gibi: stephenfin: if ye have time for a very short patch please review https://review.opendev.org/c/openstack/nova/+/952306 to enable the memballon optimisation when you have time | 15:45 |
sean-k-mooney | Uggla: and when you have time can you mark https://blueprints.launchpad.net/nova/+spec/automatic-memballoon-freeing as approved | 15:45 |
Uggla | sean-k-mooney sure. | 15:46 |
opendevreview | Merged openstack/nova stable/2025.1: [tool] Fix backport validator for non-SLURP https://review.opendev.org/c/openstack/nova/+/951968 | 15:54 |
Uggla | sean-k-mooney I speak a bit too fast. I tracked your SLBP in the etherpad document as approved. But atm I can't set it in launchpad, I'm lacking rights to do it. So I'll set all approved SLBP as soon as I can. | 16:07 |
gibi | sean-k-mooney: is this memballon bp was approved? or you will bring it up for approval on the next nova meeting? | 16:11 |
gibi | my memories are vague | 16:11 |
Uggla | gibi, I approved it. | 16:14 |
Uggla | sorry I = we | 16:14 |
gibi | cool | 16:18 |
gibi | thanks | 16:18 |
opendevreview | Elod Illes proposed openstack/osc-placement master: DNM: gate health test https://review.opendev.org/c/openstack/osc-placement/+/952913 | 16:23 |
gibi | sean-k-mooney: left some questions in the patch. The impl looks good btw. | 16:24 |
opendevreview | Elod Illes proposed openstack/python-novaclient master: DNM: gate health test https://review.opendev.org/c/openstack/python-novaclient/+/952928 | 16:27 |
gibi | sean-k-mooney: FYI I think this will be the first use of the round robin placement a_c strategy out in the field https://bugs.launchpad.net/nova/+bug/2114947 | 16:31 |
opendevreview | Stephen Finucane proposed openstack/nova master: api: Add response body schemas for images APIs https://review.opendev.org/c/openstack/nova/+/952284 | 16:37 |
opendevreview | Stephen Finucane proposed openstack/nova master: api: Separate volume, snapshot and volume attachments https://review.opendev.org/c/openstack/nova/+/952347 | 16:37 |
opendevreview | Stephen Finucane proposed openstack/nova master: api: Add response body schemas for volumes APIs https://review.opendev.org/c/openstack/nova/+/952348 | 16:37 |
opendevreview | Stephen Finucane proposed openstack/nova master: api: Add response body schemas for snapshots APIs https://review.opendev.org/c/openstack/nova/+/952349 | 16:37 |
opendevreview | Stephen Finucane proposed openstack/nova master: api: Add response body schemas for volume attachments APIs https://review.opendev.org/c/openstack/nova/+/952350 | 16:37 |
opendevreview | Stephen Finucane proposed openstack/nova master: tests: Use valid UUIDs for cinder resources https://review.opendev.org/c/openstack/nova/+/952935 | 16:37 |
opendevreview | Stephen Finucane proposed openstack/nova master: api: Only apply "soft" additionalProperties validation to requests https://review.opendev.org/c/openstack/nova/+/952936 | 16:37 |
opendevreview | Stephen Finucane proposed openstack/nova master: api: Correct expected errors https://review.opendev.org/c/openstack/nova/+/951640 | 16:41 |
sean-k-mooney | gibi: di it result in a bug or is it the fix for the bug | 16:46 |
sean-k-mooney | Uggla: i can set it as approved if you want in launchpad. at least in this case | 16:47 |
sean-k-mooney | updated https://blueprints.launchpad.net/nova/+spec/automatic-memballoon-freeing | 16:47 |
Uggla | sean-k-mooney, no it is ok, when I will be able to do it, I will check all approved SLBP and set them accordingly. | 16:48 |
gibi | sean-k-mooney: they just hit the bug, I suggested to tune placement, we will see if this will help them | 16:51 |
sean-k-mooney | gibi: ah you saying it can be fixed by it, i think | 16:51 |
gibi | the reporter is pretty responsive so hopefully we get feedback | 16:51 |
sean-k-mooney | cool | 16:52 |
sean-k-mooney | im not entirly sur if this is the same since it appears to hvae allcoated a device with the worng? resouce class | 16:52 |
gibi | which reminds me that we still not have upstream testing with breadth-first strategy due to the blocked devstack patch I have to go back to | 16:52 |
sean-k-mooney | oh they are not doing it properly | 16:53 |
gibi | sean-k-mooney: originally they had a config issue not having pci_in_placement enabled | 16:53 |
sean-k-mooney | wel | 16:53 |
gibi | now they enabled it and hitting timeout | 16:53 |
sean-k-mooney | they seam to have both a cutom trait and resouce class | 16:53 |
sean-k-mooney | device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4F:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" } | 16:54 |
sean-k-mooney | they shoudl just be useign the reouces class not the triat | 16:54 |
sean-k-mooney | it wont break anything to have both | 16:54 |
gibi | yeah | 16:54 |
gibi | we can follow up with that once they see it working | 16:55 |
sean-k-mooney | but the other thing they are trying to do that kind fo unsupproted | 16:55 |
sean-k-mooney | is allcoatign mutlipel vFs | 16:56 |
sean-k-mooney | so these are gpus | 16:56 |
sean-k-mooney | and they are suing pci passthough to use the VFs | 16:56 |
sean-k-mooney | which i guess means they are trying to use the new way of doing vgpu that we added last cycle | 16:56 |
sean-k-mooney | with managed = no | 16:57 |
sean-k-mooney | but whiel you shoudl be able to have mulitpel devices in one vm | 16:57 |
sean-k-mooney | we have not actully tested that | 16:57 |
sean-k-mooney | i guess there is no really reason form a nova perspecitve why that woudl not work | 16:58 |
sean-k-mooney | we know it did o r does work for generic vfs | 16:58 |
sean-k-mooney | so the only reaosn for it not to work with gpus woudl eb due to a hardware/driver limitaiton | 16:58 |
sean-k-mooney | oh ok they are doign bad things | 16:59 |
sean-k-mooney | openstack flavor create 8xRTX-ADA-48Q --private \ | 16:59 |
sean-k-mooney | --ram 4096 --vcpu 4 --disk 0 \ | 16:59 |
sean-k-mooney | --property "resources:CUSTOM_NVIDIA_RTX6000_ADA_48Q"=1 \ | 16:59 |
sean-k-mooney | --property "trait:CUSTOM_NVIDIA_RTX6000_ADA_48Q"="required" \ | 16:59 |
sean-k-mooney | --property "pci_passthrough:alias"="rtx6000-ada-48q:8" | 16:59 |
sean-k-mooney | openstack flavor set --project admin 8xRTX-ADA-48Q | 16:59 |
sean-k-mooney | they are requesting 8 of the gpus VF via the alisa btu also orverwriting the resouce: and traits request | 16:59 |
sean-k-mooney | they shoudl not be settign | 17:00 |
sean-k-mooney | --property "resources:CUSTOM_NVIDIA_RTX6000_ADA_24Q"=1 \ | 17:00 |
sean-k-mooney | --property "trait:CUSTOM_NVIDIA_RTX6000_ADA_24Q"="required" \ | 17:00 |
sean-k-mooney | or well the relevent lines form the same flavor | 17:00 |
opendevreview | Merged openstack/nova stable/2024.2: [tool] Fix backport validator for non-SLURP https://review.opendev.org/c/openstack/nova/+/951969 | 17:16 |
gibi | yepp they should not set the rc and trait on the flavor just in the alias | 17:16 |
sean-k-mooney | i commented on the bug to that effect | 17:24 |
opendevreview | sean mooney proposed openstack/os-resource-classes master: Add VCPU_SHARES resource class for CPU performance tiering https://review.opendev.org/c/openstack/os-resource-classes/+/952951 | 21:50 |
opendevreview | Merged openstack/nova master: Fix disable memballoon device https://review.opendev.org/c/openstack/nova/+/945621 | 22:51 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!