Thursday, 2026-05-07

opendevreviewJoan Gilabert proposed openstack/cyborg master: Move cyborg-tempest job definitions to cyborg  https://review.opendev.org/c/openstack/cyborg/+/98762010:32
tafkamaxHi I just deployed cyborg and are planning to use it to pass PGPU-s to vm-s for now.10:40
tafkamaxI just talked in kolla chat aswell, as it is active and there are is some knowledge there too.10:41
opendevreviewJoan Gilabert proposed openstack/cyborg master: Move cyborg-tempest job definitions to cyborg  https://review.opendev.org/c/openstack/cyborg/+/98762010:42
tafkamaxThe GPU shows up in `openstack accelerator device show UUID`... (full message at <https://matrix.org/oftc/media/v1/media/download/AW1EbeU-tbWQQQrjpuO6aBPema1ZFNxz1H-Oe8zT3IzpZWc2PNyPKxAaRLOwQuK9lv5coKnS2iHnahg2PqYfBgFCeeSiEWiwAG1hdHJpeC5vcmcvbmZOem9Rc2Zqak9zb2JialhsU0RSWmNB>)10:42
opendevreviewJoan Gilabert proposed openstack/cyborg-tempest-plugin master: Move job definitions to cyborg repo  https://review.opendev.org/c/openstack/cyborg-tempest-plugin/+/98762110:42
tafkamaxthe traits are pretty empty for it though:... (full message at <https://matrix.org/oftc/media/v1/media/download/AUewwyd-zCY46PWfpF0JsBLctL5NmRnri3j9bK6462IJYTg7QZOmF_RjeTDUs9FvS2nB6diGWvBU9MVBF2J4ynFCeeSiF-zAAG1hdHJpeC5vcmcvZ0xUUWRHYUNOdHhzVlp0RENUU2tsY2xT>)10:42
tafkamaxI also tried this accelerator profile... (full message at <https://matrix.org/oftc/media/v1/media/download/ARQew_Tex2aH3YGmBZDRiWwCFloG5Cdc0UaMQ5z9BfAuBKNfoNRQyiIb9zs5J0HZola9q9RhLGjJY1Y3Jb_PGcVCeeSiJUsQAG1hdHJpeC5vcmcvUVdVeEtqaUNXeXBZT2xOd3dnSmZ4c0hv>)10:43
tafkamaxAny thoughts?10:43
jgilabertafkamax, hi! the traits look correct to me. Was the flavor created correctly?11:01
tafkamax openstack flavor show 6943fd8e-08ea-4541-bc11-a167f270e98f... (full message at <https://matrix.org/oftc/media/v1/media/download/AXUWetSMAsXVtnlrnt-ox0XHF4gtEKxjUQ1zhRXjy7dCvygSAi7WqnZJr4jUfNLe3gDhYf-3t8FlaROgiijBg15CeeSjQ8dQAG1hdHJpeC5vcmcvZnZ3ZHdiZ0tkbUhUUnluTUxrRWtJUkpt>)11:03
opendevreviewJoan Gilabert proposed openstack/cyborg master: Move cyborg-tempest job definitions to cyborg  https://review.opendev.org/c/openstack/cyborg/+/98762011:05
opendevreviewJoan Gilabert proposed openstack/cyborg-tempest-plugin master: Move job definitions to cyborg repo  https://review.opendev.org/c/openstack/cyborg-tempest-plugin/+/98762111:05
jgilaberis there any more detail in the nova logs? it looks like placement might be reporting that there are no available gpus11:11
jgilabercould maybe nova be configured to passthrough the gpu as well?11:12
tafkamaxWe havent explicilty configured nova for that.11:13
jgilaberack, thanks, can you also check what 'openstack accelerator arq list' reports?11:13
tafkamaxkolla-ansible 2025.1 deployment type11:14
tafkamaxhmm empty11:15
tafkamaxoh I need to do that beforehand?11:15
chandankumarDo we need to create the device profile with resource class like this https://paste.openstack.org/raw/bJEmpgye27s5h4tEMlTf/11:17
chandankumarfor pci and fake fpga device, we pass resource class11:18
tafkamaxLike PGPU?11:18
chandankumarsorry it should be GPU11:20
tafkamaxok, thanks, so that comes from openstack accelerator device show command11:21
chandankumarhttps://github.com/openstack/cyborg/blob/f111946df6713aa64efa29dd025d47839241c529/cyborg/common/constants.py#L11511:29
chandankumar "PGPU": orc.PGPU,11:29
chandankumar    "VGPU": orc.VGPU,11:29
chandankumarpgpu - for physcial gou11:29
chandankumarpgpu - for physcial gpu and vgpu - for virtual gpu11:30
chandankumarCyborg also have GPU.11:30
tafkamaxI have this in log: 2026-05-07 14:39:05.152 7 WARNING cyborg.accelerator.drivers.gpu.nvidia.sysinfo [-] Unable to load vGPU_type from [gpu_devices] Ensure "enabled_vgpu_types" is set if the gpuis virtualized. but I dont have vgpu enabled. Is this just informational?11:41
tafkamaxfrom nova scheduler then: 2026-05-07 14:40:05.475 1087 ERROR nova.scheduler.client.report [req-3a504845-350b-4532-a33d-f0334e44db4c req-f8f77ce8-89c9-42c9-90e5-a9e7d6eaaf33 204c13dfae0b4214ae00b15a95a5d180 87aa79e7272a4f9b9e66e4582fb28c93 - - default default] Failed to retrieve allocation candidates from placement API for filters:11:45
tafkamaxRequestGroup(aggregates=[],forbidden_aggregates=set([]),forbidden_traits=set([]),in_tree=8b9f6c21-0c4a-458a-8541-d481921c6d08,provider_uuids=[],requester_id=None,required_traits=set([]),resources={MEMORY_MB=16384,VCPU=8},use_same_provider=False),11:45
tafkamaxRequestGroup(aggregates=[],forbidden_aggregates=set([]),forbidden_traits=set([]),in_tree=None,provider_uuids=[],requester_id='device_profile_0',required_traits=set(['CUSTOM_NVIDIA_26B9']),resources={},use_same_provider=True)11:45
chandankumarI am not sure about the first warning. It might be informative message.11:48
tafkamaxDo I need to enable more filters or something?11:48
chandankumarFrom last error, it does not have any resources.11:49
chandankumarNot sure it is linked with fAILED TO retrive allocation error11:50
* tafkamax sent a code block: https://matrix.org/oftc/media/v1/media/download/AeGI7UNDRjSQKyaYmmY-Z1mBBe7wRM3gMOvtpUjizKsu24kuRJkZjISUG5IC_WTUfV0znrjfz4QyJsrfIj7nbK5CeeSl9e8AAG1hdHJpeC5vcmcvWUtEV2p0TVhQc2dNWGRDdGxHWEJYSXZu11:50
chandankumarI will wait for other people to take a look11:50
chandankumarMay be resource type would be PGPU or VGPU11:50
chandankumarcan you try with PGPU?11:51
tafkamaxYeah, I will try. Also, some commands give this output in CLI:11:51
tafkamaxopenstack accelerator device enable 4dc9f844-75d4-4e6c-a252-b9986816cc5b11:51
tafkamax'Proxy' object has no attribute 'enable_device'11:51
tafkamaxIs this expected?11:51
tafkamaxDo I need to the arq bind thing aswell?11:57
chandank`tafkamax: yes, 'Proxy' object has no attribute 'enable_device' similar error is coming in my env with both enable and disable 11:58
chandank`you can open a bug for this on https://bugs.launchpad.net/openstack-cyborg11:59
tafkamaxAww damn12:01
tafkamax openstack accelerator device attribute list12:03
tafkamax'Proxy' object has no attribute 'attributes'12:03
tafkamaxthis also return the same12:03
tafkamaxhttps://bugs.launchpad.net/openstack-cyborg/+bug/215179212:03
chandank`https://paste.openstack.org/raw/bRpUV3OrPTrGUImCllzA/ - opentack accelerator device attribute list output from master devstack vm12:04
tafkamaxCould it be openstacksdk or some client lib version?12:05
tafkamaxI am using from 2025.1 UC:... (full message at <https://matrix.org/oftc/media/v1/media/download/AZYa4b2-0u7aEhweDKgZL_Hoq4YysFkqdAt284_KUhwWGucPCZc6Mi3CuUP0OQA3sVMa9LPSYo5LiskzetFd6NFCeeSm4TQgAG1hdHJpeC5vcmcvRmVaa0JwUENuSWlhUFRjdm5OUm5Bc3h2>)12:06
tafkamaxAccelerator client initialized using OpenStackSDK: <openstack.accelerator.v2._proxy.Proxy object at 0x7f1aad7e4dd0>12:10
tafkamax'Proxy' object has no attribute 'attributes'12:10
tafkamaxusing -vvv12:10
chandank`openstacksdk              4.11.0 is used in master. 12:12
chandank`Can you also add openstacksdk and python-cyborgclient version in the bug? If anyone fixes it, we can backport it back12:12
tafkamaxedited12:14
tafkamax2025.2 UC worked for attribute list12:14
* tafkamax sent a code block: https://matrix.org/oftc/media/v1/media/download/Ac5ss1UANXX71xOr-qPTkqzVWnkfPDTGM0R7-rgAeQAnVFslfRU342ShXaTzcRU2RvKH7B6EXL2KbWVsePcdQmBCeeSnXAiwAG1hdHJpeC5vcmcvQ2ZoZmpPSWdzWk9YVFNwYVlzWHZOcm5o12:14
chandank`we are missing some backport then12:15
tafkamaxbut the enable command seems to be not working indeed12:16
tafkamaxhttps://review.opendev.org/c/openstack/openstacksdk/+/883238?usp=search12:20
tafkamaxdoes this need to be backported?12:20
tafkamaxits april 4 2025, so after 2025.1 ?12:21
jgilaberyes that commit is missing in 2025.1 https://github.com/openstack/openstacksdk/tree/stable/2025.1/openstack/accelerator12:24
chandank`We have a upgrade job, let me push one patch to get the error in CI12:25
tafkamaxBut regarding the enable/disable that just seems to be missing?12:27
tafkamaxE.g. this needs to be looked in the cyborg API spec and implemented i presume12:27
tafkamaxdocs.openstack.org/api-ref/accelerator/#enable-a-device ?12:30
jgilaberfrom a quick glance looks like the controller for that API exists https://github.com/openstack/cyborg/blob/79384661ce73984d2eef05dbee800507d36e997c/cyborg/api/controllers/v2/devices.py#L17312:31
chandank`yes, it is implemented in cyborg side12:31
chandank`something is misisng or broken on cyborgclient side12:31
tafkamaxoh okay12:33
tafkamaxSo: https://github.com/openstack/python-cyborgclient/blob/master/cyborgclient/osc/v2/device.py#L14112:34
sean-k-mooneyya its a know gap12:40
sean-k-mooneythere are a few issues witht eh current cli12:40
sean-k-mooneyand the way its usign the sdk12:40
sean-k-mooneythose are thign we plan to fix over this cycle now that we are trying to more activlly maintian cyborg again12:41
sean-k-mooneytafkamax: thanks for filling the bug it will help to have a backlog orf the sepcific brakages12:42
sean-k-mooneychandank`: P in PGPU stands for phsycial becasue its only used for the physical function but this is somethign we will likely evolved in newer drivers12:43
sean-k-mooneythe type in the device list and the resouce class are not expected to be the same12:43
tafkamaxaha okay, so I should always look for attribute list, when creating a profile12:47
tafkamaxOkay. So I need to enable the device actually for it to be able to "bind" to an VM?12:51
tafkamaxHmm I will try to use an API call for enable then.12:52
sean-k-mooneytafkamax: so the device profile need to match the resouce class used but the atribute api allow you to overied that12:58
sean-k-mooneytafkamax: we will use a default one based on the driver that manages the device but if you wanted to use a diffent resouce class the atibutes api provide a way to cofnigure it12:59
tafkamaxoh okay, so my device is ID 8, I can add a custom attribute to that device ID 8.13:00
sean-k-mooneymy expecation is over then next 12-18 montsh we are goign to revisit how many fo the drivres work and ensure there is a declaritve way to do that via the config file as well13:00
sean-k-mooneytafkamax: yes if you wanted to add custom traits to it you could do that via placement directly but you can also do that via the atibutes api13:00
sean-k-mooneyat least in theory13:00
sean-k-mooneythis is an areay i have not spent too much time on yet and the testing is lite13:01
sean-k-mooneyso if you find bugs please let us know13:01
tafkamaxok13:01
sean-k-mooneyon of the topic we ont have time for this cycle form the PTG was "how to evolve the api"13:02
sean-k-mooneycurrently i think having devices, atribute and deployabels as 3 seperate apis is a bit confusing13:02
sean-k-mooneyi think eventually the devices api would be a better home for atibutes for example13:03
sean-k-mooneyi.e. include the atibute in the device show and proveide /devices/<id>/atibutes subpaths for adding/removing them13:03
tafkamaxso how would you enable the device if the CLI does not work. Via curl to the API endpoint?13:04
sean-k-mooneyso it shoudl be enabled by default13:04
sean-k-mooneybut yes13:04
sean-k-mooneyunfortunetly via curl13:04
sean-k-mooneyso you woudl do an openstack token issue13:04
sean-k-mooneyto get a keyston token and then curl the end point with that token13:04
tafkamaxyep, thats what i was thinking. will do a script for it now for testing13:05
tafkamaxthanks for the quick responses here13:05
sean-k-mooneyjust one point of clarifcaion while the resouce calss is an atibute on the device im not sure that you can modify it today13:06
sean-k-mooneyyou can add and remove addtional atibutes13:06
sean-k-mooneybut im not sure the api allows you to overreid once generated by the driver13:06
sean-k-mooneyand the docs dont actully tell you one way or another so that on my todo list to figure out13:07
sean-k-mooneyill need to go back to the orginal atibutes spec and compare that to the final code13:07
tafkamaxoh okay13:08
tafkamaxHmm I am trying to enable the gpu via API and it gives me an 204:13:48
tafkamax2026-05-07 16:45:59.843 1097 INFO eventlet.wsgi.server [None req-78038ad2-f1fe-4f2c-a984-aa6fd469ab34 204c13dfae0b4214ae00b15a95a5d180 1cc33bde294848818c8a462ad9d221a9 - - default default]  "POST /v2/devices/4dc9f844-75d4-4e6c-a252-b9986816cc5b/enable HTTP/1.1" status: 204  len: 278 time: 0.434958013:48
tafkamaxwhen using device show the status value is empty:| status            |                                                       |13:48
tafkamaxI think it might be enabled by default. Not sure though14:18
tafkamaxI found the placement API requst from logs: 14:28
tafkamax2026-05-07 17:19:35.782 1081 INFO placement.requestlog [req-e9714860-9a55-4040-b055-1402f814745d req-c938965f-e5f0-4e5b-854c-6399de72e6e2 f28c1a6ab3704064bd656bdd2d3db679 9e4bc1e8a4ef469695c83b671c090a34 - - default default] redacted "GET14:28
tafkamax/allocation_candidates?in_tree=8b9f6c21-0c4a-458a-8541-d481921c6d08&limit=1000&requireddevice_profile_0=CUSTOM_NVIDIA_26B9&resources=MEMORY_MB%3A16384%2CVCPU%3A8&resourcesdevice_profile_0=PGPU%3A1&root_required=COMPUTE_ACCELERATORS%2C%21COMPUTE_STATUS_DISABLED" status: 200 len: 53 microversion: 1.3614:28
tafkamaxi need to see what this returns14:29
tafkamaxOk I modified the query14:50
tafkamaxand removed root_required14:50
tafkamaxand got results14:50
tafkamaxwait maybe the compute node is disabled because it was in maintenance 😅😅😅😅14:51
tafkamaxEnabled the hypervisor and try again14:54
tafkamaxSeems like it booted.14:55
tafkamaxthe VM14:55
tafkamaxAnd the VM can see the device in lspci!14:57
tafkamaxroot@test-vm-gpu:~# lspci... (full message at <https://matrix.org/oftc/media/v1/media/download/AYWYRn0ernbP2iFEup_KDgOi8sTs-6dPqBKmAXcisI3S1lBaDW-A9B5u8Dr7jSbXxWudb-A2N9Xzt87A53zswTVCeeSwsnywAG1hdHJpeC5vcmcvdktwZlBsT0FKSmdXZ1VqT0RqRVdSUnh5>)14:58
opendevreviewsean mooney proposed openstack/cyborg master: Fix rule:allow policy bypass on device/deployable/attribute APIs  https://review.opendev.org/c/openstack/cyborg/+/98768015:05
opendevreviewsean mooney proposed openstack/cyborg master: Set project_id on ARQ creation and binding  https://review.opendev.org/c/openstack/cyborg/+/98768115:05
opendevreviewsean mooney proposed openstack/cyborg master: Add project_id backfill for existing ARQs  https://review.opendev.org/c/openstack/cyborg/+/98768215:05
opendevreviewsean mooney proposed openstack/cyborg master: Enforce project-scoped access for ARQs  https://review.opendev.org/c/openstack/cyborg/+/98768315:05
opendevreviewsean mooney proposed openstack/cyborg master: Require service token for bound ARQ operations  https://review.opendev.org/c/openstack/cyborg/+/98768415:05
opendevreviewsean mooney proposed openstack/cyborg master: Document ARQ ownership and service tokens  https://review.opendev.org/c/openstack/cyborg/+/98768515:05
opendevreviewsean mooney proposed openstack/cyborg master: Mark conductor ARQ delete methods for removal in RPC v2  https://review.opendev.org/c/openstack/cyborg/+/98768615:05
chandank`tafkamax: awesome, 15:15
tafkamaxthanks for the help and good that some bugs were found :-)15:16
chandank`here is our driver doc https://docs.openstack.org/cyborg/latest/configuration/drivers.html, Please have a look, Do share is there anyhting we can improve on doc side or any other issues you hit, feel free to open bugs so that we can address in future. :-)15:17
tafkamaxregarding the actual config it was rather intuitive. kolla-ansible did its magic and I just saw that for PGPU I needed to enable the 15:20
tafkamax[agent]15:20
tafkamaxenabled_drivers = nvidia_gpu_driver15:20
tafkamaxI didn't understand inititally all the stuff in `openstack accelerator <command>`15:20
tafkamaxI guess the understanding issue was for how to create working"profile" E.g. -> look into openstack accelerator attribute list and if its rcs use resources:<attribute>:1 and if it is trait use trait:<attribute>:required15:22
tafkamaxand a NB! that don't look at the attributes under `openstack accelerator device list` or `openstack accelerator device show <uuid>` as these are not used in profiles. Did I understand this correctly?15:24
tafkamaxAlso this page is not present in the indexmenu on the left side of screen: https://docs.openstack.org/cyborg/latest/admin/15:26
tafkamaxI just found this link via search15:26
sean-k-mooneyya so one of the thing we are missign is an end to end workflwo guide15:30
sean-k-mooneyhtat has some of the info requried15:30
sean-k-mooneyin https://docs.openstack.org/cyborg/latest/admin/#user-requests15:30
sean-k-mooneybut what i woudl liek to add going forward is a better end to end "how to i make this work" guide to help new operators properly configre it15:31
sean-k-mooneylet me explain breilfy15:34
sean-k-mooney              "trait:CUSTOM_FPGA_TRAITS":"required",15:34
sean-k-mooney              "resources:FPGA":"1",15:34
sean-k-mooneyin the device profile the resouces: part is descibing a countable resouce that will be assigned15:35
sean-k-mooneyand traits: are qulitive triats that must also be advertised on the device15:35
sean-k-mooneyfor a gpu this could be a cuda level or somethign like that15:35
sean-k-mooneya device profiel can have more then 1 device request15:36
sean-k-mooneythis is experssed in teh groups section15:36
sean-k-mooneyeach group can be allcoated form a differnt resouce provider15:36
sean-k-mooneytypiclly you will have resouce:<something>:115:37
sean-k-mooneybut if that something is divisabel say ssd stroage you coudl ask for say resouces:CUSTOM_SSD_GB:10015:38
sean-k-mooneyas an example15:38
sean-k-mooneyi added https://docs.openstack.org/cyborg/latest/configuration/drivers.html15:39
sean-k-mooneyas a stop gap to have some info on how to confirue each driver15:39
sean-k-mooneybut that only covers the config options currently15:40
tafkamaxoh okay15:40
sean-k-mooneyi woudl like to have a per driver doc going forward that provide an end to end example for each fo the driver15:40
sean-k-mooneyincludign a sampe device profile15:40
sean-k-mooneytafkamax: in teh intrim https://specs.openstack.org/openstack/cyborg-specs/specs/train/implemented/device-profiles.html15:42
sean-k-mooneyis the spec that defiend what device profiles are and how they are expected to work15:42
sean-k-mooneyand https://specs.openstack.org/openstack/cyborg-specs/specs/train/implemented/cyborg-nova-placement.html covers how this works with regard to placement15:43
sean-k-mooneytafkamax: you were askign about the enable/disabel api before15:45
sean-k-mooneyhttps://specs.openstack.org/openstack/cyborg-specs/specs/2023.2/approved/disable-enable-device.html15:45
sean-k-mooneythat was new in bobcat and not completed fully15:45
tafkamaxAha okay, that makes sense then why its like this15:57
opendevreviewMerged openstack/cyborg master: Fix rule:allow policy bypass on device/deployable/attribute APIs  https://review.opendev.org/c/openstack/cyborg/+/98768018:28
opendevreviewMerged openstack/cyborg master: Set project_id on ARQ creation and binding  https://review.opendev.org/c/openstack/cyborg/+/98768118:38
opendevreviewMerged openstack/cyborg master: Add project_id backfill for existing ARQs  https://review.opendev.org/c/openstack/cyborg/+/98768218:38
opendevreviewMerged openstack/cyborg master: Enforce project-scoped access for ARQs  https://review.opendev.org/c/openstack/cyborg/+/98768318:38
opendevreviewMerged openstack/cyborg master: Require service token for bound ARQ operations  https://review.opendev.org/c/openstack/cyborg/+/98768418:38

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!