Monday, 2024-11-11

opendevreviewSi Snow proposed openstack/nova master: Improve the use of castellan.key_manager  https://review.opendev.org/c/openstack/nova/+/93422501:30
opendevreviewSi Snow proposed openstack/nova master: Improve the use of castellan.key_manager  https://review.opendev.org/c/openstack/nova/+/93422503:24
*** __ministry is now known as Guest904904:04
opendevreviewGhanshyam proposed openstack/nova master: Remove default override for config options policy_file  https://review.opendev.org/c/openstack/nova/+/93457805:44
ratailor#openstack-nova Can I get attention on my open patches, some of those are having +1 from zuul but waiting for reviews from cores.10:05
ratailorhttps://review.opendev.org/q/owner:ratailor@redhat.com+status:open10:05
ratailorspecially https://review.opendev.org/c/openstack/nova/+/928933 where api-samples tests are failing, I tried to debug, but couldn't get what's missing. It would be great if anyone having expertise in api-samples could help there.10:06
gibiratailor: added hints12:42
ratailorgibi. Thanks!12:43
ratailorbut there is no api action yet, because nowhere finish method of instance action is called as of now.12:43
ratailorthat's why its the first patch in bp to add support for showing the same to user in show api response.12:44
gibiratailor: do you mean that the finish_time is always None after your first patch? If so then the template for the api test should expect None not an strtime12:47
gibiratailor: but also we tend to order the patches differently to have the API change as the last one in the series. 12:48
ratailorgibi, let me check if version bump resolves this.12:48
gibithe version bump only resolves some of the tests, the version api tests12:48
ratailorthis is the only patch for implementation12:48
ratailorbut we have strtime for other time related fields, like start_time, updated_at etc.12:49
ratailorgibi, also could you please review other patches, which are waiting for review since some time https://review.opendev.org/c/openstack/nova/+/873901  https://review.opendev.org/c/openstack/nova/+/911108  https://review.opendev.org/c/openstack/nova/+/90181012:52
gibiratailor: I could not commit to other reviews right now. 12:52
gibibut lets discuss the finish_time further as I'm confused12:53
ratailorgibi, whenever you have time later, but please add those to your review list. :)12:53
ratailorgibi, yea, sure. 12:53
gibi13:44 < ratailor> that's why its the first patch in bp to add support for showing the same to user in show api response.12:53
gibivs12:53
gibiratailor> this is the only patch for implementation12:53
gibiso will there be other patches in the bp?12:54
ratailorno, for nova this is only patch. and if osc and sdk related changes are required, then those also.12:55
gibior is this just adding a field to the API that is always None?12:55
ratailorso long story short, we have this lp https://bugs.launchpad.net/nova/+bug/2058928 which required to call finish method of instnace action, which we don't do currently.12:55
ratailorthere is already finish_time field in InstanceAction object, but we don't show it to user in instance action show api response.12:56
ratailorso this bp is just to add support for showing finish_time in instnace action show response.12:57
ratailorafter that there will be series of patches, to call finish method of instance action, where we start the action, but don't finish yet (which is case of every api where instance_action start mthod is called).12:58
ratailorso currently this is the case12:58
ratailorhttps://paste.opendev.org/show/bKShu6Vcc0Q9Ej4nzZgj/12:58
gibiratailor: OK. So it does mean that after the new microversion finish_time will be None for all responses unitl the further patches are added to actully call action_finish.13:00
ratailorgibi, yes.13:00
gibithen 1) you need to change the api sample template to expect None instead of strtime13:00
ratailorgibi, but do we need to change it to strtime once we start calling finish method of instnace action for any api.13:01
gibi2) at some point in the future patches that test will fail as it will start returning a time instead of None which is problematic a bit but I need to see how others feel about that13:01
gibihence my point of ordering of patches13:01
gibiI would rather expose a field in the API after that field has meaningful value13:02
ratailorbut the bp required only one patch to expose the field to user.13:02
ratailorthe field is already there, we are just missing the functionality to show it to user.13:02
gibibauzas: sean-k-mooney: dansmith: ^^ any oppinion about the ordering here? The https://review.opendev.org/c/openstack/nova/+/928933 would expose instance action finish_time in a new microversion but at that point the field would be always None and later patches will add the population of finish_time.13:03
gibiratailor: showing always None to the user is not useful in my eyes. It becomes useful as soon as it can be not None. Maybe you can add one patches that calls finish for at least one action and then test the API via that action.13:05
sean-k-mooneyi feel like there is littel point in exposing it until it works13:05
sean-k-mooneyso to me that should be the last patch13:05
ratailorgibi, yes, that's not useful.13:06
ratailorgibi, sean-k-mooney ack. I will prepare one patch for any api and see if that works. And if it requires all patches, then I will add as many as required for other apis.13:07
ratailorgibi, sean-k-mooney Thanks for your suggestions. Also please provide your reviews on other patches.13:08
gibiratailor: I think the action list tests creates a VM and stops it. If you fix those first the the api sample test will probably work.13:18
ratailorgibi, ack. sure.13:19
ratailorgibi, I will then try to add finish call to stop api first and see how it goes.13:21
ratailorgibi, Thanks for the hint. :)13:21
*** ratailor is now known as ratailor|dinner13:22
opendevreviewRajesh Tailor proposed openstack/nova master: Fix instance vm_state during shelve  https://review.opendev.org/c/openstack/nova/+/93429413:50
*** ratailor|dinner is now known as ratailor14:02
dansmithgibi: agree, no point until it's useful14:24
gibisean-k-mooney, dansmith: thanks for chiming in, that was my immediate feeling as well14:35
opendevreviewMerged openstack/nova-specs master: fix footnote refernces  https://review.opendev.org/c/openstack/nova-specs/+/93449315:06
mootHello ! I am having trouble with my GPU being detected as netdev when trying to setup vgpu managed by nova, I am on caracal.15:20
mootWhen trying to understand what could have gone wrong I stumbled across this bit of code and I am wondering if there is an indentation error : https://github.com/openstack/nova/blob/stable/2024.1/nova/virt/libvirt/host.py#L1487-L148915:20
mootIf there is not, I am really not sure about what can cause thoses errors : 15:20
moot"Could not get a PF mac for 0000:41:00.0 _get_sriov_netdev_details /var/lib/kolla/venv/lib/python3.11/site-packages/nova/virt/libvirt/host.py:1455"15:20
moot"Cannot get MAC address of the PF 0000:41:00.0. It is probably attached to a guest already _get_pf_details /var/lib/kolla/venv/lib/python3.11/site-packages/nova/virt/libvirt/host.py:1318"15:20
sean-k-mooneyso there are two factors, first in the pci dev-spec its imporant to make sure you have not set physical_netork on teh entry for the gpus15:22
sean-k-mooneysecond i belive in newer release of nova that is just a warning not an error15:23
sean-k-mooney2024.1 should not error out for example15:23
zigoMickael is already running Caracal.15:25
sean-k-mooneymoot: one other thing to note is that vGPU is not the same thing as pci passthough and today we do not supprot using the VFs form nvidia vGPUs directly15:25
sean-k-mooneyintel and amd gpus supprot sriov and that can be used witn nova generic pci passhtough supprot15:26
sean-k-mooneybut nvida's vGPU support uses mdevs today, we are lookign at addign supprot for the vfio-variant drivers in 2025.1 but that work has not reached the POC stage yet15:27
mootbut if I have :15:29
moot[devices]15:29
mootenabled_mdev_types = nvidia-74615:29
moot[mdev_nvidia-746]15:29
mootdevice_addresses = 0000:41:00.015:29
mootin nova.conf as told in https://docs.openstack.org/nova/latest/admin/virtual-gpu.html15:29
moota ressource provider should apear right ?15:29
sean-k-mooneyit should yes. are you cactully getting an errro in the logs15:30
sean-k-mooneythe code you are refernceign here https://github.com/openstack/nova/blob/stable/2024.1/nova/virt/libvirt/host.py#L1487-L1489 is not related to vgpus15:30
sean-k-mooneythe _get_sriov_netdev_details funciton has a gracefull fallback in the case that the VF is not a nic15:35
mootIt is not an error nor a warning just a DEBUG log15:35
sean-k-mooneyright this one https://github.com/openstack/nova/blob/stable/2024.1/nova/virt/libvirt/host.py#L145515:35
sean-k-mooneyyou can ignore that15:35
sean-k-mooneyits not ment of operators its ment for developer to know which code path was taken15:36
mootokay but the RP is never created, I enabled DEBUG  hoping I would find some clue15:37
sean-k-mooneyack well we can proably help you debug that but that not related to that log :) 15:37
sean-k-mooneycan you post the ouput of "ls /sys/class/mdev_bus/*/mdev_supported_types" somewhere15:38
sean-k-mooneylets see if your host is properly configure to be capable of supproting mdevs on that device and with that type15:38
mootsure !15:40
moothttps://paste.opendev.org/show/buohdB4juV5JTLRA0mna/15:40
sean-k-mooneyah i see the problem15:40
sean-k-mooneyso 0000:41:00.0 is the PF correct15:40
sean-k-mooneythe physical gpu15:40
mootyes15:41
sean-k-mooneyyou have enabled mig mode on the gpu which moves the mdevs form the PF to the VF15:41
sean-k-mooneyor VFs plural15:41
sean-k-mooneyso you need to update device_addresses with the pci adress of the vfs15:41
sean-k-mooneyspecificly you should only list a number of adresses equal to the number of nvidia-746 instance that mdev type can create15:42
sean-k-mooneyif you do not enable mig mode with "sriov-manage -e" then all the mdevs will work in time slicing mode and will be reported on the PF15:43
sean-k-mooneyin otherword wither you list the PF address of VF adddress depend on if your using timesliced vgpus or mig more vgpus15:45
mootA2 nvidia GPU should not support mig mode if I am not mistaken ?15:48
mootI tried to provide the pci addresses of the VFs to nova.conf device_address field and it worked but I wanted to let nova manage the whole PF.15:48
mootI don't think I enabled the ig mode I enable my VF using : /usr/lib/nvidia/sriov-manage -e 0000:41:00.015:48
mootI think I am already in the time slicing mode15:51
opendevreviewTakashi Kajinami proposed openstack/nova master: Skip functional tests on pre-commit config update  https://review.opendev.org/c/openstack/nova/+/93319215:54
mootI am double checking about the mig mode that you talked about to be sure15:56
mootI get :15:59
moot>> sudo nvidia-smi -mig 015:59
mootUnable to disable MIG Mode for GPU 00000000:41:00.0: Not Supported15:59
moot>> sudo nvidia-smi -mig 115:59
mootUnable to enable MIG Mode for GPU 00000000:41:00.0: Not Supported15:59
sean-k-mooneyso i think you are only ment to run "/usr/lib/nvidia/sriov-manage -e"  if you dont want timesilcing16:15
sean-k-mooneywith that said nvidia keep changing how that works16:15
sean-k-mooneyso even if that was ture once it may not be anymore16:16
mootMIG mode is not enabled on my gpu I can see that using nvidia-smi -q16:16
sean-k-mooneymoot: if you dont have any usecase for usign the mdevs on the host you can ommit the device address and instead define the maxum mdevs i think16:16
sean-k-mooneyoh thats missing form our docs entirly16:18
sean-k-mooneythat might be becasue its an option in an dynmic config section16:18
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/conf/devices.py#L108-L11516:19
sean-k-mooneythat is not invoked as part of our docs generation. not really sure how to fix that16:20
mootI will try that but it does not work in a senario with multiple PF right ?16:20
sean-k-mooneyit does provided you do not want to use diffent mdev typs per pf16:21
sean-k-mooneythe second you want to do that you need to staticly partion them16:21
mooteven if I am using time sliced gpus ?16:22
sean-k-mooneyso in the past for timesliced vgpu you did not use sriov-manage -e <addres>16:22
sean-k-mooneyso it might be worth testing without doing that and see fi the mdevs move back to the pf16:23
sean-k-mooneythat is how it used to work before nvidia relased card that supprot sriov16:23
mootyou mean testing out without enabling VF so no sriov-manage -e <addres> ?16:23
sean-k-mooneynviida have unfortually changed this behvior on the same card with diffent verions fo there driver/firmware16:23
sean-k-mooneymoot: correct16:24
sean-k-mooneyif you dont enabel sriov i would expect the mdev capablity to be adversed on the PF isntead and then in nova you could list jut the pf adress 16:24
mootI can do that, I will look into contributing to the doc if you think it needs some updates 16:24
sean-k-mooneythats how it worked on the T and older generation of cards16:25
sean-k-mooneythis is kind of why we just say "look at the vendor docs" for that part16:25
sean-k-mooneyi.e. https://docs.nvidia.com/vgpu/5.0/pdf/grid-vgpu-user-guide.pdf16:26
mootwell nvidia doc just tell to enable the VF if am not mistaken16:26
mootbtw I tried to remove the pci address from the nova conf add a ressource provider was created for each VF16:29
moot'If not set, it implies that we use the maximum allowed by the type."16:29
mootand not add *16:32
zigosean-k-mooney: Thanks for helping my (intern) colleague that was kind of lost! :)16:45
sean-k-mooneyzigo: vgpus are not a kind thing to expose interns  too :P happy to help16:46
sean-k-mooneythis is one of the case where nova is kind of helpless to document proeprly becasue it not only depends on the vendor but also the card and the driver used16:47
sean-k-mooneyand you know the phase of the moon, your choice in wiskey and  if you are feelign lucky today 16:47
zigosean-k-mooney: Yeah, but that one intern is special and knows his stuff ... :P16:47
sean-k-mooneymoot: if you do see a way to make the docs better please feel free too16:48
zigoI'll beat him until all is right! :P16:55
mootsean-k-mooney: I tried not to enable SRIOV as discussed but the PF does not seems to be mdev capable, so no RP is created.17:02
mootwhen I do not specifies either the device id nor the mdev max number nova does seem to manage well I did not test extensivly I will do that tomorrow!17:02
mootAny way thank you for your help I will try my best to find a way to add this information to the docs!17:02
opendevreviewGhanshyam proposed openstack/nova master: Remove default override for config options policy_file  https://review.opendev.org/c/openstack/nova/+/93457821:29

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!