opendevreview | Sahid Orentino Ferdjaoui proposed openstack/nova master: scheduler: AggregateMultitenancyIsolation to support unlimited tenant https://review.opendev.org/c/openstack/nova/+/896512 | 08:41 |
---|---|---|
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Add debug logging to can_fit_pagesize https://review.opendev.org/c/openstack/nova/+/899942 | 10:11 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/2023.1: Translate VF network capabilities to port binding https://review.opendev.org/c/openstack/nova/+/898945 | 10:20 |
opendevreview | sean mooney proposed openstack/nova master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/nova/+/899753 | 11:10 |
opendevreview | sean mooney proposed openstack/nova master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/nova/+/899753 | 11:14 |
opendevreview | sean mooney proposed openstack/os-vif master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/os-vif/+/899946 | 11:21 |
opendevreview | sean mooney proposed openstack/placement master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/placement/+/899947 | 11:23 |
opendevreview | sean mooney proposed openstack/os-traits master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/os-traits/+/899948 | 11:24 |
opendevreview | sean mooney proposed openstack/os-resource-classes master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/os-resource-classes/+/899949 | 11:25 |
opendevreview | sean mooney proposed openstack/python-novaclient master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/python-novaclient/+/899950 | 11:27 |
*** ozzzo1 is now known as ozzzo | 11:50 | |
*** eandersson0 is now known as eandersson | 11:50 | |
opendevreview | sean mooney proposed openstack/osc-placement master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/osc-placement/+/899955 | 11:54 |
greatgatsby | Good day. Having trouble converting a `nova boot` command to the `openstack server create` equivalent. With `nova boot` I can have a one-liner that creates and boots from a volume that also has the delete on instance termination set by using --block-device and shutdown=remove. However, with `openstack server create` the only thing I can get to work is first doing a --boot-from-volume and then using `openstack volume update | 12:40 |
greatgatsby | --delete-on-termination`. | 12:40 |
opendevreview | sean mooney proposed openstack/nova master: Use centos-9-stream for functional-py39 job https://review.opendev.org/c/openstack/nova/+/899959 | 12:44 |
greatgatsby | If I try to use a similar --block-device arg for `openstack server create` I get an error saying that one of --image --image-property --volume --snapshot is required | 12:44 |
sean-k-mooney | i are you trying to add a data volume or do boot form volume | 12:46 |
greatgatsby | sean-k-mooney: thanks for the reply. Boot from volume, where the volume is created from an image. I seem to be able to do it as a one-liner with `nova boot` but not `openstack server create` | 12:50 |
frickler | the "one of ... is required" is a bug that should be fixed in recent osc iirc | 12:50 |
sean-k-mooney | frickler: its not a bug | 12:50 |
sean-k-mooney | so if you want to do it in one line | 12:50 |
sean-k-mooney | you need to use --block-device-mapping | 12:51 |
greatgatsby | that says its deprecated in the help | 12:51 |
sean-k-mooney | well you can actully use --block-device i think as well that is the replacement for --block-device-mappings | 12:52 |
sean-k-mooney | but you have to specify more then just the delete on terminate flag | 12:52 |
greatgatsby | that's what I tried, basically the 1:1 --block-device I as using from nova boot, but then I get the error about those other flags being required | 12:52 |
sean-k-mooney | right os i dont think this was intened to be a 1:1 port | 12:53 |
greatgatsby | I'll test with a newer version of the openstack CLI. We're using explicit versions specified by our kolla (yoga) version | 12:53 |
sean-k-mooney | it may have been a osc bug | 12:53 |
greatgatsby | 1:1 in that I use the equivalent params | 12:53 |
sean-k-mooney | we would expect osc to map to the api partmerts but im not sure if novaclient was doing some magic internally | 12:54 |
sean-k-mooney | greatgatsby: i think --block-device-mapping will workaround the limiation on older osc version an hopefully new version of osc will also work | 12:55 |
greatgatsby | nice, thanks a lot for the help, I'll test both those scenarios | 12:56 |
sean-k-mooney | frickler: was this what you were thinking of? https://github.com/openstack/python-openstackclient/commit/89e5d67a16bd86af7bbe8480252686e2eef55095 | 13:01 |
sean-k-mooney | because that is detecting fi you forget to pass --image with --boot-from-volume | 13:01 |
sean-k-mooney | greatgatsby: this is where your error is comming form | 13:04 |
sean-k-mooney | https://github.com/openstack/python-openstackclient/blob/master/openstackclient/compute/v2/server.py#L1714-L1725 | 13:04 |
sean-k-mooney | at least i think so | 13:04 |
sean-k-mooney | oh no its https://github.com/openstack/python-openstackclient/blob/master/openstackclient/compute/v2/server.py#L1762-L1769 | 13:05 |
sean-k-mooney | greatgatsby: so i think what is missing is you likely dont have boot_index=0 set in the the --block-device argumetns? | 13:06 |
greatgatsby | if I do `openstack server create --block-device source_type=image,uuid=<uuid>,destination_type=volume,volume_size=10,delete_on_termination=true I still get an error about one of those other flags being required, OCS 5.8.0 | 13:06 |
greatgatsby | I tried with boot_index as well | 13:07 |
sean-k-mooney | can you past the output of the command somewhere so we can review | 13:08 |
sean-k-mooney | since that does not have boot_index that should raise but i want to see if i can find the exact error message in code | 13:09 |
greatgatsby | I'm on two different networks (openstack for work, home network for IRC) but I'll pastebin what I'm entering... couple seconds to type it out. Really appreciate the help. | 13:09 |
greatgatsby | if you just want the output, it's literally just the openstack server create help followed by: | 13:09 |
greatgatsby | openstack server create: error: one of the arguments --image --image-property --volume --snapshot is required | 13:10 |
greatgatsby | my command is: | 13:10 |
sean-k-mooney | ok let me see if i can find that as the code i linked to has a slight diffent message so its failign somewhere else | 13:10 |
greatgatsby | openstack --os-cloud foo server create --flavor medium --block-device source_type=image,uuid=<image-uuid>,destination_type=volume,volume_size=10,delete_on_termination=true,boot_index=0 test-vm | 13:11 |
sean-k-mooney | ya that looks valid to me | 13:12 |
greatgatsby | OCS 5.8.0 | 13:12 |
sean-k-mooney | https://github.com/openstack/python-openstackclient/commit/91277e7e51849d197554b633a579c92116a5afc4 | 13:14 |
sean-k-mooney | greatgatsby: so ^ is the fix frickler was refering too and its in 6.1.0+ | 13:15 |
greatgatsby | nice, thanks a lot. Hopefully we'll be upgrading off yoga soon and will also upgrade our recommended OCS version. | 13:15 |
sean-k-mooney | well in general i recommend pepoel to install osc in a python venv form pip | 13:16 |
greatgatsby | we try to go by the version matrix for all the python utils and dependencies | 13:16 |
sean-k-mooney | so that that they can use the latest indepent of the cloud | 13:16 |
greatgatsby | that's exactly what we do | 13:16 |
greatgatsby | we still use the versions according to the docs... maybe we should reconsider that? | 13:16 |
sean-k-mooney | ya so in general osc shoudl work with any release | 13:17 |
sean-k-mooney | so i generally suggest using the latest version | 13:17 |
sean-k-mooney | the cli should be stable and generally only have additive changes | 13:17 |
sean-k-mooney | but if you have a lot of internal docs using a fixed verion might make sense | 13:18 |
greatgatsby | https://releases.openstack.org/yoga/index.html#service-client-projects | 13:18 |
sean-k-mooney | that just documenting the version that was released in yoga | 13:19 |
sean-k-mooney | its not a recomendation to use that version to interact with a yoga cloud | 13:19 |
greatgatsby | oh, that's great to know, thanks, I'll pass that on! | 13:19 |
greatgatsby | again, really appreciate getting help in here, can be tough banging ones head against the desk for too long | 13:20 |
sean-k-mooney | the reason tha tis good to refence is for co installablity | 13:20 |
sean-k-mooney | if you stay to the version in that list | 13:20 |
sean-k-mooney | then we are commiting to keepign them co installable | 13:20 |
sean-k-mooney | if you deviate form that you might have compitbality issues | 13:20 |
sean-k-mooney | but i f you are using osc on your laptop in its own venv | 13:21 |
sean-k-mooney | you dont need to really consider that | 13:21 |
greatgatsby | sorry what do you mean by co installability? Just want to pass this info on and not get pushback | 13:21 |
sean-k-mooney | so we maintain that list mainly for distro packagers | 13:21 |
sean-k-mooney | the ablity to install all the release artifacts in the same python envionment | 13:21 |
greatgatsby | I see, ok, thanks again | 13:22 |
opendevreview | Merged openstack/nova master: Fix rebuild compute RPC API exception for rolling-upgrades https://review.opendev.org/c/openstack/nova/+/899235 | 13:22 |
sean-k-mooney | we test for co-installablity in our ci system and partly enforce that with upper-constraints.txt in the requirements repo for external deps | 13:23 |
opendevreview | sean mooney proposed openstack/nova master: Use centos-9-stream for functional-py39 job https://review.opendev.org/c/openstack/nova/+/899959 | 13:32 |
opendevreview | sean mooney proposed openstack/nova master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/nova/+/899753 | 13:32 |
opendevreview | sean mooney proposed openstack/nova master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/nova/+/899753 | 13:34 |
dansmith | bauzas: are you around to explain something to me about the RP patch | 13:36 |
dansmith | ? | 13:36 |
bauzas | dansmith: yup, I'm here now | 13:36 |
bauzas | thanks for having started to look at it, btw. | 13:36 |
bauzas | (me also needs to look at your spec) | 13:37 |
dansmith | yeah sorry I forgot you were off yesterday until after I pinged | 13:37 |
bauzas | so, shoot your questions | 13:37 |
bauzas | have you seen the bug report btw.? | 13:37 |
bauzas | I tried to summarize the problem in there | 13:37 |
dansmith | my questions are actually *about* the stuff in the bug :) | 13:37 |
bauzas | https://bugs.launchpad.net/nova/+bug/2041519 | 13:37 |
bauzas | okay | 13:37 |
bauzas | and why I did that, I guess ? | 13:38 |
dansmith | so I get the inventory before, then after one allocation, one RP goes to zero | 13:38 |
bauzas | more than one RP but yeah | 13:38 |
dansmith | no, only one according to your report | 13:38 |
bauzas | and this is not exacly *always* after | 13:38 |
bauzas | the fact is that we automatically create a mdev if none of them are not used yet | 13:39 |
dansmith | but what i don't get is why after the second allocation they *all* go to zero.. that would imply we're using inventory wrong | 13:39 |
bauzas | okay, gonna clarify | 13:39 |
bauzas | we are not counting mdevs for inventories | 13:39 |
bauzas | what we count as VGPU resources are *either unused mdevs *or* available_instances* | 13:40 |
dansmith | oh, hang on | 13:40 |
bauzas | which means something you can allocate to a guest | 13:40 |
dansmith | re-reading this with fresh eyes, I think I was confusing the very similar sysfs and placement dumps | 13:41 |
dansmith | so it's the mdev availability that goes off a cliff | 13:41 |
bauzas | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L8090-L8133 | 13:42 |
bauzas | that's where we calculate inventories | 13:42 |
dansmith | yep | 13:42 |
bauzas | available_instances is a value that the kernel driver gives | 13:43 |
bauzas | but, | 13:43 |
dansmith | right | 13:43 |
bauzas | once you create a mdev, the value decreases by 1 | 13:43 |
bauzas | that doesn't mean that the mdev will be used by a guest | 13:43 |
sean-k-mooney | bauzas: yep shich is why you are ment to be internally tracking which devices the mdev are created form | 13:43 |
bauzas | if you basically precreate all mdev resources, then available_instances is 0 for all the gpus | 13:43 |
sean-k-mooney | and using that to compute the inventoy total as (aviablie_instnace + mdevs form that device) | 13:44 |
dansmith | does nvidia-472 imply that more than 1/total share of the card is used? | 13:44 |
bauzas | but the inventory still tracks that you can bind N vGPUs | 13:44 |
bauzas | dansmith: that's the crux of the problem | 13:44 |
bauzas | the number of vGPUs can differ per type | 13:45 |
sean-k-mooney | bauzas: that problem is sovled by enforcing that the device adress | 13:45 |
dansmith | that feels like we're using placement incorrectly to have inventory of unequal resources | 13:45 |
bauzas | nvidia-472 can create up to two vGPUs | 13:45 |
bauzas | so there are three possibilities | 13:45 |
bauzas | either available_instances = 2 | 13:45 |
bauzas | or, available_instances = 1 and one mdev exists that'not used by a guest | 13:45 |
bauzas | or, there are two mdevs available, which are both unused | 13:46 |
bauzas | with SR-IOV, things change | 13:46 |
sean-k-mooney | bauzas: the decreasing inventory behavoir is a nvidia specirci issue with vfs form teh same card | 13:46 |
bauzas | now, with SR-IOV, instead of having one single pci device that have available_instances = 2 (and then only one RP), | 13:46 |
bauzas | then we have *two* VFs, each of them having available_instances = 1 | 13:47 |
bauzas | but,n | 13:47 |
bauzas | since this is type-dependent, nvidia doesn't create only 2 VFs but 16 | 13:47 |
bauzas | (for that single GPU) | 13:47 |
sean-k-mooney | yes | 13:47 |
bauzas | (if you have A2, IIRC this will create 32 virtual functions) | 13:48 |
dansmith | can someone answer the question: that's a function of nvidia-472 right? | 13:48 |
bauzas | so, say you define nvidia-472, this means that only 2 virtual functions have to be used | 13:48 |
sean-k-mooney | dansmith: the VF can support many mdev types | 13:48 |
dansmith | and if they are slicing differently with (nvidia-xxx right)? then the math would be different/ | 13:48 |
bauzas | if you use other type (say nvidia-473), then you could use *four* virtual functions | 13:48 |
bauzas | dansmith: correct | 13:49 |
dansmith | yeah | 13:49 |
sean-k-mooney | dansmith: yep the math change depenign on which mdev type you create | 13:49 |
sean-k-mooney | but the number of VF is fixed | 13:49 |
bauzas | the fact is | 13:49 |
sean-k-mooney | so not all of them can actully be allcoated | 13:49 |
bauzas | given the type, | 13:49 |
sean-k-mooney | for all types | 13:49 |
bauzas | once you create mdevs for all the VFs corresponding to the type, | 13:49 |
dansmith | right, and could it be some combination of two types? or only one type of slicing per card? | 13:49 |
bauzas | then other VFs start magically having available_instances=0 | 13:49 |
sean-k-mooney | dansmith: that depend on the card | 13:50 |
bauzas | you can't mix types per GPU, nvidia doesn't support it | 13:50 |
bauzas | well, | 13:50 |
sean-k-mooney | actully it does | 13:50 |
bauzas | for time-sliced GPUs | 13:50 |
bauzas | for MIG GPUs, you can, but that's a waaaaay different matter | 13:50 |
sean-k-mooney | but only in s specific mode for sepcific type on specific cards | 13:50 |
sean-k-mooney | yep | 13:50 |
bauzas | let's not overcomplicate this | 13:50 |
sean-k-mooney | lol | 13:50 |
sean-k-mooney | do you have a time machine | 13:50 |
dansmith | well, my point is, | 13:50 |
dansmith | it seems really uncool to expose all these single-inventory RPs in general | 13:51 |
dansmith | like it feels as though we modeled it wrong | 13:51 |
sean-k-mooney | ya so we shoudl be grouping per PF | 13:51 |
sean-k-mooney | as each phsical devce shoudl be its own RP | 13:52 |
sean-k-mooney | but since we never actully added offical supprot for sriov mode | 13:52 |
sean-k-mooney | what we did instead was tell people | 13:52 |
bauzas | dansmith: the problem is that we don't know preemptively which type consumes how many VFs | 13:52 |
sean-k-mooney | jsut use the pci adress of the vf instead of hte pf | 13:52 |
bauzas | so the simpliest solution is to tell people to statically allocate mdevs | 13:52 |
sean-k-mooney | bauzas: thats not really the probelm but it is one of the issues | 13:53 |
bauzas | sean-k-mooney: that *IS* the problem in our case | 13:53 |
dansmith | bauzas: yeah, I think there are ways we could handle that, but yeah it's not ideal for sure | 13:53 |
dansmith | unless we make people annotate the config in that way | 13:53 |
sean-k-mooney | well we agreed in the ptg to make them do that ^ | 13:53 |
bauzas | if I was able to get a clue of how many VFs are available for a type, I could prevent nova to create RPs for the 16 of them | 13:53 |
sean-k-mooney | we agreed to make device_adress required | 13:54 |
bauzas | sean-k-mooney: no we said at the PTG that allocating a mdev is cheap | 13:54 |
bauzas | so we could continue to do this | 13:54 |
sean-k-mooney | bauzas: yes we can that not what i said above | 13:54 |
sean-k-mooney | we agreed that going forward then need to specificy which devices we create inventoies form | 13:54 |
sean-k-mooney | by requiring the device_adress | 13:54 |
bauzas | sean-k-mooney: sure | 13:54 |
sean-k-mooney | nova can continue to od the mdev creation | 13:54 |
bauzas | but this doesn't prevent operator fat fingers, right? | 13:54 |
dansmith | sean-k-mooney: agreed to make who do what? | 13:55 |
sean-k-mooney | the key is just not list more vf then can be created | 13:55 |
sean-k-mooney | dansmith: https://docs.openstack.org/nova/latest/configuration/config.html#devices | 13:55 |
bauzas | sean-k-mooney: also, with SR-IOV, this will be hard for operators to know which VFs to allocate btw. | 13:55 |
bauzas | the fucking nvidia docs never tell about nvidia-XXX | 13:56 |
sean-k-mooney | for legacy reasons (only for the first mdev type) specifiv device_address is optional | 13:56 |
dansmith | sean-k-mooney: meaning in [mdev_nvidia-35]/this_takes_n_vfs=10 right? | 13:56 |
dansmith | that sounds good to me | 13:56 |
bauzas | the only way for an operator to know what a nvidia type is doing is by getting the /name value and match it with docs | 13:56 |
dansmith | default to 1 and if they don't fill it in, then the math will be wrong | 13:56 |
sean-k-mooney | dansmith: not quite | 13:56 |
bauzas | dansmith: no, sean-k-mooney is mentioning to explicitely define in nova.conf all the pci addresses of the VFs | 13:57 |
sean-k-mooney | dansmith: you would look at the mdev type and list n vfs | 13:57 |
bauzas | no new option | 13:57 |
dansmith | sean-k-mooney: isn't that what I just said? | 13:57 |
sean-k-mooney | we could do it diffently however | 13:57 |
bauzas | [devices] enabled_mdev_types = nvidia-35, nvidia-36 [mdev_nvidia-35] device_addresses = 0000:84:00.0,0000:85:00.0 [vgpu_nvidia-36] device_addresses = 0000:86:00.0 | 13:57 |
bauzas | you would specify then on nova.conf all the VFs | 13:57 |
bauzas | at least the ones you gonna allocate | 13:58 |
sean-k-mooney | dansmith: so we could do something better | 13:58 |
bauzas | sean-k-mooney: so sean, you gonna prefer to -1 my fix and say "please rather statically allocate all VFs" ? | 13:58 |
bauzas | https://review.opendev.org/c/openstack/nova/+/899625 | 13:59 |
dansmith | -1 to static, I can't imagine that's really going to make people happy | 13:59 |
bauzas | not sure our operators will appreciate | 13:59 |
bauzas | yeah | 13:59 |
sean-k-mooney | dansmith: what we could do is per mdev type declare how many that can have | 13:59 |
sean-k-mooney | and then specify that device_address shoudl be the PF | 13:59 |
bauzas | that's a risky math | 14:00 |
sean-k-mooney | so then we would have one device address (and placment inventoy) per phsyical gpu | 14:00 |
dansmith | I still don't understand how that's not what I said above and why that's not a good solution | 14:00 |
bauzas | because nothing in the sysfs structure is telling us which parent PF is the VF | 14:00 |
sean-k-mooney | and we would just not use aviable_isntance at all for the placment total | 14:00 |
sean-k-mooney | bauzas: it really should | 14:00 |
dansmith | suuuurely it does | 14:00 |
bauzas | sean-k-mooney: if you don't use available_instances, then you're asking people to precreate the mdevs | 14:00 |
sean-k-mooney | that might be missing in libvirt but it will be in sysfs | 14:00 |
sean-k-mooney | bauzas: no im not | 14:01 |
sean-k-mooney | im asking them to read the nvidia docs | 14:01 |
dansmith | I have to do another meeting for a bit | 14:01 |
sean-k-mooney | and tell us how many it can allcoate | 14:01 |
dansmith | surely we could also have a generic listing for the common types | 14:01 |
bauzas | sean-k-mooney: I don't get then what you said | 14:01 |
bauzas | (15:00:31) sean-k-mooney: and we would just not use aviable_isntance at all for the placment total | 14:01 |
bauzas | dansmith: that's like boiling the ocean | 14:02 |
bauzas | types are hardware-dependent | 14:02 |
sean-k-mooney | bauzas: im saying for the placment inventory we read total form the config option not form aviable_instnace | 14:02 |
dansmith | nvidia-472 differs in size on different cards? | 14:02 |
bauzas | https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#supported-gpus-grid-vgpu | 14:02 |
sean-k-mooney | dansmith: nvidia are creating a mdev type per card | 14:02 |
bauzas | dansmith: count every line in each tab, you'll get the number of mdev types | 14:02 |
dansmith | ack I understand | 14:03 |
bauzas | dansmith: no, the other way, nvidia-472 only stands for A100-20C which is only a type supported by A100 cards | 14:03 |
dansmith | yup, I got it | 14:03 |
sean-k-mooney | bauzas: lets park this for a bit and create an etherpad or something to collaberate on | 14:04 |
bauzas | formerly, before the magic world of SRIOV GPUs, available_instances was giving us the number of VGPUs a PCI device was able to create | 14:04 |
sean-k-mooney | its simpler to discuss when we have some of the constriat written down liek a spec | 14:04 |
bauzas | sean-k-mooney: there is already the bug report | 14:05 |
sean-k-mooney | i would like somethign we can edit | 14:05 |
bauzas | I'm not a big fan of writing a spec for fixing this | 14:05 |
sean-k-mooney | bauzas: i think you have too | 14:05 |
sean-k-mooney | depending on the fix | 14:05 |
sean-k-mooney | there are ways to do the minimal fix as a bug | 14:06 |
dansmith | I think it's clear we need to document this more than just some sysfs dumps in a bug :) | 14:06 |
bauzas | sean-k-mooney: have you seen my patch already ? | 14:06 |
sean-k-mooney | but if we want to fix this properly we need to change how we are doing mdev managment | 14:06 |
bauzas | as a reminder, mdev inventory isn't a problem with non-SRIOV GPUs | 14:06 |
bauzas | so whatever we decide will necessarly be opt-in | 14:06 |
sean-k-mooney | its not a problem for all sriov devices | 14:06 |
bauzas | and keeping the legacy behaviour too | 14:07 |
sean-k-mooney | its litrally aproblem just for how nvidia did it | 14:07 |
bauzas | "its litrally aproblem just for how nvidia did it" I should quote it, print it and put it on my front wall | 14:07 |
bauzas | the way that available_instances isn't a static number is also a terrible driver implementation, if you want MHO | 14:08 |
sean-k-mooney | before i left intel, i used a protoytp intel QAT impelemtnation wtch had sriov VF for each enging type and they supprot mdevs per VF with indepenent aviableity | 14:08 |
dansmith | is it really a problem though? I mean it kinda seems like this is the point of mdevs, no? to be allocatable in more dynamic ways than just static slices of a card? | 14:08 |
bauzas | true | 14:09 |
sean-k-mooney | dansmith: mevs from diffent pci endpoitn are intented to not affect each other | 14:09 |
sean-k-mooney | that is what nvida started breaking | 14:09 |
sean-k-mooney | the framework allows them to be dynmaic | 14:09 |
dansmith | there are multiple pci devices per card? or you mean the vfs? | 14:09 |
bauzas | oh wait | 14:10 |
bauzas | [stack@lenovo-sr655-01 ~]$ cat /sys/class/mdev_bus/0000\:41\:01.0/mdev_supported_types/nvidia-472/description | 14:10 |
bauzas | num_heads=1, frl_config=60, framebuffer=20480M, max_resolution=4096x2400, max_instance=2 | 14:10 |
bauzas | we can get the number of VGPUs in the description | 14:10 |
sean-k-mooney | bauzas: the desciptoin is not a standard format | 14:10 |
bauzas | this is horribly driver-specific | 14:10 |
bauzas | yeah | 14:11 |
sean-k-mooney | yes it is its litrally just a text filed | 14:11 |
dansmith | but it's something | 14:11 |
sean-k-mooney | its not somehting nova should read | 14:11 |
dansmith | it's something an operator could read in order to put the math in the config :) | 14:12 |
sean-k-mooney | grep . /sys/class/mdev_bus/0000\:41\:01.0/mdev_supported_types/nvidia-472/* | 14:12 |
bauzas | operators aldray read : | 14:12 |
bauzas | [stack@lenovo-sr655-01 ~]$ cat /sys/class/mdev_bus/0000\:41\:01.0/mdev_supported_types/nvidia-472/name | 14:12 |
bauzas | GRID A100-20C | 14:12 |
sean-k-mooney | bauzas: can you run ^ | 14:12 |
dansmith | I also wouldn't say nova *shouldn't* read it.. I think we could read and parse it as long as we don't fail if it has changed | 14:12 |
bauzas | and then Ctrl-F that string into https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#supported-gpus-grid-vgpu | 14:12 |
sean-k-mooney | dansmith: yes the operator can read it | 14:13 |
sean-k-mooney | dansmith: so would you be ok with adding max_instances to the mdev_type config section | 14:13 |
sean-k-mooney | or is that still too static | 14:14 |
bauzas | sean-k-mooney: if we go this way, this wouldn't be backportable, right? | 14:14 |
dansmith | well, I'm not sure that's a solution entirely, but I think making the operator tell us some of the math so we can do it better on the backend makes sense | 14:14 |
bauzas | I really tried to propose https://review.opendev.org/c/openstack/nova/+/899625 as a backportable mechanism | 14:14 |
dansmith | I'd need to hear the whole story before "being okay" with it, but that seems like a good direction to me | 14:14 |
bauzas | dansmith: finish your call and then we could do a gmeet | 14:15 |
dansmith | sure | 14:15 |
sean-k-mooney | bauzas: i dont really like that because it implies your still doing the math incorrect | 14:15 |
bauzas | I have the PTG summary to write | 14:15 |
dansmith | I don't love the "quick, delete the provider" approach either, FWIW | 14:15 |
bauzas | sean-k-mooney: it implies people were lazy enough to not explicitely tell which PCI devices to allocate | 14:15 |
bauzas | dansmith: that's understandable, but then you're pro-statically defining a list of VFs or an integer counting allocatable VFs | 14:16 |
sean-k-mooney | bauzas: and we are not internally trackign which device the mdev is allcoated form meaning we are not mapping the allcoations correctly in placment | 14:16 |
bauzas | (just clarifying the options) | 14:16 |
sean-k-mooney | removing the rp works if and only if we gurentte that the mdevs are created by packing the RPs | 14:17 |
bauzas | sean-k-mooney: if you want to have my frank opinion, I'm not a big fan of changing all the allocations we make | 14:17 |
dansmith | bauzas: I'm for doing better math, and it sounds like some hints for that math (in config) are necessary | 14:17 |
bauzas | sean-k-mooney: I very humbly think you're wrong on the fact we miscount | 14:17 |
bauzas | we very correcly count which mdevs against RPs | 14:18 |
dansmith | still would need to understand the whole picture *and* how we migrate to it, impact to existing people, and whether or not it's worth it if mdevs are going away | 14:18 |
bauzas | the latter conditional is hard to guess | 14:18 |
bauzas | but that's fair assumptions | 14:19 |
bauzas | and I'm not particularly attached to the bugfix itself | 14:19 |
bauzas | but again, I have my wills : a simple approach not redesigning our placement tracking which would be backportable | 14:20 |
sean-k-mooney | bauzas: in sriov mode is aviable_instances alway 1 | 14:20 |
sean-k-mooney | becasue for intels implemation (which they never released) it was not | 14:20 |
bauzas | see the bug report, yah until all mdevs are created | 14:20 |
bauzas | I literally wrote that in the bug description | 14:20 |
sean-k-mooney | what i dont like about that is that is a vendor specific detail | 14:21 |
bauzas | hell yeah, hence why I wanted to make the fix agnostic | 14:21 |
bauzas | we're just doing some math in some method that say 'oh look, that inventory is now 0, but meh, I don't care updating it' | 14:22 |
sean-k-mooney | well your solution does not work if the VF have aviable_instance>1 initally | 14:22 |
bauzas | this is honestly to me a wrong approach, and at least a LOG warning should tell the operator something weird happened | 14:22 |
bauzas | beg your pardon ? | 14:22 |
sean-k-mooney | simple case | 14:23 |
sean-k-mooney | we have one mdev capable device | 14:23 |
sean-k-mooney | it has an mdev type that allow 2 allcations total | 14:23 |
sean-k-mooney | it creats n VF | 14:23 |
sean-k-mooney | each vf reporst aviabel_invetnory=2 | 14:23 |
bauzas | (assuming then you're not talking of the nvidia driver then) | 14:24 |
sean-k-mooney | i allcoate one mdev from two different vfs | 14:24 |
sean-k-mooney | you can delete the other vf rps | 14:24 |
sean-k-mooney | but we will would be miss reporting the 2 remain rps | 14:24 |
bauzas | are you talking of something inexistent now but as a future possibility ? | 14:25 |
sean-k-mooney | im talkign generically | 14:25 |
bauzas | again, I don't understand your logic, but let's imagine your scenario for a sec | 14:26 |
bauzas | say I have 16 VFs, each of them with available_instances=2 | 14:26 |
bauzas | say now I create one mdev twice, each on a distinct VF | 14:26 |
bauzas | for 2 VFs, available_instances would be equal to 1 | 14:27 |
bauzas | two mdevs would exist | 14:27 |
bauzas | so the inventory wouldn't change for those | 14:27 |
sean-k-mooney | right but only 2 can be created | 14:27 |
bauzas | we would still count : total=2 | 14:27 |
sean-k-mooney | i gues what would happen is that when i allcoate the first mdev | 14:27 |
sean-k-mooney | the aviable_insstance on all vfs would go to 1 | 14:27 |
sean-k-mooney | and the second one it would go to 0 | 14:27 |
bauzas | okay, then it's the driver's responsibility to change available_instances to 0 for all the VFs | 14:28 |
bauzas | including the two ones we use | 14:28 |
sean-k-mooney | ya | 14:28 |
bauzas | total would be one for 2 RP inventories then, right? | 14:28 |
bauzas | total=1 I mean | 14:28 |
sean-k-mooney | so the only mdev hardware i have used that supprotes sriov vf did not have that probelm | 14:28 |
bauzas | anyway, this isn't possible with nvidia SR-IOV, all VFs *have* available_instances=1 | 14:29 |
bauzas | you lost me, honestly | 14:29 |
bauzas | if you're talking of T4s | 14:29 |
bauzas | what we had was a basically non-SRIOV accountancy | 14:30 |
sean-k-mooney | no im talking about hardware i used at intel and trying to think if it would break | 14:30 |
sean-k-mooney | i think the delete your are proposing would work | 14:30 |
bauzas | tbc, I've experienced a race condition | 14:30 |
sean-k-mooney | but i dont like the idea of us creating and deleting rps in genreal based on workload usage | 14:30 |
bauzas | since we update resource providers on a periodic | 14:31 |
bauzas | I was able to create 3 VMs (and allocate three VGPU allocations) | 14:31 |
bauzas | that's why I added the claim stuff | 14:31 |
bauzas | so, see, again, I'm not particularly attached to do this | 14:32 |
bauzas | if we go with statically defining which VFs are going to be used, fair | 14:32 |
sean-k-mooney | bauzas: we normally run the update avaiabel reosuces call elsewhere | 14:32 |
bauzas | I have checked | 14:32 |
sean-k-mooney | bauzas: we expcitly do it form the comptue manager in move claims | 14:32 |
bauzas | we do run the update *before* the spawn | 14:32 |
bauzas | not *after* :) | 14:32 |
dansmith | bauzas: agreed, deleting RPs as a result of allocation feels very wrong | 14:33 |
sean-k-mooney | dansmith: the only mitigation is we dont delete the mdev after the instance is moved/deleted | 14:33 |
sean-k-mooney | so provide the compute node is not restated | 14:33 |
sean-k-mooney | we will do the delete once | 14:33 |
dansmith | could we set reserved=total instead of deleting maybe? | 14:33 |
sean-k-mooney | but the extra RPs will get recated after a host reboot | 14:33 |
bauzas | dansmith: we would need to implement the reverse logic then | 14:34 |
sean-k-mooney | dansmith: reserved=total is what i was orgianlly thinking but the issue is total is going to 0 | 14:34 |
sean-k-mooney | and total cant be 0 | 14:34 |
sean-k-mooney | so we would need to do resrved=total=1 | 14:34 |
bauzas | and when to unreserve ? | 14:34 |
bauzas | say I delete my mdev ? | 14:34 |
bauzas | I'm a stubborn operator that deletes the mdevs, y'know | 14:35 |
sean-k-mooney | or you do a host reboot | 14:35 |
bauzas | honestly, I like the config approach | 14:35 |
sean-k-mooney | so ya the reserved=total approch is more complext then delete | 14:35 |
dansmith | sean-k-mooney: right I meant reserved=old_total | 14:35 |
bauzas | if you specific the max VGPUs per type, then we could do htis | 14:36 |
bauzas | I feel a disturbance in the force when I tell about deleting RPs | 14:36 |
sean-k-mooney | bauzas: its slightly more complex then that | 14:36 |
bauzas | I used the term "strawman", I think it was the right wording | 14:37 |
sean-k-mooney | max mdevs need nova to understand which phsyical deviec the mdevs are form | 14:37 |
bauzas | that is correct | 14:37 |
bauzas | we know the parent device | 14:37 |
bauzas | for a mdev | 14:37 |
bauzas | we already use this information to track | 14:38 |
bauzas | in nova today | 14:38 |
sean-k-mooney | well in the nvidia case the max mdev per type applies to the grandparent device | 14:38 |
bauzas | what we don't track *now* is to know the parent of a mdev capable device | 14:38 |
bauzas | and we can't rely on libvirt, it gives you the PCI-E root PCI address (I tested :) ) | 14:38 |
bauzas | sean-k-mooney: that's what libvirt is giving us https://paste.opendev.org/show/b3MHPT8VoIOBWw1tim2o/ | 14:40 |
bauzas | we know the parent addr tho | 14:40 |
bauzas | <capability type='phys_function'> <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/> | 14:40 |
sean-k-mooney | it tell you the part device | 14:41 |
bauzas | so you were right, we could use this | 14:41 |
sean-k-mooney | <parent>pci_0000_40_01_1</parent> | 14:41 |
bauzas | sean-k-mooney: no, you're wrong, that's not the PF address :) | 14:41 |
bauzas | [stack@lenovo-sr655-01 ~]$ lspci | grep 40:01.1 | 14:42 |
bauzas | 40:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge | 14:42 |
sean-k-mooney | ok can you give me access to that system so i can take a look | 14:42 |
bauzas | that's the PCI bridge address :) | 14:42 |
bauzas | but I told you, we have the parent PF address in the device XML | 14:42 |
sean-k-mooney | i orgianlly said we could get the info from sysfs because i knwo how to look it up there | 14:42 |
sean-k-mooney | we can get it form libvirt but i dont rememebr the element fo the top of my head | 14:43 |
sean-k-mooney | if we have it great | 14:43 |
sean-k-mooney | its <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/> | 14:43 |
sean-k-mooney | form the capability type='phys_function' YES | 14:44 |
bauzas | yeah, and thru sysfs, we have access to the physical function in /sys/class/mdev_bus/0000\:41\:00.4/physfn/ | 14:46 |
dansmith | okay done with my other meeting | 14:46 |
* dansmith reads back | 14:46 | |
bauzas | but that's not a trivial change | 14:46 |
bauzas | we need to implement some libvirt config change | 14:46 |
sean-k-mooney | we can and should use the info form libvirt since its there | 14:46 |
sean-k-mooney | bauzas: we already have parsing logic for this in the pci tracker code in the libvirt driver | 14:46 |
bauzas | we already get information of the mdev capable devices thru libvirt | 14:46 |
bauzas | sean-k-mooney: oh you're right | 14:47 |
bauzas | dansmith: take time to digest | 14:47 |
bauzas | sounds less bugfix-y, more feature-y than my patch but I can do it | 14:48 |
bauzas | sean-k-mooney: fwiw, this is where we get information for a mdev cap device https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L8284 | 14:49 |
sean-k-mooney | bauzas: we should seperate the back protable fix form the final fix | 14:49 |
sean-k-mooney | bauzas: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L1321 is the functilalyt that extracts the pci info form te node dev | 14:49 |
bauzas | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L8263 is where we get the capacity | 14:50 |
bauzas | so sounds easier than originally thought | 14:50 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L1351-L1363 | 14:50 |
sean-k-mooney | that actully where we are parsing it | 14:50 |
dansmith | so max_mdevs_per_device works only because once you allocate one mdev type from a card, that card is locked into that type for the rest of the allocations right? | 14:50 |
bauzas | dansmith: you're correct, there is no heterogenous support | 14:50 |
artom | sean-k-mooney, we never updated the Neutron docs for the socket PCI NUMA affinity policy: https://docs.openstack.org/api-ref/network/v2/index.html#numa-affinity-policy | 14:50 |
sean-k-mooney | dansmith: in the simple case yes. but we also only support one mdev_type per card in nova | 14:50 |
bauzas | yup, there is no behavioural change in support | 14:51 |
sean-k-mooney | artom: we never updated the neutron extention for it | 14:51 |
sean-k-mooney | artom: its not a docs issue i belive we need an api change | 14:51 |
dansmith | sean-k-mooney: wait, is that the case even before we allocate? | 14:51 |
artom | sean-k-mooney, so we can't actually set it on the port at all? | 14:51 |
sean-k-mooney | dansmith: yes which means we break if tye allocate the wrong type manually | 14:52 |
bauzas | are we still talking of GPUs or PCI NUMA affinity now ? | 14:52 |
sean-k-mooney | dansmith: we have had customer do that | 14:52 |
artom | Anyways, I interrupted a convo with my dumb questions, carry on, I'll go look at the code | 14:52 |
sean-k-mooney | bauzas: im talkign to artom about something else | 14:52 |
artom | bauzas, ignore me and PCI policies | 14:52 |
bauzas | artom: no worries, we're reaching to some agreement soon I guess | 14:52 |
bauzas | dansmith: assigning a mdev type to a GPU is statically defined in nova | 14:53 |
sean-k-mooney | artom: https://github.com/openstack/neutron-lib/blob/master/neutron_lib/constants.py#L676-L681 | 14:53 |
dansmith | okay I'm confused | 14:53 |
dansmith | I thought ya'll said the opposite earlier | 14:53 |
bauzas | changing the value once you create mdevs is a breaking change that we don't support | 14:53 |
sean-k-mooney | artom: so the api will not accpet it and we are missing the osc support too | 14:53 |
dansmith | right I understand that part | 14:54 |
dansmith | are we going to gmeet? | 14:54 |
bauzas | sure I can fire one | 14:54 |
dansmith | just saying, since artom showed up we | 14:54 |
dansmith | are all muddy here :) | 14:54 |
artom | Yep, I barged in like a rhino on a sugar rush, apologies | 14:55 |
* dansmith blinks | 14:55 | |
bauzas | https://meet.google.com/hqy-bgjb-muk for people fancy discussing nvidia nightmare | 14:55 |
sean-k-mooney | dansmith: for what its worth the patch bauzas proposed should work given the hardware that is aviabel to buy today | 14:55 |
sean-k-mooney | its jsut depening on some asusmintions we have in nova and in how the current hardware works | 14:55 |
opendevreview | Will Szumski proposed openstack/nova stable/yoga: Ironic nodes with instance reserved in placement https://review.opendev.org/c/openstack/nova/+/867912 | 16:49 |
jovial | Is the nova-emulation job busted on stable/yoga? Was interested in progressing this patch: https://review.opendev.org/c/openstack/nova/+/898554 | 17:41 |
bauzas | sean-k-mooney: dansmith: so, if you don't mind, I'm gonna work on the VF filtering case with max_instances config option | 17:42 |
bauzas | and then we could nitpick on what we want | 17:42 |
dansmith | yep | 17:44 |
bauzas | dansmith: fwiw, there is also https://review.opendev.org/c/openstack/nova/+/899406 | 17:44 |
bauzas | with that patch, pas-ha[m] is forcing operators to explicitely define all the device_addresses | 17:45 |
bauzas | but I personnally think that adding a max_instances config option is painless for ops | 17:45 |
bauzas | so we could mix both | 17:45 |
bauzas | meaning that we would force ops to either explicitelyt specify the device addresses *or* give the number of VFs that nova needs to create | 17:46 |
bauzas | this sounds very consistent to me | 17:46 |
dansmith | max_instances seems much more useful to me so people don't have to encode pci device addresses in their config *just* to avoid the broken behavior | 17:50 |
bauzas | yeah so we will keep the existing behaviour but add another option for automatic discovery | 17:51 |
bauzas | dansmith: I definitely appreciate your help on that one | 17:51 |
bauzas | and I really want to give you review credits back on your own series | 17:52 |
sean-k-mooney | dansmith: i have some concerns with allow config with out pci adresses | 17:57 |
sean-k-mooney | i.e. if you allow some of your gpus to be used for pci passthogu and other for vgpu | 17:58 |
dansmith | right, that's a good reason *to* do it, but not a good reason to *require* it, IMHO | 17:58 |
sean-k-mooney | if you dont use device_address in that case we coudl passthough a device (that was allowed in the pci devspec) but not disalloed in the mdev config section | 17:58 |
sean-k-mooney | dansmith: i dont know i always felt that its a psudo security isseu to not explcitly list devices we can expose to a guest | 17:59 |
sean-k-mooney | i get that it sucks to need to be explict | 17:59 |
sean-k-mooney | but that the trade off | 17:59 |
dansmith | but in the mdev case you're not just blindly exposing them, | 18:00 |
sean-k-mooney | to be clear we have this tradeoff in the pci tracker too | 18:00 |
dansmith | you've already opted into "all devices with this mdev type" right? | 18:00 |
sean-k-mooney | yep so its the same as just putting vendor_id and product_id in the pci dev_spec | 18:01 |
sean-k-mooney | aka the pci whitelist | 18:01 |
dansmith | I don't think that's the same really | 18:01 |
opendevreview | Stephen Finucane proposed openstack/nova master: docs: Add documentation on server groups https://review.opendev.org/c/openstack/nova/+/899979 | 18:01 |
dansmith | well, if you do both vendor *and* product I guess it is | 18:01 |
sean-k-mooney | to me there is no diffence betwen mdev type and *vendor_id and product_id) | 18:01 |
dansmith | but not like worrying that 8086 is going to allow your root bridge to go through or something | 18:01 |
dansmith | but I really don't see the problem with that level of specificity | 18:02 |
stephenfin | bauzas: melwitt: https://review.opendev.org/c/openstack/nova/+/899979 came up as a result of an OpenShift question. I figure you'd both be interested in that addition to our docs | 18:02 |
sean-k-mooney | dansmith: i think its more an issue with how the config options are arranged more then anything esle | 18:03 |
stephenfin | I've also proposed helper alias for OSC to make requesting server groups slightly nicer https://review.opendev.org/c/openstack/python-openstackclient/+/899975 | 18:03 |
stephenfin | *a helper | 18:03 |
sean-k-mooney | dansmith: we split the mdevs into multipel dynmaic config sections to influcne this | 18:03 |
bauzas | sean-k-mooney: if we can ease your pain, we can document the fact that max_instances automatically passes *all* GPUs to nova | 18:03 |
sean-k-mooney | which to me make it harder to know if a device will be allowed or not | 18:03 |
bauzas | sean-k-mooney: and we could mention to use device_addresses as an alternative for people afraid of passing the whole GPUs | 18:04 |
sean-k-mooney | bauzas: just make -1 or None the default and use it as a sentanical to mean all. | 18:04 |
bauzas | lemme provide the patch first | 18:04 |
sean-k-mooney | bauzas: well i thing we shoudl be recommending device_address as the way peopel shoudl do it in general | 18:05 |
dansmith | sean-k-mooney: that doesn't work right? that gives us the wrong behavior we have now | 18:05 |
sean-k-mooney | dansmith: it doesn but i think bauzas was going to have it be wrong out of the box too | 18:05 |
sean-k-mooney | unless i missunderstood | 18:06 |
bauzas | tbc I was documenting in the config doc helper that max_instances should be set to the number of mediated devices a device can create | 18:06 |
sean-k-mooney | but where you going to recommend usign device_addresses as what peopel shoudl do by default? | 18:07 |
bauzas | we could | 18:07 |
sean-k-mooney | personally i woudl prefer reuqiring you do do "device_adddress=*" to have use any device with this mdev type behavior by the way | 18:08 |
sean-k-mooney | or saying you must set | 18:08 |
sean-k-mooney | either device_addres or max_instances | 18:08 |
bauzas | yeah those options are conflicting, for sre | 18:10 |
bauzas | I can mark those this way with oslo.config | 18:10 |
sean-k-mooney | but can you also make the mutally exclive group required | 18:11 |
sean-k-mooney | and the dynmic config section requried | 18:11 |
bauzas | we're on same page | 18:12 |
sean-k-mooney | well there is one thing im not sure you and i agree on | 18:12 |
bauzas | [mdev_nvidia-XXX] would become mandatory if we merge https://review.opendev.org/c/openstack/nova/+/899406 | 18:12 |
sean-k-mooney | ah | 18:12 |
sean-k-mooney | ya that | 18:12 |
bauzas | what's inside this section is mutually exclusive | 18:12 |
bauzas | either device_addresses or max_instances | 18:13 |
bauzas | wfy ? | 18:13 |
sean-k-mooney | so your ok with requiring that if an mdev type is in enabled_mdev_types | 18:13 |
sean-k-mooney | we must have the config section | 18:13 |
bauzas | that's what we agreed at the PTG | 18:13 |
sean-k-mooney | [mdev_<mdev name>] | 18:13 |
sean-k-mooney | ok then ya with that condition i think im good with your propsoal | 18:13 |
bauzas | in order to ease people's lifes, I'm just gonna add max_instances as a mutually exclusive option | 18:13 |
bauzas | if you're SR-IOV based, we'll pick N VFs for you | 18:14 |
bauzas | if you're not SRIOV, we'll just pick N devices for you | 18:14 |
melwitt | stephenfin: cool, thanks. I'll check it out | 18:15 |
sean-k-mooney | bauzas: we could just make max_instances apply to sriov case | 18:15 |
sean-k-mooney | since the non sriov case just works | 18:16 |
bauzas | with the non-sriov gpus, you need to specify all the cards | 18:16 |
sean-k-mooney | in the device list | 18:16 |
bauzas | giving them a way to say "I don't care" | 18:16 |
bauzas | seems good to me | 18:16 |
sean-k-mooney | ya so just add * to device_addreses | 18:17 |
sean-k-mooney | well im flexible on how we do that | 18:17 |
bauzas | we need implementation patches and then we could do anything we want | 18:18 |
sean-k-mooney | im ok to have a away to say "expose all devices of this type" i just woudl like it to be a little more explict then it is today | 18:18 |
bauzas | btw. I'm on PTO tomorrow | 18:18 |
sean-k-mooney | right os if we want to defer the details of how we do that for now | 18:18 |
sean-k-mooney | that works for me | 18:18 |
sean-k-mooney | ya im on pto tomorrow and monday | 18:18 |
bauzas | cool | 18:18 |
bauzas | I'll try if I can draft something before I leave | 18:19 |
bauzas | now that I have a machine, this is waaaay easier than before | 18:19 |
sean-k-mooney | yep hardware both sucks and make life simpler | 18:19 |
bauzas | sean-k-mooney: I made the wrong assumption, for non-sriov, max_instances is indeed meaningless | 18:36 |
sean-k-mooney | cool | 18:36 |
bauzas | so yeah we maybe need a wilcard | 18:38 |
bauzas | that said, we captured the opposite https://etherpad.opendev.org/p/nova-caracal-ptg#L669 | 18:38 |
bauzas | but I recently discovered that every single day is full of adventures and news when it comes to nvidia | 18:38 |
opendevreview | sean mooney proposed openstack/nova master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/nova/+/899753 | 19:44 |
pas-ha[m] | <bauzas> "dansmith: fwiw, there is also..." <- bauzas: I don't think I force anything with this patch, it should be 100% backward compatible - if no explicit devices are defined together with the single enabled mdev type, everything should work as before. | 20:23 |
pas-ha[m] | at least that was my intention, not sure I actually tested for that :-) | 20:23 |
sean-k-mooney | with a single mdev type bu tif you had 2 or more then the config section would be requried for each correct | 20:24 |
* sean-k-mooney has not looked at the patch in a few hours and lost context | 20:25 | |
sean-k-mooney | hum so https://review.opendev.org/c/openstack/nova/+/899406/1/nova/virt/libvirt/driver.py#7933 | 20:26 |
sean-k-mooney | is just always regestring the config options | 20:26 |
sean-k-mooney | but not nessiarly enforicing it | 20:26 |
sean-k-mooney | because you still defalt it here https://review.opendev.org/c/openstack/nova/+/899406/1/nova/virt/libvirt/driver.py#7958 | 20:26 |
sean-k-mooney | pas-ha[m]: so ya i think your version is tecnically still backward compatible | 20:27 |
sean-k-mooney | pas-ha[m]: the fact your chagne has not unit test chagne means either its backward compatiable or we have no test coverage for this | 20:28 |
pas-ha[m] | sean-k-mooney: I bet the latter :-D | 20:28 |
sean-k-mooney | unfortunetly you might be right | 20:29 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!