*** vishalmanchanda has quit IRC | 00:05 | |
*** k_mouza has joined #openstack-nova | 00:05 | |
*** k_mouza has quit IRC | 00:10 | |
*** brinzhang has joined #openstack-nova | 00:22 | |
*** tosky has quit IRC | 00:36 | |
*** sean-k-mooney1 has joined #openstack-nova | 00:40 | |
*** sean-k-mooney has quit IRC | 00:41 | |
*** martinkennelly has quit IRC | 00:41 | |
*** macz_ has joined #openstack-nova | 01:04 | |
*** sapd1 has quit IRC | 01:05 | |
*** macz_ has quit IRC | 01:09 | |
*** mlavalle has quit IRC | 01:27 | |
*** iurygregory_ has joined #openstack-nova | 01:30 | |
*** dviroel_ has joined #openstack-nova | 01:31 | |
*** iurygregory has quit IRC | 01:38 | |
*** dviroel has quit IRC | 01:38 | |
*** ttx has quit IRC | 01:38 | |
*** spatel_ has joined #openstack-nova | 01:39 | |
*** dviroel_ is now known as dviroel | 01:39 | |
*** macz_ has joined #openstack-nova | 01:40 | |
*** macz_ has quit IRC | 01:45 | |
*** ttx has joined #openstack-nova | 01:47 | |
*** spatel_ has quit IRC | 02:10 | |
*** hamalq has quit IRC | 02:14 | |
*** macz_ has joined #openstack-nova | 02:22 | |
*** macz_ has quit IRC | 02:27 | |
openstackgerrit | Jinsheng Zhang proposed openstack/nova-specs master: Add nova support ironic instance port group network metadata spec https://review.opendev.org/c/openstack/nova-specs/+/779644 | 02:33 |
---|---|---|
*** spatel_ has joined #openstack-nova | 03:05 | |
*** jamesdenton has joined #openstack-nova | 03:42 | |
*** jamesdenton has quit IRC | 03:50 | |
*** jamesdenton has joined #openstack-nova | 03:51 | |
*** k_mouza has joined #openstack-nova | 03:56 | |
*** gyee has quit IRC | 03:58 | |
*** k_mouza has quit IRC | 04:00 | |
*** links has joined #openstack-nova | 04:08 | |
*** macz_ has joined #openstack-nova | 04:23 | |
*** macz_ has quit IRC | 04:27 | |
*** vishalmanchanda has joined #openstack-nova | 04:59 | |
*** ratailor has joined #openstack-nova | 05:09 | |
*** spatel_ has quit IRC | 05:43 | |
*** macz_ has joined #openstack-nova | 05:50 | |
*** macz_ has quit IRC | 05:55 | |
*** jamesdenton has quit IRC | 05:57 | |
*** jamesden_ has joined #openstack-nova | 05:57 | |
*** sapd1_x has quit IRC | 06:13 | |
*** whoami-rajat has joined #openstack-nova | 06:23 | |
*** LinPeiWen has quit IRC | 07:13 | |
*** ralonsoh has joined #openstack-nova | 07:21 | |
*** khomesh24 has joined #openstack-nova | 07:29 | |
*** LinPeiWen has joined #openstack-nova | 07:32 | |
*** slaweq has quit IRC | 07:42 | |
*** ociuhandu has joined #openstack-nova | 07:45 | |
*** slaweq has joined #openstack-nova | 07:46 | |
*** dklyle has quit IRC | 07:48 | |
*** ociuhandu has quit IRC | 07:51 | |
*** slaweq has quit IRC | 07:52 | |
*** slaweq has joined #openstack-nova | 07:54 | |
openstackgerrit | Merged openstack/nova stable/ussuri: Default user_id when not specified in check_num_instances_quota https://review.opendev.org/c/openstack/nova/+/777217 | 07:54 |
openstackgerrit | Merged openstack/nova stable/ussuri: Add regression test for bug 1914777 https://review.opendev.org/c/openstack/nova/+/777218 | 07:55 |
openstack | bug 1914777 in OpenStack Compute (nova) victoria "Possible race condition between n-cpu and n-api when deleting a building instance" [High,In progress] https://launchpad.net/bugs/1914777 - Assigned to melanie witt (melwitt) | 07:55 |
openstackgerrit | Merged openstack/nova stable/ussuri: Handle instance = None in _local_delete_cleanup https://review.opendev.org/c/openstack/nova/+/777219 | 08:02 |
*** ociuhandu has joined #openstack-nova | 08:05 | |
*** lpetrut has joined #openstack-nova | 08:06 | |
*** xinranwang has joined #openstack-nova | 08:08 | |
gibi | good morning | 08:09 |
*** tesseract has joined #openstack-nova | 08:13 | |
*** ociuhandu has quit IRC | 08:16 | |
*** ociuhandu has joined #openstack-nova | 08:17 | |
*** ociuhandu has quit IRC | 08:17 | |
*** ociuhandu has joined #openstack-nova | 08:18 | |
*** brinzhang_ has joined #openstack-nova | 08:20 | |
*** brinzhang has quit IRC | 08:23 | |
*** ociuhandu has quit IRC | 08:23 | |
yonglihe | good morning | 08:27 |
*** rpittau|afk is now known as rpittau | 08:27 | |
*** ociuhandu has joined #openstack-nova | 08:28 | |
*** tesseract has quit IRC | 08:29 | |
*** tesseract has joined #openstack-nova | 08:31 | |
openstackgerrit | Yongli He proposed openstack/nova master: Smartnic support - cyborg drive https://review.opendev.org/c/openstack/nova/+/771362 | 08:40 |
openstackgerrit | Yongli He proposed openstack/nova master: smartnic support - new vnic type https://review.opendev.org/c/openstack/nova/+/771363 | 08:40 |
openstackgerrit | Yongli He proposed openstack/nova master: smartnic support https://review.opendev.org/c/openstack/nova/+/758944 | 08:40 |
*** tosky has joined #openstack-nova | 08:46 | |
*** MrClayPole has quit IRC | 08:47 | |
openstackgerrit | Adit Sarfaty proposed openstack/nova master: Retry on vmware create_vm when it fails https://review.opendev.org/c/openstack/nova/+/764586 | 08:47 |
*** MrClayPole has joined #openstack-nova | 08:48 | |
*** andrewbonney has joined #openstack-nova | 09:03 | |
*** derekh has joined #openstack-nova | 09:09 | |
*** lucasagomes has joined #openstack-nova | 09:11 | |
*** sapd1 has joined #openstack-nova | 09:16 | |
bauzas | good morning | 09:20 |
* bauzas wonders whether the gate is impacted by the OVH SBG fab issue | 09:21 | |
yonglihe | sean-k-mooney: refer to patch comments, really hope that address those concerns. for first patch commit message changed thanks. | 09:25 |
*** LinPeiWen has quit IRC | 09:26 | |
gibi | bauzas: jobs are runnig this morning as far as I see | 09:31 |
bauzas | gibi: I haven't seen jobs running in any sbg 1 to 4 dcs | 09:36 |
bauzas | but /me gives hugs to OVH folks around | 09:37 |
bauzas | my backups are impacted, but meh ;) | 09:37 |
*** martinkennelly has joined #openstack-nova | 09:47 | |
*** rcernin has quit IRC | 09:54 | |
*** jangutter_ has joined #openstack-nova | 09:55 | |
*** jangutter has quit IRC | 09:59 | |
*** nightmare_unreal has joined #openstack-nova | 10:12 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Remove dead error handling code https://review.opendev.org/c/openstack/nova/+/779704 | 10:15 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP zuul: Replace grenade and nova-grenade-multinode with grenade-multinode https://review.opendev.org/c/openstack/nova/+/778885 | 10:19 |
*** swp20 has joined #openstack-nova | 10:21 | |
*** ociuhandu has quit IRC | 10:23 | |
*** ociuhandu has joined #openstack-nova | 10:24 | |
*** ociuhandu has quit IRC | 10:31 | |
*** ociuhandu has joined #openstack-nova | 10:31 | |
stephenfin | kashyap: Can you take a look at https://review.opendev.org/c/openstack/nova/+/779304/ today, please? | 10:32 |
stephenfin | lyarwood: Seen this before? https://zuul.opendev.org/t/openstack/build/a19ff22927a045df9f8982d7ef678238/log/controller/logs/screen-n-cpu.txt#16367 | 10:36 |
*** masterpe has quit IRC | 10:36 | |
stephenfin | WARNING: Failed to get udev device handler for device /dev/sda1.\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for device(8,15)\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for device(8,15)\n ... | 10:36 |
stephenfin | artom: Look what I see in some build logs -> This host appears to have multiple sockets per NUMA node. The `socket` PCI NUMA affinity will not be supported. | 10:38 |
stephenfin | artom: from https://zuul.opendev.org/t/openstack/build/a19ff22927a045df9f8982d7ef678238/log/controller/logs/screen-n-cpu.txt | 10:38 |
*** jangutter has joined #openstack-nova | 10:42 | |
lyarwood | stephenfin: yeah ./me finds the bug | 10:42 |
lyarwood | stephenfin: https://bugs.launchpad.net/cinder/+bug/1901783 | 10:43 |
openstack | Launchpad bug 1901783 in Cinder "volume delete fails because cinder-rootwrap lvs fails with exit code 139" [Undecided,Triaged] | 10:43 |
stephenfin | oh, fun. This one is happening in nova (via privsep) | 10:44 |
lyarwood | oh really? | 10:44 |
lyarwood | sorry I didn't actually click through | 10:44 |
*** LinPeiWen has joined #openstack-nova | 10:44 | |
* lyarwood looks | 10:44 | |
stephenfin | all good | 10:44 |
stephenfin | https://a59f59e90e7507f14599-2cecd17ff9ea5f4d02c12f6d5fc5aedd.ssl.cf1.rackcdn.com/776681/7/check/nova-lvm/a19ff22/controller/logs/screen-n-cpu.txt | 10:44 |
stephenfin | search for 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43 | 10:44 |
lyarwood | ah right it's the nova-lvm job | 10:44 |
lyarwood | that makes sense | 10:44 |
stephenfin | (the instance) | 10:44 |
stephenfin | yup, agreed | 10:45 |
*** jangutter_ has quit IRC | 10:45 | |
kashyap | stephenfin: Mornin, I have it open; will definitely look today. (Just finishing something more time-constrained.) | 10:47 |
stephenfin | kashyap: Cool. Just remember feature freeze is tomorrow | 10:47 |
stephenfin | so this is also pretty time constrained | 10:47 |
kashyap | stephenfin: Oh, yeah. Darn | 10:48 |
kashyap | stephenfin: I see that you've read the messy/complicated situation here: https://bugzilla.redhat.com/show_bug.cgi?id=1929357#c5 | 10:49 |
openstack | bugzilla.redhat.com bug 1929357 in libvirt "UEFI: Provide a way how to configure different combinations of secure boot enabled/disabled and keys enrolled/not enrolled" [Medium,New] - Assigned to phrdina | 10:49 |
*** k_mouza has joined #openstack-nova | 10:49 | |
stephenfin | yup | 10:49 |
kashyap | Sigh, so we have to do custom-parsing still | 10:50 |
stephenfin | the libvirt auto-configure feature isn't ready for primetime so I've side-stepped it and gone straight to the (QEMU) source | 10:50 |
stephenfin | Yes, but I think what we've done is definitely better than the previous hardcoded list | 10:50 |
stephenfin | so it's an improvement, just not as much of an improvement as we'd hoped for | 10:51 |
kashyap | stephenfin: What do you mean "straight to the source"? You mean using directly the firmware descriptor files shipped by OVMF/EDK2? | 10:51 |
stephenfin | yes | 10:51 |
kashyap | stephenfin: Agree, on removing the hard-coded list | 10:51 |
kashyap | stephenfin: A detail: each distro (depending on the distro) ships them as part of either EDK2 or OVMF package. I did it for Fedora; and filed bugs for Debian and Ubuntu: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932269 | 10:52 |
openstack | Debian bug 932269 in ovmf "Ship the firmware "descriptor files" as part of the 'ovmf' package" [Normal,Fixed] | 10:52 |
stephenfin | I checked. Ubuntu 20.04 has them, as does Debian now | 10:52 |
kashyap | Excellent; by now they should | 10:53 |
stephenfin | And I found references to them for SUSE | 10:53 |
stephenfin | All in the standard locations too, thankfully | 10:53 |
kashyap | stephenfin: Yes, I've worked w/ SUSE folks too in the past to get it worked in | 10:53 |
stephenfin | bauzas: Friendly reminder that the UEFI secure boot series is all green now, save for errant gate failures, and ready for your attention | 10:58 |
artom | stephenfin, is that... bad? | 11:00 |
stephenfin | It's unexpected, no? | 11:01 |
artom | Well, from the nodepool VMs, where we would expect everything to be in one node and socket, yeah | 11:01 |
stephenfin | Though I guess the gate jobs are VMs and therefore not really representative of real hardware | 11:01 |
*** masterpe has joined #openstack-nova | 11:01 | |
artom | stephenfin, well, see here: https://zuul.opendev.org/t/openstack/build/a19ff22927a045df9f8982d7ef678238/log/controller/logs/screen-n-cpu.txt#892-901 | 11:03 |
artom | It put each CPU in its own socket | 11:03 |
artom | So my expectation is clearly wrong :) | 11:03 |
stephenfin | \o/ | 11:04 |
*** jangutter has quit IRC | 11:07 | |
*** dtantsur|afk is now known as dtantsur | 11:07 | |
*** jangutter has joined #openstack-nova | 11:07 | |
*** belmoreira has joined #openstack-nova | 11:08 | |
*** k_mouza has quit IRC | 11:15 | |
*** k_mouza has joined #openstack-nova | 11:15 | |
*** jangutter_ has joined #openstack-nova | 11:18 | |
*** macz_ has joined #openstack-nova | 11:19 | |
*** sapd1 has quit IRC | 11:21 | |
*** jangutter has quit IRC | 11:22 | |
*** macz_ has quit IRC | 11:24 | |
bauzas | stephenfin: yup, I was looking at other series, but I'll review them again this afternoon | 11:30 |
*** iurygregory_ is now known as iurygregory | 11:33 | |
yonglihe | sean-k-mooney: I try to work out your concern and that bug is about LM/resize, that's out of scope. refer to comments, there are some idea about that. | 11:40 |
*** jangutter has joined #openstack-nova | 11:43 | |
yonglihe | sean-k-mooney: I did not dig into that bug, just say: It's seems that's data out of sync somehow, change how to store the data won't help? (fix me) | 11:43 |
*** jangutter_ has quit IRC | 11:46 | |
*** mgariepy has quit IRC | 11:49 | |
*** derekh has quit IRC | 11:52 | |
*** derekh has joined #openstack-nova | 11:54 | |
alex_xu | sean-k-mooney1: yonglihe I looked those two bugs, hope I understand that correctly https://review.opendev.org/c/openstack/nova/+/771363/11/nova/objects/network_request.py#42 | 11:54 |
*** macz_ has joined #openstack-nova | 12:00 | |
*** macz_ has quit IRC | 12:06 | |
sean-k-mooney1 | artom: nova default to creating 1 socket per vcpu | 12:07 |
sean-k-mooney1 | so in the gate vms you will have 8 socket and 1 numa node generated implcitly by qemu/libvirt | 12:08 |
sean-k-mooney1 | even for non numa vms | 12:08 |
sean-k-mooney1 | stephenfin: by the way passing the function form the driver.py to host.py goes away in a later patch where i actully move the fucntions too host.py | 12:09 |
stephenfin | sean-k-mooney1: could you do that first? | 12:09 |
sean-k-mooney1 | we dont actully use the function that take the callback until that patch | 12:09 |
* stephenfin goes for lunch - bbiab | 12:09 | |
sean-k-mooney1 | maybe but it will be a bit of a pain to split back out the patches | 12:09 |
sean-k-mooney1 | i can do that it more a question of time | 12:10 |
sean-k-mooney1 | artom: would you be ok removing the numa toplogy object form the pci tracker and adding in a numa to socket map instead | 12:11 |
*** ociuhandu has quit IRC | 12:12 | |
sean-k-mooney1 | although if we have to account for the many socket to 1 numa node of the gate i guess we need to retink that abit | 12:12 |
*** ociuhandu has joined #openstack-nova | 12:12 | |
sean-k-mooney1 | in the gate all pci devices will be connected to the first socket | 12:12 |
*** ociuhandu has quit IRC | 12:12 | |
sean-k-mooney1 | we only have 1 pcie root | 12:12 |
sean-k-mooney1 | and there is also only 1 numa node by default | 12:13 |
*** ociuhandu has joined #openstack-nova | 12:13 | |
sean-k-mooney1 | a hack would be to have that numa to socket dict be int->list[int] and for you to just use the first socket form the list | 12:13 |
sean-k-mooney1 | we did discuss that at one point | 12:14 |
*** sean-k-mooney1 is now known as sean-k-mooney | 12:16 | |
sean-k-mooney | alex_xu: its the same bug just upstream and downstream version | 12:17 |
sean-k-mooney | alex_xu: the bug is not related to claiming the pci devics | 12:17 |
sean-k-mooney | alex_xu: we correctly claim the pci device the issue is we dont update the neutron port profile correctrly as part of unshelve | 12:18 |
*** ociuhandu has quit IRC | 12:18 | |
sean-k-mooney | alex_xu: and we generate the xml with the pci address stored in the profile | 12:18 |
sean-k-mooney | so we use the pci address form its previous host | 12:18 |
sean-k-mooney | so the vm we are unsleving is the one using the incorrect device | 12:19 |
alex_xu | sean-k-mooney: yes, so for cyborg, we should update the port with new arq uuid | 12:19 |
sean-k-mooney | yes | 12:19 |
alex_xu | so I'm thinking we need the network_request.arq_uuid again to pass that new arq from conductor to compute | 12:20 |
sean-k-mooney | and we should update the profile with the claimed pci address for sriov too we just dont today | 12:20 |
alex_xu | yea | 12:20 |
sean-k-mooney | alex_xu: well network_request is not stored in the db right | 12:20 |
alex_xu | it is fine, we needn't the old arq | 12:20 |
sean-k-mooney | im uncomfortable with having the neutron port binding be the only place we persit that | 12:20 |
sean-k-mooney | we need it for things like hard reboot no? | 12:21 |
alex_xu | we needn't, I check that also, we build network_info from the neutron directly | 12:21 |
sean-k-mooney | we do and store it in the network info cache | 12:21 |
sean-k-mooney | my point is i dont think we should be using the value sotre in neutron to generate the xml | 12:22 |
sean-k-mooney | we should be tryign to do it without using the network info cache | 12:22 |
sean-k-mooney | so that user cant acidentally alter the port profile in a way that can brake nova | 12:22 |
alex_xu | but all the action is using that network info, even for boot instance | 12:22 |
sean-k-mooney | right and that is a design flaw | 12:23 |
sean-k-mooney | that info can be currpted | 12:23 |
sean-k-mooney | i have seen it happen in customer deployment many times | 12:23 |
alex_xu | I'm thinking if the user done that, whether it means there already have other thing goes wrong | 12:24 |
sean-k-mooney | generally it can be recoverd but we should really treat that as write only | 12:24 |
sean-k-mooney | its possible yes | 12:24 |
alex_xu | so binding profile is only for neutron use, but nova should have it own copy | 12:25 |
alex_xu | the hard thing is if the nova copy is different with neutron one, what should nova do | 12:25 |
sean-k-mooney | binding profile was intended to be used to pass info from nova to neutorn in one direction | 12:25 |
sean-k-mooney | alex_xu: use the nova one | 12:26 |
artom | sean-k-mooney, yeah, that is a good idea | 12:26 |
alex_xu | ok | 12:26 |
alex_xu | then correct the neutron info | 12:26 |
artom | And to internalize the function that generates that map into the NUMATopology object itself | 12:26 |
sean-k-mooney | alex_xu: https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/portbindings.py#L31-L34 | 12:26 |
artom | Not sure we have enough time before FF though | 12:26 |
sean-k-mooney | alex_xu: correct field that differ ideally | 12:26 |
sean-k-mooney | but we cant do that based on the info cache we have today | 12:27 |
artom | sean-k-mooney, also, if those _filter functions are really meant to be purely functional, then @classmethod is not the way to do it - you put them outside the class entirely :) | 12:27 |
artom | I'll see what I can hack up today | 12:27 |
sean-k-mooney | artom: why class method are prefectly valid ways to do that | 12:27 |
alex_xu | for yongli's patch, should we block it for that issue? | 12:28 |
sean-k-mooney | you coudl put them outside the class entirely or even make them static | 12:28 |
alex_xu | I feel it is hard for yongli to fix that, since it is boarder issue | 12:28 |
sean-k-mooney | alex_xu: am no we could proceed and clean this up next cycle | 12:28 |
alex_xu | yea | 12:28 |
*** jraju__ has joined #openstack-nova | 12:28 | |
sean-k-mooney | alex_xu: im just uncomfortable realying on that | 12:28 |
sean-k-mooney | i was hoping we could ask cybrog for the arq using the neutorn port uuid | 12:29 |
sean-k-mooney | and have cyborg be the souce of truth | 12:29 |
*** links has quit IRC | 12:29 | |
alex_xu | but nothing we can do it now, I don't think system metadata is good place, since we said before, we shouldn't anything to it, since it without version control | 12:29 |
sean-k-mooney | alex_xu: actully i thought the preferece was to use it for things linke this over adding more tables | 12:30 |
sean-k-mooney | alex_xu: we can have version contol by using ovos | 12:30 |
sean-k-mooney | it allow use to store addtional info without db migrations | 12:30 |
sean-k-mooney | but ok lets put this issue assign for now and maybe add a todo | 12:31 |
sean-k-mooney | alex_xu: can we maybe add a test to vaoidate unshelve and ensure that the port profile is updated with the new arq? | 12:31 |
*** ratailor has quit IRC | 12:31 | |
alex_xu | sean-k-mooney: I don't yongli implement that, I remember he said unshelve is out of scope for the spec | 12:32 |
sean-k-mooney | didnt we add support for shelve with cyborg recently | 12:32 |
sean-k-mooney | if its not supproted with neutron port then we would need to block it yes | 12:33 |
alex_xu | I think when yongli's spec merged, the shelve with cyborg doesn't merge yet | 12:33 |
alex_xu | I think the missing part is update the new arq uuid | 12:34 |
sean-k-mooney | yep | 12:34 |
sean-k-mooney | so we allow shelve now https://github.com/openstack/nova/blob/master/nova/compute/api.py#L4169 | 12:34 |
sean-k-mooney | if the compute service version is new enough | 12:35 |
sean-k-mooney | alex_xu: ok what we can do is proceed as it is | 12:36 |
sean-k-mooney | and when i submit the patch to block shelve for sriov instance with a workaoutnd config option i can include cyborg too | 12:37 |
sean-k-mooney | infact it will do it for both automaticlly | 12:37 |
alex_xu | ok, so we want a explictly block for shelve instance? | 12:37 |
sean-k-mooney | i was just going to check if any of the port vnic types were in VNIC_TYPES_DIRECT_PASSTHROUGH which the cyborg ones are | 12:38 |
* alex_xu is trying to figure out what cyborg unshelve do | 12:38 | |
sean-k-mooney | yes becuse a vm form one tenatn can break another tenat by unshelving | 12:38 |
sean-k-mooney | so like numa migration i want to block shelve for all instance using neutron sriov port with a [workaround]/allow_unsafe_shelve to renable it and backprot that | 12:39 |
sean-k-mooney | then i plan to fix it as a bug | 12:39 |
sean-k-mooney | so we can fix shleve for both sriov and cybrog by ensuring the port profile si updated for the new arq | 12:40 |
sean-k-mooney | so i planned to return a 403 permission deined | 12:40 |
sean-k-mooney | unless you had [workaround]/allow_unsafe_shelve=true or you had a fixed verion of nova | 12:41 |
sean-k-mooney | alex_xu: does that sound resaonable? | 12:41 |
sean-k-mooney | gmann: ^ that should be ok form the api point of view yes? | 12:41 |
sean-k-mooney | gmann: the exisitng issue for sriov ports is a long term bug so i dont want to have to make an api micorverion bump for that | 12:42 |
sean-k-mooney | 403 can be retruned form this already and i think that should be valid? | 12:42 |
alex_xu | sean-k-mooney: or we have a check whether instance have a port with device profile? then we needn't an workaround option | 12:43 |
sean-k-mooney | gmann: the current block we have uses https://github.com/openstack/nova/blob/ab07507e5cfce6232fef373d07ff92ea704541da/nova/exception.py#L158-L159 | 12:43 |
*** ociuhandu has joined #openstack-nova | 12:44 | |
sean-k-mooney | alex_xu: i want the workaound for people that are using this with sriov today | 12:44 |
sean-k-mooney | alex_xu: i know that it currenlty does the wrong thing and you can end up using the wrong device | 12:44 |
sean-k-mooney | which is why i want to blcok it | 12:44 |
sean-k-mooney | but we have had custoemr manually update the port profile and hard reboot the instace to match the claimed device | 12:45 |
sean-k-mooney | its a horable hack but they really wanted to keep shelve | 12:45 |
alex_xu | sean-k-mooney: ok, got it | 12:45 |
sean-k-mooney | so if they want to shoot them selves in the foot then that is what the workaroudn is for | 12:45 |
alex_xu | ok | 12:46 |
sean-k-mooney | alex_xu: if your ok with handeling the shelve issue this way i guess im ok with using the requeted_networks for now | 12:47 |
sean-k-mooney | and the port profile | 12:47 |
alex_xu | sean-k-mooney: yes, I'm ok with that | 12:47 |
alex_xu | then we need to wait yonglihe back, he is going to take his boy back from school :) | 12:47 |
*** ociuhandu has quit IRC | 12:48 | |
sean-k-mooney | ill go link this converation to the gerrit review | 12:49 |
alex_xu | cool | 12:49 |
sean-k-mooney | im more or less ok with the patch other then that but still want to see if gibi is ok with defining the constanct before they are need | 12:50 |
*** ociuhandu has joined #openstack-nova | 12:51 | |
alex_xu | yea, definitely need gibi's feedback | 12:51 |
sean-k-mooney | we may also need some extra test coverage for the vnic_types i havent fully review the last patch | 12:51 |
*** ociuhandu has quit IRC | 13:02 | |
*** ociuhandu has joined #openstack-nova | 13:05 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP zuul: Replace grenade and nova-grenade-multinode with grenade-multinode https://review.opendev.org/c/openstack/nova/+/778885 | 13:07 |
*** nicolasbock has quit IRC | 13:11 | |
*** nicolasbock has joined #openstack-nova | 13:11 | |
*** mgariepy has joined #openstack-nova | 13:12 | |
sean-k-mooney | artom: so since your pci socket policy patch conflit with both my port numa patch and vdpa series we proably should set them up in a chain | 13:18 |
*** ociuhandu has quit IRC | 13:19 | |
artom | sean-k-mooney, sure. Which one of yours is more likely to land at this point? | 13:19 |
artom | I guess put that one on top of my socket stuff? | 13:19 |
sean-k-mooney | vdpa obviously goes at the end since that is the lesast likely to be ready | 13:19 |
artom | Yeah, what I thought :) | 13:19 |
sean-k-mooney | ya the port numa one | 13:19 |
*** ociuhandu has joined #openstack-nova | 13:20 | |
artom | I'm in the process of adding some patches to my series, I'll let you know when it's pushed? | 13:20 |
sean-k-mooney | would you be oke putting your patch on top of the port numa one? if i rebase it on top of master to pic up the patches of your that already merged | 13:20 |
sean-k-mooney | hum i guess yours already had 2 +2s buyt you need to change it for the multi socket thing | 13:21 |
sean-k-mooney | i dont think there are outstanding quesion left for mine. | 13:22 |
artom | sean-k-mooney, mind linking your series? | 13:22 |
sean-k-mooney | its one patch or the prot numa polices https://review.opendev.org/c/openstack/nova/+/773792 | 13:22 |
artom | Ah, there's only the one? https://review.opendev.org/c/openstack/nova/+/773792 | 13:22 |
sean-k-mooney | ya i should have a second for numa vswitch but i wont get to that until next cycle so just one for sriov | 13:23 |
*** brinzhang_ has quit IRC | 13:23 | |
artom | There aren't serious conflicts there, though | 13:23 |
sean-k-mooney | i was going to say maybe defer to stephenfin and gibi for order | 13:23 |
artom | Maybe some trivial "context code" in the tests | 13:23 |
sean-k-mooney | correct its minor | 13:23 |
*** brinzhang_ has joined #openstack-nova | 13:24 | |
sean-k-mooney | just dont wnat them to fight in the gate | 13:24 |
sean-k-mooney | sicne one of the two will merge first | 13:24 |
sean-k-mooney | and the other need to be respun | 13:24 |
artom | sean-k-mooney, lemme drop the kids off | 13:24 |
artom | Back in ~1 hour | 13:24 |
*** ociuhandu has quit IRC | 13:25 | |
*** jangutter_ has joined #openstack-nova | 13:25 | |
sean-k-mooney | sure ill wait for stephenfin or gibi to comment and ill do whatever they say | 13:26 |
sean-k-mooney | if they say put user first ill rebase on top of https://review.opendev.org/c/openstack/nova/+/772779 | 13:26 |
*** ociuhandu has joined #openstack-nova | 13:27 | |
*** jangutter has quit IRC | 13:30 | |
*** ociuhandu has quit IRC | 13:33 | |
*** sapd1 has joined #openstack-nova | 13:38 | |
stephenfin | sean-k-mooney: what's the question? | 13:43 |
*** jangutter_ has quit IRC | 13:44 | |
sean-k-mooney | stephenfin: artoms socket patch conflict with both my port numa and vdpa patches so i thin we need to stack them | 13:44 |
sean-k-mooney | stephenfin: question is port numa first or socket | 13:45 |
sean-k-mooney | with vpda last | 13:45 |
stephenfin | socket | 13:45 |
stephenfin | it's closest to merge | 13:45 |
*** jangutter has joined #openstack-nova | 13:45 | |
stephenfin | in fact it's approved | 13:45 |
sean-k-mooney | right but ig cant merge since it broken | 13:45 |
stephenfin | huh? | 13:46 |
sean-k-mooney | artom need to fix suport form multipel socekt on one numa node | 13:46 |
sean-k-mooney | by making it a list and taking the first socket | 13:46 |
sean-k-mooney | to make it work in the gate vms | 13:46 |
stephenfin | can't that be a follow-up? | 13:46 |
stephenfin | as it's an edge condition | 13:46 |
sean-k-mooney | i gues but i also am not happy with the numa topology object being passed as a copy to the pci tracker | 13:46 |
stephenfin | is there real hardware with that design, out of curiosity? | 13:46 |
sean-k-mooney | we should have passed a dict of numa:socket | 13:47 |
stephenfin | yeah, neither was I but I couldn't find a better way | 13:47 |
stephenfin | that not much better | 13:47 |
sean-k-mooney | well we should have passed it to the function | 13:47 |
stephenfin | it's still information that can get out of sync | 13:47 |
sean-k-mooney | that cant | 13:47 |
sean-k-mooney | we dont supprot memory hotplug on the compute host | 13:47 |
stephenfin | and the NUMATopology can, for what we need? | 13:48 |
sean-k-mooney | so the numa node to sockt mapping wont get out of daty | 13:48 |
stephenfin | *what we need it for there? | 13:48 |
sean-k-mooney | stephenfin: its a copy so the pinned cpus and mempages wil be out of date | 13:48 |
stephenfin | but we don't use that, right? | 13:48 |
stephenfin | so we're simply passing too much information, some of which will get out-of-date | 13:48 |
sean-k-mooney | right but we dont have any comment stating that it will be incorrect and if we started trying to use it it could break | 13:48 |
sean-k-mooney | stephenfin: yep | 13:49 |
stephenfin | okay, fair | 13:49 |
stephenfin | I'm happy to look at alternatives so | 13:49 |
sean-k-mooney | i would have prefered not to store it in the class too and pass it to the filter evne if that involed passing it donw the layres but the too mugh datat it my main concnern | 13:49 |
stephenfin | but I wouldn't block on it, because of how close feature freeze is | 13:49 |
*** ociuhandu has joined #openstack-nova | 13:49 | |
stephenfin | yeah, artom and I went over that in the review | 13:50 |
sean-k-mooney | ya so we coudl reduce the amount of data we pass in a folowup | 13:50 |
stephenfin | I believe him when he said it was too ugly | 13:50 |
sean-k-mooney | i just dislike the fact i need to change my mental model | 13:50 |
sean-k-mooney | the filter were not ment to use any data not passed into them | 13:51 |
sean-k-mooney | which is why they were class metods | 13:51 |
sean-k-mooney | to signal that | 13:51 |
stephenfin | did you see my other idea? | 13:51 |
stephenfin | from https://review.opendev.org/c/openstack/nova/+/774149/12 | 13:51 |
stephenfin | "Spitballing. How about adding 'host_socket' and maybe 'host_cell' attributes to 'InstanceNUMACell'? The latter isn't really necessary, since we use 'id' for that right now, but it would let us decouple this in the future (it's a confusing design, IMO). That former would let us avoid passing host NUMA information through to the PCI manager." | 13:51 |
sean-k-mooney | nope i only saw this after the patches were merged | 13:51 |
stephenfin | Well then, thoughts on that? Unnecessary duplication or potentially useful? | 13:52 |
stephenfin | host_cell should probably read host_node too | 13:52 |
sean-k-mooney | well i orginally wanted to put the socket in the extra_info dict in the pcidevice | 13:52 |
sean-k-mooney | so we did not need the lookup | 13:52 |
sean-k-mooney | but let me read that a few times | 13:53 |
sean-k-mooney | stephenfin: am adding host_socket to instanceNumaCell wont help unfortunetly | 13:53 |
sean-k-mooney | stephenfin: we dont have the socket of the pci device | 13:54 |
sean-k-mooney | if we did that and we added host_socket to the pci extra info then yes | 13:54 |
yonglihe | sean-k-mooney, alex_xu gibi: thanks, guys. | 13:54 |
stephenfin | oh, right | 13:54 |
stephenfin | darn | 13:54 |
stephenfin | then nvm | 13:55 |
sean-k-mooney | so what artom has merged in the first to pathces works its just got sharp edges | 13:55 |
sean-k-mooney | and if we reduced it to a dict of numa node to list of sockets | 13:56 |
sean-k-mooney | and then jsut did mapping[cell.id][0] | 13:56 |
sean-k-mooney | i think that would be enough | 13:56 |
sean-k-mooney | and we coudl do that in a follow up | 13:56 |
sean-k-mooney | so add a properyt or function to the host numa object to return that mappign dict and pass that in to the pci_tracker when we contuct it | 13:57 |
sean-k-mooney | that also get rid of some of the nested loops | 13:57 |
sean-k-mooney | stephenfin: ill rebase my port patch on his then but what do you think of ^ | 13:58 |
stephenfin | wfm | 13:58 |
stephenfin | but as a follow-up, IMO | 13:58 |
stephenfin | just to de-risk the whole enterprise | 13:58 |
sean-k-mooney | sure | 13:59 |
*** spatel_ has joined #openstack-nova | 14:01 | |
*** ociuhandu has quit IRC | 14:01 | |
*** ociuhandu has joined #openstack-nova | 14:01 | |
kashyap | stephenfin: I'm looking at the UEFI code; I'm commenting there as I go along (to facilitate parallel processing). I see that you're already neck-deep in another discussion here; no rush | 14:03 |
kashyap | stephenfin: Another thing on my mind (maybe you've already done it somwhere, and I need to catch up): a "guard" / check that denies secure boot for non-q35 machine types -- it's a q35-only feature | 14:04 |
* kashyap keeps lookin | 14:04 | |
stephenfin | kashyap: I don't need to do that explicitly | 14:06 |
stephenfin | but I do, to be safe | 14:06 |
stephenfin | the firmware metadata files won't report support for pc-i440fx-* machine types with the secure-boot feature | 14:06 |
stephenfin | so we'll hit https://review.opendev.org/c/openstack/nova/+/779302/2/nova/virt/libvirt/host.py#1506 | 14:06 |
stephenfin | see also line 1499 in that | 14:07 |
stephenfin | one of the reasons I invested so much time in beefing up the FakeLibvirtFixture was to "automate" as much of this as possible | 14:07 |
kashyap | stephenfin: Indeed, the JSON files report, correctly so, only for pc-q35-* (ugly name; but it's QEMU's "historical decision) | 14:07 |
stephenfin | correct | 14:07 |
kashyap | stephenfin: I see. Yeah, I've seen your work fly-by on the test / fixtures stuff. Fine work | 14:08 |
stephenfin | with that being said https://review.opendev.org/c/openstack/nova/+/776681/7/nova/virt/libvirt/driver.py#5837 | 14:08 |
stephenfin | That's mostly for test purposes | 14:08 |
stephenfin | actually, no, I remember - it's so we can decided whether to enable secure boot or not in the 'optional' case | 14:09 |
*** jamesden_ is now known as jamesdenton | 14:09 | |
stephenfin | if we didn't have that, we'd attempt to enable it and fail | 14:09 |
kashyap | stephenfin: Ah, I didn't get tot he _check-secure_boot_support() method yet; looks good | 14:09 |
kashyap | Yeah, we don't want to enable secure Boot by "default" (may people tend to opt out of the UEFI complexity, for good reasons) | 14:12 |
*** jrollen is now known as jroll | 14:13 | |
sean-k-mooney | stephenfin: artom summerised that here https://review.opendev.org/c/openstack/nova/+/772779/17#message-744974dfe27df3782bd2e6c066aa1b241fc7136a | 14:14 |
sean-k-mooney | ill be afk untill the top of the hour and ill rebase my patches then | 14:14 |
gibi | stephenfin, sean-k-mooney, artom: fixing the sharp edges in artoms socket policy series as a followup works for me. I can review both that and the port numa policy when they are put in order. The order does not matter to me I have the intention to push both through :) | 14:14 |
sean-k-mooney | :) | 14:16 |
gibi | and we have more than a day! :D | 14:16 |
bauzas | I'm switching away a bit from stephenfin's uefi secure boot series while 50% of the changes are now approved | 14:22 |
bauzas | who wants reviews here? | 14:22 |
bauzas | sean-k-mooney: vDPA series or the policy one ? | 14:23 |
* bauzas offers time | 14:23 | |
bauzas | gibi: nothing crucial for your eyes ? | 14:23 |
stephenfin | gibi: plus a week for them to actually get through the gate /o\ | 14:23 |
stephenfin | sigh | 14:23 |
stephenfin | it really hates my db migration series | 14:23 |
bauzas | stephenfin: well, no worries if they are accepted before FF | 14:23 |
bauzas | if they are in the gate, we can recheck if needed | 14:24 |
artom | stephenfin, sean-k-mooney, so one thing that hadn't occurred to me what we could do is still store something in self (let's say self.numa_topology for now), but access it from consume_requests() and supports_requests(), which are already at the object level, and pass it to the _filter() methods, and we can keep them classmethods in that way | 14:24 |
bauzas | even if we're past the feature freeze | 14:24 |
artom | But to be honest, I'm burnt out on the thing. It's good enough as is, I'll remove my -1, merge, and that way folks are free to concentrate on other stuff before FF | 14:25 |
artom | Instead of endlessly revisiting this | 14:25 |
stephenfin | ack | 14:27 |
gibi | stephenfin: I will recheck the db series into W or at least "die" trying :) | 14:27 |
sean-k-mooney | artom: ya that would have worked. | 14:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: WIP: Change default policy for GET '/os-hypervisors' API https://review.opendev.org/c/openstack/nova/+/765798 | 14:28 |
sean-k-mooney | i have a ptg topic on reworking the pci handeling and fixing bugs | 14:28 |
artom | sean-k-mooney, you have *all* the PTG topics ;) | 14:28 |
sean-k-mooney | artom: lets think about it as part of that next cycle | 14:28 |
stephenfin | gibi: gmann: That's my approach to the 'os-hypervisors as project admin' change ^ | 14:28 |
sean-k-mooney | artom: ya i know i feel bad for adding them but we might just skip some | 14:28 |
stephenfin | gibi: gmann: tbh, for what it gives us I'm thinking it's less and less a good idea | 14:28 |
sean-k-mooney | artom: more a list of what i knwo we shoudl addrss right now then what we plann on adressing next cycle | 14:29 |
stephenfin | bauzas: ta for the reviews, btw :) | 14:29 |
gibi | artom: as I said follow up work for me for the refactor. Do we know that socket - numa cardinality issue on the gate is just gate test node specific and fixebale later? | 14:29 |
bauzas | stephenfin: sorry for stopping but I want to zap here | 14:29 |
sean-k-mooney | artom: i plan to stick through or otherwise note which one wont be worked on in xena after we discuss downstream what we have capsity to work on | 14:30 |
bauzas | so, which one to pick ? | 14:30 |
artom | gibi, the thing stephenfin pointed out? It's normal, our nodepool VMs are set up that way | 14:30 |
stephenfin | bauzas: all good. chances are something will fail in the gate so no point bulk approving them | 14:30 |
artom | gibi, the new code correctly noped out when it saw there were multiple sockets per NUMA node | 14:30 |
bauzas | stephenfin: that's the usual game | 14:30 |
gibi | artom: does it also mean that we cannot test this now on the gate in tempest? | 14:31 |
sean-k-mooney | gibi: it can happen on really old hardware with a frontside bus. its unlkely to happen on any moderne (made in the last 12 years) hardware | 14:31 |
bauzas | gibi: again, nothing to point out specifically ? | 14:31 |
bauzas | if not, I'll choose sean-k-mooney's vDPA stuff as it's HPC | 14:31 |
artom | gibi, I mean we never could we don't have PCI devices :) | 14:31 |
gibi | sean-k-mooney: thanks, so it is just gate env issue, cool | 14:31 |
sean-k-mooney | gibi: multipl socket per numa ndoe imples the memory contoler was on the motherboard not the cpu | 14:31 |
gibi | artom: correct, sorry :) | 14:31 |
sean-k-mooney | intell moved it to the cpu 12 years ago with nehelem | 14:31 |
gibi | OK, then I'm fine with the socket policy series as is | 14:32 |
bauzas | that reminds me, any barebone server you could recommend me for running an openstack env ? | 14:32 |
artom | gibi, cool, thanks! | 14:32 |
gibi | let's merge it and do whathever refactor you agree on as a followup | 14:32 |
sean-k-mooney | bauzas: i got one from ebay for 1600 or so but i allo have older ones for about 500 | 14:32 |
gibi | it also means that Sean's port policy patch should go on top | 14:33 |
sean-k-mooney | bauzas: thre are lots you can get second hand depending on your budget | 14:33 |
bauzas | sean-k-mooney: woah that's expensive | 14:33 |
sean-k-mooney | bauzas: if you dont have an atic get a tower server | 14:33 |
gmann | stephenfin: on shelve block: by this you mean you will return this in new code based on config right ? "the current block we have uses https://github.com/openstack/nova/blob/ab07507e5cfce6232fef373d07ff92ea704541da/nova/exception.py#L158-L159" | 14:33 |
gmann | sean-k-mooney: ^^ | 14:33 |
gmann | not stephenfin | 14:33 |
bauzas | sean-k-mooney: I have a cave, but it's already fully packed so a barebone would get my preference | 14:33 |
gmann | stephenfin: ack, sorry i missed that part of filtering. I will check that | 14:34 |
bauzas | a tower is nice but too huge for the place I'd put it in there | 14:34 |
sean-k-mooney | bauzas: i have to run but we can talk about it later if you want you can get ceaper system if you hapy to go older in 200-500 price point | 14:34 |
bauzas | sean-k-mooney: I'd appreciate it | 14:34 |
bauzas | I honestly don't need a lot of power | 14:35 |
bauzas | I just want rapid disks and RAM | 14:35 |
bauzas | for spinning VMs | 14:35 |
sean-k-mooney | gmann am it would return that for vms with cyborg prots and a similer new excpetion for vms with sriov port | 14:35 |
bauzas | I was both frustrated and ashamed by gibi testing my own series :p | 14:35 |
sean-k-mooney | gmann: the new one woudl also just inherit form forbidden | 14:35 |
stephenfin | gmann: Ack. Like I said, I don't really like it myself. Relying on this obscure aggregate metadata property that was previously only used for a specific filter and optional pre-filter feels wrong | 14:35 |
sean-k-mooney | gmann: so no change in error code | 14:35 |
gibi | bauzas: no worries. I'm lucky as I have access to a lab with decent hardware (even with SRIOV capable NICs) | 14:36 |
bauzas | gibi: that's what I honestly'd like to have | 14:36 |
stephenfin | gmann: If it were me, I'd probably just drop the idea and remove that idea from the spec but perhaps there are people who really want this? :-D | 14:36 |
sean-k-mooney | but ya basically return that for shelve based vnic_type in (direct,direct-pyshical ...) all the sriov or cyborg vnic types | 14:36 |
gmann | sean-k-mooney: +1, as long as it inherit from forbidden error, wsgi layer will handle it as 403 otherwise you need to handle it explicitly in API controller. | 14:37 |
sean-k-mooney | ok ill add you to the patch to review if that is ok when i get time to write it | 14:37 |
gmann | sean-k-mooney: and yes 403 return for error cases does not need microversion as it is already generic return code as you mentioned. | 14:37 |
yonglihe | gibi: hope you have some bandwidth . I can fix comments at you guys night. wish me lucky. -:) | 14:37 |
gibi | yonglihe: I plan to get to the smartnic series before I leave for today | 14:38 |
gibi | yonglihe: have a nice night | 14:38 |
gmann | stephenfin: but if we drop we again go back to the problem of using PROJECT_AMDIN for POST /server API on specific host | 14:38 |
stephenfin | gmann: That's a fair point | 14:38 |
yonglihe | gibi: thanks have a good day. | 14:39 |
stephenfin | gmann: In that case, we can persist with it. I wasn't sure of the use case but that sounds like a good one. | 14:39 |
* gibi goes to look at the os-hypervisor policy patch | 14:40 | |
*** fnordahl has quit IRC | 14:42 | |
gmann | stephenfin: 1 comment, for case where there is no 'filter_tenant_id' in host's aggregate then that host we can return for project admin | 14:49 |
stephenfin | oh, that makes things more useful | 14:49 |
stephenfin | isn't that a lot of information to expose though? | 14:50 |
stephenfin | Maybe not. They'll just see the summary view so ID, hostname, state, and status | 14:51 |
gibi | that would basically mean that all project admin in a non tenant separated cloud see every compute | 14:51 |
gmann | yeah | 14:51 |
gibi | do we allow project admins to boot on all the computes in that case directly? | 14:51 |
gibi | I mean what exactly this means? | 14:52 |
gibi | 15:38 < gmann> stephenfin: but if we drop we again go back to the problem of using | 14:52 |
gmann | all you mean all or only host's computes with no 'filter_tenant_id' | 14:52 |
gibi | PROJECT_AMDIN for POST /server API on specific host | 14:52 |
gibi | does the above mean that we want to allow project admins to boot on specific compute host? | 14:53 |
stephenfin | I'm looking and we already do | 14:53 |
gmann | if it pass the filter_tenant_id filter (host match with filter_tenant_id or there is host with no filter_tenant_id) then that compute canbe allowed and rest 403 | 14:53 |
gibi | hm, that feels powerful | 14:53 |
stephenfin | but gmann's comment there says we should change that to system admin | 14:54 |
gibi | especially in a non tenant separated cloud | 14:54 |
*** ociuhandu has quit IRC | 14:54 | |
gmann | stephenfin: no, i mean system reader get everything, and project admin only go with those filter_tenant_id filters | 14:54 |
*** ociuhandu has joined #openstack-nova | 14:54 | |
stephenfin | isn't that what I've done? | 14:55 |
gmann | may be we have to check if context is scoped to system or project | 14:55 |
gmann | this one? https://review.opendev.org/c/openstack/nova/+/765798/5..6/nova/api/openstack/compute/hypervisors.py#197 | 14:55 |
gmann | and you will keep default policy as SYSTEM_READER? | 14:56 |
stephenfin | oh, are you talking about the context I use in my call to 'objects.Aggregate.get_by_metadata'? | 14:57 |
stephenfin | I'm changing the 'os_compute_api:os-hypervisors:list' policy to project admin or system reader | 14:58 |
stephenfin | but you'll note that check doesn't pass a context argument | 14:58 |
stephenfin | so it will never pass for a project admin, right? | 14:58 |
stephenfin | while it will pass for a system admin or system reader | 14:59 |
stephenfin | since they don't need context | 14:59 |
stephenfin | it's a bit of a hack, I'll admit, but it seemed logical to me | 14:59 |
gmann | stephenfin: it will pass with project admin too right with ' 'os_compute_api:os-hypervisors:list' policy to project admin or system reader' | 14:59 |
gmann | it will fail for project member/reader | 15:00 |
stephenfin | the check will fail (non-fatally) for project admins, which means we'll enter that block | 15:00 |
gmann | but with this it would fail right?> -https://review.opendev.org/c/openstack/nova/+/765798/6/nova/policies/hypervisors.py#37 | 15:01 |
gmann | *would not | 15:01 |
*** jraju__ has quit IRC | 15:01 | |
stephenfin | I'm confused. Let's start from the top | 15:02 |
stephenfin | The policy has been changed from "system reader" to "system reader or project admin" | 15:02 |
stephenfin | We check that here https://review.opendev.org/c/openstack/nova/+/765798/5..6/nova/api/openstack/compute/hypervisors.py#277 | 15:02 |
*** ociuhandu has quit IRC | 15:03 | |
stephenfin | If e.g. a project reader tries to access this API, they'll get a HTTP 403 | 15:03 |
*** ociuhandu has joined #openstack-nova | 15:03 | |
stephenfin | we then check the *same policy* again here https://review.opendev.org/c/openstack/nova/+/765798/5..6/nova/api/openstack/compute/hypervisors.py#197 | 15:03 |
stephenfin | only this time we do not provide "target={'project_id': context.project_id}" | 15:04 |
gmann | yeah 277 protect from project member/reader | 15:04 |
stephenfin | you must provide that to pass any project checks | 15:04 |
gmann | ahh, i did not notice the target | 15:04 |
gmann | got it, little confusing to read | 15:04 |
stephenfin | so that check will return False for project members but true for system admin/member/reader | 15:04 |
stephenfin | *project admins | 15:05 |
gmann | yeah | 15:05 |
stephenfin | yeah, like I said, it's a bit of hack | 15:05 |
*** zzzeek has quit IRC | 15:05 | |
stephenfin | that's what I was trying to call out with this comment | 15:05 |
stephenfin | # don't pass project information, forcing the check to be a system | 15:05 |
stephenfin | # admin check | 15:05 |
gmann | or simple we can check context.system_scope == 'all' or whatever exact value | 15:05 |
stephenfin | I should have said "don't provide project information as part of the target" | 15:05 |
stephenfin | that would be simpler :) | 15:06 |
*** zzzeek has joined #openstack-nova | 15:06 | |
stephenfin | what happens if we don't have scoping enabled though? | 15:06 |
stephenfin | would that result in admins not being able to list all hypervisors? | 15:06 |
gmann | https://github.com/openstack/oslo.context/blob/0d02866365bc8b779aef9ebd0b79a52c96ae40e5/oslo_context/context.py#L251 | 15:07 |
gmann | humm that is tricky, in old way we allowed to list all hypervisors | 15:08 |
gmann | so as long as we support deprecated policy it will pass the 277 and @197 if you can check "if CONF.oslo_policy.enforce_scope and context.system_scope == 'all'" then we can support old way too | 15:09 |
gmann | i mean only enable that check for CONF.oslo_policy.enforce_scope=true | 15:09 |
gmann | if false then we do old way to keep compatibility | 15:10 |
gmann | means return all hyperviros | 15:10 |
stephenfin | Yup, good idea | 15:11 |
stephenfin | okay, so back to the response if 'filter_tenant_id' is not set on any aggregate | 15:11 |
stephenfin | are you suggesting we return all hypervisors for project admins if 'filter_tenant_id=project_id' is not present on any aggregate? | 15:12 |
stephenfin | that would allow users to know all compute nodes in a given deployment | 15:12 |
stephenfin | which seems too much | 15:12 |
gmann | yeah, as filter_tenant_id is not there on aggregate scheduler filter aggregate_multitenancy_isolation allow any project to boot instance, | 15:14 |
gmann | https://github.com/openstack/nova/blob/0e7cd9d1a95a30455e3c91916ece590454235e0e/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L54 | 15:14 |
*** __ministry1 has joined #openstack-nova | 15:14 | |
gmann | i mean that is how host is kept open for all project | 15:14 |
gibi | I project admin in a non tenant isolated cloud should not be allowed to explicitly select any target host, I think | 15:14 |
stephenfin | there's a big difference between being able to run an instance on any host and being able to get information on all those hosts though | 15:15 |
gibi | gmann: the filters allows using hosts but for me the question is does the project admin needs to know which host is which? | 15:15 |
gibi | stephenfin: +1 | 15:15 |
stephenfin | we don't even expose the host that the instance is on for non-admins | 15:15 |
stephenfin | as in the 'OS-EXT-SRV-ATTR:hypervisor_hostname' attribute | 15:16 |
gmann | for this use case, only hstname is needed for create server request | 15:17 |
gmann | yeah for non-admin we will block list hypervisor or create instance on specific host | 15:17 |
gmann | gibi's has good point 'I project admin in a non tenant isolated cloud should not be allowed to explicitly select any target host, I think' | 15:18 |
stephenfin | I just checked. It fails on a DevStack deployment for me | 15:18 |
gmann | non tenant isolated cloud is anotehr thing to consider | 15:18 |
stephenfin | . devstack/openrc | 15:18 |
stephenfin | $ openstack --os-compute-api-version 2.latest server create --flavor m1.tiny --image cirros-0.5.1-x86_64-disk --network adec0800-859b-4aee-bf51-75eecb7aacf2 --hypervisor-hostname devstack-1 --wait test-server | 15:18 |
stephenfin | Policy doesn't allow compute:servers:create:requested_destination to be performed. (HTTP 403) (Request-ID: req-094bc901-ca54-4c9d-80dc-36f21b9d30ee) | 15:18 |
stephenfin | So unless an admin changes that policy, this information isn't very helpful | 15:19 |
stephenfin | even for the tenant isolated cloud | 15:20 |
stephenfin | Am I missing something? | 15:20 |
gmann | yeah this one https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L189 | 15:20 |
stephenfin | Ah, wait, I am. 'source devstack/openrc' won't set things up for a project admin | 15:22 |
stephenfin | it'll use the demo user | 15:22 |
stephenfin | so I could wrangle things to use a project admin and it should pass since it's actually using this policy https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L205 | 15:23 |
*** ociuhandu has quit IRC | 15:23 | |
stephenfin | but the TODO there indicates that the use of PROJECT_ADMIN is a mistake that we need to fix | 15:23 |
stephenfin | and we shouldn't really be letting project admins select their host | 15:23 |
stephenfin | at least that's how I read that TODO | 15:24 |
gmann | yeah that is the use case end up with hypervors policy change | 15:24 |
*** ociuhandu has joined #openstack-nova | 15:24 | |
gmann | first we discussed of letting system to create instance for project need vm on specific host as project admin cannot get host info | 15:24 |
gmann | but later we agreed to do policy change in hypervisor itself and let project admin to list host | 15:25 |
gmann | In current situation, with default policy, unless user get system and project scope both in their token they cannot boot instance on specific host. | 15:26 |
gmann | use system scope power to get host info and then project admin power to boot instance with host | 15:26 |
*** ociuhandu has quit IRC | 15:26 | |
*** ociuhandu has joined #openstack-nova | 15:26 | |
stephenfin | Okay, so this is an alternative to a new microversion for the 'POST /server' API? | 15:27 |
gmann | yes | 15:27 |
stephenfin | Instead of adding e.g. a 'project_id' field to the request body for that | 15:27 |
stephenfin | Gotcha | 15:27 |
gmann | exactly | 15:27 |
stephenfin | So my gut says for this to happen, we'd have to insist on tenant-isolation | 15:28 |
gmann | gibi: stephenfin but if we want to be more secure, I think returning no host if no matching 'filter_tenant_id' is also fine for me | 15:29 |
gmann | and if we get any use case to allow non-'filter_tenant_id=project_id' host then we can see | 15:29 |
gmann | i mean the current way stephenfin doing. | 15:30 |
stephenfin | I think that's necessary. Listing all hosts in a deployment is too much power | 15:30 |
*** __ministry1 has quit IRC | 15:30 | |
gibi | I agree | 15:31 |
gmann | yeah, and it contradict our new concept of system vs project .. | 15:31 |
stephenfin | (Continuing to think out loud) Even with that though, it's still odd that we will now allow users to list some hypervisors but we won't show them the 'OS-EXT-SRV-ATTR:hypervisor_hostname' attribute | 15:31 |
stephenfin | But maybe not as odd as having to ask a system admin to create an instance for you | 15:31 |
stephenfin | gmann: This is a tricky problem :) | 15:31 |
gmann | stephenfin: humm, yeah but let it be strict use case of 'you want to create server on host so you can get hypervisor list but no where else we will return host info' | 15:32 |
gibi | how I see 1) if there is strict tenant isolation and I'm the admin of that tenant then it is OK that I can see the hosts dedicated to my tenant and also it is OK that I can specify which host a VM lands on 2) if there is no strict tenant isolation then a project admin should not see more hypervisor info than a project member and also should not be able to specify which host a VM lands on | 15:33 |
gmann | +1 | 15:33 |
gibi | in short if there is isolation then inside an isolated buble the project admin can do whathever | 15:33 |
gibi | I don't care | 15:33 |
gibi | but as soon as tenants interact | 15:34 |
sean-k-mooney | gibi: i can live it that. i was leanign to be more generous and say if ther eis not strict isolation for this tenat | 15:34 |
gibi | the project admin should no have super power | 15:34 |
sean-k-mooney | then it can see all host that are not isolated | 15:34 |
stephenfin | gibi: so should we add another check to 'POST /servers'? | 15:34 |
gmann | can we add this info in api-ref? this will clarify the usage for this API. | 15:34 |
sean-k-mooney | since that would be the set of host they could boot on | 15:34 |
gmann | at lest by seeing how confusing those API combination is | 15:34 |
stephenfin | if I'm a project admin, the host I request must belong to an aggregate I'm isolated to | 15:34 |
gibi | agree ^^ | 15:35 |
sean-k-mooney | requireing isolation is ok too | 15:35 |
gmann | stephenfin: gibi for POST server we do not need as they do not know the host info | 15:35 |
stephenfin | gmann: you could conceivably guess | 15:35 |
*** jamesdenton has quit IRC | 15:35 | |
gmann | ohk on agrregate check for POST eyx | 15:35 |
gmann | yes | 15:35 |
stephenfin | What do we return? HTTP 403? | 15:35 |
stephenfin | I'm not sure if this warrants a microversion or not. Policy changes like this are weird | 15:36 |
sean-k-mooney | i think that is what the existing policy would return | 15:36 |
*** jamesdenton has joined #openstack-nova | 15:36 | |
sean-k-mooney | if a project_member did it | 15:36 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make nova-ceph-multistore use policy.yaml https://review.opendev.org/c/openstack/nova/+/779815 | 15:36 |
gmann | yeah 403 is better, if they do not pass policy permision or and host does not elong/allow them | 15:36 |
stephenfin | not for a project admin though | 15:36 |
stephenfin | currently they can guess a hostname and request that they be booted on that | 15:37 |
gmann | yeah for project admin, policy allow | 15:37 |
sean-k-mooney | stephenfin: ya and the filter will prevent it | 15:37 |
sean-k-mooney | if the host is not in there set | 15:37 |
stephenfin | right, but it didn't before | 15:37 |
sean-k-mooney | well assuming your using it | 15:37 |
stephenfin | so that's a change in behavior | 15:37 |
gmann | or by bribing the system user :) | 15:37 |
stephenfin | ergo, what do we respond with and does this need a microversion | 15:37 |
sean-k-mooney | stephenfin: not the aggreate tenatn isolation fitler or placment version would block the boot if isolatio was configured | 15:37 |
stephenfin | but not if it wasn't | 15:38 |
sean-k-mooney | correct | 15:38 |
gmann | for non isolated cloud, it is change in behavior | 15:38 |
sean-k-mooney | if it was not then even as a normal user you can guess the host by using the AZ:host syntax | 15:38 |
sean-k-mooney | i dont think we need to do the check on server create | 15:39 |
sean-k-mooney | we should leave that up to placment/the schduler fileters | 15:39 |
sean-k-mooney | for /os-hyperviors we shoudl check | 15:39 |
gmann | but for non isolated cloud, anyone can boot on host if they know host name? | 15:39 |
sean-k-mooney | with an az yes | 15:40 |
sean-k-mooney | you can jsut do --avaiablity-zone $az_name:$hostname | 15:40 |
gmann | but we are restricting in list hypervisors, do not return if no tenant isolation | 15:40 |
sean-k-mooney | i think --host is admin only? | 15:40 |
sean-k-mooney | gmann: i was originally suggeting returning all hyperviors if no isolation for what it worth | 15:41 |
gmann | yeah but with current discussion it seem that is too much info | 15:41 |
sean-k-mooney | ya it seams to have pivioted | 15:42 |
stephenfin | sean-k-mooney: '--hypervisor-hostname' is project-admin scoped https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L217 | 15:42 |
sean-k-mooney | my view on this is its the admins choice to grant project admin to a tenant | 15:42 |
gmann | IMO, if we restrict hypervisor list then we should do the same for boot instance also otherwise we are opening a loop hole in our API | 15:42 |
sean-k-mooney | so if they care about restircting the view they shoudl configure isolation | 15:42 |
sean-k-mooney | if they dont then they dont | 15:42 |
sean-k-mooney | gmann: we are restricting the hypervior list using the schduler metadta | 15:43 |
*** dklyle has joined #openstack-nova | 15:43 | |
sean-k-mooney | gmann: so if you have isolation configured you get it for free | 15:43 |
stephenfin | can we defer this to Xena? | 15:43 |
gmann | sean-k-mooney: and what we discussed now is restrict if no tenant isolcation too | 15:43 |
stephenfin | this is starting to feel like it warrants its own spec | 15:44 |
gmann | yeah may be we should. | 15:44 |
sean-k-mooney | yep so we should keep the consitent | 15:44 |
*** lpetrut has quit IRC | 15:44 | |
gmann | and let discuss in PTG on all cases | 15:44 |
stephenfin | given the security implications and discussion around API changes | 15:44 |
sean-k-mooney | either restic both or dont | 15:44 |
stephenfin | *changes to other APIs | 15:44 |
sean-k-mooney | when no isolation is configured for the tenant | 15:44 |
gmann | and we can discuss host info in GET server also for such project admin | 15:44 |
stephenfin | yeah, I'm in favour of the same behavior for both listing hypervisors and creating servers on a particular hypervisor | 15:45 |
gmann | yeah, same behavior is needed whatever we agreed to | 15:45 |
sean-k-mooney | gmann: getting hypervior_hostname is proably valid if you can do the hypervior list | 15:45 |
gmann | yeah, and i think we return that in PUT/REBUILD server API also which also we can discuss | 15:46 |
*** ociuhandu has quit IRC | 15:46 | |
*** ociuhandu has joined #openstack-nova | 15:46 | |
sean-k-mooney | it basicaly come down too this. show project admins be aware of the hypervior that there vms can run on | 15:46 |
gmann | so 1. we can update the current spec to remove the policy change 2create new spec for Xena and discuss in PTG ? | 15:46 |
sean-k-mooney | if yes then it should see it on all apis | 15:46 |
sean-k-mooney | where its relevent | 15:47 |
gmann | gibi: sean-k-mooney stephenfin ^^ hope it is ok to update spec at this stage as this is to remove the things we agreed to do | 15:48 |
sean-k-mooney | and then there is the quetion of if it should require tenatn isolation or not | 15:48 |
gmann | not on adding anything new | 15:48 |
sean-k-mooney | gmann: we can update specs at any time to reflect reality | 15:48 |
stephenfin | gmann: Yeah, I'll update the spec and add an item to the PTG to discuss this | 15:48 |
gmann | +1, thanks | 15:48 |
sean-k-mooney | gmann: this wont get moved into implemneted anyway | 15:48 |
gmann | sean-k-mooney: oh why? | 15:49 |
gibi | gmann: yeah, spec update and a new spec is OK to me | 15:49 |
*** ociuhandu has quit IRC | 15:49 | |
sean-k-mooney | gmann: well its not done right | 15:49 |
gmann | stephenfin:gibi sean-k-mooney i can create the new xena spec for PTG | 15:49 |
gibi | gmann: thanks | 15:49 |
*** ociuhandu has joined #openstack-nova | 15:49 | |
stephenfin | thanks | 15:49 |
sean-k-mooney | or is enough of the spec done to mark the wallby one as implemented? | 15:49 |
gibi | gmann: and If no discussion happens until the PTG on the spec then I suggest to raise this topic on the PTG too | 15:49 |
gmann | sean-k-mooney: actually that is some special case we want to handle in that spec otherwise original spec of 'standardize the hypervisor API' is done | 15:49 |
gmann | gibi: ack, make sense | 15:50 |
sean-k-mooney | oh its part of that | 15:50 |
sean-k-mooney | sorry ya | 15:50 |
gibi | sean-k-mooney: I put the os-hypervisor API moderinzation bp to implemented as I got from stephenfin that the policy part is just a nice to have | 15:50 |
sean-k-mooney | so that spec can move to implmetned and then xena one for the edgecase | 15:50 |
gmann | yeah | 15:50 |
sean-k-mooney | cool | 15:50 |
gibi | OK, we are in wild agreement | 15:51 |
*** jangutter_ has joined #openstack-nova | 15:51 | |
sean-k-mooney | it would appear so. its funny how that often looks liek an argument :) | 15:52 |
*** jangutter has quit IRC | 15:55 | |
*** macz_ has joined #openstack-nova | 15:57 | |
*** macz_ has quit IRC | 15:57 | |
*** macz_ has joined #openstack-nova | 15:58 | |
*** LinPeiWen has quit IRC | 16:00 | |
*** derekh has quit IRC | 16:00 | |
openstackgerrit | Stephen Finucane proposed openstack/nova-specs master: Remove policy changes from modernize-os-hypervisors-api spec https://review.opendev.org/c/openstack/nova-specs/+/779821 | 16:00 |
stephenfin | gmann, gibi: ^ | 16:00 |
stephenfin | also added to the PTG doc | 16:01 |
gibi | stephenfin: on it | 16:02 |
gibi | and thanks | 16:02 |
*** derekh has joined #openstack-nova | 16:03 | |
*** amodi has quit IRC | 16:05 | |
*** spatel_ has quit IRC | 16:08 | |
*** amodi has joined #openstack-nova | 16:08 | |
*** ociuhandu has quit IRC | 16:15 | |
gmann | stephenfin: thanks, lgtm | 16:18 |
*** spatel_ has joined #openstack-nova | 16:19 | |
*** gyee has joined #openstack-nova | 16:22 | |
*** ociuhandu has joined #openstack-nova | 16:24 | |
*** ociuhandu has quit IRC | 16:24 | |
*** ociuhandu has joined #openstack-nova | 16:25 | |
*** ociuhandu has quit IRC | 16:29 | |
*** jangutter has joined #openstack-nova | 16:35 | |
*** ociuhandu has joined #openstack-nova | 16:35 | |
*** ociuhandu has quit IRC | 16:37 | |
*** ociuhandu has joined #openstack-nova | 16:37 | |
*** takamatsu has quit IRC | 16:37 | |
openstackgerrit | Merged openstack/nova-specs master: Remove policy changes from modernize-os-hypervisors-api spec https://review.opendev.org/c/openstack/nova-specs/+/779821 | 16:38 |
*** jangutter_ has quit IRC | 16:39 | |
*** ociuhandu has quit IRC | 16:51 | |
dansmith | bauzas: I never saw any replies to my comments on the rpc bump patch | 16:55 |
dansmith | bauzas: did you not notice, or just think they were all stupid? :) | 16:56 |
bauzas | dansmith: I forgot to look at it, my bad. | 16:56 |
dansmith | okay | 16:56 |
bauzas | dansmith: but I quickly saw them | 16:56 |
bauzas | reviews ftw atm | 16:56 |
dansmith | okay none of them are critical obviously | 16:56 |
bauzas | dansmith: I can look at them now while you're on | 16:56 |
bauzas | (at least once my weekly meeting is done) | 16:57 |
dansmith | okay | 16:57 |
*** lucasagomes has quit IRC | 16:58 | |
*** mgariepy has quit IRC | 16:59 | |
*** ociuhandu has joined #openstack-nova | 17:00 | |
*** khomesh24 has quit IRC | 17:00 | |
*** derekh has quit IRC | 17:02 | |
*** derekh has joined #openstack-nova | 17:02 | |
*** rpittau is now known as rpittau|afk | 17:05 | |
*** mlavalle has joined #openstack-nova | 17:06 | |
stephenfin | gmann: Any chance you could take a look at https://review.opendev.org/c/openstack/nova/+/778550 ? | 17:09 |
stephenfin | It's pretty late in the day, but it's a fairly simple microversion | 17:09 |
gmann | stephenfin: sure, I will check, may be tomorrow I have to go out today after noon. | 17:12 |
stephenfin | sweet, thanks :) | 17:12 |
*** dtantsur is now known as dtantsur|afk | 17:15 | |
*** xinranwang has quit IRC | 17:18 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: api: Rename 'parameter_types.hostname' -> 'fqdn' https://review.opendev.org/c/openstack/nova/+/778549 | 17:22 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: api: Add support for 'hostname' parameter https://review.opendev.org/c/openstack/nova/+/778550 | 17:22 |
*** jangutter_ has joined #openstack-nova | 17:26 | |
stephenfin | sean-k-mooney: okay, got to the end of the series. I see what you mean about patch ordering | 17:28 |
stephenfin | are you working on that series rn? | 17:29 |
sean-k-mooney | not yet but soon | 17:30 |
sean-k-mooney | im going to move the neutorn port one now | 17:30 |
sean-k-mooney | on top of artom change | 17:30 |
sean-k-mooney | then go back to vdpa | 17:30 |
*** jangutter has quit IRC | 17:31 | |
* gibi leaves for today | 17:31 | |
sean-k-mooney | gibi: o/ enjoy your evening | 17:31 |
stephenfin | cool, mind if I take a quick swing at it? | 17:32 |
stephenfin | I think I know what you're going for | 17:32 |
sean-k-mooney | if you want go for it | 17:32 |
sean-k-mooney | i tried to break it out at first to be logicly split but then i found later when i got the real hardware i need to fix some thing and some of those fixed end in the last patch | 17:33 |
sean-k-mooney | i would like to bring those fix to the correct patch | 17:33 |
*** tesseract has quit IRC | 17:34 | |
*** sapd1 has quit IRC | 17:34 | |
bauzas | dansmith: okay, I looked at all your comments and thanks for them | 17:38 |
*** takamatsu has joined #openstack-nova | 17:38 | |
bauzas | dansmith: you made a great point, which is I made extra work for free when trying to continue to support 5.0 | 17:38 |
bauzas | instead of 5.12 | 17:38 |
dansmith | bauzas: well, not "for free", it clearly cost you something :D | 17:38 |
bauzas | dansmith: I probably misunderstood that we were pinning to the least supported version from all computes, which is *not* 5.0 | 17:39 |
dansmith | (but I know what you mean) | 17:39 |
dansmith | bauzas: well, I think you aren't able to faithfully support 5.0 anyway, was my point | 17:39 |
bauzas | yeah | 17:39 |
bauzas | the point is, we're entering the RC period and touching my change for removing stuff could require me some further work | 17:40 |
bauzas | so maybe let's just pretend we can support 5.0, which is impossible anyway | 17:40 |
dansmith | yes, although this is pretty much the time to be doing that | 17:41 |
dansmith | but as you wish | 17:41 |
bauzas | I'll first reply to the other comments and do a quick respin | 17:41 |
dansmith | ack | 17:41 |
bauzas | and if we have time, I can look at removing the unnecessary bits | 17:41 |
bauzas | which should simplify my change | 17:42 |
bauzas | but here, baby steps | 17:42 |
bauzas | to secure the RPC bump anyway | 17:42 |
dansmith | sounds good | 17:42 |
sean-k-mooney | dansmith: since your here and talking about this topic | 17:43 |
sean-k-mooney | dansmith: when can we do ovo major version bumps | 17:44 |
dansmith | you guys talking about me being here now makes me sad :/ | 17:44 |
sean-k-mooney | do they have to align with RPC bumps or not | 17:44 |
dansmith | they do not | 17:44 |
sean-k-mooney | well it was actully a time zone reference not avaiablity | 17:44 |
dansmith | I did an instance bump long ago, let me show you my scars... | 17:44 |
*** mgariepy has joined #openstack-nova | 17:45 | |
dansmith | but if you find that it should be a good map of how to do it for the worst case, and the newer backport manifest stuff likely makes it easier nowadays | 17:45 |
sean-k-mooney | ok that is fine so we have a few comment for some other objects like the compute_node object to do X i 2.0 | 17:45 |
bauzas | I remember the instance major bump pain | 17:45 |
dansmith | instance is like the worst case though, compute node is probably easier since we don't pass it around everywhere | 17:46 |
sean-k-mooney | ya i was more wondering what the rules were around it | 17:46 |
bauzas | good point, but still something difficult | 17:46 |
bauzas | sean-k-mooney: the idea is, you have to understand which services are using the object | 17:46 |
bauzas | and which version of it | 17:46 |
sean-k-mooney | yep we can check what the max version is supproted on the dest right | 17:47 |
sean-k-mooney | and then we backlevel before sending | 17:47 |
bauzas | it's not just the compute | 17:47 |
dansmith | sean-k-mooney: the rules are similar to the rpc interface, but you have to be able to support both version of the object across the boundary.. compute has to support the 1.x object in case another compute sends it (although likely not the case for compute_node), | 17:47 |
bauzas | or the object being passed over the wire | 17:47 |
dansmith | and of course conductor has to be able to support both | 17:47 |
bauzas | and other services also have to understand the new minimum version | 17:48 |
sean-k-mooney | yep that is more or less what i was assuming regarding supproting both | 17:48 |
bauzas | if they get the object | 17:48 |
bauzas | I mean, if they rehydrate it | 17:48 |
dansmith | yup | 17:48 |
bauzas | sean-k-mooney: which ovo object are you considering to bump ? | 17:48 |
sean-k-mooney | its somewhat bounded now that we have the check for min compute service version now | 17:49 |
bauzas | computenode is also used by a lot of services, hence my concern | 17:49 |
sean-k-mooney | e.g. that it now enforce n+1 max delta | 17:49 |
sean-k-mooney | bauzas: well im not nessisaly suggesting we do that one next cycle | 17:49 |
sean-k-mooney | just trying to figure out when we are allowed too | 17:49 |
bauzas | yup, gotcha | 17:50 |
bauzas | should we now discuss about raising the API minimums, just for the fun ? | 17:50 |
bauzas | REST* APIs | 17:50 |
bauzas | :) | 17:50 |
sean-k-mooney | well we have a todo to stop declaring the numa_toplogy object as a stign adn declar it as an ovo filed | 17:50 |
dansmith | bauzas: it is, but not passed around between them as much right? | 17:50 |
sean-k-mooney | form like juno | 17:51 |
sean-k-mooney | bauzas: so remove in 2.0 came up when i review artoms patchs | 17:51 |
sean-k-mooney | which is why it was on my mind since we addded anouther one of those comments | 17:51 |
bauzas | dansmith: trying to think about it | 17:52 |
bauzas | dansmith: as a nested object, can't promise it isn't used | 17:52 |
sean-k-mooney | give that todo is 5 years old https://github.com/openstack/nova/blob/ab07507e5cfce6232fef373d07ff92ea704541da/nova/objects/compute_node.py#L84-L86 | 17:52 |
dansmith | what other object includes computenode in it? | 17:52 |
bauzas | good question | 17:52 |
dansmith | bauzas: you said "as a nested object" ... | 17:53 |
bauzas | dansmith: the Service object at least, I remember writing the nesting | 17:53 |
dansmith | okay but I don't think that gets passed around either | 17:54 |
bauzas | hah, and the RequestSpec one | 17:54 |
dansmith | reqspec has computenode in it? | 17:54 |
bauzas | the SchedulerRetries one | 17:54 |
bauzas | it has a list of cn objects in it | 17:54 |
sean-k-mooney | do we use that any more | 17:55 |
bauzas | which itself is nested in the spec object | 17:55 |
sean-k-mooney | i tought we dont do retreis since placment | 17:55 |
sean-k-mooney | we use alternate hosts | 17:55 |
dansmith | that seems weird because those are from different DBs, but maybe we just jam them in there to pass around for the retries? | 17:55 |
sean-k-mooney | dansmith: we should check if its still needed i think that might have been for when we used the retryfilter | 17:56 |
bauzas | sean-k-mooney: good point, it's no longer used | 17:56 |
dansmith | yeah | 17:56 |
dansmith | well, I certainly hadn't remembered that we do that, but changing it would mean more bumps, so.. | 17:56 |
dansmith | however, | 17:56 |
dansmith | I'm not sure it's really likely to be a problem | 17:56 |
dansmith | it's sourced from the control services, so if they put 1.x in there until everyone is upgraded, | 17:57 |
dansmith | it should be fine | 17:57 |
sean-k-mooney | ya again just raised this because i was trying to under stand when we can adress the Do X in next major version comments that are in some objects | 17:57 |
sean-k-mooney | many of which have been there for years at this point | 17:58 |
sean-k-mooney | https://github.com/openstack/nova/commit/1337890ace918fa2555046c01c8624be014ce2d8 was the instance one so i guess reviewing that is a good start. | 18:01 |
sean-k-mooney | oh ya i rembere the obj_relationships stuff | 18:02 |
sean-k-mooney | oh i see we had a _BaseInstance and InstanceV1 and InstanceV2 class for a cycle kind of like the proxy for rpc | 18:05 |
sean-k-mooney | then that patch actully drop V1 after we were shoudl everytin supported v2 | 18:06 |
sean-k-mooney | i vaguely remeber this happening https://github.com/openstack/nova/commit/713d8cb0777afb9fe4f665b9a40cac894b04aacb added 2 objects. | 18:07 |
*** takamatsu has quit IRC | 18:10 | |
*** andrewbonney has quit IRC | 18:19 | |
*** tbarron has joined #openstack-nova | 18:37 | |
*** ralonsoh has quit IRC | 18:41 | |
*** nightmare_unreal has quit IRC | 18:44 | |
*** zul has quit IRC | 18:54 | |
*** mgariepy has quit IRC | 19:03 | |
*** mgariepy has joined #openstack-nova | 19:06 | |
*** ociuhandu has quit IRC | 19:12 | |
*** ociuhandu has joined #openstack-nova | 19:12 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: add constants for vnic type vdpa https://review.opendev.org/c/openstack/nova/+/770474 | 19:18 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: objects: Add 'VDPA' to 'PciDeviceType' https://review.opendev.org/c/openstack/nova/+/777481 | 19:18 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Add vDPA nodedev parsing https://review.opendev.org/c/openstack/nova/+/770533 | 19:18 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Add guest generation for vDPA https://review.opendev.org/c/openstack/nova/+/770532 | 19:18 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: pci: Add vDPA vnic to PCI request mapping and filtering https://review.opendev.org/c/openstack/nova/+/778350 | 19:18 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: add hw:mlock extra spec https://review.opendev.org/c/openstack/nova/+/778347 | 19:18 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Move PCI host device parsing to 'host' https://review.opendev.org/c/openstack/nova/+/779851 | 19:18 |
stephenfin | sean-k-mooney: there you go | 19:18 |
*** ociuhandu has quit IRC | 19:19 | |
*** ociuhandu has joined #openstack-nova | 19:20 | |
sean-k-mooney | thanks. ill take a look and test it out shortly. | 19:22 |
*** ociuhandu has quit IRC | 19:22 | |
*** ociuhandu has joined #openstack-nova | 19:23 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: objects: Add 'VDPA' to 'PciDeviceType' https://review.opendev.org/c/openstack/nova/+/777481 | 19:31 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Move PCI host device parsing to 'host' https://review.opendev.org/c/openstack/nova/+/779851 | 19:31 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Add vDPA nodedev parsing https://review.opendev.org/c/openstack/nova/+/770533 | 19:31 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Add guest generation for vDPA https://review.opendev.org/c/openstack/nova/+/770532 | 19:31 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: pci: Add vDPA vnic to PCI request mapping and filtering https://review.opendev.org/c/openstack/nova/+/778350 | 19:31 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: add hw:mlock extra spec https://review.opendev.org/c/openstack/nova/+/778347 | 19:31 |
*** brinzhang0 has joined #openstack-nova | 19:33 | |
*** ociuhandu_ has joined #openstack-nova | 19:34 | |
*** dpawlik6 has joined #openstack-nova | 19:34 | |
*** macz_ has quit IRC | 19:35 | |
*** macz_ has joined #openstack-nova | 19:36 | |
*** mugsie__ has joined #openstack-nova | 19:37 | |
*** tosky_ has joined #openstack-nova | 19:37 | |
*** benj_- has joined #openstack-nova | 19:37 | |
*** dosaboy_ has joined #openstack-nova | 19:38 | |
*** whoami-rajat has quit IRC | 19:40 | |
*** artom has quit IRC | 19:42 | |
*** ociuhandu has quit IRC | 19:42 | |
*** brinzhang_ has quit IRC | 19:42 | |
*** tosky has quit IRC | 19:42 | |
*** mjturek has quit IRC | 19:42 | |
*** dpawlik has quit IRC | 19:42 | |
*** benj_ has quit IRC | 19:42 | |
*** dosaboy has quit IRC | 19:42 | |
*** mugsie has quit IRC | 19:42 | |
*** Underknowledge has quit IRC | 19:42 | |
*** dpawlik6 is now known as dpawlik | 19:42 | |
*** Underknowledge has joined #openstack-nova | 19:43 | |
*** irclogbot_0 has quit IRC | 19:44 | |
*** jamesdenton has quit IRC | 19:45 | |
*** mugsie__ is now known as mugsie | 19:47 | |
*** jamesdenton has joined #openstack-nova | 19:48 | |
*** belmoreira has quit IRC | 19:52 | |
*** belmoreira has joined #openstack-nova | 19:58 | |
*** artom has joined #openstack-nova | 20:01 | |
openstackgerrit | Merged openstack/nova master: libvirt: Stop passing around virt_type, caps https://review.opendev.org/c/openstack/nova/+/775689 | 20:12 |
openstackgerrit | Merged openstack/nova master: libvirt: Add missing type hints https://review.opendev.org/c/openstack/nova/+/775688 | 20:13 |
*** tosky_ is now known as tosky | 20:16 | |
openstackgerrit | sean mooney proposed openstack/nova master: Support per port numa policies with SR-IOV https://review.opendev.org/c/openstack/nova/+/773792 | 20:23 |
sean-k-mooney | artom: ^ im having troble getting the socket test i addded here working https://review.opendev.org/c/openstack/nova/+/773792/10/nova/tests/functional/libvirt/test_pci_sriov_servers.py#1254 | 20:24 |
*** takamatsu has joined #openstack-nova | 20:25 | |
artom | sean-k-mooney, I wonder if it's because I set the 'socket' attribute on the pools in my filter method | 20:31 |
artom | And the pools that get passed to remove_device() don't have it | 20:31 |
artom | So it fails the equality test | 20:31 |
sean-k-mooney | its there in the host state object {"physnets": [], "tunneled": false}, "nova_object.changes": ["physnets", "tunneled"]}, "socket": 1}, | 20:32 |
sean-k-mooney | actully ill copy this to pastbin that not waht i wanted | 20:32 |
sean-k-mooney | http://paste.openstack.org/show/803443/ | 20:34 |
artom | Hrmm, yeah, if I add: | 20:34 |
artom | del pool_keys['devices'] | 20:34 |
artom | if 'socket' in pool_keys: | 20:34 |
artom | del pool_keys['socket'] | 20:34 |
artom | In _find_pool() | 20:34 |
artom | It makes the tests pass | 20:34 |
artom | OK, I have to run | 20:34 |
artom | Hopefully that helped a little bit, talk in ~1 hour | 20:34 |
sean-k-mooney | ok ya that might be the issue | 20:35 |
sean-k-mooney | but if it is your current patch might break neutron sriov ports? ill do some testing locally and try and see if its just a functest issue or a real one | 20:36 |
sean-k-mooney | the other polices do work however so i dont know it might just be an edgecase i need to fix to make both work correctly | 20:37 |
artom | I suspect it's a logic error in my patches | 20:38 |
artom | Though I do wonder how come my own functional tests passed...? | 20:38 |
* artom -> really run | 20:38 | |
*** tosky has quit IRC | 20:49 | |
*** tosky has joined #openstack-nova | 20:49 | |
*** derekh has quit IRC | 20:52 | |
*** k_mouza has quit IRC | 20:57 | |
*** ociuhandu_ has quit IRC | 20:59 | |
*** dklyle has quit IRC | 21:10 | |
*** ociuhandu has joined #openstack-nova | 21:12 | |
*** mtreinish has joined #openstack-nova | 21:15 | |
sean-k-mooney | artom: i get the error on real hardware too. | 21:15 |
*** martinkennelly has quit IRC | 21:20 | |
*** belmoreira has quit IRC | 21:20 | |
artom | sean-k-mooney, I wonder if if has something to do with VFs and PFs? | 21:31 |
*** flaviof has quit IRC | 21:31 | |
*** flaviof has joined #openstack-nova | 21:31 | |
sean-k-mooney | im not sure | 21:34 |
sean-k-mooney | did you test your code on real hardware | 21:34 |
sean-k-mooney | hum if i do some hacking in /sys to put all my nics on numa 0 i can boot again | 21:35 |
sean-k-mooney | basiclly echo 0 | sudo tee /sys/class/net/eth*/device/numa_node | 21:36 |
sean-k-mooney | then stop nova compute and libvirt fully and start them again | 21:36 |
artom | I didn't | 21:36 |
*** vdrok has quit IRC | 21:37 | |
sean-k-mooney | when that was -1 it was failing i hadd to add pool_keys.pop('socket', None) | 21:37 |
sean-k-mooney | to find_pool | 21:37 |
*** vdrok has joined #openstack-nova | 21:37 | |
sean-k-mooney | ill try removing that again | 21:37 |
sean-k-mooney | i can test it with pci alis too i guess i just need to spend some time to do it | 21:39 |
*** NobodyCam has quit IRC | 21:39 | |
*** NobodyCam has joined #openstack-nova | 21:40 | |
artom | That's the thing, I wasn't touching /sys, or indeed the PCI devices | 21:40 |
artom | Besides calculating the socket and saving it in pool['socket'] | 21:41 |
sean-k-mooney | ya well this is what is printign for the final resouce view | 21:41 |
sean-k-mooney | PciDevicePool(count=1,numa_node=0,product_id='10c9',tags={dev_type='type-PF',physical_network='public'},vendor_id='8086'), PciDevicePool(count=7,numa_node=0,product_id='10ca',tags={dev_type='type-VF',parent_ifname='eth1',physical_network='public'},vendor_id='8086')] | 21:41 |
sean-k-mooney | so the socket is not being added there incorerctly | 21:41 |
*** dklyle has joined #openstack-nova | 21:42 | |
*** ociuhandu has quit IRC | 21:42 | |
sean-k-mooney | a pic dev look more or less normal too | 21:43 |
sean-k-mooney | {"dev_id": "pci_0000_01_11_3", "address": "0000:01:11.3", "product_id": "10ca", "vendor_id": "8086", "numa_node": 0, "label": "label_8086_10ca", "dev_type": "type-VF", "parent_addr": "0000:01:00.1", "parent_ifname": "eth1", "capabilities": {"network": ["rx", "tx", "sg", "tso", "gso", "gro", "rxvlan", "txvlan", "txudptnl"]}} | 21:43 |
sean-k-mooney | strange it seam to be workign without the pop(socket ,none) | 21:45 |
artom | sean-k-mooney, hrmm, I really think it's the parent PF issue | 21:45 |
artom | It doesn't have the socket | 21:45 |
*** TheJulia has quit IRC | 21:45 | |
sean-k-mooney | ya maybe althou it shoudl | 21:45 |
*** TheJulia has joined #openstack-nova | 21:45 | |
sean-k-mooney | well in the same way the VF will | 21:45 |
artom | The logging I added to remove_device() and _find_pool() logs what I expect with your tests | 21:45 |
artom | But not with my functional tests | 21:46 |
*** johnsom has quit IRC | 21:46 | |
artom | Meaning - it's never called, because in my func tests it only tests PFs (or "standard" PCI) | 21:46 |
artom | (Not sure which) | 21:46 |
*** johnsom has joined #openstack-nova | 21:46 | |
artom | By testing with VFs and their parent PFs, you've uncovered an issue | 21:46 |
sean-k-mooney | standard is not a PF | 21:46 |
sean-k-mooney | standard is a device that does not supprot sriov | 21:47 |
artom | But it has no parent, is my point | 21:47 |
sean-k-mooney | a PF is type-PF | 21:47 |
sean-k-mooney | standard is type-PCI | 21:47 |
artom | The point is not the type, it's the parent device (or lack thereof) | 21:47 |
sean-k-mooney | correct you do not have to mark the parent as unavaible | 21:48 |
sean-k-mooney | as we do with VFs | 21:48 |
sean-k-mooney | or the child as unaviable with PFs | 21:48 |
sean-k-mooney | so you are not calling _handle_device_dependents | 21:49 |
sean-k-mooney | in your tests | 21:49 |
artom | Nope :) | 21:49 |
artom | Not that it as intentional | 21:49 |
sean-k-mooney | well you might be but its a noop for you | 21:49 |
artom | Or that I was aware it existed | 21:50 |
sean-k-mooney | you are calling it | 21:50 |
sean-k-mooney | but i does nothing for STANDARD/type-PCI | 21:50 |
artom | Yeah, if I do for pool in pools: | 21:51 |
artom | if 'socket' in pool: | 21:51 |
artom | del pool['socket'] | 21:51 |
artom | At the end of my new filter | 21:52 |
artom | It fixes your func tests | 21:52 |
artom | I guess you've made your point about being purely functional and not having side effects ;) | 21:52 |
sean-k-mooney | making it it a pure fucntion may or may not have caught it depending on if we wrote the test right | 21:53 |
artom | I think the "best" solution would be to not set pool['socket'] altogether, and come at this from another angle: figure out which NUMA nodes are allowed, and filter on those | 21:54 |
artom | No side effects | 21:54 |
sean-k-mooney | well you should not be setting pool sockets | 21:55 |
sean-k-mooney | and you dont need too if you did what was in my last comment | 21:55 |
*** slaweq has quit IRC | 21:55 | |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/772779/17#message-744974dfe27df3782bd2e6c066aa1b241fc7136a | 21:56 |
sean-k-mooney | the body of your filter could be basically this right | 21:57 |
sean-k-mooney | vm_sockets = [mappings[cell.id] for cell in numa_cells] | 21:57 |
sean-k-mooney | pools_to_remove = [] | 21:57 |
sean-k-mooney | for pool in pools: | 21:57 |
sean-k-mooney | pool_socket = mappings.get(pool['numa_node']) | 21:57 |
sean-k-mooney | if not (pci_socket and pool_socket in vm_sockets): | 21:57 |
sean-k-mooney | pools_to_remove += pool | 21:57 |
sean-k-mooney | return [pool for pool in pools if pool not in pools_to_remove] | 21:57 |
artom | There are some unbound variables in there, so I can't quite grok it | 21:57 |
artom | But I can come up with something equivalent | 21:57 |
*** masayukig has quit IRC | 21:58 | |
sean-k-mooney | where mappings is basically {numa_node:[socket_id,socket_id] | 21:58 |
*** masayukig has joined #openstack-nova | 21:58 | |
sean-k-mooney | artom: mappings is the only one right | 21:58 |
artom | pci_socket as well | 21:58 |
sean-k-mooney | oh that is pool_socket | 21:59 |
sean-k-mooney | i forcot to rename it | 21:59 |
sean-k-mooney | artom: although maybe that shoudl be | 22:00 |
sean-k-mooney | if not pool_socket or pool_socket in vm_sockets: | 22:00 |
sean-k-mooney | for socket policy you dont allow device with no numa node correct | 22:01 |
sean-k-mooney | so that should be an or not an and | 22:01 |
*** spatel_ has quit IRC | 22:04 | |
artom | sean-k-mooney, well, I did a thing, and it made all tests pass. | 22:06 |
artom | It's brute force and dumb, but "easy" to understand | 22:06 |
artom | Shall I push? | 22:06 |
sean-k-mooney | artom: so i can "fix" this in my patch by adding pool_keys.pop('socket', None) to _find_pool | 22:06 |
artom | It'll kick my last patch out of the gate, and we'll have to get +2s and +As again tomorrow | 22:07 |
sean-k-mooney | i dont know its technically broken but you have a followup or i can fix it in my patch | 22:07 |
sean-k-mooney | we proably shoudl kick it out of the gate as much as i hate that | 22:08 |
artom | Yeah, it's the correct thing to do | 22:08 |
artom | As it, it's broken with any parent-having device | 22:08 |
sean-k-mooney | ya | 22:08 |
*** vishalmanchanda has quit IRC | 22:08 | |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: pci: implement the 'socket' NUMA affinity policy https://review.opendev.org/c/openstack/nova/+/772779 | 22:09 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Support per port numa policies with SR-IOV https://review.opendev.org/c/openstack/nova/+/773792 | 22:09 |
sean-k-mooney | so the minimal fix is just add one line pool_keys.pop('socket', None) to _find_pool | 22:09 |
*** ociuhandu has joined #openstack-nova | 22:10 | |
sean-k-mooney | ah i see what you did | 22:10 |
sean-k-mooney | are you still ok with rewriting it again in the followup | 22:11 |
sean-k-mooney | that should work as is though | 22:11 |
sean-k-mooney | ill test it now | 22:11 |
artom | sean-k-mooney, yeah, we can improve in a follow up | 22:12 |
artom | For now, do the simple stupid thing that's easy to understand and review, and works ;) | 22:12 |
sean-k-mooney | ok i have the unit and funct test running for those locally now. im going to go get somethign to eat and when i get back ill test boot real vms and ill comment on the review | 22:15 |
sean-k-mooney | then ill rebase the vdpa change on top of both patches and test those | 22:16 |
*** zzzeek has quit IRC | 22:16 | |
artom | That's... optimistic ;) | 22:16 |
artom | Though I suppose if that was the only outstanding issue with the port NUMA policies, might as well aim for the moon ;) | 22:16 |
sean-k-mooney | i have some nits form stephen it looks like i mised but ya i think its the only issue | 22:17 |
*** zzzeek has joined #openstack-nova | 22:18 | |
sean-k-mooney | these https://review.opendev.org/c/openstack/nova/+/773792/11#message-8d43910738d79b4f530fd6b4176074ad6f53c543 | 22:18 |
*** ociuhandu has quit IRC | 22:18 | |
openstackgerrit | Merged openstack/os-resource-classes master: Fix hacking min version to 3.0.1 https://review.opendev.org/c/openstack/os-resource-classes/+/727557 | 22:20 |
*** rcernin has joined #openstack-nova | 22:27 | |
*** k_mouza has joined #openstack-nova | 22:42 | |
*** k_mouza has quit IRC | 22:43 | |
*** k_mouza has joined #openstack-nova | 22:43 | |
*** k_mouza has quit IRC | 22:47 | |
*** k_mouza has joined #openstack-nova | 22:53 | |
*** k_mouza has quit IRC | 22:57 | |
dansmith | melwitt: easy stats: https://review.opendev.org/c/openstack/nova/+/779815 | 22:58 |
dansmith | converting the single-line glance policy I write from old json to new yaml, so I can write to it easier from a child job to change other stuff | 22:59 |
melwitt | +2 | 23:01 |
dansmith | m'thanks | 23:01 |
*** k_mouza has joined #openstack-nova | 23:28 | |
*** rpittau|afk has quit IRC | 23:37 | |
*** rpittau|afk has joined #openstack-nova | 23:37 | |
*** aarents has quit IRC | 23:38 | |
openstackgerrit | sean mooney proposed openstack/nova master: Support per port numa policies with SR-IOV https://review.opendev.org/c/openstack/nova/+/773792 | 23:38 |
*** k_mouza has quit IRC | 23:55 | |
*** k_mouza has joined #openstack-nova | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!