sean-k-mooney | hyang[m]: its merged its not relaease yet | 00:59 |
---|---|---|
opendevreview | melanie witt proposed openstack/nova master: Poison usage of eventlet spawn_n() in tests https://review.opendev.org/c/openstack/nova/+/818042 | 03:52 |
*** abhishekk is now known as akekane|home | 05:12 | |
*** akekane|home is now known as abhishekk | 05:12 | |
EugenMayer | It seems like the first boot of an instance (with cloud-init) has diffenret results in network then all the other after that. Is that desired? | 07:10 |
EugenMayer | am I right that using --user-data has 2 flavours, using it with --config-drive true will expect to have meta-data alike content there, without using config-drive yaml based cloud-init files are expected. So the same --user-data for 2 different subsystems on init? | 07:47 |
EugenMayer | (reading https://docs.openstack.org/nova/queens/user/config-drive.html) | 07:48 |
gibi_ | o/ morning nova | 08:03 |
*** gibi_ is now known as gibi | 08:03 | |
bauzas | good spec review day, Nova | 08:40 |
* gibi already on it | 08:45 | |
opendevreview | Merged openstack/nova-specs master: Repropose flavour and image defined ephemeral storage encryption https://review.opendev.org/c/openstack/nova-specs/+/810867 | 09:13 |
jengbers | Morning, I have been searching OpenStack history, but I haven't been able to find why it is not possible to add existing instances to server groups. Does anyone know if that is just never discussed or if there is a fundamental technical problem? | 09:33 |
jengbers | We have been adding servers to server groups by changing the database and the scheduler has always handled this well. | 09:33 |
gibi | jengbers: the problem is consistency. When you add a server to a group then you can create a situation when the group membership and the actaul placement of the instance contradicts | 09:38 |
gibi | and the question is what to do then | 09:38 |
gibi | a) move the instance to restore consistency | 09:39 |
gibi | b) reject the addition of the instance to the group | 09:39 |
gibi | c) allow temporary inconsitency and let the next move operation on the instance fix the group | 09:40 |
gibi | I think we never agreed which direction to take | 09:40 |
sean-k-mooney | there was a propsoal to extend it recently to allow this but the suggestion then was to have teh add do migration which i dont thinkis the right approch | 09:43 |
sean-k-mooney | its particaly a probelm for the affinity policy since that is more likely to break then the anti affinity policy | 09:43 |
kashyap | bauzas: I might not be able to make the meeting today; I see it's at 17u CET :-( | 09:48 |
kashyap | Morning, BTW | 09:48 |
bauzas | kashyap: ah ok, no worries then | 09:48 |
bauzas | kashyap: just add notes to your specless ask in the agenda, so we can discuss there | 09:49 |
kashyap | bauzas: Yep; doing that now | 09:50 |
kashyap | bauzas: If there are any questions, you can ask me here, I'll answer when I'm back later in the evening. I'm away from 17-18u CET | 09:50 |
jengbers | I guess option b) would be the least surprising. | 09:52 |
gibi | jengbers: with option b) the problem is that then the user first somehow move the instance to the proper place (but the user has no tool for it) then add it to the group | 09:59 |
opendevreview | Merged openstack/nova-specs master: Store and allow libvirt instance device buses and models to be updated https://review.opendev.org/c/openstack/nova-specs/+/810235 | 10:00 |
sean-k-mooney | gibi: if you have the correct wiegher enabel i belvie you can ask for an isntnace to be created on the same hos tor a different host to a specific instance via a scheduler hint | 10:01 |
sean-k-mooney | its certenly requires a lot of knoladge of nova and the instance to do correctly | 10:01 |
gibi | sean-k-mooney: if you use SameHost / DifferentHost filters then you don't need server groups | 10:02 |
sean-k-mooney | you cant a s a normal use cold migrate to the same host or simialrly aline them all | 10:02 |
sean-k-mooney | gibi: ya that is also kind of true | 10:02 |
gibi | they implement similar logic but a very different way :) | 10:02 |
sean-k-mooney | yep | 10:03 |
sean-k-mooney | i do wish we had ways to make server groups more useful | 10:03 |
sean-k-mooney | but its kind fo hard to extend them for the reason above | 10:03 |
gibi | sean-k-mooney: for that we need to solve jengbers' problem and also extend the logic to support multiple groups per instance (or nested groups | 10:03 |
gibi | ) | 10:04 |
gibi | both is painfully missing but hard to solve | 10:04 |
gibi | bauzas: I'm done with the spec sweep. I could not really comment on the ironic one https://review.opendev.org/c/openstack/nova-specs/+/815789 and it seems nobody commented yet | 10:05 |
gibi | the rest of the specs has feedback | 10:05 |
sean-k-mooney | ya. on second tought we should have "server aggreates" in parallel to server groups. you knwo so we can aggreate servers and give that aggreate of servers a name and even have some metadta that can be shared like this server it the primary of the aggreate and just not have it related to vm placement at all | 10:05 |
bauzas | gibi: I still have 3 specs to look at | 10:05 |
sean-k-mooney | that way we can pretend server groups dont exist :) | 10:05 |
bauzas | gibi: but OK, and thanks for the fish | 10:05 |
gibi | sean-k-mooney: :D | 10:06 |
jengbers | gibi, sean-k-mooney: If it was only possible for admins, that could work, because they can also migrate servers, but for users it seems quite hard. | 10:14 |
gibi | jengbers: yeah that could work. Feel free to propose a spec about the new API to get wider discussion around it | 10:15 |
jengbers | On the other hand, they can power of and start an instance. I guess that would mean it is started on a different hypervisor. | 10:16 |
sean-k-mooney | really there are 2 paths we could take. 1 allow normall user ask nova to cold/live migrate an instance to be consitent with a server group that it is currently not a member of and then allow them to add the server to the group after, rejecting the request if the policy is volated | 10:20 |
sean-k-mooney | or 2 we can have the server group add triger the migration as part fo the request | 10:21 |
kashyap | Is this failing for anyone else too? | 10:24 |
kashyap | tempest.api.compute.admin.test_live_migration.LiveAutoBlockMigrationV225Test.test_live_migration_with_trunk [108.238299s] ... FAILED | 10:24 |
gibi | kashyap: could you link the test run? | 10:25 |
kashyap | gibi: https://zuul.opendev.org/t/openstack/build/632f8ed30e9a4a04a32648843f227ef3 | 10:25 |
jkulik | we've downstream extended the server-groups api to allow adding servers after the fact. we opted for not allowing to add servers if this would be against the server-group's rules | 10:25 |
gibi | looking | 10:25 |
gibi | kashyap: I think you got hit by https://bugs.launchpad.net/neutron/+bug/1940425 | 10:27 |
jkulik | it helps customers if they already spawned an instance and forgot the server-group and now want to spawn another instance in some affinity to the existing one | 10:28 |
gibi | the stack trace is the same | 10:28 |
gibi | jkulik, jengbers: so both of you would like the same behavior, you should team up proposing this upstream :) | 10:28 |
kashyap | gibi: Oh, thank you | 10:28 |
kashyap | gibi: Now what? ... Should I do a "recheck 1940425"? | 10:29 |
kashyap | Or pray to the ju-ju at the bottom of the sea? Or... | 10:29 |
gibi | kashyap: yepp, recheck bug 1940425 | 10:29 |
jkulik | https://github.com/sapcc/nova/commit/7220be3968ee1dd257c9add88228cc5bb9857795 is the main commit downstream for us | 10:29 |
jkulik | gibi: yes, we talked internally already about proposing this upstream, but small team, much work :/ | 10:30 |
gibi | kashyap: I added your run to the bug maybe that way we can get attention to the failure as it is still happening | 10:30 |
kashyap | gibi: Thx for the quick spot | 10:30 |
gibi | jkulik: no pressure, I know that type of frustration | 10:31 |
bauzas | jkulik: we tried discussing this in the past upstream, but operators are very afraid by the races conditions it creates | 10:35 |
bauzas | jkulik: problem is, in a distributed service model like Nova, you can't get a valid answer whether you can do it, as when you validate, you don't ask the nova-compute service | 10:36 |
jkulik | bauzas: that reminds me ... we wanted to change the DB to disallow having a server in multiple server-groups to help with races. we haven't done that, yet. thanks :D | 10:37 |
bauzas | in theory we should hold new instance creations per compute once you ask for adding a new instance to the group | 10:38 |
jkulik | our problem is a little different, still, as we use VMware and not libvirt. thus, we have a lot of hidden hypervisors as nova only sees the cluster. therefore, hard anti-affinity doesn't really matter for us that much | 10:40 |
jkulik | customers want to make sure they run on different hypervisors and thus we sync the server-groups to the VMware clusters. VMware then migrates VMs around to make sure the rules apply. | 10:40 |
jkulik | i.e. most of our customers depend on soft-anti-affinity, which is a Weigher in nova-scheduler anyways | 10:42 |
kashyap | bauzas: Alright, added it to the Open Discussion here: https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 11:43 |
opendevreview | Rajat Dhasmana proposed openstack/nova-specs master: Add spec for volume backed server rebuild https://review.opendev.org/c/openstack/nova-specs/+/809621 | 11:44 |
dmitriis | gibi: tyvm for the feedback | 12:10 |
gibi | dmitriis: you are welcome. it is a well written spec, thanks for putting in the effort | 12:10 |
sean-k-mooney | i dont know if i set send on my respocne to the last verions of it | 12:22 |
sean-k-mooney | gibi you are correct the pci passhtoug filter will be suffiect without the prefilter | 12:23 |
sean-k-mooney | but the prefiletr can help reduce the set if we only report the new trait on host that have off path devices | 12:23 |
sean-k-mooney | and the capablity to use them of course | 12:23 |
sean-k-mooney | ill re review that specs later today | 12:24 |
gibi | sean-k-mooney: yes, exactly my argument, the prefilter is not mandatory but it is good to have | 12:26 |
*** mdbooth1 is now known as mdbooth | 12:53 | |
dmitriis | sean-k-mooney: ack, ty for confirming | 13:00 |
sean-k-mooney | i dont currently have access to hardware to test what you have done but i may have access before the end of the cycle. if i do i might reach out to you and try and test it end to end although i dont know if i will have time to do that or not | 13:02 |
elodilles | bauzas: i'll update now the meeting wiki #stable section if that is not interfering with you right now | 13:24 |
dmitriis | sean-k-mooney: btw, fnordahl and I have done end-to-end testing of this in a lab. Here's a PPA https://launchpad.net/~fnordahl/+archive/ubuntu/smartnic-enablement that was used in the process (the WIP reviews are in use there). It has https://listman.redhat.com/archives/libvir-list/2021-November/msg00431.html included as well - I am trying to get | 13:26 |
dmitriis | someone to review it sooner than later. | 13:26 |
dmitriis | it doesn't yet have the prefilter and compute capability parts that were recently added to the spec but I will work on updating the WIP review soon with that and on raising a relevant os-traits change | 13:28 |
dmitriis | We had a VM booted with a floating IP assigned which we then connected to via a router. The flows were offloaded into the ConnectX-6 chip present on BF2. | 13:30 |
sean-k-mooney | i think i have ping that patch to people dowstream already but ill let the vert team know | 13:31 |
sean-k-mooney | *virt | 13:31 |
dmitriis | ack, tyvm | 13:32 |
dmitriis | sean-k-mooney: besides testing overlays we also tried using VLAN provider networks. That worked as well but the only thing to note there is that collocating VMs with ports attached to overlay networks via PCI devices with the ones that are directly attached to VLAN networks is going to be problematic with the current whitelist based lookup | 13:33 |
dmitriis | implementation. | 13:33 |
dmitriis | entries in the whitelist get a physnet tag (either null for overlay networks or a physnet label) | 13:34 |
sean-k-mooney | correct they do | 13:34 |
dmitriis | but there is only one vendor/device id pair | 13:34 |
sean-k-mooney | and technially null was never intended to be supported | 13:34 |
sean-k-mooney | we never had a nova feature to support overlays with pci devices | 13:35 |
sean-k-mooney | they exploted a lack of null checking and it happend to work | 13:35 |
sean-k-mooney | dmitriis: anyway back to your point why is that problematic | 13:36 |
dmitriis | sean-k-mooney: heh, yes, I wasn't aware of the history but the hardware offload docs explicitly mention that null needs to be used https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html#configure-nodes-vxlan-configuration | 13:36 |
sean-k-mooney | dmitriis: yes that was never intended to work | 13:36 |
sean-k-mooney | but peopel now have it in produciton | 13:36 |
sean-k-mooney | dmitriis: https://bugs.launchpad.net/nova/+bug/1915282 | 13:37 |
dmitriis | sean-k-mooney: IIRC PCI requests come with a specific physnet parameter (or null). So when PCI stats are looked at, this parameter is used for lookup | 13:37 |
dmitriis | let me find that code again | 13:37 |
sean-k-mooney | yes we end up passing python NONE in the pci request | 13:37 |
sean-k-mooney | becaue the physent of a vxlan or geneve netowrk is not set | 13:38 |
sean-k-mooney | that wil match the null phsynet specified in the whitelist | 13:38 |
dmitriis | ((Pdb)) request | 13:39 |
dmitriis | InstancePCIRequest(alias_name=<?>,count=1,is_new=<?>,numa_policy=<?>,request_id=c3a87cba-323a-4203-bca7-0916927dcd5b,requester_id='28ea5b12-729c-46b4-b441-518fe786ea10',spec=[{physical_network=None,remote_managed='True'}]) | 13:39 |
dmitriis | I had something like this ^ | 13:39 |
sean-k-mooney | yep | 13:39 |
sean-k-mooney | that shoudl work | 13:39 |
sean-k-mooney | that is the python None | 13:39 |
sean-k-mooney | not its not quoted | 13:39 |
sean-k-mooney | *note | 13:39 |
sean-k-mooney | that will match physical_network=null | 13:40 |
dmitriis | ah, maybe that's an old note that I have. It was since fixed to use a string | 13:40 |
sean-k-mooney | you have to use "'physical_network':null" not "'physical_network':'null'" in the whitelist | 13:40 |
sean-k-mooney | like this passthrough_whitelist={ "vendor_id":"15b3", "product_id":"101e", "physical_network":null } | 13:41 |
sean-k-mooney | that enabels a connectx-6 dx for overlay networking | 13:42 |
sean-k-mooney | dmitriis: if you want to have some VF for vlan/flat and other for geneve tunnels you need to use the adress field to partion the vfs into groups | 13:43 |
dmitriis | sean-k-mooney: I suppose that could be one way to do it | 13:44 |
sean-k-mooney | dmitriis: this is because tunnels was never ment to be supported at all so we never impleted a way to allow a device to be part of multile physnets | 13:44 |
sean-k-mooney | dmitriis: if it was not for the fact that this was used as production we would have closed this as a secuirty bug and blocked the use of null the detail are in the bug | 13:45 |
dmitriis | sean-k-mooney: yeah, makes sense. I think that documenting this and suggesting address-based partitioning as a workaround is viable for now | 13:46 |
sean-k-mooney | dmitriis: the tl;dr is we use a json parser to parse the whitelist and in json unquoted null is mapped to the python None object which just happens to be what we get when we parse the phsynet form networks that dont have one | 13:46 |
sean-k-mooney | which is why physical_network=None in the pci request will actully match | 13:47 |
sean-k-mooney | sicne that is also python None object not the stirng 'None' | 13:48 |
sean-k-mooney | fyi hte docs for the whitelist are not greate but incase yo udont know we support both bash style globs and python regex expression in the adress filed | 13:49 |
sean-k-mooney | and we supprot both in either the sting or dict form | 13:50 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#pci.passthrough_whitelist has some examples | 13:50 |
dmitriis | sean-k-mooney: I recall some other place in Nova where I had to use a string instead (trying to find where so maybe I wrongly brought this up here). | 13:50 |
sean-k-mooney | there might be if you find it let me knwo and i might know the history or it might just be a bug | 13:51 |
dmitriis | sean-k-mooney: that's what we used in the lab | 13:52 |
dmitriis | passthrough_whitelist = [{"vendor_id": "15b3", "product_id": "101e", "physical_network": null, "remote_managed": "true"}] | 13:52 |
dmitriis | and for physnets: passthrough_whitelist = [{"vendor_id": "15b3", "product_id": "101e", "physical_network": "physnet1", "remote_managed": "true"}] | 13:53 |
sean-k-mooney | not at the same time right | 13:53 |
sean-k-mooney | if you add the adress filed you could use both but both look valid to me the first for geneve and the second for flat/vlan | 13:54 |
sean-k-mooney | huh interesting | 13:55 |
sean-k-mooney | the VFs for the connectx-6 on the bluefiled 2 have the same vendor and product id as a normal connectx-6 | 13:55 |
bauzas | elodilles: sure, please do it, I'll just update the wikipage after you | 13:59 |
dmitriis | sean-k-mooney: ack on the address field usage. | 14:06 |
dmitriis | sean-k-mooney: I don't have a separate ConnectX-6 at hand but BF2 has ConnectX-6 in it. Let me check the PCI ID DB - I think I've seen different ids but maybe that's for something else. | 14:07 |
elodilles | bauzas: done, thanks (i might overused the info and link markers o:) feel free to edit :)) | 14:11 |
bauzas | elodilles: ack, thanks | 14:11 |
dmitriis | sean-k-mooney: so the PF is different but VFs look like the ones from a "regular" ConnectX-6. | 14:13 |
dmitriis | PF: | 14:13 |
dmitriis | 82:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01) | 14:13 |
dmitriis | 82:00.0 0200: 15b3:a2d6 (rev 01) | 14:13 |
dmitriis | Subsystem: 15b3:0061 | 14:13 |
dmitriis | VF: | 14:13 |
dmitriis | 82:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function (rev 01) | 14:13 |
dmitriis | 82:00.3 0200: 15b3:101e (rev 01) | 14:13 |
dmitriis | Subsystem: 15b3:0061 | 14:13 |
dmitriis | so careful "remote_managed" tagging is needed | 14:14 |
sean-k-mooney | ack good to know | 14:14 |
sean-k-mooney | dmitriis: you can use the adress of the pf and vendor id of the VF to whitelist all the VFs that belog to that pf | 14:29 |
sean-k-mooney | just so you know | 14:29 |
sean-k-mooney | dmitriis: that behavior is not well known | 14:29 |
dmitriis | sean-k-mooney: didn't know that (not surprisingly), thanks for the info. | 14:32 |
dmitriis | sean-k-mooney: btw, BF2 does bonding at the ARM CPU side transparently to the hypervisor | 14:33 |
dmitriis | and there's an option to hide the inactive PF for the hypervisor side: https://docs.mellanox.com/display/BlueFieldSWv24011082/BlueField%20Link%20Aggregation | 14:34 |
dmitriis | that makes it easier for OpenStack deployers/operators since only one PF needs to be taken into account | 14:35 |
dmitriis | sean-k-mooney: so this is the place where I had to use a string (instead of bool, not None so my reference was not correct) https://review.opendev.org/c/openstack/nova/+/812111/3/nova/network/neutron.py#2295 - that's where a device spec is dynamically generated (not based on flavor or image properties). | 14:42 |
sean-k-mooney | im on a call but ill look it up after thanks | 14:45 |
dmitriis | ack | 14:47 |
Adri2000 | hi, I've got a race condition issue on ussuri and victoria when resizing an instance... specifically this is with /var/lib/nova/instances on NFS, and the following happens sometimes when resizing an instance where a cold migration is triggered: `qemu-img resize` will be run on the new compute node, before the old compute node has fully released the lock on the disk file; this | 15:03 |
Adri2000 | will put the instance in ERROR state. does that ring a bell to anyone? | 15:03 |
Adri2000 | ERROR nova.compute.manager [req-...] [instance: 6ca672fd-8746-441f-bbca-6baa3234bb5e] Setting instance vm_state to ERROR: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. Command: qemu-img resize /var/lib/nova/instances/6ca672fd-8746-441f-bbca-6baa3234bb5e/disk Exit code: 1 Stdout: '' | 15:03 |
Adri2000 | Stderr: "qemu-img: Could not open '/var/lib/nova/instances/6ca672fd-8746-441f-bbca-6baa3234bb5e/disk': Could not open '/var/lib/nova/instances/6ca672fd-8746-441f-bbca-6baa3234bb5e/disk': Permission denied\n" | 15:03 |
sean-k-mooney | Adri2000: are you using nfsv3 | 15:04 |
Adri2000 | sean-k-mooney: `/var/lib/nova/instances type nfs4 (rw,relatime,vers=4.1...` | 15:04 |
sean-k-mooney | ok nfsv3 has locking issues v4.1 improves the situration but recommend v4.2+ | 15:05 |
sean-k-mooney | lyarwood: does ^ seem familar to you | 15:05 |
sean-k-mooney | Adri2000: i belive there are some tunabel in the mount option that can be used to help resolve this | 15:07 |
sean-k-mooney | Adri2000: are you using raw images? | 15:11 |
bauzas | gibi: I'll have to hardstop our meeting by 5:50pm our TZ | 15:11 |
gibi | bauzas: ack | 15:12 |
bauzas | in case we have to continue discussing, could you be chairing it ? | 15:12 |
sean-k-mooney | dmitriis: oh there | 15:13 |
sean-k-mooney | str(self._is_remote_managed(vnic_type)), | 15:13 |
sean-k-mooney | dmitriis: ya that makes sense | 15:13 |
Adri2000 | sean-k-mooney: qcow3 images. one nfs option I have currently is local_lock=none, maybe I should look into this one. | 15:13 |
sean-k-mooney | dmitriis: technicaly the tags are defiend to be of type String | 15:14 |
sean-k-mooney | so its a dict of string to string | 15:14 |
sean-k-mooney | Adri2000: ack the reason i asked is apparently the lockign behavior is differnt in qemu for raw vs qcow | 15:15 |
dmitriis | sean-k-mooney: ack | 15:15 |
sean-k-mooney | dmitriis: https://github.com/openstack/nova/blob/master/nova/pci/devspec.py#L262-L263 | 15:17 |
dmitriis | sean-k-mooney: yep, makes sense | 15:18 |
dmitriis | sean-k-mooney: btw, the ovn-vif repo is now up under ovn-org https://github.com/ovn-org/ovn-vif | 15:18 |
sean-k-mooney | yes i saw your comment | 15:19 |
sean-k-mooney | just looking at the code | 15:19 |
dmitriis | ack | 15:19 |
sean-k-mooney | am i right in assuming we do not want to allow these device to be used for flavor based pci passthouhg | 15:19 |
dmitriis | sean-k-mooney: yes, they won't be of much use without being plugged appropriately. Not the VFs at least. | 15:20 |
sean-k-mooney | ya | 15:21 |
sean-k-mooney | im wondering if we should explictly block that | 15:21 |
sean-k-mooney | unfortunetly i dont see a trivial way to do that | 15:21 |
sean-k-mooney | although we might already do that | 15:22 |
dmitriis | sean-k-mooney: I guess we could exclude devices from search results if remote_managed is present but not requestedd | 15:22 |
sean-k-mooney | yep | 15:22 |
sean-k-mooney | i was just going to provide an exmaple | 15:22 |
sean-k-mooney | we do this in other cases already | 15:23 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L411-L433 | 15:23 |
sean-k-mooney | That filteres out PF if you did not ask for one | 15:23 |
sean-k-mooney | dmitriis: so you can copy past https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L520-L535 | 15:24 |
sean-k-mooney | and then add a new function that will filter out remote managed device if not requested | 15:24 |
dmitriis | sean-k-mooney: https://review.opendev.org/c/openstack/nova/+/812111/3/nova/network/neutron.py#2295 | 15:24 |
sean-k-mooney | i did this recently when i added support for vdpa https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L540 | 15:25 |
dmitriis | actually, I'm explicitly passing remote_managed=False | 15:25 |
sean-k-mooney | that wont work | 15:25 |
dmitriis | sean-k-mooney: even with this? https://review.opendev.org/c/openstack/nova/+/812111/3/nova/pci/stats.py#111 | 15:25 |
sean-k-mooney | it will break existing deploymetn on upgrade as there existing device wont have remote_managed=False | 15:25 |
sean-k-mooney | and it woudl only apply for pci recuts form ports | 15:26 |
sean-k-mooney | dmitriis: that would work but we might end up doign a data migration of all existing rows | 15:26 |
sean-k-mooney | dmitriis: ok we can review this as part of the code review rather then the spec. | 15:27 |
sean-k-mooney | im just finsihing reading it now and ill approve it shortly | 15:27 |
dmitriis | sean-k-mooney: ack, I am open to adding a filter as you suggested | 15:27 |
sean-k-mooney | either would work but one involes updateing every row in the pci device tabel with remote_managed=false :) | 15:28 |
dmitriis | right, I would certainly like to avoid introducing a change that would break with a stale state in PCI stats | 15:28 |
sean-k-mooney | the important thing is there is not a gap in the design | 15:28 |
dmitriis | agreed | 15:28 |
*** artom__ is now known as artom | 15:34 | |
sean-k-mooney | dmitriis: ok i captured some of my tought in this converstaion in the spec but +2 +w from me | 15:43 |
sean-k-mooney | dmitriis: feel free to ping me to review the implemetaion too. i do not have +2 right on the code repo but ill try and spend some time reviewing it end to end next week | 15:45 |
dmitriis | sean-k-mooney: tyvm. I'll try to get the code updated with some of the latest changes by then. Still have to extend func tests to cover more cases but there are some already. | 15:46 |
dmitriis | sean-k-mooney: speaking of other lifecycle operations, I've spent some time looking at the recent VF hot-plug/unplug changes so I may revisit some of the unsupported operations at a later point | 15:47 |
bauzas | reminder : nova weekly meeting starts in 13 mins here in this #chan | 15:47 |
dmitriis | maybe we can actually make things like cold migration work, just need to review that further | 15:47 |
sean-k-mooney | ack | 15:47 |
sean-k-mooney | dmitriis: it might just work | 15:48 |
sean-k-mooney | there is very littel in the nova side that will need to be updated | 15:48 |
sean-k-mooney | also for the live migration | 15:48 |
dmitriis | sean-k-mooney: yes, we might need to document the need for extra slots to be added via the new config | 15:48 |
dmitriis | https://review.opendev.org/c/openstack/nova/+/545034/16/nova/conf/libvirt.py | 15:49 |
sean-k-mooney | dmitriis: that is really only need for q35 | 15:49 |
sean-k-mooney | and we already have a config to add extra slots in that case | 15:49 |
sean-k-mooney | yep that one | 15:49 |
dmitriis | ack | 15:49 |
sean-k-mooney | the pc machine type ahs 24 or 32 pci slot by default | 15:50 |
sean-k-mooney | for q35 the defautl behavior is to allocation all that are required for your vm +1 free for hotplug | 15:50 |
sean-k-mooney | oh... | 15:51 |
sean-k-mooney | there might be a bug in sriov live migration with q35 | 15:51 |
dmitriis | From the guest OS perspective, the PCI addressing is tied to the virtual PCI topology. Hopefully it is consistent across migration so that device naming doesn't change for the guest while the MAC is reprogrammed anyway. | 15:52 |
sean-k-mooney | i did most of my testing with pc and when i tested with q35 i dont know if i tested with more then one sriov nic | 15:52 |
sean-k-mooney | dmitriis: we dont gaurntee ti will be | 15:52 |
sean-k-mooney | so it might change | 15:53 |
dmitriis | sean-k-mooney: ah, good to know. Changing PCI addresses will change persistent device names tied to PCI addresses. | 15:53 |
sean-k-mooney | yes the way around that is to leverage device role tagging | 15:54 |
sean-k-mooney | but really we want qemu/kvm/nvidia to finish implemeitn live migration support for vdpa | 15:54 |
sean-k-mooney | so that we can just leave teh vdpa device attach | 15:54 |
sean-k-mooney | dmitriis: by the way at some point we likely need to consider how to supprot vdpa+bluefiled-2 | 15:55 |
sean-k-mooney | we can get the simple version working first however. | 15:56 |
dmitriis | sean-k-mooney: yes, I agree. There are two cases: software and hardware vDPA. For soft vDPA there is an extra agent needed on the hypervisor host. | 15:56 |
dmitriis | so that definitely has some challenges | 15:56 |
sean-k-mooney | im hoping we can simple not specify a device_type and relay on remote_manged=True | 15:56 |
dmitriis | another interesting area is Scalable Functions (SFs) which rely on mdev and a vendor-specific driver | 15:56 |
sean-k-mooney | well maybe not we can see | 15:57 |
sean-k-mooney | dmitriis: yes i have worked with that in the past | 15:57 |
dmitriis | it kind of erases the benefits of hardware virtio tbh | 15:57 |
sean-k-mooney | its not clear that the mdev based apparoch will go to market or not at least form teh vendor i was workign with | 15:57 |
sean-k-mooney | well the mdev impleation can be in hardware and present virtio too | 15:58 |
sean-k-mooney | it predates the vdpa buss | 15:58 |
dmitriis | ah, in that case, I take it back :^) | 15:58 |
whoami-rajat | Hi, just to be sure the nova meeting is in this channel right? | 15:59 |
dmitriis | I was also thinking of what CXL would bring and how much churn will it introduce to the existing PCI management implementation in Nova | 15:59 |
sean-k-mooney | the protype i was working on used an fpga to implent virtio in "hardware" but the long term plan was to do that in an asic. i just dont know if they have pivitored to vdpa now or not but it was mdev based at the time | 15:59 |
bauzas | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Nov 16 16:00:10 2021 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
gibi | o/ | 16:00 |
elodilles | o/ | 16:00 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:00 |
bauzas | good 'day, everyone ;) | 16:00 |
whoami-rajat | Hi | 16:00 |
opendevreview | Merged openstack/nova-specs master: Integration With Off-path Network Backends https://review.opendev.org/c/openstack/nova-specs/+/787458 | 16:00 |
gmann | o/ | 16:01 |
bauzas | I'll have to hardstop working in 45-ish mins, sooo | 16:01 |
bauzas | #chair gibi | 16:01 |
opendevmeet | Current chairs: bauzas gibi | 16:01 |
bauzas | sorry again | 16:01 |
gibi | so I will take the rest | 16:01 |
* bauzas is a taxi | 16:01 | |
bauzas | anyway, let's start | 16:02 |
bauzas | #topic Bugs (stuck/critical) | 16:02 |
bauzas | #info No Critical bug | 16:02 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 28 new untriaged bugs (+3 since the last meeting) | 16:02 |
bauzas | #help Nova bug triage help is appreciated https://wiki.openstack.org/wiki/Nova/BugTriage | 16:02 |
bauzas | I'm really a sad panda | 16:02 |
bauzas | in general, I'm triaging bugs on Tuesday, but I forgot about our today's spec review day :) | 16:03 |
bauzas | so I'll look at the bugs tomorrow | 16:03 |
bauzas | in case people want to help us, <3 | 16:03 |
bauzas | any bug to discuss ? | 16:03 |
bauzas | #link https://storyboard.openstack.org/#!/project/openstack/placement 33 open stories (+1 since the last meeting) in Storyboard for Placement | 16:04 |
bauzas | about thisq.$ | 16:04 |
bauzas | this... * | 16:04 |
bauzas | I tried to find which story was new :) | 16:04 |
bauzas | but the last story was already the one I knew | 16:05 |
bauzas | so, in case people know... | 16:05 |
dansmith | o/ | 16:05 |
gibi | bauzas: if at some point I have time I can try to dig but I pretty full at the moment | 16:06 |
bauzas | also, Storyboard is a bit... slow, I'd say | 16:06 |
bauzas | 5 secs at least everytime it takes for looking about a story | 16:06 |
bauzas | I mean, for stories, maybe we should use Facebook then ? :p | 16:07 |
bauzas | (heh, :p ) | 16:07 |
* bauzas was joking in case people were not knowing | 16:07 | |
bauzas | OK, this looks like a bad joke | 16:08 |
bauzas | moving on :p | 16:08 |
bauzas | #topic Gate status | 16:08 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:08 |
bauzas | nothing new | 16:08 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status | 16:08 |
bauzas | now placement-nova-tox-functional-py38 job works again :) | 16:09 |
bauzas | thanks ! | 16:09 |
bauzas | #topic Release Planning | 16:09 |
bauzas | #info Yoga-1 is due Nova 18th #link https://releases.openstack.org/yoga/schedule.html#y-1 | 16:10 |
bauzas | which is in 2 days | 16:10 |
bauzas | nothing really to say about it | 16:10 |
bauzas | #info Spec review day is today | 16:10 |
bauzas | I think I reviewed all the specs but one (but I see this one was merged ;) ) | 16:10 |
bauzas | thanks for all who already reviewed specs | 16:11 |
gibi | yeah I think we pushed forward all the open specs | 16:11 |
whoami-rajat | Sorry If I'm interrupting but I had one doubt regarding my spec | 16:12 |
bauzas | we merged 3 specs today | 16:12 |
bauzas | whoami-rajat: no worries, we can discuss this spec if you want during the open discussion topic | 16:12 |
whoami-rajat | ack thanks bauzas | 16:12 |
bauzas | whoami-rajat: but what is your concern ? | 16:12 |
bauzas | a tl;dr if you prefer | 16:13 |
bauzas | for other specs, I'll mark the related blueprints accepted in Launchpad by tomorrow | 16:14 |
whoami-rajat | bauzas, so I'm working on the reimage spec for volume backed instances and we decided to send connector details with the reimage API call and cinder will do the attachment update (this was during PTG), Lee pointed out that we should follow our current mechanism of nova doing attachment update like we do for other operations | 16:14 |
bauzas | ok, if this is a technical question, let's discuss this during the open discussion topic as I said | 16:15 |
whoami-rajat | sure, np | 16:15 |
bauzas | ok, next topic then | 16:15 |
bauzas | #topic Review priorities | 16:15 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement)+label:Review-Priority%252B1 | 16:15 |
bauzas | #info https://review.opendev.org/c/openstack/nova/+/816861 bauzas proposing a documentation change for helping contributors to ask for reviews | 16:16 |
bauzas | gibi already provided some comments on it | 16:16 |
bauzas | I guess the concern is how to help contributors to ask for reviews priorities like we did with the etherpad | 16:16 |
bauzas | but if we have a consensus saying that it is not an issue, I'll stop | 16:17 |
bauzas | but my only concern is that I think asking people to come on IRC and ping folks is difficult so we could use gerrit | 16:18 |
gibi | what is more difficult? Finding the reason of a faul in nova code and fixing it or joing IRC to ask for review help? | 16:19 |
sean-k-mooney | well one you "might" be able to do offlien/async | 16:19 |
sean-k-mooney | the other invovles talking to peopel albeit by text | 16:20 |
sean-k-mooney | unfortunetly those are sometime non overlaping skill sets | 16:20 |
bauzas | gibi: I'm just thinking of on and off contributors that just provide bugfixes | 16:20 |
gibi | doing code review is talking to people via text :) | 16:20 |
bauzas | but let's continue discussing this in the proposal, I don't wanna drag the whole attention by now | 16:21 |
sean-k-mooney | bauzas: for one off patches i think the expectaion shoudl still be on use to watch the patchs come in and help them | 16:21 |
sean-k-mooney | rather ten assuemign they will use any tools we provide | 16:21 |
bauzas | sean-k-mooney: yeah but then how to discover them ? | 16:21 |
bauzas | either way, let's discuss this by Gerrit :p | 16:22 |
sean-k-mooney | well if its a similar time zone i watch for teh irc bot commeting for the patches | 16:22 |
sean-k-mooney | if i dont recognise it or the name i open it | 16:22 |
sean-k-mooney | and then one of use can request the reqview priority in gerrirt or publicise the patch to others | 16:22 |
bauzas | that's one direction | 16:23 |
sean-k-mooney | if there is something in gerrit i can set im happy to do that on patches when i think they are ready otherwise ill just ping them to ye as i do now | 16:23 |
bauzas | either way, we have a large number of items for the open discussion topic, so let's move on | 16:23 |
sean-k-mooney | ack | 16:24 |
bauzas | #topic Stable Branches | 16:24 |
bauzas | elodilles: fancy copy/pasting or do you want me to do so ? | 16:24 |
elodilles | either way is OK :) | 16:24 |
bauzas | I can do it | 16:25 |
bauzas | #info stable gates' status look OK, no blocked branch | 16:25 |
bauzas | #info final ussuri nova package release was published (21.2.4) | 16:25 |
bauzas | #info ussuri-em tagging patch is waiting for final python-novaclient release patch to merge | 16:25 |
bauzas | #link https://review.opendev.org/c/openstack/releases/+/817930 | 16:26 |
bauzas | #link https://review.opendev.org/c/openstack/releases/+/817606 | 16:26 |
bauzas | #info intermittent volume detach issue: afaik Lee has an idea and started to work on how it can be fixed: | 16:26 |
bauzas | #link https://review.opendev.org/c/openstack/tempest/+/817772/ | 16:26 |
bauzas | any question ? | 16:26 |
elodilles | thanks :) | 16:26 |
bauzas | looks like none | 16:27 |
bauzas | #topic Sub/related team Highlights | 16:27 |
gibi | the volume detach issue feel more an more like not related to detach | 16:27 |
bauzas | #undo | 16:27 |
opendevmeet | Removing item from minutes: #topic Sub/related team Highlights | 16:27 |
gibi | the kernel panic happens before we issue detach | 16:28 |
elodilles | gibi: true | 16:28 |
gibi | it is either related to the attach or the live migration itself | 16:28 |
gibi | I have trials placing sleep in different places to see where we are too fast https://review.opendev.org/c/openstack/nova/+/817564 | 16:28 |
bauzas | which stable branches are impacted ? | 16:28 |
gibi | stable/victoria | 16:28 |
bauzas | ubuntu focal-ish I guess ? | 16:29 |
bauzas | ack thanks | 16:29 |
elodilles | (and other branches as well, but might be different root causes) | 16:29 |
gibi | I only see kernel panic in stable/victoria (a lot) and one single failure in stable/wallaby | 16:29 |
gibi | so if there are detach issues in older stable that is either not causing kernel panic, or we don't see the panic in the logs | 16:30 |
bauzas | I guess kernel versions are different between branches | 16:30 |
bauzas | right? | 16:30 |
bauzas | could we imagine somehow to verify another kernel version for stable/victoria | 16:31 |
bauzas | ? | 16:31 |
gibi | we tested with guest cirros 0.5.1 (victoria default) and 0.5.2 (master default) it is reproducible with both | 16:31 |
bauzas | ack so unrelated | 16:31 |
gibi | there is a summary here https://bugs.launchpad.net/nova/+bug/1950310/comments/8 | 16:31 |
bauzas | #link https://bugs.launchpad.net/nova/+bug/1950310/comments/8 explaining the guest kernel panic related to stable/victoria branch | 16:32 |
sean-k-mooney | ya the fiew cases i looked at with you last week were all happing befoer detach | 16:32 |
sean-k-mooney | so its either the attach or live migration | 16:32 |
gibi | sean-k-mooney: I have more logs in the runs of https://review.opendev.org/c/openstack/nova/+/817564 if you are interested | 16:32 |
sean-k-mooney | i looked downstream at our qemu bugs but didnt see anythign relevent | 16:32 |
sean-k-mooney | gibi: sure ill try and take a look proably tomorrow | 16:33 |
sean-k-mooney | but ill open it in a tab | 16:33 |
gibi | sean-k-mooney: thanks, I will retrigger that patch for a couple times to see if the current sleep before the live migration helps | 16:33 |
bauzas | a good sleep always helps | 16:34 |
bauzas | :) | 16:34 |
elodilles | :] | 16:34 |
sean-k-mooney | when sleep does not work we can also try a trusty print statement | 16:34 |
gibi | sleep is not there as a solution but as a troubleshooting to see which step we are too fast :D | 16:35 |
* sean-k-mooney is dismayed by how may race condition __dont__ appeare when you use print for debugging | 16:35 | |
gibi | and I do have a lot of print(server.console) like statements in the tempest :D | 16:35 |
sean-k-mooney | i think we can move on but its good you were able to confirm we were attaching before the kerenl finished booting | 16:36 |
sean-k-mooney | at least in some cases | 16:36 |
sean-k-mooney | that at least lend weight to the idea we are racing | 16:36 |
bauzas | ok, let's move on | 16:37 |
gibi | ack | 16:37 |
bauzas | again, large agenda todayu | 16:37 |
bauzas | #topic Sub/related team Highlights | 16:37 |
bauzas | damn | 16:37 |
bauzas | #topic Sub/related team Highlights | 16:37 |
bauzas | Libvirt : lyarwood ? | 16:37 |
bauzas | I guess nothing to tell | 16:38 |
bauzas | moving on to the last topic | 16:38 |
bauzas | #topic Open discussion | 16:38 |
bauzas | whoami-rajat: please queue | 16:39 |
whoami-rajat | thanks! | 16:39 |
bauzas | (kashyapc) Blueprint for review: "Switch to 'virtio' as the default display device" -- https://blueprints.launchpad.net/nova/+spec/virtio-as-default-display-device | 16:39 |
bauzas | this is a specless bp ask | 16:39 |
bauzas | kashyap said " The full rationale is in the blueprint; in short: "cirrus" display device has many limitations and is "considered harmful"[1] by QEMU graphics maintainers since 2014." | 16:39 |
bauzas | do we need a spec for this bp or are we OK for approving it by now ? | 16:40 |
whoami-rajat | so lyarwood had a concern with my reimage spec, we agreed to pass the connector info to reimage API (cinder) and cinder will do attachment update and return the connection info with events payload | 16:40 |
gibi | I think we don't need a spec this is pretty self contained in the libvirt driver | 16:40 |
bauzas | kashyap was unable to attend the meeting today | 16:40 |
whoami-rajat | (in PTG) | 16:40 |
sean-k-mooney | i think we are ok with approving it the main thing to call out is we will be chaing it for existing isntnace too | 16:40 |
bauzas | whoami-rajat: please hold, sorry | 16:40 |
whoami-rajat | oh ok | 16:40 |
gibi | the only open question we had with sean-k-mooney is how to change the default | 16:40 |
gibi | but kashyap tested it out that changing the default during hard reboot not cause any trouble to guests | 16:41 |
gibi | as the new video dev has a fallback vga mode | 16:41 |
bauzas | gibi: I'm thinking hard of any potential upgrade implication | 16:41 |
sean-k-mooney | right so when we dicussed this before we decied to change it only for new instances to avoid upgrade issue | 16:41 |
bauzas | correct | 16:41 |
sean-k-mooney | our downstream QE tested this with windows guests and linux guest and both seamd to be ok with the change | 16:41 |
bauzas | I'm in favor of not touching the running instances | 16:42 |
bauzas | or asking to rebuild them | 16:42 |
gibi | we are not toching the running instance, we only touch hard rebooting instances | 16:42 |
sean-k-mooney | so kasyap has impletne this for all instnaces | 16:42 |
bauzas | gibi: which happens when you stop/start, right? | 16:42 |
gibi | right | 16:42 |
sean-k-mooney | bauzas: yes as gibi says it will only take effect when the xml is next regenreted | 16:42 |
gibi | it happens while the guest is not running | 16:42 |
*** akekane_ is now known as abhishekk | 16:43 | |
gibi | it is not an unplug/plug for a running guest | 16:43 |
bauzas | do we want admins to opt-in instances ? | 16:43 |
bauzas | or do we agree it would be done automatically? | 16:43 |
sean-k-mooney | it will happen on start/stop hard reboot or a non live move operations | 16:43 |
gibi | bauzas: I trust kashyap that it is safe to change this device | 16:44 |
bauzas | do we also want to have a nova-status upgrade check for yoga about this ? | 16:44 |
sean-k-mooney | no | 16:44 |
bauzas | gibi: me too | 16:44 |
sean-k-mooney | why would we need too | 16:44 |
sean-k-mooney | we are not removing support for cirrus | 16:44 |
gibi | we don't remove cirros | 16:44 |
sean-k-mooney | jsut not the default | 16:44 |
gibi | yepp | 16:44 |
sean-k-mooney | gibi: context is downstream it is being remvoed form rhel 9 | 16:45 |
bauzas | sean-k-mooney: sure, that just means that long-living instances could continue running cirros | 16:45 |
sean-k-mooney | so wwe need to care about it for our product | 16:45 |
sean-k-mooney | actully cirrus is not beeing remvoed in rhel 9 | 16:45 |
sean-k-mooney | but like in rhel 10 | 16:45 |
sean-k-mooney | bauzas: yep which i think is ok | 16:46 |
sean-k-mooney | we coudl have a nova status check but it woudl have to run on the compute nodes | 16:46 |
sean-k-mooney | which is kind of not nice | 16:46 |
sean-k-mooney | since it woudl have to check the xmls | 16:46 |
bauzas | I know | 16:46 |
sean-k-mooney | so i woudl not add it personally | 16:46 |
bauzas | I'm just saying that we enter a time that could last long | 16:47 |
gibi | I agree, we don't need upgrade check | 16:47 |
sean-k-mooney | shal we continue this in the patch review | 16:48 |
bauzas | but agreed on the fact this is not a problem until cirros support is removed and this is not an upstream question | 16:48 |
bauzas | sean-k-mooney: you're right, nothing needing a spec | 16:48 |
bauzas | #agreed https://blueprints.launchpad.net/nova/+spec/virtio-as-default-display-device is accepted as specless BP for the Yoga release timeframe | 16:49 |
bauzas | moving on | 16:49 |
gibi | \o/ | 16:49 |
bauzas | next item | 16:49 |
bauzas | (kashyapc) Blueprint for review: "Add ability to control the memory used by fully emulated QEMU guests -- https://blueprints.launchpad.net/nova/+spec/control-qemu-tb-cache | 16:49 |
bauzas | again, a specless bp ask | 16:49 |
bauzas | he said " This blueprint allows us to configure how much memory a plain-emulated (TCG) VM, which is what OpenStack CI uses. Recently, QEMU changed the default memory used by TCG VMs to be much higher, thus reducing the no. of VMs you TCG could run per host. Note: the libvirt patch required for this will be in libvirt-v7.10.0 (December 2021)." | 16:49 |
bauzas | " See this issue for more details: https://gitlab.com/qemu-project/qemu/-/issues/693 (Qemu increased memory usage with TCG)" | 16:49 |
sean-k-mooney | im a little torn on this | 16:50 |
sean-k-mooney | im not sure i like this being a per host config option | 16:50 |
sean-k-mooney | but its also breaking existing deployemnts | 16:50 |
sean-k-mooney | so we cant really adress that with flavor extra specs or iamge properties | 16:51 |
sean-k-mooney | sicne it would be a pain for operators to use | 16:51 |
gibi | but that requires rebuild of existing instances | 16:51 |
sean-k-mooney | yep | 16:51 |
sean-k-mooney | so with that in mind the config option proably is the way to go | 16:51 |
sean-k-mooney | just need to bare in mind it might chagne after a hard reboot if you live migrate | 16:51 |
gibi | yeah, config as a first step, if later more fine grained control is needed we can add an extra_spec | 16:51 |
bauzas | there are libvirt dependencies | 16:52 |
sean-k-mooney | if we capture the (this should really be the same on all host in a region) pice in the docs im ok with this | 16:52 |
sean-k-mooney | bauzas: and qemu deps | 16:52 |
bauzas | you need a recent libvirt in order to be able to use it | 16:52 |
bauzas | right | 16:52 |
sean-k-mooney | its only supproted on qemu 5.0+ | 16:52 |
gibi | sean-k-mooney: yeah that make sense to document | 16:52 |
sean-k-mooney | so we will need a libvirt verion and qemu check in the code | 16:53 |
sean-k-mooney | which is fine we know how to do that | 16:53 |
bauzas | so, if this is a configurable, this has to explain which versions you need | 16:53 |
sean-k-mooney | yep | 16:53 |
bauzas | we would expose something unusable for the most | 16:53 |
sean-k-mooney | the only tricky bit will be live migration | 16:53 |
sean-k-mooney | if the dest is not new enough but the host is | 16:54 |
bauzas | correct, the checks ? | 16:54 |
sean-k-mooney | we will need to make sure we validate that | 16:54 |
bauzas | right | 16:54 |
bauzas | but this looks to me an implementation detail | 16:54 |
bauzas | all of this seems not needing a spec, right? | 16:54 |
bauzas | upgrade concerns are N/A | 16:54 |
bauzas | as you explicitely need a recent qemu | 16:55 |
sean-k-mooney | am the live migration check will be a littel complex but other then that i dont see a need for a spec | 16:55 |
sean-k-mooney | im a littel concerned about the livemgration check which is what makes me hesitate to say no spec | 16:55 |
bauzas | we can revisit this decision if the patch goes hairy | 16:55 |
sean-k-mooney | yes | 16:55 |
sean-k-mooney | that works for mee | 16:55 |
gibi | works for me too | 16:56 |
sean-k-mooney | i htink we have the hypervior version avaible in the conductor so i think we can do it without an rpc/object change | 16:56 |
bauzas | #agreed https://blueprints.launchpad.net/nova/+spec/control-qemu-tb-cache can be a specless BP but we need to know more about the live migration checks before we approve | 16:56 |
bauzas | gibi: sean-k-mooney: works for you what I wrote ? | 16:56 |
sean-k-mooney | +1 | 16:57 |
bauzas | ok, | 16:57 |
bauzas | next topic is ganso | 16:57 |
bauzas | and eventually, whoami-rajat | 16:57 |
ganso | hi! | 16:57 |
bauzas | ganso: you have one min :) | 16:57 |
ganso | so my question is about adding hw_vif_multiqueue_enabled setting to flavors | 16:57 |
ganso | it was removed from the original spec | 16:57 |
ganso | https://review.opendev.org/c/openstack/nova-specs/+/128825/comment/7ad32947_73515762/#90 | 16:57 |
ganso | today it can be only used in image properties | 16:58 |
ganso | does it make at all semantically or is this something that only makes sense as an image property? | 16:58 |
sean-k-mooney | ya this came up semi recently | 16:58 |
sean-k-mooney | i think we can just add this in the flavor | 16:58 |
bauzas | the other way would be a concern to me | 16:58 |
ganso | ok. Would this require a spec? | 16:58 |
bauzas | as users could use a new property | 16:59 |
sean-k-mooney | well image propertise are for exposing thing that affect the virtualised hardware | 16:59 |
bauzas | but given we already accept this for images, I don't see a problem with accepting it as a flavor extraspec | 16:59 |
sean-k-mooney | so in gnerally you want that to be user setable | 16:59 |
ganso | great | 17:00 |
bauzas | sean-k-mooney: right, I was just explaning that image > flavor seems not debatable while flavor > image seems to be discussed | 17:00 |
ganso | to me it sounds simple enough to not require a spec, do you agree? | 17:00 |
bauzas | good question | 17:00 |
bauzas | but we're overtime | 17:00 |
sean-k-mooney | https://blueprints.launchpad.net/nova/+spec/multiqueue-flavor-extra-spec | 17:00 |
sean-k-mooney | this is the implemation https://review.opendev.org/q/topic:bp/multiqueue-flavor-extra-spec | 17:01 |
bauzas | ganso: whoami-rajat: let's continue discussing your concerns after the meeting | 17:01 |
bauzas | #endmeeting | 17:01 |
opendevmeet | Meeting ended Tue Nov 16 17:01:10 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 17:01 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2021/nova.2021-11-16-16.00.html | 17:01 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2021/nova.2021-11-16-16.00.txt | 17:01 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2021/nova.2021-11-16-16.00.log.html | 17:01 |
whoami-rajat | ack | 17:01 |
bauzas | I need to leave | 17:01 |
sean-k-mooney | ganso: stephenfin was working on this before he moved team last cycle | 17:01 |
bauzas | ganso: about your ask, I'll put the specless bp acceptance to next week | 17:01 |
sean-k-mooney | ganso: i think we can do it as a specless blueprint | 17:01 |
bauzas | ganso: but we can basically agreed on this without waiting for it to be papered | 17:02 |
bauzas | agree* | 17:02 |
sean-k-mooney | ganso: all of the code is there i just didnt get time to pick it back up after stephenfin move so if you want to pick it up please do | 17:02 |
* bauzas needs to leave | 17:02 | |
ganso | sean-k-mooney, bauzas thank you very much!! | 17:02 |
gibi | whoami-rajat: would be nice to have the reimage discussion when lyarwood is present | 17:03 |
gibi | I not feel knowledgeable enough in cinder | 17:03 |
whoami-rajat | gibi, ok, just wanted the team's thoughts on it, can you suggest a time that would be suitable? | 17:04 |
gibi | whoami-rajat: try to ping lyarwood tomorrow | 17:05 |
whoami-rajat | ok | 17:06 |
gibi | both me and bauzas was +2 on your spec so a quick chat with lyarwood would be enough | 17:06 |
whoami-rajat | ack, i will fix the gate failure and see what lyarwood thinks about it | 17:06 |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Revert project-specific APIs for servers https://review.opendev.org/c/openstack/nova/+/816206 | 17:19 |
kashyap | bauzas: Just back: on the blueprint for changing video model to "virtio": yes, you can trust the test results posted in the change. As noted, I've got it properly integration-tested for Windows and Linux guests with Red Hat virt QE | 17:37 |
kashyap | Also, gibi --^ (Thanks for the trust :)) | 17:37 |
gibi | kashyap: :) | 17:38 |
kashyap | gibi: On context: it is not specific to downstream RHEL9 removing it (as sean-k-mooney phrased it). *Regardless* of what RHEL9 does it is not a good default. That's the bare argument. | 17:38 |
kashyap | "it" == Cirrus, I mean. | 17:38 |
kashyap | gibi: On the tb-cache thing: as a reminder, it is mostly used by CI setups that can't have KVM. All sensible production users will use KVM | 17:39 |
kashyap | Unless they have some need to run emulated-only guests -- because the performance is cripplingly slow compared to hardware-accelerated virt | 17:40 |
gibi | yeah good point | 17:40 |
gibi | it is for a specific non production use case | 17:40 |
clarkb | kashyap: there are production use cases for emulation though. For example docker image builds for different architectures (we do a bunch of that) | 17:40 |
clarkb | That doesn't concern nova, but it should be something that qemu/libvirt consider | 17:41 |
clarkb | basically the emulation use case shouldn't simply be dismissed | 17:41 |
kashyap | clarkb: Heya. Fully agree - that's a valid use-case. :-) But I was speaking from a compute-workload point of view: 90% of them are on KVM driver | 17:42 |
kashyap | clarkb: Enabling cross-arch builds is one of the appealing points, sure. | 17:42 |
kashyap | clarkb: Although, my use of "sensible production users" is a bit dismissive, I agree. Sorry :) | 17:43 |
sean-k-mooney | kashyap: rackspace used to run there public cloud using qemu for x86 on power hardware for a long time | 17:43 |
kashyap | sean-k-mooney: Sure; but it's also far, far, less secure. And upstream QEMU doesn't have any security guranatees | 17:45 |
sean-k-mooney | kashyap: yep and that is fine for many | 17:45 |
sean-k-mooney | espically if they use selinxu/contiaer to add an extra laywer of security around the qemu instancve | 17:46 |
kashyap | sean-k-mooney: Sure; as long as they're aware of it. I just double-checked with the QEMU folks: they "explicitly *disclaim* any security for TCG" | 17:46 |
sean-k-mooney | yes i kno | 17:46 |
sean-k-mooney | know | 17:46 |
sean-k-mooney | its in there wiki | 17:46 |
kashyap | Public docs: https://qemu-project.gitlab.io/qemu/system/security.html | 17:47 |
sean-k-mooney | https://www.qemu.org/docs/master/system/security.html#non-virtualization-use-case | 17:47 |
kashyap | Yep. | 17:47 |
kashyap | sean-k-mooney: Note, though; SELinux/AppArmour can mitgate *some* of the risk, but as the QEMU folks say elsewhere: "depending on the config you can still have *massive* holes you can drive a truck through" (Cc: gibi, clarkb) | 17:50 |
clarkb | sure, I'm not saying it is a good idea for production cloud VM usage. But I do think there are valid use cases out there | 17:51 |
kashyap | Agreed. I was just tempering the "production cloud w/ TCG" usage point-of-view. In case any lurkers are observing this conversation, I wanted to plug the security implications here | 17:52 |
sean-k-mooney | i dont really think its a debate there have been several production largescase cloud that did run with just qemu | 17:53 |
sean-k-mooney | depening on your security model it may or may not be an issue | 17:53 |
kashyap | Also Rackspace used to offer Xen too. Not just plain QEMU. | 18:01 |
kashyap | I don't want belabour this point. I wonder who are these "largescale clouds". Overall, any serious user who wants to run non-toy compute workloads will not use plain emulation. | 18:02 |
kashyap | Anyhow...time to wrap up the day. | 18:03 |
*** tosky is now known as Guest6054 | 18:05 | |
*** tosky_ is now known as tosky | 18:05 | |
*** tosky_ is now known as tosky | 18:48 | |
dasp | sean-k-money: I opened the BP like you suggested but didn't tag it for yoga properly, so it may have been missed: https://blueprints.launchpad.net/nova/+spec/configurable-no-compression-image-types | 19:11 |
sean-k-mooney | am we will tag it when its review but you just need to add it to the metting adgenda by updating the wiki | 19:12 |
opendevreview | Rodrigo Barbieri proposed openstack/nova master: Add 'hw:vif_multiqueue_enabled' flavor extra spec https://review.opendev.org/c/openstack/nova/+/792356 | 19:12 |
dasp | sean-k-mooney: thanks, done | 19:17 |
*** mdbooth5 is now known as mdbooth | 19:35 | |
opendevreview | Artom Lifshitz proposed openstack/nova master: DNM: Test token expiration during live migration https://review.opendev.org/c/openstack/nova/+/817778 | 20:19 |
*** tosky is now known as Guest6070 | 22:42 | |
*** tosky_ is now known as tosky | 22:42 | |
*** tosky is now known as Guest6073 | 23:07 | |
*** tosky_ is now known as tosky | 23:07 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!