Monday, 2022-05-09

gibi	sean-k-mooney: when you are up. We can go through the open questions in the PCI tracing spec	08:24
ralonsoh	gibi, qq if you know that. For live-migration, all libvirt should have the same "virsh secret-list" ?	09:10
gibi	ralonsoh: o/ good question. I guess you are looking at a situation with encrypted volumes	09:14
ralonsoh	gibi, yes, I've deployed two nodes with devstack and ceph	09:15
gibi	hm, I dont seem to find any code in nova that would move the secret	09:18
gibi	I'm not a sure how this works but seems like barbican is involved too	09:27
gibi	ralonsoh: as melwitt took over the ephemeral encryption work from lyarwood I expect that she might know more about the volume encryption case as well	09:31
ralonsoh	gibi, thanks, I'm going to try to deploy this env without encription, if possible	09:32
opendevreview	Jorhson Deng proposed openstack/nova master: Clear the ignore_hosts before starting evacuate https://review.opendev.org/c/openstack/nova/+/841089	09:42
sean-k-mooney	do you mean the ceph secret	09:43
sean-k-mooney	if so we expect the operator to distibute that	09:43
ralonsoh	sean-k-mooney, this is in a devstack installation	09:43
sean-k-mooney	yep	09:44
ralonsoh	so one compute is requesting its own secret id	09:44
ralonsoh	and the other compute is asking for other	09:45
ralonsoh	and I don't know it that should match the ceph.conf fsid	09:45
sean-k-mooney	hum i could try and deploy this and see. but the devstack roles just copy the keyfiels and config to the compute https://github.com/openstack/devstack/blob/master/roles/sync-controller-ceph-conf-and-keys/tasks/main.yaml	09:50
sean-k-mooney	then on the compute you set REMOTE_CEPH=True	09:50
ralonsoh	sean-k-mooney, yes, that's set	09:51
sean-k-mooney	https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L834-L841=	09:51
sean-k-mooney	the secret config seams to be the same	09:51
sean-k-mooney	https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L244-L259=	09:52
ralonsoh	sean-k-mooney, yeah, it is. In any case, I'll deploy the second compute node again	09:52
sean-k-mooney	ack	09:53
sean-k-mooney	i think from lookign at that that the secure and keyfiles should be the same on all hosts	09:53
ralonsoh	sean-k-mooney, one question	09:53
ralonsoh	in the first node, the fsid is 37...	09:54
ralonsoh	and the "sudo virsh secret-list " is d5...	09:54
ralonsoh	this is wrong...	09:54
ralonsoh	pffff	09:54
ralonsoh	sean-k-mooney, sorry, I've re-deployed the first controller	10:17
ralonsoh	and CEPH has changed the /etc/ceph/ceph.cong fsid number	10:18
ralonsoh	that means the virsh secret and the fsid is different now	10:18
sean-k-mooney	yes the it will change every time you stack	10:21
ralonsoh	sean-k-mooney, but what ID should I use now?	10:22
sean-k-mooney	ralonsoh: on a different topic has anyone raised support virtio-failover in neutron	10:23
sean-k-mooney	ralonsoh: likely the one for the contoler	10:23
ralonsoh	sorry what?	10:23
sean-k-mooney	that basically answers my question :) virtio-failover is like a form of automatic bounding	10:23
ralonsoh	ah no that I'm aware	10:24
sean-k-mooney	you can declare one virtio device as a failover device if the primary losses connectivity	10:24
ralonsoh	in any case, I'll check it later	10:24
ralonsoh	nope, that is usually done inside the VM	10:24
sean-k-mooney	no worries im 99% sure that its not supproted	10:24
ralonsoh	sean-k-mooney, last q	10:24
ralonsoh	about this fsid	10:25
sean-k-mooney	ralonsoh: virtio failover allwos qemu to ask the guest virtio driver to do the failover automaticaly in the guest	10:25
sean-k-mooney	ralonsoh: sure	10:25
ralonsoh	if "virsh secret-list" is one number	10:25
ralonsoh	and in ceph.conf I have another one	10:25
ralonsoh	then which one should I use in the second comptue?	10:25
sean-k-mooney	the secret uuid is the cinder_ceph_uuid	10:28
ralonsoh	yes, and it's generated by devstack-ceph	10:28
ralonsoh	CEPH_FSID=$(uuidgen)	10:28
sean-k-mooney	yep so you coudl doble check the cidner config	10:28
sean-k-mooney	and determin which is the correct uuid value	10:28
sean-k-mooney	ill quickly deploy with ceph and see if we can compare	10:29
gibi	Uggla: responded in the manila spec	10:33
sean-k-mooney	gibi: im just going to grab coffee but we can chat about the pci spec when ever suits	10:38
sean-k-mooney	ill be back in 10 mins	10:39
gibi	I will grab lunch so I will ping you later	10:45
sean-k-mooney	cool no rush.	10:47
gibi	sean-k-mooney: OK, I'm back	11:33
gibi	sean-k-mooney: so the first question is simple. Will nova create the custom resource class mentioned in the [pci]device_list or we expect the deployer to pre-create that	11:38
gibi	https://review.opendev.org/c/openstack/nova-specs/+/791047/4/specs/zed/approved/pci-device-tracking-in-placement.rst#156	11:38
gibi	I think in the vgpu case nova creates the custom RC	11:39
gibi	as the config does not have a full RC but just some typename	11:40
gibi	and nova generates the RC name from it	11:40
opendevreview	Balazs Gibizer proposed openstack/nova master: DNM: log number of green(thread\|let)s periodically https://review.opendev.org/c/openstack/nova/+/841040	11:48
sean-k-mooney	gibi: o/ am yes i think nova should create the custom resouce classes in placement	11:50
sean-k-mooney	the reason for this is we want to use CUSTOM_<VENDOR_ID>_<PRODUCT_ID>	11:51
sean-k-mooney	when no RC has been specified in the device list	11:51
sean-k-mooney	so i think it would be a better end user experince if those custom resouce classes were created automatically	11:52
gibi	so resource_class=foobar is OK and nova will create CUSTOM_FOOBAR in placement	11:52
sean-k-mooney	ah are you asking if nova should normalise and prepend the CUSTOM_	11:52
gibi	yep, as a follow up :)	11:52
gibi	follow up question	11:53
sean-k-mooney	i think that woudl be workable. i would prefer to encurage them to set CUSTOM_<whatever>	11:53
sean-k-mooney	but i think its fine to prepend and normalise automatically	11:54
gibi	a bit more user friendly if we normalize and prepend	11:54
gibi	so I will go with that	11:54
sean-k-mooney	yep just so long as we are smart and only prepend when needed	11:54
gibi	OK, I can make it smart to avoid double custom	11:55
gibi	OK	11:55
gibi	next one is future proofing the RC	11:55
gibi	https://review.opendev.org/c/openstack/nova-specs/+/791047/4/specs/zed/approved/pci-device-tracking-in-placement.rst#169	11:55
gibi	I think we support the case today when pci alias and neutorn sriov is configured in the same deployment	11:56
gibi	also I think it is possible to consume the same type-VF either from alias or from port	11:57
gibi	so for this case we need an RC name for the type-VF RP that is known before the scheduling for the sriov case	11:57
gibi	SRIOV_NET_VF could be use for that	11:57
gibi	as that already exists in os-traits	11:58
sean-k-mooney	yes you can consume type-vf via alias	11:58
sean-k-mooney	because VF and sriov have nothing to do with networking	11:58
sean-k-mooney	you can have VFs for gpus or ssds	11:58
gibi	true	11:58
sean-k-mooney	the physical_network tag in the device list	11:59
sean-k-mooney	is what marks it as a nic and is required for neutron consumtion	11:59
sean-k-mooney	im not sure if we explictly prevent alaise form consuming device with physical_network set	11:59
sean-k-mooney	but we could do that going forward i guess	11:59
gibi	I think we did not prevent it today	11:59
sean-k-mooney	ack	12:00
sean-k-mooney	so for device with physical_network set	12:00
gibi	we can simply say that resource_class will not be applicable if physical_network tag is present, and nova will use standard RC for these devices	12:00
sean-k-mooney	i think its fine to mark them as SRIOV_NET_VF or SRIOV_NET_PF	12:00
gibi	cool	12:00
sean-k-mooney	we would likely track vdpa devices as SRIOV_NET_VF too	12:01
sean-k-mooney	although i guess that could be SRIOV_NET_VDPA	12:01
sean-k-mooney	that actully might make more sense not that i think of it	12:01
gibi	yeah. but we don't have to decide it now, I just needed to make sure the that the current spec is future proof	12:02
sean-k-mooney	so yes making RC and phsynet mutally exclsive i think is correct	12:02
sean-k-mooney	yep	12:02
gibi	OK	12:02
gibi	next one	12:02
sean-k-mooney	so you descoped the current spec to jsut the alias based passthough case yes	12:02
gibi	yes	12:02
sean-k-mooney	cool im fine with that by the way	12:02
gibi	it is still complex enough	12:02
sean-k-mooney	i want the neutron way to work too but it does not need to be in the initall mvp	12:03
gibi	I just keep an eye on things not to create a dead end with the current spec	12:03
sean-k-mooney	yep	12:03
sean-k-mooney	ok so back ot your next question :)	12:03
gibi	OK, the next one is simple. With the current proposal the RP is named <hostname>_pci_0000_84_00_0	12:03
gibi	if we follow the pGPUnaming	12:04
gibi	PGPU naming	12:04
gibi	is that OK?	12:04
gibi	we could have a full normal PCI address if we want as the RP name charset is not restricted	12:04
sean-k-mooney	am so i was not planning to use the lable for libvirt	12:04
sean-k-mooney	the nodedev name pci_0000_84_00_0	12:05
sean-k-mooney	is not considerd stable by them	12:05
gibi	ahh, good to know	12:05
gibi	then I think it is better not to rely on it	12:05
sean-k-mooney	so i was thinking it woudl be <hostname>_<pci addres in linux format>	12:05
gibi	so like in DDDD:BB:AA.FF format?	12:05
sean-k-mooney	yes	12:05
gibi	OK, we can do that	12:05
sean-k-mooney	is : allowed	12:06
gibi	yes it is	12:06
gibi	the RP name is free text	12:06
sean-k-mooney	ok we could normalise	12:06
sean-k-mooney	if not	12:06
gibi	the traits and RCs are restricted	12:06
sean-k-mooney	if we wanted to have pci_0000_84_00_0 by the way i would prefer that nova generated that	12:06
sean-k-mooney	rahter then using the value directly form libvirt	12:06
sean-k-mooney	but im oke wiht usei the bdf format above DDDD:BB:AA.FF	12:07
gibi	ack	12:08
gibi	next one	12:08
sean-k-mooney	so one thing related to this	12:08
gibi	go	12:08
sean-k-mooney	even if you add the device to the device list using devname instead of the adress we would still use the pci adress in the RP name yes	12:08
gibi	yes	12:09
sean-k-mooney	i just want to make sure that detail is hidden form placement	12:09
sean-k-mooney	cool	12:09
gibi	if we need the devname for any reason during the scheduling we can add that as a trait	12:09
sean-k-mooney	yes we could. currently its not used in the alias or pci request object	12:09
sean-k-mooney	so we shoudl not	12:09
sean-k-mooney	but if we did a trait woudl be workable	12:10
gibi	if not needed then we wont add it :)	12:10
sean-k-mooney	:)	12:11
gibi	so the next question is related	12:11
gibi	what traits nova needs to add automatically?	12:11
gibi	only the ones mentioned in the device_lsit	12:11
gibi	?	12:11
gibi	or we want to automate things like adding capabilities as traits	12:11
sean-k-mooney	ah so yes the capablitiy traits shoudl be added in my view	12:12
sean-k-mooney	they were intended to be reported to placement orgianly	12:12
gibi	OK, that make sense	12:12
gibi	do we have in the code somewhere listed what are the capabilities we parse? or we parse verything?	12:13
sean-k-mooney	so if i rememebr correctly you were suggeting allow additivie only traits to be listed in the device_list	12:13
gibi	*everything	12:13
sean-k-mooney	gibi: im looking for the code now	12:13
sean-k-mooney	but we had code to normalise the capablites and report them to placement that ralonsoh wrote in the past	12:13
gibi	sean-k-mooney: yepp the today spec only supports additive traits	12:13
sean-k-mooney	https://review.opendev.org/q/topic:bp%252Fenable-sriov-nic-features	12:14
sean-k-mooney	https://review.opendev.org/c/openstack/nova/+/466051 specficically	12:15
sean-k-mooney	but you might be able to reuse some of the other work	12:15
sean-k-mooney	gibi: the traits have already been added to os-traits https://github.com/openstack/os-traits/blob/master/os_traits/hw/nic/offload.py	12:16
gibi	thanks, this helps	12:17
gibi	so lets report capabilities as traits	12:17
gibi	and then I will amend the spec to allow removing traits via device_list	12:17
gibi	to disable capability	12:18
sean-k-mooney	yep. right now this only makes sense for neutron nics really	12:18
sean-k-mooney	since we dont really gather capablities for other devices	12:18
gibi	ahh	12:18
gibi	so no generic PCI caps	12:18
sean-k-mooney	although remote_mannaged might be the excption	12:18
sean-k-mooney	well for the remote managed deviecs we now have the vpd	12:18
sean-k-mooney	capablity	12:18
sean-k-mooney	that is not yet a trait so maybe we would want to report that	12:19
sean-k-mooney	gibi: i think for the inital version we could keep it to jsut operator provided traits	12:19
sean-k-mooney	if we want to keep it simple	12:19
sean-k-mooney	then in the future we can auto discover device capablities and report them if that makes sense	12:20
gibi	yeah that make sense, it is easy to add later	12:20
gibi	so keeping traits just additive now as well	12:20
gibi	OK	12:21
gibi	next one	12:22
gibi	https://review.opendev.org/c/openstack/nova-specs/+/791047/4/specs/zed/approved/pci-device-tracking-in-placement.rst#288	12:22
gibi	What to do if both ``resource_class`` and ``vendor_id`` and ``product_id`` are provided in the alias?	12:22
sean-k-mooney	good question. i guess we have too options	12:23
sean-k-mooney	first is consider that an error	12:23
sean-k-mooney	second is use the resouce_class for placment queries	12:23
sean-k-mooney	and the the rest for the pci/numa filter	12:23
gibi	the only use case I can think of is that deployer use a generic RC but then later want to refine the alias via product id	12:23
sean-k-mooney	my secret plan is to eventually remove the need for the alias	12:24
sean-k-mooney	so over time it woudl be nice if we coudl move to jsut having the resouce class in the alias	12:24
gibi	do you want to go with flavor extra_spec based resource?	12:24
sean-k-mooney	yes and no	12:25
sean-k-mooney	i current hate that we allow grouping in the extra_specs	12:25
sean-k-mooney	so part of me wants to keep the alisa as we can insulate operators form that	12:25
sean-k-mooney	but i do kind of like the idea of have just resouce:CUSTOM_<whatever>=1	12:26
gibi	OK, I get the goal that we want to move deployers to RC based alias in the future and if the deployer still want product id based filtering then the deployer can use traits for that	12:26
sean-k-mooney	so i think having the RC take prescidence for the placment query and allowign vendor_id and product_id makes sense	12:26
gibi	OK	12:27
sean-k-mooney	so really what i would like is for operators to take devces with the custom resouce class and use the RC name as the "alias"	12:28
gibi	so we keep the PCIFilter to keep filtering for vendor / product	12:28
gibi	at least for now	12:28
gibi	the RC name as alias make sense	12:28
sean-k-mooney	ya for now althogh in princial i think you could trun it off if we did this right	12:28
sean-k-mooney	my idea is	12:29
gibi	yes if we let the product id filtering case go and no SRIOV in the deployment then we can turn off the PCIFilter	12:29
gibi	neutron based SRIOV	12:30
sean-k-mooney	device_list = some device -> RC gpu_gold	12:30
sean-k-mooney	and then you could jsut ask for RC gpu_goal in the alias	12:30
sean-k-mooney	right now the reason i dont wnat to go directly to resouce:gpu_gold=1	12:31
sean-k-mooney	is that intoduced problems wiht vgpu and generic mdev usage	12:31
sean-k-mooney	it shoudl be resovlable	12:31
sean-k-mooney	i.e. if we see that the resouce does not match any of the generic_mdev types listed in teh config we know tis a pci passthough request	12:31
sean-k-mooney	but i tought that would complicate the spec more then need initally	12:32
gibi	ahh yeah	12:33
gibi	thanks for the background	12:33
gibi	next	12:33
gibi	I feel that both you and me want to keep the dependent device handling supported. But as stephenfin said, it is a lot of complexity	12:34
gibi	so just double checking it that you still think this is needed	12:34
gibi	as per https://review.opendev.org/c/openstack/nova-specs/+/791047/4/specs/zed/approved/pci-device-tracking-in-placement.rst#320	12:34
sean-k-mooney	honestly i would like to be able to remove it. but im concerned by the upgrade impact	12:34
sean-k-mooney	i do know we have customer that want to dynically choose if they consime a device a a PF or VF when they boot the workload	12:35
sean-k-mooney	but its not very reliable today	12:35
sean-k-mooney	as in its easy for vms to consume 1 vf on all the devices	12:36
gibi	I also think that there is many deployment out there that is was used without knowing it. I mean if somebody whitelisted a PF that had VFs then the VFs become scheduleable automatically	12:36
sean-k-mooney	basically meaning you can not allocate PF even though your could have allocate the vfs differntly	12:36
sean-k-mooney	right so i think what stephenfin had in mind was if you whitelist the PF and it has VFs we would only expose the VFs	12:37
sean-k-mooney	where as today unless you use the product_id to filter	12:37
sean-k-mooney	we expos both the VFs and PFs	12:37
sean-k-mooney	if we maintian the current behavior we obvilly need ot dynamically adjst the reserved value	12:37
gibi	yes, that is the complexity	12:38
sean-k-mooney	to emulated the unclaimable state	12:38
gibi	but it is solveable	12:38
gibi	I think I will keep this open for bauzas or other reviews to chime in	12:38
sean-k-mooney	sure	12:38
sean-k-mooney	we decided to reduce flexiblity for cpu pinning	12:39
sean-k-mooney	with isolate	12:39
sean-k-mooney	and we know that not everyone was happy with that	12:39
sean-k-mooney	we can elect to do the same here but we need to be deliberiate about it and comunicate it well if we want to force this change	12:40
ralonsoh	sean-k-mooney, sorry, I was having lunch	12:40
ralonsoh	what do you need?	12:40
sean-k-mooney	if we can live with the complexity then we proably shoudl keep it	12:40
sean-k-mooney	ralonsoh: i found it	12:40
gibi	OK, I will plan with the complexity	12:40
ralonsoh	ah perfect	12:40
sean-k-mooney	ralonsoh: https://review.opendev.org/q/topic:bp%252Fenable-sriov-nic-features	12:40
sean-k-mooney	ralonsoh: your code for tracking pci device in placment	12:40
sean-k-mooney	ralonsoh: we are just discussing the spec to enable it. gibi will be taking on that feature	12:41
ralonsoh	perfect	12:41
gibi	ralonsoh: https://review.opendev.org/c/openstack/nova-specs/+/791047/4/specs/zed/approved/pci-device-tracking-in-placement.rst this is the spec if you are interested :)	12:41
ralonsoh	sure	12:41
gibi	sean-k-mooney: so I have one more open question	12:42
sean-k-mooney	gibi: go for it	12:42
gibi	upgrade	12:42
gibi	obviously rolling upgrade is a pain	12:42
sean-k-mooney	ah well you burn down the datacenter and build a new one in the ashes	12:42
sean-k-mooney	obvisouly the least painful approch	12:42
gibi	in the PCPU case we did a fallback query	12:42
gibi	to allow schedling to not-yet upgraded computes	12:43
sean-k-mooney	yep	12:43
sean-k-mooney	we could make the prefilter configurable	12:43
gibi	I'm not sure how we did the allocation in that case	12:43
gibi	but	12:43
gibi	in the PCI case if we select a host based on the fallback query then on that host the scheduler will not allocate PCI devices in placement	12:43
sean-k-mooney	right so i would not use a fallback	12:44
sean-k-mooney	by default we shoudl reprot the inventores to placemnt	12:44
sean-k-mooney	and ahve a prefileter	12:44
sean-k-mooney	the prefilter would add the pci device request to the query	12:44
sean-k-mooney	and we disable it by default in zed	12:44
sean-k-mooney	then enable it by default in AA	12:44
sean-k-mooney	so you would rolling upgrade to Zed	12:45
sean-k-mooney	then enable the prefilter once all host are upgraded	12:45
sean-k-mooney	you likely would have to then do a heal-allocation like command to update the allcotiosn of existign instances	12:45
gibi	ahh i see	12:46
sean-k-mooney	we could also have a nova-status check	12:46
gibi	so we report devices but we don't allocate yet	12:46
sean-k-mooney	yep	12:46
gibi	then when every compute is ready we do a heal and then start allocating	12:46
sean-k-mooney	yes using the claim in the pci_devices table	12:47
gibi	yepp we keep using the claim and the pci_device table	12:47
gibi	anyhow	12:47
gibi	as that tracks exact VF PCI addresses	12:47
gibi	Placement wont	12:47
sean-k-mooney	yep so form the contolers	12:47
sean-k-mooney	we will have all the info in the pci_devices table to heal the allocations	12:48
sean-k-mooney	since we also have the parent adresses	12:48
sean-k-mooney	we can constuct the RP names	12:48
gibi	yepp	12:49
sean-k-mooney	we could consider	12:49
sean-k-mooney	if we can activate teh filter based on min compute service version	12:49
sean-k-mooney	if we were to do that we would likely need the compute agent to heal the allcoations automaticaly	12:50
sean-k-mooney	perhaps on starup or in the upsadate_avaiable_resouces periodic task	12:50
sean-k-mooney	im not sure if we want that level of compleixty but we already do reshapes in init_host	12:51
sean-k-mooney	do you think that is too much "magic"/complexity	12:52
sean-k-mooney	it would make the operator experince much nicer as it woudl just start working once everythin was upgraded	12:52
gibi	this reshap will be just RP creation, we won't move things	12:52
gibi	so calling that code from periodic feels OK	12:52
gibi	if we have that then it is safe to enable the prefilter automatically	12:53
gibi	by the compute min version	12:53
sean-k-mooney	right so when the compaute-agent start with the new code. the first time it create the inventroeis it would also update the allcoations	12:53
sean-k-mooney	and then the prefilter woudl activate once all computes are upgraded	12:53
sean-k-mooney	based on min verion check	12:54
sean-k-mooney	the proably i see with this would be move operations before the prefilter is enabled	12:54
sean-k-mooney	unless we have it continue to heal	12:54
sean-k-mooney	untill the min version reaces the required version	12:54
sean-k-mooney	there would be some inconsitency for a time but the pci_tracker would enforce the corret behaivor with regards to not over subscibing	12:55
gibi	yeah we have the pci tracker and pci claim as a fallback	12:55
gibi	so we can move even if the prefilter is disabled	12:56
gibi	just have to have a way to heal the placement allocation	12:56
gibi	eventually	12:56
sean-k-mooney	yep	12:56
gibi	OK, I think I got my answers	12:56
gibi	thank you for your time	12:56
gibi	I really appreciate it	12:56
sean-k-mooney	which again can use current and min service version to disable the healing when its not needed	12:57
sean-k-mooney	no worries	12:57
sean-k-mooney	im excited to see this moving forward	12:57
sean-k-mooney	will you summerise this in the spec	12:57
sean-k-mooney	perhaps like to the irc logs	12:57
sean-k-mooney	*link	12:57
gibi	I will do the summary	12:57
gibi	and linking to the log	12:57
gibi	then I will respin the spec	12:57
gibi	and trim the questions	12:58
gibi	I'm excited to stat coding up some of these in nova and watch them fail in the func env :)	12:58
gibi	it will be fun	12:58
sean-k-mooney	gibi: on a related not you reviewed Uggla spec. there was kind of an open question regarding updating hte AZ when you specify a host did you weigh in on that.	12:59
gibi	I saw it and I think it was settled, I had no objection. But then I will doulecheck	12:59
gibi	doublecheck	12:59
sean-k-mooney	gibi: ack ill try and review it again shortly so	12:59
sean-k-mooney	gibi: on a more selfish note i could also use your input on something else but its not super urgent https://review.opendev.org/c/openstack/nova/+/841017/1/nova/virt/libvirt/driver.py	13:01
sean-k-mooney	i dont think that is 100% correct but i works for vdpa i need to test it with VFs and other vnic-types	13:02
* gibi clicks		13:02
sean-k-mooney	basically we are curently unpluging neutron interface using _detach_pci_dev for suspend	13:02
sean-k-mooney	that does not work for vdpa and im pretty sure it does not work in general	13:03
sean-k-mooney	so i need to verify that and file a bug	13:03
gibi	I never tried suspend with PCI / neutron SRIOV. So I neither confirm now deny that it works	13:07
sean-k-mooney	it used to but its been a very long time since i checked it.	13:07
sean-k-mooney	so ya i need to test it with differnt backends	13:08
gibi	but your comments seems valid that if something is an interface then that cannot be detached as a hostdev	13:08
sean-k-mooney	i have the ablity to test hardware offloaded ovs and sriov at home and i still have the servers i was usign for vdpa although ill be giving those abck today	13:09
sean-k-mooney	so i can see if i can test the differnt combinations	13:10
sean-k-mooney	i think self.detach_interface(context, instance, vif) shoudl work however in all cases	13:10
sean-k-mooney	i dont really know why we have sepcial handelign for the host dev elements	13:10
sean-k-mooney	detach_interface	13:11
sean-k-mooney	is ment to be the abstraction here and its what is called when we call detach form the api	13:11
gibi	yepp detach inteface dynamically use hostdev or interface config object	13:12
sean-k-mooney	so i think i can just factor out the common code form https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L9811-L9829=	13:12
sean-k-mooney	making the migrate_data optionall effectivly	13:13
gibi	yepp	13:13
gibi	that seems doable	13:13
sean-k-mooney	basically i woudl jsut take in a list of vifs	13:13
sean-k-mooney	so what im wonderign is it better to adapt the curent fucion as i did in the wip patch	13:14
sean-k-mooney	or jsut do the refactor	13:15
sean-k-mooney	and call detach_interface	13:15
sean-k-mooney	via _detach_direct_passthrough_vifs	13:15
gibi	I would do the refactor and call detach_inteface but I'm biased with the detach_interface code :D	13:15
sean-k-mooney	well see i trust the detach_inteface code more	13:16
sean-k-mooney	its better tested	13:16
sean-k-mooney	ok thanks ill try and confirm my sepculation that suspend was broken and file a bug	13:17
gibi	cool	13:17
sean-k-mooney	one thing i need to bring up in the team meeting tomorrow is how to track the vdpa work	13:18
sean-k-mooney	https://review.opendev.org/q/topic:bug%252F1970467 the non WIP patch is the bug fix	13:18
sean-k-mooney	for the move operation that actully work	13:18
sean-k-mooney	the next 3 add attach/detach, suspend and hotplug live migration	13:19
sean-k-mooney	i feel like the last 3 shoudl be a specless blueprint or maybe a small spec	13:19
*** dasm\|off is now known as dasm		13:27
gibi	I'm OK with both direction. If there is no API change then I'm fine with specless but if you have open questions then those are easy to discuss via a spec	13:27
sean-k-mooney	there are no api change other then me removing the api block on the operation however i think i should be adding a compute service version bump for live migration	13:29
sean-k-mooney	to supprot rolling upgade	13:29
sean-k-mooney	i dont have that in the wip code	13:29
gibi	I think this still can fly as specless	13:31
sean-k-mooney	ack that is what i was hoping but if other felt differently i just wanted to get the spec up quickly	13:32
gibi	yeah it is worth to ask	13:32
opendevreview	Balazs Gibizer proposed openstack/nova-specs master: PCI device tracking in Placement https://review.opendev.org/c/openstack/nova-specs/+/791047	14:36
gibi	sean-k-mooney: updated according to our discussion ^^	14:36
* Uggla plays with tempest today. Unshelve to host should have a tempest test.		16:36
opendevreview	ribaudr proposed openstack/python-novaclient master: Microversion 2.91: Support specifying destination host to unshelve https://review.opendev.org/c/openstack/python-novaclient/+/831651	16:55
opendevreview	Merged openstack/nova stable/xena: Test aborting queued live migration https://review.opendev.org/c/openstack/nova/+/836145	17:24
opendevreview	Merged openstack/nova stable/xena: Add functional tests to reproduce bug #1960412 https://review.opendev.org/c/openstack/nova/+/836146	17:24
opendevreview	Merged openstack/nova stable/xena: Clean up when queued live migration aborted https://review.opendev.org/c/openstack/nova/+/836147	17:24
opendevreview	Merged openstack/nova stable/yoga: Retry in CellDatabases fixture when global DB state changes https://review.opendev.org/c/openstack/nova/+/840734	19:05
opendevreview	Artom Lifshitz proposed openstack/nova master: Reproduce bug 1952745 https://review.opendev.org/c/openstack/nova/+/841170	21:26
artom	I'm kinda proud of ^^	21:26
opendevreview	Artom Lifshitz proposed openstack/nova master: Reproduce bug 1952745 https://review.opendev.org/c/openstack/nova/+/841170	21:36
opendevreview	Artom Lifshitz proposed openstack/nova master: Reproduce bug 1952745 https://review.opendev.org/c/openstack/nova/+/841170	21:41
*** dasm is now known as dasm\|off		22:16
opendevreview	melanie witt proposed openstack/nova stable/ussuri: Define new functional test tox env for placement gate to run https://review.opendev.org/c/openstack/nova/+/840771	23:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!