Thursday, 2025-11-13

*** mhen_ is now known as mhen		02:43
kklimaszewski	Hello, I'm Karol Klimaszewski - one of the presenters of NUMA in Placement at OpenInfra. While my colleague Dominik is still working on making patches related to that feature, I wanted to ask about another nova feature made by CloudFerro that was mentioned during conversation after our presentation. Well, it is more like two features. First is ephemeral storage backend for libvirt driver based on local NVMe disks managed by SPDK.	10:10
kklimaszewski	Second is adding to nova the ability to handle multiple ephemeral storage backends on one nova-compute. I was wondering if there is any interest in us introducing those features to the upstream? And if yes, then is creating a blueprint and a spec proposal at nova-specs opendev repo a good first step to make it clear what we are trying to introduce?	10:10
opendevreview	Stephen Finucane proposed openstack/nova master: WIP: libvirt: Ensure LibvirtDriver._host is initialized https://review.opendev.org/c/openstack/nova/+/967005	10:14
stephenfin	gibi: interested in your thoughts on ^	10:15
stephenfin	If you think it makes sense, I can fix up the tests (or try bribe sean-k-mooney to do it for me...)	10:15
sean-k-mooney	gibi did you get an resolution to your lock question. i also dont know if that is really required.	11:33
sean-k-mooney	gibi looking at the usage	11:34
sean-k-mooney	https://github.com/openstack/nova/blob/b7d50570c7a79a38b0db6476ccb3c662b237f69b/nova/compute/provider_tree.py#L269-L278	11:35
sean-k-mooney	im not seeing any get_or_create type pattern where we woudl eighter get a cached value or create it	11:36
sean-k-mooney	at lset not in tha tfuction but we obvioulsy are doign esit chace else where https://github.com/openstack/nova/blob/b7d50570c7a79a38b0db6476ccb3c662b237f69b/nova/compute/provider_tree.py#L409-L423	11:37
sean-k-mooney	the ironic case sort of makes sesen but i wonder is there also an interaction with share resouce providers and or provider.yaml at play	11:38
sean-k-mooney	gibi: lookign a the the patch that intoduced, its clear that this exest to preveve concurrent modification f the provider try while iterating over it so a remove and find or add can happen a th esame time. why it not usign say the compute node uuid instead of a single lock. the only thin ghta makes sense to me in the orginal patch is the for over the comptue nodes. i	11:49
sean-k-mooney	personally woudl have modele this very slightly diffent. the placemetn resouces manage by a compuate agent are a forest fo trees not a signel tree. so i would not have accpeted a list of compute nodes however you wuld still have needed a list a of root for sharing resouce providers, i think if we upleveled the compute nodes to a seperate map of cn uuid to provider tree the	11:49
sean-k-mooney	lock in the provder tree coudl use the cn.uuid but the map woudl still likely need to ahve a single lock	11:49
sean-k-mooney	we did refactor this a bit but we never actully turnd it into a forest of trees	11:52
opendevreview	sean mooney proposed openstack/nova master: ensure correct cleanup of multi-attach volumes https://review.opendev.org/c/openstack/nova/+/916322	11:56
opendevreview	sean mooney proposed openstack/nova master: ensure correct cleanup of multi-attach volumes https://review.opendev.org/c/openstack/nova/+/916322	11:57
gibi	stephenfin: replied in the commit	12:28
gibi	sean-k-mooney: I'm questioning the need of a named and therefore global lock for any instances of ProviderTree in the process. Even if it is as simple libvirt compute with just a single root provider. I don't see why we need to ensure that two copies of the ProviderTree is not accessed at the same time. I totally understand why we would use a threading.Lock per ProviderTree instance so concurrent	12:32
gibi	access to the data is synchronized.	12:32
gibi	but anyhow I keep the global named lock for the ProviderTree in https://review.opendev.org/c/openstack/nova/+/956091 I just feel like it is totally unnecessary	12:33
sean-k-mooney	yep i realsise that. im just wondering is there any shareing here btween the reshaps that can happen due ot OTU devices and the perodics	12:33
sean-k-mooney	i dont think we share the same PT between treads today	12:34
sean-k-mooney	so a simpel lock is likely enough	12:34
gibi	when we update placement we update that based on a specific ProviderTree instance. A copy of that object does not matter as it is an independent copy	12:34
sean-k-mooney	the real qustion for me is do we ever store this PT in a shared location like the resouce tracker	12:34
sean-k-mooney	or is it only constucted in side a function	12:35
gibi	we pass around PTs	12:35
gibi	and we alos copy them around in some cases	12:35
gibi	because we pass arond a threading.lock is needed	12:35
sean-k-mooney	right so its pass down the call stack but not in shared mutable storage either because of a copy or how its constucted	12:35
sean-k-mooney	im not directly seeing a need for it to be a named lock either for what its worth	12:36
gibi	even if we pass to a shared mutable storage, we pass an instance there with a threading lock is enough, we don't need to share the lock across the copies because they are independent deep copies	12:36
gibi	actually in nova-compute a named lock is never needed as the compute is a single process	12:37
gibi	so we only sync across threads so a threading.lock is enough	12:37
gibi	in the scheduler or conductor a named lock might be needed iff we share data across worker processes	12:37
gibi	like DB state	12:38
gibi	anyhow I stop here. I just noticed it and felt strange, but I can ignore it and be safe that it will keep working as before	12:39
opendevreview	Balazs Gibizer proposed openstack/nova master: Compute manager to use thread pools selectively https://review.opendev.org/c/openstack/nova/+/966016	14:20
opendevreview	Balazs Gibizer proposed openstack/nova master: Libvirt event handling without eventlet https://review.opendev.org/c/openstack/nova/+/965949	14:20
opendevreview	Balazs Gibizer proposed openstack/nova master: Run nova-compute in native threading mode https://review.opendev.org/c/openstack/nova/+/965467	14:20
gibi	gmaan: FYI https://review.opendev.org/c/openstack/nova/+/966016 proposes changing the default value of some of our config options. After the discussion of dansmith and sean-k-mooney we think this is OK from compatibility perspective via having an upgrade relnote. Do you agree?	14:21
opendevreview	Balazs Gibizer proposed openstack/nova master: Compute manager to use thread pools selectively https://review.opendev.org/c/openstack/nova/+/966016	14:32
opendevreview	Balazs Gibizer proposed openstack/nova master: Libvirt event handling without eventlet https://review.opendev.org/c/openstack/nova/+/965949	14:32
opendevreview	Balazs Gibizer proposed openstack/nova master: Run nova-compute in native threading mode https://review.opendev.org/c/openstack/nova/+/965467	14:32
opendevreview	Dan Smith proposed openstack/nova master: Test nova-next with >1 parallel migrations https://review.opendev.org/c/openstack/nova/+/966447	14:33
gibi	stephenfin: after a bit of deliberation and a failed trial I'm OK with the direction in 967005	15:08
gibi	967005	15:08
gibi	https://review.opendev.org/c/openstack/nova/+/967005	15:08
gibi	(wtf happens with my copy paste buffer)	15:08
gibi	stephenfin: will you fix the 700 failed unit test or looking for a volunteer to take over?	15:09
dansmith	gibi: I still haven't looked at your patch to fix.. are you saying that one is not working and this is the one we need to pursue?	15:10
gibi	nope, this is a followup top of mine	15:10
dansmith	okay	15:10
gibi	to prevent later patches re-introducing the issue	15:10
gibi	~ the generic issue of calling driver methods before driver.init_host	15:11
stephenfin	gibi: sounds good: I had tried an alternative involving a metaclass but it was feeling...complicated	15:11
gibi	metaclass, I haven't thought about it. Woudl that sidestep our inheritance pattern?	15:11
gibi	anyhow	15:12
gibi	I'm OK not going there :)	15:12
opendevreview	Stephen Finucane proposed openstack/nova master: virt: Ensure init_host is always called https://review.opendev.org/c/openstack/nova/+/967051	15:12
stephenfin	gibi: that's the patch (incomplete)	15:12
gibi	ahh OK, so that basically automating injecting a wrapper to the normal virt calls. I think that will not work as the virt interface function is not called by the implementation as most of the virt inteface calls raising NotImplementedError	15:14
stephenfin	yeah, that's what I'd figured out before lunch: we'd need to use the metaclass on the actual implementation rather than the base class. And ideally only wrap methods defined in the base class (not that drivers should have other public methods)	15:15
gibi	yeah	15:15
gibi	then I think we felt the same pain	15:16
gibi	anyhow let me know if you need help with the unit test or need a re-review if you jump on the unit test by yourself	15:16
*** jizaymes_ is now known as jizaymes		16:04
vsaienko	hey sean-k-mooney: maybe it will be interesting there are performance tests for different scenarios (asap2, vdpa with packed ring 8 queues, virtio single queue, virtio packed + 8 queues) https://jumpshare.com/s/xHU2UTQoKoeKwAWDFsa6 https://jumpshare.com/s/mLQ1u29xOyMftaULc0lJ so far VDPA does not show super significant improvements on 10Gb.	16:07
vsaienko	scenarios are iperf tests, arguments in each scenario description	16:08
sean-k-mooney	i think that is in line with expectation	16:18
vsaienko	I thought that vdpa will be near same as sriov	16:19
sean-k-mooney	for tcp at least	16:19
vsaienko	maybe it will be closer on higher rates 50Gbps or 100Gbps	16:20
sean-k-mooney	vdpa can by connectx6 dx is not doign vdpa fully in hardware	16:20
vsaienko	nvidiay says they support hardware vdpa on connect 6dx	16:21
sean-k-mooney	can you expalin the labelign by the way? when you saw swtich deve you mean hardware offloead ovs with a driect assigned VF i assume	16:21
sean-k-mooney	vsaienko: yes but its not done fully in an asic it does soem fo the data trasformation using firware/cpu comptue on the nic	16:22
vsaienko	switchdev - sriov + asap2 hardware offload, vdpa-packed - its packed ring + 8 queues, virtio def - is default flavor, no multiqueue or packed ring, virtio-packed - packed ring + 8 queues	16:23
vsaienko	vdpa with hardware offload as well	16:23
vsaienko	between hosts VXLAN network, mtu 1450 (no jumboframes) on all configurations	16:23
sean-k-mooney	ok because the defautl is usign vdpa so its not correct to call it virtio	16:23
sean-k-mooney	ah you also adding vxlan overhead	16:24
sean-k-mooney	so is this inter host traffic i.e. 2 vms on diffent hosts or 2 vms on the same host	16:24
vsaienko	virtio def is default neutron port =normal which is virtio	16:25
sean-k-mooney	its technaily kernel vhost	16:25
vsaienko	all scenarios between VMs on different hosts	16:25
sean-k-mooney	virtio woudl be the virto stack in ovs without the kernel vhost-net module offloading it	16:26
sean-k-mooney	this looks to me like you are hittig a code path that is not supproted for hardware offload	16:27
sean-k-mooney	for example the conenction tracker used for security groups is not fully offloadable to hardware	16:28
sean-k-mooney	are you using ml2/ovs?	16:28
vsaienko	its ovs, no qos no security groups, in TC I see that traffic is offloaded	16:28
sean-k-mooney	well in https://jumpshare.com/s/xHU2UTQoKoeKwAWDFsa6	16:29
sean-k-mooney	you are not hitting line rate env in the swtichdev case which is effectivly jsut using sriov for the dataplane	16:29
sean-k-mooney	with ovs providing the contol plane	16:29
vsaienko	yes	16:30
sean-k-mooney	that to me suggest you are hittign bottel necks in teh trafic generateion as well	16:30
vsaienko	not sure between hosts on same hypervisor iperf can show more, its new version with multithreads	16:31
sean-k-mooney	well ierf it not really godo for doing real testing	16:31
sean-k-mooney	you can use -p to have it use multipel cores	16:31
sean-k-mooney	but if you want ot test this properly you eigher need a hardware traffic generator or somethign like trex	16:32
vsaienko	yes, there are scenarios with p1 - its -P 1, p30 - -P 30	16:32
sean-k-mooney	https://trex-tgn.cisco.com/	16:32
sean-k-mooney	right but for 1400 by packets you shoudl be able to hit 10G even with 1 thread	16:33
sean-k-mooney	if the hsot are other wise idel with 1400 by packets even standar kernel ovs shoudl be able to hit 10G	16:34
sean-k-mooney	so to me the performac eof this host/vms is very much below what i woudl expect	16:34
sean-k-mooney	vsaienko: what version of opesntack are you using	16:36
vsaienko	but why with sriov it provides 6+gbps	16:36
vsaienko	its epoxy	16:36
vsaienko	3.x openvswitch	16:36
sean-k-mooney	so 6GBPS is also too low for sriov	16:36
vsaienko	maybe indeed its a limitation of iperf	16:37
sean-k-mooney	so if your using epoxy you shoudl have https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L107-L126	16:37
sean-k-mooney	but can you check in ovs what the qos policy on the ovs port is set too	16:37
sean-k-mooney	and confirm its linux-noop	16:37
vsaienko	qos plugin is disabled on my env	16:37
sean-k-mooney	that is not what im asking	16:38
vsaienko	but thanks for pointing this out, I haven't seen this option	16:38
sean-k-mooney	in new version fo systemd or the kernel the default qdisg was changed to fq_codel	16:38
sean-k-mooney	now we added this option to make sure tha tthat was disabled on teh tap devices	16:38
vsaienko	interesting	16:39
sean-k-mooney	because if we dont ti results in massive perfroamce hti	16:39
vsaienko	let me check this	16:39
vsaienko	it should be default	16:39
sean-k-mooney	yep	16:39
sean-k-mooney	so we need ot check that 1 the policy in ovs set by os-vif is linux-noop	16:39
sean-k-mooney	but we shoudl also check that the representor netdev added to ovs does not have a qos policy applied with tc	16:40
vsaienko	its not explicitly set, so default is picked	16:40
sean-k-mooney	right so im askign you to conrimf this because this was never tested for vdpa or hardware offloaded ovs	16:41
sean-k-mooney	it shoudl get appled but it may not be	16:41
sean-k-mooney	and if its not beign set in the ovs db then it coudl reduce the performacne	16:42
vsaienko	https://paste.openstack.org/show/bZeQE1UxDhnNWIpJmiCb/	16:43
sean-k-mooney	yep that the problem	16:44
sean-k-mooney	its using fqcodel	16:44
sean-k-mooney	create /etc/sysctl.conf.d/99_qdisk.conf and add net.core.default_qdisc = pfifo	16:44
sean-k-mooney	thatthen apply it and try doing it again with a new vm	16:45
vsaienko	ok thanks, let me try	16:45
sean-k-mooney	well that might work but lets confirm something	16:46
sean-k-mooney	what is eth1 in this case	16:46
sean-k-mooney	is it teh respreentor netdev for the vdpa device	16:46
sean-k-mooney	i expect it si based on the altname	16:47
sean-k-mooney	enp33s0f1npf1vf1	16:47
sean-k-mooney	if that is the case and the perfoamce impves when you set your defautl qdisc ot pfifo	16:47
sean-k-mooney	the the issue you are hittig is the os-vif change is not properly handelign this edge case	16:48
sean-k-mooney	it was intened to fix https://bugs.launchpad.net/os-vif/zed/+bug/2017868 for standard kernel ovs	16:49
sean-k-mooney	vsaienko: if your using iperf to test you shousl also ensure that you have disable fqcodel i nteh guest. https://trex-tgn.cisco.com/ is a lot more work to set up but it woudl elimiante any worry that the traffic genreator is the bottel neck. maybe book mark that for future reading	16:54
vsaienko	ack let me try this	16:57
vsaienko	thanks	16:58
vsaienko	sean-k-mooney: pfifo improves situation https://jumpshare.com/s/GkgTEynhPdHlvWptjih5	17:29
gmaan	gibi: RE default value change of config options (966016): yes it is fine as long as we are adding it in upgrade release notes. oslo.config have way to respect both old and new when config name is changed but for default value change, yes upgrade notes is right things.	17:31
sean-k-mooney	vsaienko: ack it will likely improve the other results too if you ened up testing them	17:33
sean-k-mooney	we know it can limit perforamce to <10% form some reports	17:33
sean-k-mooney	diffent backend are affected more or less	17:34
sean-k-mooney	os-vif obviouly has a blind spot for hardware offloaded ports that we shoudl fix	17:34
sean-k-mooney	but the root of the issue was the changeing fo behviro in ovs and systemd	17:34
sean-k-mooney	the defuatl qdisk was change about 3-4 year ago it used to be pfifo by default	17:35
sean-k-mooney	ovs also change to not ignoring/removing the qdisc on port added to it by defualt	17:35
gibi	gmaan: thanks! I added the upgrade reno	17:36
sean-k-mooney	vsaienko: https://wiki.archlinux.org/title/Advanced_traffic_control#CoDel_and_Fair_Queueing_CoDel	17:37
sean-k-mooney	that is some of the history on this changes it hapspen din systemd 217	17:38
opendevreview	Nicolai Ruckel proposed openstack/nova master: Preserve UEFI NVRAM variable store https://review.opendev.org/c/openstack/nova/+/959682	17:41
gmaan	anyone face/know the issue of not seeing 'comoute.<hostname>' queue in devstack env ? I am listing all queues (sudo rabiitmqctl list_queues) and it show all queues except compute service queues (oslo.message create three queues per compute service 1. 'compute' 2. 'compute.<hostname>' 3. compute fanout with uuids)	17:44
gmaan	it list all other queues for example sch, conductor, cinder queues	17:45
opendevreview	Nicolai Ruckel proposed openstack/nova master: Preserve UEFI NVRAM variable store https://review.opendev.org/c/openstack/nova/+/959682	17:47
nicolairuckel	sean-k-mooney, I found a few more places with code that wasn't necessary anymore	17:48
nicolairuckel	I think it should be a lot cleaner now	17:48
sean-k-mooney	yep im +1 on your patch after skimming it	17:48
sean-k-mooney	ill upgrade to +2 after i have time to test it and read over it again	17:48
nicolairuckel	perfect :)	17:48
sean-k-mooney	gmaan: so it shoudl be "compute.{CONF.host}" but if the queue is not there it normally is a netwrokign or auth issue, i belive it can also happen if we exit because fo say the missing compute node id file	17:50
sean-k-mooney	i.e. if the agent stop runing and the queue gets deleted	17:51
gmaan	compute service is up, instance are booting fine so no issue on that side	17:51
sean-k-mooney	are you lookig at the correct exchange?	17:51
gmaan	if you have devstack running up, can you try if you see the comoute queues?	17:52
gmaan	yes, nove and even i checked all exhange	17:52
sean-k-mooney	sure i have one	17:52
gmaan	before i boot new VM and install devstack i just want to cofirm if something in my machine	17:52
gmaan	thanks	17:52
sean-k-mooney	i have one on a host internally i can add you ssh key to it and you can ssh in and take a look if you want	17:53
gmaan	its fine, if you can just check if any queue name with 'compute*'or paste the outphut of sudo rabiitmqctl list_queues	17:54
sean-k-mooney	https://termbin.com/rzms	17:55
sean-k-mooney	im not seeing any	17:56
gmaan	yeah, same issue, no compute queue there	17:56
sean-k-mooney	ah	17:57
sean-k-mooney	so we use vhosts in devstack	17:57
gmaan	instance booting successfully make sure no issue in queue creation but somehow it is not listed in rabbitmq command	17:57
sean-k-mooney	and the compute are in nova_cell1	17:57
gmaan	yeah vhosts we use	17:58
sean-k-mooney	https://termbin.com/01tb	17:58
sean-k-mooney	that the output form "rabbitmqctl list_queues --vhost nova_cell1 \| nc termbin.com 9999"	17:58
sean-k-mooney	compute.hibernal01	17:58
sean-k-mooney	is the compute queue	17:58
gmaan	ah yeah, it is listed now. thanks sean-k-mooney ++	17:59
sean-k-mooney	no worries.	18:00
-opendevstatus- NOTICE: The OpenDev team will be restarting Gerrit at approximately 2130 UTC in order to pick up the latest 3.10 bugfix release.		20:32
*** haleyb is now known as haleyb\|out		22:56

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!