Thursday, 2025-11-13

*** mhen_ is now known as mhen02:43
kklimaszewskiHello, I'm Karol Klimaszewski - one of the presenters of NUMA in Placement at OpenInfra. While my colleague Dominik is still working on making patches related to that feature, I wanted to ask about another nova feature made by CloudFerro that was mentioned during conversation after our presentation. Well, it is more like two features. First is ephemeral storage backend for libvirt driver based on local NVMe disks managed by SPDK. 10:10
kklimaszewskiSecond is adding to nova the ability to handle multiple ephemeral storage backends on one nova-compute. I was wondering if there is any interest in us introducing those features to the upstream? And if yes, then is creating a blueprint and a spec proposal at nova-specs opendev repo a good first step to make it clear what we are trying to introduce?10:10
opendevreviewStephen Finucane proposed openstack/nova master: WIP: libvirt: Ensure LibvirtDriver._host is initialized  https://review.opendev.org/c/openstack/nova/+/96700510:14
stephenfingibi: interested in your thoughts on ^10:15
stephenfinIf you think it makes sense, I can fix up the tests (or try bribe sean-k-mooney to do it for me...)10:15
sean-k-mooney gibi  did you get an resolution to your lock question. i also dont know if that  is really required.11:33
sean-k-mooney gibi  looking at the usage11:34
sean-k-mooneyhttps://github.com/openstack/nova/blob/b7d50570c7a79a38b0db6476ccb3c662b237f69b/nova/compute/provider_tree.py#L269-L27811:35
sean-k-mooneyim not seeing any get_or_create type pattern where we woudl eighter get a cached value or create it11:36
sean-k-mooneyat lset not in tha tfuction but we obvioulsy are doign esit chace else where https://github.com/openstack/nova/blob/b7d50570c7a79a38b0db6476ccb3c662b237f69b/nova/compute/provider_tree.py#L409-L42311:37
sean-k-mooneythe ironic case sort of makes sesen but i wonder is there also an interaction with share resouce providers and or provider.yaml at play11:38
sean-k-mooneygibi: lookign a the the patch that intoduced, its clear that this exest to preveve concurrent modification f the provider try while iterating over it so a remove and find or add can happen a th esame time. why it not usign say the compute node uuid instead of a single lock. the only thin ghta makes sense to me in the orginal patch is the for over the comptue nodes. i11:49
sean-k-mooneypersonally woudl have modele this very slightly diffent. the placemetn resouces manage by a compuate agent are a forest fo trees not a signel tree. so i would not have accpeted a list of compute nodes however you wuld still have needed a list a of root for sharing resouce providers, i think if we upleveled the compute nodes to a seperate map of cn uuid to provider tree the11:49
sean-k-mooneylock in the provder tree coudl use the cn.uuid but the map woudl still likely need to ahve a single lock11:49
sean-k-mooneywe did refactor this a bit but we never actully turnd it into a forest of trees11:52
opendevreviewsean mooney proposed openstack/nova master: ensure correct cleanup of multi-attach volumes  https://review.opendev.org/c/openstack/nova/+/91632211:56
opendevreviewsean mooney proposed openstack/nova master: ensure correct cleanup of multi-attach volumes  https://review.opendev.org/c/openstack/nova/+/91632211:57
gibistephenfin: replied in the commit 12:28
gibisean-k-mooney: I'm questioning the need of a named and therefore global lock for any instances of ProviderTree in the process. Even if it is as simple libvirt compute with just a single root provider. I don't see why we need to ensure that two copies of the ProviderTree is not accessed at the same time. I totally understand why we would use a threading.Lock per ProviderTree instance so concurrent 12:32
gibiaccess to the data is synchronized.12:32
gibibut anyhow I keep the global named lock for the ProviderTree in https://review.opendev.org/c/openstack/nova/+/956091 I just feel like it is totally unnecessary12:33
sean-k-mooneyyep i realsise that. im just wondering is there any shareing here btween the reshaps that can happen due ot OTU devices and the perodics12:33
sean-k-mooneyi dont think we share the same PT between treads today12:34
sean-k-mooneyso a simpel lock is likely enough12:34
gibiwhen we update placement we update that based on a specific ProviderTree instance. A copy of that object does not matter as it is an independent copy12:34
sean-k-mooneythe real qustion for me is do we ever store this PT in a shared location like the resouce tracker 12:34
sean-k-mooneyor is it only constucted in side a function12:35
gibiwe pass around PTs12:35
gibiand we alos copy them around in some cases12:35
gibibecause we pass arond a threading.lock is needed12:35
sean-k-mooneyright so its pass down the call stack but not in shared mutable storage either because of a copy or how its constucted12:35
sean-k-mooneyim not directly seeing a need for it to be a named lock either for what its worth12:36
gibieven if we pass to a shared mutable storage, we pass an instance there with a threading lock is enough, we don't need to share the lock across the copies because they are independent deep copies12:36
gibiactually in nova-compute a named lock is never needed as the compute is a single process12:37
gibiso we only sync across threads so a threading.lock is enough12:37
gibiin the scheduler or conductor a named lock might be needed iff we share data across worker processes12:37
gibilike DB state12:38
gibianyhow I stop here. I just noticed it and felt strange, but I can ignore it and be safe that it will keep working as before12:39
opendevreviewBalazs Gibizer proposed openstack/nova master: Compute manager to use thread pools selectively  https://review.opendev.org/c/openstack/nova/+/96601614:20
opendevreviewBalazs Gibizer proposed openstack/nova master: Libvirt event handling without eventlet  https://review.opendev.org/c/openstack/nova/+/96594914:20
opendevreviewBalazs Gibizer proposed openstack/nova master: Run nova-compute in native threading mode  https://review.opendev.org/c/openstack/nova/+/96546714:20
gibigmaan: FYI https://review.opendev.org/c/openstack/nova/+/966016 proposes changing the default value of some of our config options. After the discussion of dansmith and sean-k-mooney we think this is OK from compatibility perspective via having an upgrade relnote. Do you agree?14:21
opendevreviewBalazs Gibizer proposed openstack/nova master: Compute manager to use thread pools selectively  https://review.opendev.org/c/openstack/nova/+/96601614:32
opendevreviewBalazs Gibizer proposed openstack/nova master: Libvirt event handling without eventlet  https://review.opendev.org/c/openstack/nova/+/96594914:32
opendevreviewBalazs Gibizer proposed openstack/nova master: Run nova-compute in native threading mode  https://review.opendev.org/c/openstack/nova/+/96546714:32
opendevreviewDan Smith proposed openstack/nova master: Test nova-next with >1 parallel migrations  https://review.opendev.org/c/openstack/nova/+/96644714:33
gibistephenfin: after a bit of deliberation and a failed trial I'm OK with the direction in 96700515:08
gibi96700515:08
gibihttps://review.opendev.org/c/openstack/nova/+/96700515:08
gibi(wtf happens with my copy paste buffer)15:08
gibistephenfin: will you fix the 700 failed unit test or looking for a volunteer to take over?15:09
dansmithgibi: I still haven't looked at your patch to fix.. are you saying that one is not working and this is the one we need to pursue?15:10
gibinope, this is a followup top of mine15:10
dansmithokay15:10
gibito prevent later patches re-introducing the issue15:10
gibi~ the generic issue of calling driver methods before driver.init_host15:11
stephenfingibi: sounds good: I had tried an alternative involving a metaclass but it was feeling...complicated15:11
gibimetaclass, I haven't thought about it. Woudl that sidestep our inheritance pattern?15:11
gibianyhow15:12
gibiI'm OK not going there :)15:12
opendevreviewStephen Finucane proposed openstack/nova master: virt: Ensure init_host is always called  https://review.opendev.org/c/openstack/nova/+/96705115:12
stephenfingibi: that's the patch (incomplete)15:12
gibiahh OK, so that basically automating injecting a wrapper to the normal virt calls. I think that will not work as the virt interface function is not called by the implementation as most of the virt inteface calls raising NotImplementedError 15:14
stephenfinyeah, that's what I'd figured out before lunch: we'd need to use the metaclass on the actual implementation rather than the base class. And ideally only wrap methods defined in the base class (not that drivers should have other public methods)15:15
gibiyeah15:15
gibithen I think we felt the same pain15:16
gibianyhow let me know if you need help with the unit test or need a re-review if you jump on the unit test by yourself15:16
*** jizaymes_ is now known as jizaymes16:04
vsaienkohey sean-k-mooney: maybe it will be interesting there are performance tests for different scenarios (asap2, vdpa with packed ring 8 queues, virtio single queue, virtio packed + 8 queues) https://jumpshare.com/s/xHU2UTQoKoeKwAWDFsa6 https://jumpshare.com/s/mLQ1u29xOyMftaULc0lJ so far VDPA does not show super significant improvements on 10Gb.16:07
vsaienkoscenarios are iperf tests, arguments in each scenario description16:08
sean-k-mooney i think that is in line with expectation16:18
vsaienkoI thought that vdpa will be near same as sriov 16:19
sean-k-mooneyfor tcp at least16:19
vsaienkomaybe it will be closer on higher rates 50Gbps or 100Gbps16:20
sean-k-mooneyvdpa can by connectx6 dx is not doign vdpa fully in hardware16:20
vsaienkonvidiay says they support hardware vdpa on connect 6dx16:21
sean-k-mooneycan you expalin the labelign by the way? when you saw swtich deve you mean hardware offloead ovs with a driect assigned VF i assume16:21
sean-k-mooneyvsaienko: yes but its not done fully in an asic it does soem fo the data trasformation using firware/cpu comptue on the nic16:22
vsaienkoswitchdev - sriov + asap2 hardware offload, vdpa-packed - its packed ring + 8 queues, virtio def - is default flavor, no multiqueue or packed ring, virtio-packed - packed ring + 8 queues16:23
vsaienkovdpa with hardware offload as well16:23
vsaienkobetween hosts VXLAN network, mtu 1450 (no jumboframes) on all configurations16:23
sean-k-mooneyok because the defautl is usign vdpa so its not correct to call it virtio16:23
sean-k-mooneyah you also adding vxlan overhead16:24
sean-k-mooneyso is this inter host traffic i.e. 2 vms on diffent hosts or 2 vms on the same host16:24
vsaienkovirtio def is default neutron port =normal which is virtio16:25
sean-k-mooneyits technaily kernel vhost16:25
vsaienkoall scenarios between VMs on different hosts16:25
sean-k-mooneyvirtio woudl be the virto stack in ovs without the kernel vhost-net module offloading it16:26
sean-k-mooneythis looks to me like you are hittig a code path that is not supproted for hardware offload16:27
sean-k-mooneyfor example the conenction tracker used for security groups is not fully offloadable to hardware16:28
sean-k-mooneyare you using ml2/ovs?16:28
vsaienkoits ovs, no qos no security groups, in TC I see that traffic is offloaded16:28
sean-k-mooneywell in https://jumpshare.com/s/xHU2UTQoKoeKwAWDFsa616:29
sean-k-mooneyyou are not hitting line rate env in the swtichdev case which is effectivly jsut using sriov for the dataplane16:29
sean-k-mooneywith ovs providing the contol plane16:29
vsaienkoyes16:30
sean-k-mooneythat to me suggest you are hittign bottel necks in teh trafic generateion as well16:30
vsaienkonot sure between hosts on same hypervisor iperf can show more, its new version with multithreads16:31
sean-k-mooneywell ierf it not really godo for doing real testing16:31
sean-k-mooneyyou can use -p to have it use multipel cores16:31
sean-k-mooneybut if you want ot test this properly you eigher need a hardware traffic generator or somethign like trex16:32
vsaienkoyes, there are scenarios with p1 - its -P 1, p30 - -P 3016:32
sean-k-mooneyhttps://trex-tgn.cisco.com/16:32
sean-k-mooneyright but for 1400 by packets you shoudl be able to hit 10G even with 1 thread16:33
sean-k-mooneyif the hsot are other wise idel with 1400 by packets even standar kernel ovs shoudl be able to hit 10G16:34
sean-k-mooneyso to me the performac eof this host/vms is very much below what i woudl expect16:34
sean-k-mooneyvsaienko: what version of opesntack are you using16:36
vsaienkobut why with sriov it provides 6+gbps16:36
vsaienkoits epoxy16:36
vsaienko3.x openvswitch16:36
sean-k-mooneyso 6GBPS is also too low for sriov16:36
vsaienkomaybe indeed its a limitation of iperf16:37
sean-k-mooneyso if your using epoxy you shoudl have https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L107-L12616:37
sean-k-mooneybut can you check in ovs what the qos policy on the ovs port is set too16:37
sean-k-mooneyand confirm its linux-noop 16:37
vsaienkoqos plugin is disabled on my env16:37
sean-k-mooneythat is not what im asking16:38
vsaienkobut thanks for pointing this out, I haven't seen this option16:38
sean-k-mooneyin new version fo systemd or the kernel the default qdisg was changed to fq_codel16:38
sean-k-mooneynow we added this option to make sure tha tthat was disabled on teh tap devices16:38
vsaienkointeresting16:39
sean-k-mooneybecause if we dont ti results in massive perfroamce hti16:39
vsaienkolet me check this 16:39
vsaienkoit should be default16:39
sean-k-mooneyyep16:39
sean-k-mooneyso we need ot check that 1 the policy in ovs set by os-vif is linux-noop16:39
sean-k-mooneybut we shoudl also check that the representor netdev added to ovs does not have a qos policy applied with tc16:40
vsaienkoits not explicitly set, so default is picked16:40
sean-k-mooneyright so im askign you to conrimf this because this was never tested for vdpa or hardware offloaded ovs16:41
sean-k-mooneyit shoudl get appled but it may not be 16:41
sean-k-mooneyand if its not beign set in the ovs db then it coudl reduce the performacne16:42
vsaienkohttps://paste.openstack.org/show/bZeQE1UxDhnNWIpJmiCb/16:43
sean-k-mooneyyep that the problem16:44
sean-k-mooneyits using fqcodel16:44
sean-k-mooneycreate  /etc/sysctl.conf.d/99_qdisk.conf and add net.core.default_qdisc = pfifo16:44
sean-k-mooneythatthen apply it and try doing it again with a new vm16:45
vsaienkook thanks, let me try16:45
sean-k-mooneywell that might work but lets confirm something16:46
sean-k-mooneywhat is eth1 in this case16:46
sean-k-mooneyis it teh respreentor netdev for the vdpa device16:46
sean-k-mooneyi expect it si based on the altname16:47
sean-k-mooneyenp33s0f1npf1vf116:47
sean-k-mooneyif that is the case and the perfoamce impves when you set your defautl qdisc ot pfifo16:47
sean-k-mooneythe the issue you are hittig is the os-vif change is not properly handelign this edge case16:48
sean-k-mooneyit was intened to fix https://bugs.launchpad.net/os-vif/zed/+bug/2017868 for standard kernel ovs16:49
sean-k-mooneyvsaienko: if your using iperf to test you shousl also ensure that you have disable fqcodel i nteh guest. https://trex-tgn.cisco.com/ is a lot more work to set up but it woudl elimiante any worry that the traffic genreator is the bottel neck. maybe book mark that for future reading16:54
vsaienkoack let me try this16:57
vsaienkothanks 16:58
vsaienkosean-k-mooney: pfifo improves situation https://jumpshare.com/s/GkgTEynhPdHlvWptjih517:29
gmaangibi: RE default value change of config options (966016): yes it is fine as long as we are adding it in upgrade release notes. oslo.config have way to respect both old and new when config name is changed but for default value change, yes upgrade notes is right things.17:31
sean-k-mooneyvsaienko: ack it will likely improve the other results too if you ened up testing them17:33
sean-k-mooneywe know it can limit perforamce to <10% form some reports17:33
sean-k-mooneydiffent backend are affected more or less17:34
sean-k-mooneyos-vif obviouly has a blind spot for hardware offloaded ports that we shoudl fix17:34
sean-k-mooneybut the root of the issue was the changeing fo behviro in ovs and systemd17:34
sean-k-mooneythe defuatl qdisk was change about 3-4 year ago it used to be pfifo by default17:35
sean-k-mooneyovs also change to not ignoring/removing the qdisc on port added to it by defualt17:35
gibigmaan: thanks! I added the upgrade reno 17:36
sean-k-mooneyvsaienko: https://wiki.archlinux.org/title/Advanced_traffic_control#CoDel_and_Fair_Queueing_CoDel17:37
sean-k-mooneythat is some of the history on this changes it hapspen din systemd 21717:38
opendevreviewNicolai Ruckel proposed openstack/nova master: Preserve UEFI NVRAM variable store  https://review.opendev.org/c/openstack/nova/+/95968217:41
gmaananyone face/know the issue of not seeing 'comoute.<hostname>' queue in devstack env ? I am listing all queues (sudo rabiitmqctl list_queues) and it show all queues except compute service queues (oslo.message create three queues per compute service 1. 'compute' 2. 'compute.<hostname>' 3. compute fanout with uuids)17:44
gmaanit list all other queues for example  sch, conductor, cinder queues 17:45
opendevreviewNicolai Ruckel proposed openstack/nova master: Preserve UEFI NVRAM variable store  https://review.opendev.org/c/openstack/nova/+/95968217:47
nicolairuckelsean-k-mooney, I found a few more places with code that wasn't necessary anymore17:48
nicolairuckelI think it should be a lot cleaner now17:48
sean-k-mooneyyep im +1 on your patch after skimming it17:48
sean-k-mooneyill upgrade to +2 after i have time to test it and read over it again17:48
nicolairuckelperfect :)17:48
sean-k-mooneygmaan: so it shoudl be "compute.{CONF.host}" but if the queue is not there it normally is a netwrokign or auth issue, i belive it can also happen if we exit because fo say the missing compute node id file 17:50
sean-k-mooneyi.e. if the agent stop runing and the queue gets deleted17:51
gmaancompute service is up, instance are booting fine so no issue on that side17:51
sean-k-mooneyare you lookig at the correct exchange?17:51
gmaanif you have devstack running up, can you try if you see the comoute queues? 17:52
gmaanyes, nove and even i checked all exhange17:52
sean-k-mooneysure i have one 17:52
gmaanbefore i boot new VM and install devstack i just want to cofirm if something in my machine17:52
gmaanthanks17:52
sean-k-mooneyi have one on a host internally i can add you ssh key to it and you can ssh in and take a look if you want17:53
gmaanits fine, if you can just check if any queue name with 'compute*'or paste the outphut of sudo rabiitmqctl list_queues17:54
sean-k-mooneyhttps://termbin.com/rzms17:55
sean-k-mooneyim not seeing any17:56
gmaanyeah, same issue, no compute queue there17:56
sean-k-mooneyah17:57
sean-k-mooneyso we use vhosts in devstack17:57
gmaaninstance booting successfully make sure no issue in queue creation but somehow it is not listed in rabbitmq command17:57
sean-k-mooneyand the compute are in nova_cell117:57
gmaanyeah  vhosts we use17:58
sean-k-mooneyhttps://termbin.com/01tb17:58
sean-k-mooneythat the output form "rabbitmqctl list_queues --vhost nova_cell1  | nc termbin.com 9999"17:58
sean-k-mooneycompute.hibernal01 17:58
sean-k-mooneyis the compute queue17:58
gmaanah yeah, it is listed now. thanks sean-k-mooney ++17:59
sean-k-mooneyno worries. 18:00
-opendevstatus- NOTICE: The OpenDev team will be restarting Gerrit at approximately 2130 UTC in order to pick up the latest 3.10 bugfix release.20:32
*** haleyb is now known as haleyb|out22:56

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!