| *** mhen_ is now known as mhen | 02:43 | |
| kklimaszewski | Hello, I'm Karol Klimaszewski - one of the presenters of NUMA in Placement at OpenInfra. While my colleague Dominik is still working on making patches related to that feature, I wanted to ask about another nova feature made by CloudFerro that was mentioned during conversation after our presentation. Well, it is more like two features. First is ephemeral storage backend for libvirt driver based on local NVMe disks managed by SPDK. | 10:10 |
|---|---|---|
| kklimaszewski | Second is adding to nova the ability to handle multiple ephemeral storage backends on one nova-compute. I was wondering if there is any interest in us introducing those features to the upstream? And if yes, then is creating a blueprint and a spec proposal at nova-specs opendev repo a good first step to make it clear what we are trying to introduce? | 10:10 |
| opendevreview | Stephen Finucane proposed openstack/nova master: WIP: libvirt: Ensure LibvirtDriver._host is initialized https://review.opendev.org/c/openstack/nova/+/967005 | 10:14 |
| stephenfin | gibi: interested in your thoughts on ^ | 10:15 |
| stephenfin | If you think it makes sense, I can fix up the tests (or try bribe sean-k-mooney to do it for me...) | 10:15 |
| sean-k-mooney | gibi did you get an resolution to your lock question. i also dont know if that is really required. | 11:33 |
| sean-k-mooney | gibi looking at the usage | 11:34 |
| sean-k-mooney | https://github.com/openstack/nova/blob/b7d50570c7a79a38b0db6476ccb3c662b237f69b/nova/compute/provider_tree.py#L269-L278 | 11:35 |
| sean-k-mooney | im not seeing any get_or_create type pattern where we woudl eighter get a cached value or create it | 11:36 |
| sean-k-mooney | at lset not in tha tfuction but we obvioulsy are doign esit chace else where https://github.com/openstack/nova/blob/b7d50570c7a79a38b0db6476ccb3c662b237f69b/nova/compute/provider_tree.py#L409-L423 | 11:37 |
| sean-k-mooney | the ironic case sort of makes sesen but i wonder is there also an interaction with share resouce providers and or provider.yaml at play | 11:38 |
| sean-k-mooney | gibi: lookign a the the patch that intoduced, its clear that this exest to preveve concurrent modification f the provider try while iterating over it so a remove and find or add can happen a th esame time. why it not usign say the compute node uuid instead of a single lock. the only thin ghta makes sense to me in the orginal patch is the for over the comptue nodes. i | 11:49 |
| sean-k-mooney | personally woudl have modele this very slightly diffent. the placemetn resouces manage by a compuate agent are a forest fo trees not a signel tree. so i would not have accpeted a list of compute nodes however you wuld still have needed a list a of root for sharing resouce providers, i think if we upleveled the compute nodes to a seperate map of cn uuid to provider tree the | 11:49 |
| sean-k-mooney | lock in the provder tree coudl use the cn.uuid but the map woudl still likely need to ahve a single lock | 11:49 |
| sean-k-mooney | we did refactor this a bit but we never actully turnd it into a forest of trees | 11:52 |
| opendevreview | sean mooney proposed openstack/nova master: ensure correct cleanup of multi-attach volumes https://review.opendev.org/c/openstack/nova/+/916322 | 11:56 |
| opendevreview | sean mooney proposed openstack/nova master: ensure correct cleanup of multi-attach volumes https://review.opendev.org/c/openstack/nova/+/916322 | 11:57 |
| gibi | stephenfin: replied in the commit | 12:28 |
| gibi | sean-k-mooney: I'm questioning the need of a named and therefore global lock for any instances of ProviderTree in the process. Even if it is as simple libvirt compute with just a single root provider. I don't see why we need to ensure that two copies of the ProviderTree is not accessed at the same time. I totally understand why we would use a threading.Lock per ProviderTree instance so concurrent | 12:32 |
| gibi | access to the data is synchronized. | 12:32 |
| gibi | but anyhow I keep the global named lock for the ProviderTree in https://review.opendev.org/c/openstack/nova/+/956091 I just feel like it is totally unnecessary | 12:33 |
| sean-k-mooney | yep i realsise that. im just wondering is there any shareing here btween the reshaps that can happen due ot OTU devices and the perodics | 12:33 |
| sean-k-mooney | i dont think we share the same PT between treads today | 12:34 |
| sean-k-mooney | so a simpel lock is likely enough | 12:34 |
| gibi | when we update placement we update that based on a specific ProviderTree instance. A copy of that object does not matter as it is an independent copy | 12:34 |
| sean-k-mooney | the real qustion for me is do we ever store this PT in a shared location like the resouce tracker | 12:34 |
| sean-k-mooney | or is it only constucted in side a function | 12:35 |
| gibi | we pass around PTs | 12:35 |
| gibi | and we alos copy them around in some cases | 12:35 |
| gibi | because we pass arond a threading.lock is needed | 12:35 |
| sean-k-mooney | right so its pass down the call stack but not in shared mutable storage either because of a copy or how its constucted | 12:35 |
| sean-k-mooney | im not directly seeing a need for it to be a named lock either for what its worth | 12:36 |
| gibi | even if we pass to a shared mutable storage, we pass an instance there with a threading lock is enough, we don't need to share the lock across the copies because they are independent deep copies | 12:36 |
| gibi | actually in nova-compute a named lock is never needed as the compute is a single process | 12:37 |
| gibi | so we only sync across threads so a threading.lock is enough | 12:37 |
| gibi | in the scheduler or conductor a named lock might be needed iff we share data across worker processes | 12:37 |
| gibi | like DB state | 12:38 |
| gibi | anyhow I stop here. I just noticed it and felt strange, but I can ignore it and be safe that it will keep working as before | 12:39 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Compute manager to use thread pools selectively https://review.opendev.org/c/openstack/nova/+/966016 | 14:20 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Libvirt event handling without eventlet https://review.opendev.org/c/openstack/nova/+/965949 | 14:20 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Run nova-compute in native threading mode https://review.opendev.org/c/openstack/nova/+/965467 | 14:20 |
| gibi | gmaan: FYI https://review.opendev.org/c/openstack/nova/+/966016 proposes changing the default value of some of our config options. After the discussion of dansmith and sean-k-mooney we think this is OK from compatibility perspective via having an upgrade relnote. Do you agree? | 14:21 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Compute manager to use thread pools selectively https://review.opendev.org/c/openstack/nova/+/966016 | 14:32 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Libvirt event handling without eventlet https://review.opendev.org/c/openstack/nova/+/965949 | 14:32 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Run nova-compute in native threading mode https://review.opendev.org/c/openstack/nova/+/965467 | 14:32 |
| opendevreview | Dan Smith proposed openstack/nova master: Test nova-next with >1 parallel migrations https://review.opendev.org/c/openstack/nova/+/966447 | 14:33 |
| gibi | stephenfin: after a bit of deliberation and a failed trial I'm OK with the direction in 967005 | 15:08 |
| gibi | 967005 | 15:08 |
| gibi | https://review.opendev.org/c/openstack/nova/+/967005 | 15:08 |
| gibi | (wtf happens with my copy paste buffer) | 15:08 |
| gibi | stephenfin: will you fix the 700 failed unit test or looking for a volunteer to take over? | 15:09 |
| dansmith | gibi: I still haven't looked at your patch to fix.. are you saying that one is not working and this is the one we need to pursue? | 15:10 |
| gibi | nope, this is a followup top of mine | 15:10 |
| dansmith | okay | 15:10 |
| gibi | to prevent later patches re-introducing the issue | 15:10 |
| gibi | ~ the generic issue of calling driver methods before driver.init_host | 15:11 |
| stephenfin | gibi: sounds good: I had tried an alternative involving a metaclass but it was feeling...complicated | 15:11 |
| gibi | metaclass, I haven't thought about it. Woudl that sidestep our inheritance pattern? | 15:11 |
| gibi | anyhow | 15:12 |
| gibi | I'm OK not going there :) | 15:12 |
| opendevreview | Stephen Finucane proposed openstack/nova master: virt: Ensure init_host is always called https://review.opendev.org/c/openstack/nova/+/967051 | 15:12 |
| stephenfin | gibi: that's the patch (incomplete) | 15:12 |
| gibi | ahh OK, so that basically automating injecting a wrapper to the normal virt calls. I think that will not work as the virt interface function is not called by the implementation as most of the virt inteface calls raising NotImplementedError | 15:14 |
| stephenfin | yeah, that's what I'd figured out before lunch: we'd need to use the metaclass on the actual implementation rather than the base class. And ideally only wrap methods defined in the base class (not that drivers should have other public methods) | 15:15 |
| gibi | yeah | 15:15 |
| gibi | then I think we felt the same pain | 15:16 |
| gibi | anyhow let me know if you need help with the unit test or need a re-review if you jump on the unit test by yourself | 15:16 |
| *** jizaymes_ is now known as jizaymes | 16:04 | |
| vsaienko | hey sean-k-mooney: maybe it will be interesting there are performance tests for different scenarios (asap2, vdpa with packed ring 8 queues, virtio single queue, virtio packed + 8 queues) https://jumpshare.com/s/xHU2UTQoKoeKwAWDFsa6 https://jumpshare.com/s/mLQ1u29xOyMftaULc0lJ so far VDPA does not show super significant improvements on 10Gb. | 16:07 |
| vsaienko | scenarios are iperf tests, arguments in each scenario description | 16:08 |
| sean-k-mooney | i think that is in line with expectation | 16:18 |
| vsaienko | I thought that vdpa will be near same as sriov | 16:19 |
| sean-k-mooney | for tcp at least | 16:19 |
| vsaienko | maybe it will be closer on higher rates 50Gbps or 100Gbps | 16:20 |
| sean-k-mooney | vdpa can by connectx6 dx is not doign vdpa fully in hardware | 16:20 |
| vsaienko | nvidiay says they support hardware vdpa on connect 6dx | 16:21 |
| sean-k-mooney | can you expalin the labelign by the way? when you saw swtich deve you mean hardware offloead ovs with a driect assigned VF i assume | 16:21 |
| sean-k-mooney | vsaienko: yes but its not done fully in an asic it does soem fo the data trasformation using firware/cpu comptue on the nic | 16:22 |
| vsaienko | switchdev - sriov + asap2 hardware offload, vdpa-packed - its packed ring + 8 queues, virtio def - is default flavor, no multiqueue or packed ring, virtio-packed - packed ring + 8 queues | 16:23 |
| vsaienko | vdpa with hardware offload as well | 16:23 |
| vsaienko | between hosts VXLAN network, mtu 1450 (no jumboframes) on all configurations | 16:23 |
| sean-k-mooney | ok because the defautl is usign vdpa so its not correct to call it virtio | 16:23 |
| sean-k-mooney | ah you also adding vxlan overhead | 16:24 |
| sean-k-mooney | so is this inter host traffic i.e. 2 vms on diffent hosts or 2 vms on the same host | 16:24 |
| vsaienko | virtio def is default neutron port =normal which is virtio | 16:25 |
| sean-k-mooney | its technaily kernel vhost | 16:25 |
| vsaienko | all scenarios between VMs on different hosts | 16:25 |
| sean-k-mooney | virtio woudl be the virto stack in ovs without the kernel vhost-net module offloading it | 16:26 |
| sean-k-mooney | this looks to me like you are hittig a code path that is not supproted for hardware offload | 16:27 |
| sean-k-mooney | for example the conenction tracker used for security groups is not fully offloadable to hardware | 16:28 |
| sean-k-mooney | are you using ml2/ovs? | 16:28 |
| vsaienko | its ovs, no qos no security groups, in TC I see that traffic is offloaded | 16:28 |
| sean-k-mooney | well in https://jumpshare.com/s/xHU2UTQoKoeKwAWDFsa6 | 16:29 |
| sean-k-mooney | you are not hitting line rate env in the swtichdev case which is effectivly jsut using sriov for the dataplane | 16:29 |
| sean-k-mooney | with ovs providing the contol plane | 16:29 |
| vsaienko | yes | 16:30 |
| sean-k-mooney | that to me suggest you are hittign bottel necks in teh trafic generateion as well | 16:30 |
| vsaienko | not sure between hosts on same hypervisor iperf can show more, its new version with multithreads | 16:31 |
| sean-k-mooney | well ierf it not really godo for doing real testing | 16:31 |
| sean-k-mooney | you can use -p to have it use multipel cores | 16:31 |
| sean-k-mooney | but if you want ot test this properly you eigher need a hardware traffic generator or somethign like trex | 16:32 |
| vsaienko | yes, there are scenarios with p1 - its -P 1, p30 - -P 30 | 16:32 |
| sean-k-mooney | https://trex-tgn.cisco.com/ | 16:32 |
| sean-k-mooney | right but for 1400 by packets you shoudl be able to hit 10G even with 1 thread | 16:33 |
| sean-k-mooney | if the hsot are other wise idel with 1400 by packets even standar kernel ovs shoudl be able to hit 10G | 16:34 |
| sean-k-mooney | so to me the performac eof this host/vms is very much below what i woudl expect | 16:34 |
| sean-k-mooney | vsaienko: what version of opesntack are you using | 16:36 |
| vsaienko | but why with sriov it provides 6+gbps | 16:36 |
| vsaienko | its epoxy | 16:36 |
| vsaienko | 3.x openvswitch | 16:36 |
| sean-k-mooney | so 6GBPS is also too low for sriov | 16:36 |
| vsaienko | maybe indeed its a limitation of iperf | 16:37 |
| sean-k-mooney | so if your using epoxy you shoudl have https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L107-L126 | 16:37 |
| sean-k-mooney | but can you check in ovs what the qos policy on the ovs port is set too | 16:37 |
| sean-k-mooney | and confirm its linux-noop | 16:37 |
| vsaienko | qos plugin is disabled on my env | 16:37 |
| sean-k-mooney | that is not what im asking | 16:38 |
| vsaienko | but thanks for pointing this out, I haven't seen this option | 16:38 |
| sean-k-mooney | in new version fo systemd or the kernel the default qdisg was changed to fq_codel | 16:38 |
| sean-k-mooney | now we added this option to make sure tha tthat was disabled on teh tap devices | 16:38 |
| vsaienko | interesting | 16:39 |
| sean-k-mooney | because if we dont ti results in massive perfroamce hti | 16:39 |
| vsaienko | let me check this | 16:39 |
| vsaienko | it should be default | 16:39 |
| sean-k-mooney | yep | 16:39 |
| sean-k-mooney | so we need ot check that 1 the policy in ovs set by os-vif is linux-noop | 16:39 |
| sean-k-mooney | but we shoudl also check that the representor netdev added to ovs does not have a qos policy applied with tc | 16:40 |
| vsaienko | its not explicitly set, so default is picked | 16:40 |
| sean-k-mooney | right so im askign you to conrimf this because this was never tested for vdpa or hardware offloaded ovs | 16:41 |
| sean-k-mooney | it shoudl get appled but it may not be | 16:41 |
| sean-k-mooney | and if its not beign set in the ovs db then it coudl reduce the performacne | 16:42 |
| vsaienko | https://paste.openstack.org/show/bZeQE1UxDhnNWIpJmiCb/ | 16:43 |
| sean-k-mooney | yep that the problem | 16:44 |
| sean-k-mooney | its using fqcodel | 16:44 |
| sean-k-mooney | create /etc/sysctl.conf.d/99_qdisk.conf and add net.core.default_qdisc = pfifo | 16:44 |
| sean-k-mooney | thatthen apply it and try doing it again with a new vm | 16:45 |
| vsaienko | ok thanks, let me try | 16:45 |
| sean-k-mooney | well that might work but lets confirm something | 16:46 |
| sean-k-mooney | what is eth1 in this case | 16:46 |
| sean-k-mooney | is it teh respreentor netdev for the vdpa device | 16:46 |
| sean-k-mooney | i expect it si based on the altname | 16:47 |
| sean-k-mooney | enp33s0f1npf1vf1 | 16:47 |
| sean-k-mooney | if that is the case and the perfoamce impves when you set your defautl qdisc ot pfifo | 16:47 |
| sean-k-mooney | the the issue you are hittig is the os-vif change is not properly handelign this edge case | 16:48 |
| sean-k-mooney | it was intened to fix https://bugs.launchpad.net/os-vif/zed/+bug/2017868 for standard kernel ovs | 16:49 |
| sean-k-mooney | vsaienko: if your using iperf to test you shousl also ensure that you have disable fqcodel i nteh guest. https://trex-tgn.cisco.com/ is a lot more work to set up but it woudl elimiante any worry that the traffic genreator is the bottel neck. maybe book mark that for future reading | 16:54 |
| vsaienko | ack let me try this | 16:57 |
| vsaienko | thanks | 16:58 |
| vsaienko | sean-k-mooney: pfifo improves situation https://jumpshare.com/s/GkgTEynhPdHlvWptjih5 | 17:29 |
| gmaan | gibi: RE default value change of config options (966016): yes it is fine as long as we are adding it in upgrade release notes. oslo.config have way to respect both old and new when config name is changed but for default value change, yes upgrade notes is right things. | 17:31 |
| sean-k-mooney | vsaienko: ack it will likely improve the other results too if you ened up testing them | 17:33 |
| sean-k-mooney | we know it can limit perforamce to <10% form some reports | 17:33 |
| sean-k-mooney | diffent backend are affected more or less | 17:34 |
| sean-k-mooney | os-vif obviouly has a blind spot for hardware offloaded ports that we shoudl fix | 17:34 |
| sean-k-mooney | but the root of the issue was the changeing fo behviro in ovs and systemd | 17:34 |
| sean-k-mooney | the defuatl qdisk was change about 3-4 year ago it used to be pfifo by default | 17:35 |
| sean-k-mooney | ovs also change to not ignoring/removing the qdisc on port added to it by defualt | 17:35 |
| gibi | gmaan: thanks! I added the upgrade reno | 17:36 |
| sean-k-mooney | vsaienko: https://wiki.archlinux.org/title/Advanced_traffic_control#CoDel_and_Fair_Queueing_CoDel | 17:37 |
| sean-k-mooney | that is some of the history on this changes it hapspen din systemd 217 | 17:38 |
| opendevreview | Nicolai Ruckel proposed openstack/nova master: Preserve UEFI NVRAM variable store https://review.opendev.org/c/openstack/nova/+/959682 | 17:41 |
| gmaan | anyone face/know the issue of not seeing 'comoute.<hostname>' queue in devstack env ? I am listing all queues (sudo rabiitmqctl list_queues) and it show all queues except compute service queues (oslo.message create three queues per compute service 1. 'compute' 2. 'compute.<hostname>' 3. compute fanout with uuids) | 17:44 |
| gmaan | it list all other queues for example sch, conductor, cinder queues | 17:45 |
| opendevreview | Nicolai Ruckel proposed openstack/nova master: Preserve UEFI NVRAM variable store https://review.opendev.org/c/openstack/nova/+/959682 | 17:47 |
| nicolairuckel | sean-k-mooney, I found a few more places with code that wasn't necessary anymore | 17:48 |
| nicolairuckel | I think it should be a lot cleaner now | 17:48 |
| sean-k-mooney | yep im +1 on your patch after skimming it | 17:48 |
| sean-k-mooney | ill upgrade to +2 after i have time to test it and read over it again | 17:48 |
| nicolairuckel | perfect :) | 17:48 |
| sean-k-mooney | gmaan: so it shoudl be "compute.{CONF.host}" but if the queue is not there it normally is a netwrokign or auth issue, i belive it can also happen if we exit because fo say the missing compute node id file | 17:50 |
| sean-k-mooney | i.e. if the agent stop runing and the queue gets deleted | 17:51 |
| gmaan | compute service is up, instance are booting fine so no issue on that side | 17:51 |
| sean-k-mooney | are you lookig at the correct exchange? | 17:51 |
| gmaan | if you have devstack running up, can you try if you see the comoute queues? | 17:52 |
| gmaan | yes, nove and even i checked all exhange | 17:52 |
| sean-k-mooney | sure i have one | 17:52 |
| gmaan | before i boot new VM and install devstack i just want to cofirm if something in my machine | 17:52 |
| gmaan | thanks | 17:52 |
| sean-k-mooney | i have one on a host internally i can add you ssh key to it and you can ssh in and take a look if you want | 17:53 |
| gmaan | its fine, if you can just check if any queue name with 'compute*'or paste the outphut of sudo rabiitmqctl list_queues | 17:54 |
| sean-k-mooney | https://termbin.com/rzms | 17:55 |
| sean-k-mooney | im not seeing any | 17:56 |
| gmaan | yeah, same issue, no compute queue there | 17:56 |
| sean-k-mooney | ah | 17:57 |
| sean-k-mooney | so we use vhosts in devstack | 17:57 |
| gmaan | instance booting successfully make sure no issue in queue creation but somehow it is not listed in rabbitmq command | 17:57 |
| sean-k-mooney | and the compute are in nova_cell1 | 17:57 |
| gmaan | yeah vhosts we use | 17:58 |
| sean-k-mooney | https://termbin.com/01tb | 17:58 |
| sean-k-mooney | that the output form "rabbitmqctl list_queues --vhost nova_cell1 | nc termbin.com 9999" | 17:58 |
| sean-k-mooney | compute.hibernal01 | 17:58 |
| sean-k-mooney | is the compute queue | 17:58 |
| gmaan | ah yeah, it is listed now. thanks sean-k-mooney ++ | 17:59 |
| sean-k-mooney | no worries. | 18:00 |
| -opendevstatus- NOTICE: The OpenDev team will be restarting Gerrit at approximately 2130 UTC in order to pick up the latest 3.10 bugfix release. | 20:32 | |
| *** haleyb is now known as haleyb|out | 22:56 | |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!