17:14:56 <mmichelson> #startmeeting ovn-community-development-discussion 17:14:57 <openstack> Meeting started Thu May 14 17:14:56 2020 UTC and is due to finish in 60 minutes. The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:14:58 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:15:00 <openstack> The meeting name has been set to 'ovn_community_development_discussion' 17:15:34 <mmichelson> Just as a reminder, we're coming up on hard freeze. Thanks to everyone for mentioning which patches they want included in 20.06 17:15:46 <mmichelson> I'm planning to spend time today and tomorrow to help with the review effort on those patches 17:16:02 <mmichelson> I plan to create the 20.06 branch on Monday 17:16:59 <mmichelson> As far as my activity goes, After doing the GROUP_MOD message split patch, I took on some of the smaller issues that have been reported as of late. Thanks to everyone who has helped review those patches 17:17:08 <mmichelson> And that's all from me. 17:17:17 <blp> I have a quick update. 17:17:41 <blp> I rebased the OVN DDlog code against master earlier this week. All the tests pass. 17:17:49 <blp> ryzhyk has been working on performance. 17:18:22 <blp> That's all I have for the moment, unless there are questions. 17:18:39 <ryzhyk> Yes, I am making progress on performance, but we need scale tests. 17:18:48 <mmichelson> blp, ryzhyk awesome! 17:19:26 <ryzhyk> So far I've been using Han's old scale test log, but it is getting increasingly irrelevant with all the changes to OVN since it was created. 17:19:41 <ryzhyk> (that's it from me) 17:20:18 <numans> ryzhyk, If you want explore this one - https://github.com/dceara/ovn-heater 17:20:18 <numans> for the scale testing. 17:20:52 <ryzhyk> numans: thanks! 17:21:02 <numans> But we are planning to run a scale test with the ddlog changes. We need to figure out a way to compile ovn-northd-ddlog in the container images. Once that is done, it should be straightforward. 17:21:39 <numans> I can go real quick. 17:21:44 <blp> Yeah, we'll do some preliminary work and then we can figure out how to do that. 17:22:13 <numans> Ok. sounds good. 17:22:37 <numans> I did some reviews this week. And I spent much of the time refactoring/reworking on patch 1 and 2 of my I-P patch series 17:22:46 <numans> Thanks to dceara for the reviews. 17:23:02 <numans> I'll continue to do that and planning to submit v6 by tomorrow. 17:23:13 <numans> I appreciate more reviewers joining in :). 17:23:19 <numans> That's it from me. 17:24:14 <dceara> hi all 17:25:09 <dceara> numans, ryzhyk I can hack ovn-heater to compile ovn-northd-ddlog for the scale test container images. I just need to know what branches to use 17:25:33 <zhouhan> numans: sorry that I didn't get time to review your I-P patches last week. I will resume this week. 17:25:51 <numans> zhouhan, thanks. 17:26:04 <blp> dceara: We're in a little bit of a transition at the moment, we'll get back to you on that. 17:26:11 <dceara> blp, sure 17:27:38 <zhouhan> can someone pin the link to the meeting logs? 17:28:26 <mmichelson> zhouhan, I don't understand what you mean by "pin" 17:28:35 <flaviof> zhouhan: http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2020/ 17:28:46 <numans> flaviof, you forgot to use the #link :) 17:28:51 <flaviof> LOL 17:29:10 <zhouhan> mmichelson: I meant, pin in this IRC channel, like the "FAQ: http://docs.openvswitch.org/en/latest/faq/" 17:29:23 <flaviof> there is a handy link to that dir in ovn.org too 17:29:24 <zhouhan> thanks flaviof 17:29:29 <mmichelson> ah ok 17:30:14 <zhouhan> May I go next? 17:30:33 <mmichelson> #topic Open vSwitch, a Linux Foundation Collaborative Project || FAQ: http://docs.openvswitch.org/en/latest/faq/ || Hyper-V meeting Tues 10:00 Pacific || OVN meeting Thurs 10:15 am US Pacific || Use ovs-discuss@openvswitch.org for questions if you don't get an answer here. || OVN weekly meeting logs can be found at: http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/ 17:30:50 <mmichelson> oh crap it's /topic isn't it 17:31:00 <mmichelson> And I don't have permission 17:31:02 <mmichelson> zhouhan, go ahead 17:31:06 <zhouhan> Firstly I have a question regarding the OVS FAQ on the compatibility 17:31:23 <blp> I can change the topic. 17:31:29 <zhouhan> 2.11.x 3.10 to 4.18 17:31:29 <zhouhan> 2.12.x 3.10 to 5.0 17:31:29 <zhouhan> 2.14.x 3.10 to 5.5 17:31:41 <zhouhan> It didn't mention 2.13, why is that? 17:31:56 <zhouhan> blp: do you know? 17:32:10 <flaviof> blp++ 17:32:25 <blp> zhouhan: Probably just overlooked. 17:33:10 <zhouhan> We tried 2.13 compiling with 5.4, it has a message: configure: error: Linux kernel in /lib/modules/5.4.0-31.generic.x86_64/build is version 5.4.0, but version newer than 5.0.x is not supported (please refer to the FAQ for advice) 17:33:27 <zhouhan> So it seems not an overlook, but on purpose ... 17:33:58 <zhouhan> It's confusing though 17:34:15 <zhouhan> I continued debugging the problem of: deferred action limit reached, drop record action 17:34:49 <blp> zhouhan: It looks like 2.13.x supports the same versions as 2.12.x. 17:35:06 <zhouhan> ok, thanks blp 17:35:45 <zhouhan> I think I made some progress on the endless recirc problem. The issue was that there is slowpath required for the actions while there is also a group action which requires dp_hash + recirc 17:36:21 <blp> OK, I sent a patch to update the FAQ. 17:36:39 <blp> zhouhan: Oh that's a little awkward. Do you have a lead on a fix? 17:36:43 <zhouhan> Whenever this combination comes, it tries to execute dp_hash in userspace, and only do recirc in kernel 17:37:45 <zhouhan> However, the usespace hash generated is not carried for injecting the packet to datapath, so after recirc back, the upcall doesn't have dp_hash value, again. 17:38:13 <blp> Oh. Have you figured out why we don't pass the dp_hash back? 17:38:14 <zhouhan> So when it hits the same group action, it generates recirc and dp_hash actions again 17:38:17 <blp> Is that the easy fix? 17:38:59 <zhouhan> At this time, the recirc_id generated is the same as the older one because all the metadata in state is the same, which caused the loop 17:39:19 <zhouhan> blp: I haven't got time on the fix yet. 17:39:28 <zhouhan> blp: I think there are two options 17:39:30 <blp> Userspace does know how to put dp_hash into a flow, see odp_flow_key_from_flow__(). 17:39:49 <zhouhan> thanks for the pointer! 17:40:06 <blp> It will do so if it detects recirc support in the datapath. Perhaps it's not being detected properly? 17:40:24 <blp> I think we log whether that feature is detected as supported. Check the log, if that's the problem then we should fix the detection logic. 17:40:24 <zhouhan> I think this is one option. The other option is always do dp_hash in datapath instead of "trying to help" in userspace 17:40:49 <blp> I think there is some reason why we do that, although I don't recall what it was. 17:41:32 <zhouhan> I wonder if dp_hash anyway requires recirc in datapath, why would it help by doing dp_hash in userspace 17:41:54 * numans says bye and disappears. 17:41:57 <blp> What's the reason that it gets slow-pathed to begin with? 17:41:58 <zhouhan> The other question is why is there slowpath required in the first place 17:42:02 * blp waves at numans 17:42:10 <zhouhan> blp: exactly 17:42:22 <blp> "trace" should explain the reason for slow-pathing. 17:43:06 <zhouhan> blp: the scenario is ping LRP's IP. The LR is replying ICMP by simply setting fields, and I can't tell why is slowpath needed 17:43:35 <blp> Some fields aren't supported for setting in the datapath, that could be the reason. 17:44:02 <blp> But ofproto/trace should say. For example, if it's because of fields that the datapath can't set, it should say something about unsupported actions. 17:44:15 <imaximets> blp, zhouhan: execute.hash that passed back to datapath could only be set from the upcall->hash, i.e. the hash that received from the datapath during upcall. Userspace never passes dp_hash that calculated by userspace to datapath, it only returns same hash that was calculated by datapath itself before upcall. 17:44:32 <zhouhan> blp: ofproto/trace only tells slowpath needed, but didn't tell which action requires that. 17:45:17 <blp> OK, it should be possible to figure it out. Do you have the trace handy? 17:45:23 <zhouhan> imaximets: is it possible to pass it back (i.e. is it a small fix?) 17:45:58 <zhouhan> Final flow: icmp,reg11=0xe,reg12=0x18,reg14=0xa,reg15=0x11,tun_id=0xa0011ff0002,tun_src=10.172.66.12,tun_dst=10.78.211.43,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_erspan_ver=0,tun_flags=csum|key,metadata=0xff0002,in_port=457,vlan_tci=0x0000,dl_src=aa:aa:bb:00:01:02,dl_dst=aa:aa:bb:00:03:01,nw_src=10.227.183.232,nw_dst=10.9.0.1,nw_tos=0,nw_ecn=0,nw_ttl=53,icmp_type=8,icmp_code=0 17:45:58 <zhouhan> Megaflow: recirc_id=0,eth,icmp,tun_id=0xa0011ff0002,tun_src=10.172.66.12,tun_dst=10.78.211.43,tun_tos=0,tun_flags=-df+csum+key,in_port=457,vlan_tci=0x0000/0x1000,dl_src=aa:aa:bb:00:01:02,dl_dst=aa:aa:bb:00:03:01,nw_src=10.227.183.232,nw_dst=10.9.0.1,nw_ttl=53,nw_frag=no,icmp_type=0x8/0xff,icmp_code=0x0/0xff 17:45:58 <zhouhan> Datapath actions: ct_clear,ct_clear,ct_clear,set(eth(src=aa:aa:aa:00:01:01,dst=aa:aa:aa:00:00:01)),set(ipv4(src=10.9.0.1,dst=10.227.183.232,ttl=254)),set(icmp(type=0,code=0)),hash(l4(0)),recirc(0x3) 17:46:00 <zhouhan> This flow is handled by the userspace slow path because it: 17:46:02 <zhouhan> - Uses action(s) not supported by datapath. 17:46:08 <blp> It's not a good idea to pass a userspace-caluclated hash back to the kernel because the kernel would calculate a different value. 17:46:11 <zhouhan> blp: This is the last part of trace 17:46:26 <imaximets> zhouhan, I'm not sure. It might be possible to check if we have no upcall->hash, but have dp_hash and pass dp_hash to execute.hash instead, but I'm not sure. 17:46:59 <zhouhan> blp: imaximets: Then do you think it is a better idea to always do dp_hash in datapath? 17:47:31 <imaximets> zhouhan, I think, yes, it's better than 'datapath hash' is calculated by datapath. 17:47:47 <zhouhan> (sorry this might have taken too long, in case someone else want to update) 17:48:07 <blp> It's the set actions for ICMP that are doing it. The datapath doesn't know how to change ICMP. 17:48:13 <imaximets> s/than/when/ 17:49:02 <zhouhan> blp: is it the ICMP fields setting require slowpath? 17:49:08 <blp> zhouhan: yes 17:49:14 <zhouhan> blp: ok, thanks! 17:49:44 <zhouhan> blp: it would be better if trace can just point this out :) 17:49:56 <blp> zhouhan: yes 17:50:22 <zhouhan> other than this, I was involved in some discussions and also trying to fix some bugs in ovn. 17:51:25 <zhouhan> One of the discussion was about ARP flows exploding in LRs. I think I can work out the configurably disable static ARP resolve in LR, which would solve the issue for ovn-k8s. 17:51:38 <zhouhan> That's my update :) 17:52:00 <_lore_> zhouhan: regarding gateway flow issue, IIRC this chunks was to distribute non DVR traffic 17:52:43 <zhouhan> _lore_: I got your point, but why was different prirority needed? 17:53:13 <_lore_> I need to get back to it since I can't recall the details now 17:53:26 <zhouhan> _lore_: ok, thanks! 17:53:44 <zhouhan> _lore_: It seems an optimization, right? 17:53:49 <_lore_> nope 17:54:14 <_lore_> let's say you have FIP 192.168.1.1 17:54:33 <_lore_> this is to distribute traffic for 192.168.1.0/24 IIRC 17:54:47 <_lore_> but I will check 17:54:49 <zhouhan> _lore_: ok, I was thinking we could revert it, if it is an optimization and if we couldn't solve the route priority problem before the release 17:55:11 <_lore_> what is the issue you are facing? 17:55:47 <zhouhan> _lore_: If you see my example, the /16 route is overriding the /24 route, which is wrong 17:56:00 <_lore_> ack 17:56:19 <_lore_> I will look into it 17:56:26 <zhouhan> thanks _lore_ 17:56:33 <_lore_> yw :) 17:57:55 <_lore_> zhouhan: do you have a unit-test for it? 17:59:18 <zhouhan> _lore_: here is what I did in a sandbox: 17:59:23 <zhouhan> 989 ovn-nbctl lr-add lr1 17:59:23 <zhouhan> 990 ovn-nbctl lrp-add lr1 lrp1 aa:aa:aa:aa:aa:01 192.168.0.1/24 17:59:23 <zhouhan> 991 ovn-nbctl lrp-add lr1 lrp2 aa:aa:aa:aa:aa:02 192.168.100.1/24 17:59:25 <zhouhan> 992 ovn-nbctl lr-route-add lr1 10.0.0.0/24 192.168.0.2 17:59:27 <zhouhan> 993 ovn-nbctl lr-route-add lr1 10.0.0.0/16 192.168.100.2 17:59:29 <zhouhan> 994 ovn-sbctl lflow-list 17:59:31 <zhouhan> 995 ovn-nbctl --help | grep gateway 17:59:33 <zhouhan> 996 ovn-sbctl show 17:59:35 <zhouhan> 997 ovn-nbctl lrp-set-gateway-chassis lrp1 chassis-1 17:59:37 <zhouhan> 998 ovn-sbctl lflow-list 17:59:46 <_lore_> ok, thx 18:00:46 <_lore_> I think we should add some unitest for it 18:02:13 <blp> _lore_: please! 18:02:40 <_lore_> will do :) 18:05:06 <mmichelson> Anyone else want to take a turn at the mic? 18:06:02 <flaviof> bye all! 18:09:29 <blp> bye! 18:10:03 <dceara> bye! 18:11:20 <mmichelson> #endmeeting