17:21:03 <numans> #startmeeting ovn-community-development-discussion 17:21:04 <openstack> Meeting started Thu Apr 23 17:21:03 2020 UTC and is due to finish in 60 minutes. The chair is numans. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:21:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:21:07 <openstack> The meeting name has been set to 'ovn_community_development_discussion' 17:21:16 <numans> Hello 17:21:26 <numans> Lets start the meeting. 17:21:29 <zhouhan> hi 17:21:37 <numans> zhouhan, Hi 17:21:47 <numans> mmichelson is not joining today. 17:21:54 <numans> Who wants to start 17:22:21 <numans> I can go real quick. 17:22:40 <dceara> Hi 17:22:42 <numans> Last week I submitted the v3 of I-P patches 17:22:45 <numans> dceara, Hi 17:23:10 <numans> Then I started working on the OVN load balancer hashing. 17:23:34 <numans> I submitted a patch to enable openflow 1.5, since we cannot set the selection_method unless we use oflow 1.5 17:24:02 <numans> I'm still working on providing the option for the CMS to chose the hash fields - like ip_src, ip_dst, tp_src etc 17:24:14 <numans> which will be used for the hashing. 17:24:37 <numans> I'm working on another issue. Recently I submitted a patch to address an issue in tcp_reset action 17:24:55 <numans> And to support this, I added llflows to by pass tcp rst pkts from conntrack. 17:25:24 <numans> And this is causing the conntrack entries to be established state even after the client/server closes the connection by sending tcp rst pkt 17:25:28 <numans> I'm working on it. 17:25:31 <numans> That 17:25:38 <numans> That's it for me. 17:25:53 <numans> zhouhan, dceara It will be great if you can take a look at the I-P patches whenever you can. 17:25:54 <flaviof> Nice, numans! 2 follow up questions, if I may. 17:26:00 <numans> flaviof, sure. 17:26:07 <dceara> numans, ack 17:26:15 <_lore_> hi all 17:26:15 <zhouhan> numans: thanks a lot. I will review them 17:26:24 <numans> zhouhan, thanks. 17:26:26 <flaviof> 1) does the hashing for lp require a specific verion of the kernel? or anything that can do ovs1.5 is enough? 17:26:46 <numans> flaviof, No. it doesn't depend the kernel version. 17:26:57 <numans> OVS supports 2 hashing methods - dp_hash and hash. 17:27:08 <numans> dp_hash is calculated by the datapath. 17:27:30 <numans> for the latter, ovs calculates the hash. 17:27:48 <flaviof> numans: ah. ack. I heard from Maiciej about it today. Interesting find! 17:27:54 <flaviof> 2) can you tell me if the conntrack issue is a regression or has it always been there? 17:27:55 <zhouhan> numans: dp_hash should still ensure that same 5-tuple is hashed to same value, right? 17:28:09 <numans> zhouhan, that's not the case in my testing. 17:28:52 <numans> zhouhan,for a given 5tuple , I see that ovs is not choosing the same bucket all the time 17:29:04 <numans> flaviof, I don't think its a regression. 17:29:32 <zhouhan> numans: that's strange. For what I observed in the past it seems always the same bucket. Maybe I can do more test and confirm 17:29:48 <numans> zhouhan, that would be great if you could confirm 17:29:49 <flaviof> zhouhan: maybe you are using OF1.5 ? 17:30:07 <zhouhan> numans: usually I use ping (ICMP) to test. Does that impact the result? 17:30:08 <numans> flaviof, starting from ovs 2.10, the defualt selection method changed from hash to dp_hash in ovs. 17:30:16 <numans> zhouhan, I used ncat 17:30:29 <numans> and specified the source port. 17:31:09 <numans> flaviof, ovn was always selecting dp_hash, but since we are not using oflow1.5, that never reached ovs and it uses the default if no selection_method is set 17:31:33 <flaviof> ack, understood. ty 17:31:34 <zhouhan> numans: do you know by any chance what's the dp_hash algorithm? In the code comment it mentioned Webster method, but I never heard of it 17:31:36 <numans> zhouhan, I'd suggest to use ncat with src port specified in your testing 17:31:46 <numans> zhouhan, it is calculated by kernel 17:32:00 <numans> zhouhan, and from what I understand it uses skb_get_hash() for that 17:32:11 * numans merely knows datapath and I can be wrong. 17:32:45 <zhouhan> ok, thanks! 17:32:49 <imaximets> numans, zhouhan: IIRC, dp_hash is likely an RSS hash if available. 17:33:21 <numans> zhouhan, from the code I saw, it uses webster to allocate the hashes to buckets. 17:33:22 <numans> imaximets, ok 17:33:37 <imaximets> there might be difference while making upcall, because vswitchd will calculate 5tuple hash by itself if RSS hash is not present. 17:34:10 <zhouhan> numans: any links to "webster" would be helpful. Google just give me the dictionary :( 17:34:43 <numans> zhouhan, :). Even I'm not aware of it. I just saw the comments. But sure, I'll share if I come across 17:35:26 <zhouhan> imaximets: RSS hash should still make sure same flow hashed to same bucket, right? So I still don't understand how could same flow end up in different bucket 17:35:57 <imaximets> zhouhan, yes, RSS should ensure. 17:36:43 <numans> imaximets, zhouhan, I'm pretty certain on that. I did some prints, and the value of dp_hash was always different 17:36:55 <imaximets> the issue might be if you're calculating dp_hash in userspace for the first packet and using RSS in datapath for subsequent ones. That is the only case I can think of. 17:37:38 <numans> https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-xlate.c#L4604 17:37:50 <numans> I put a print here and the dp_hash was different. 17:37:51 <numans> ok. 17:38:05 <imaximets> numans, if it's always different, this might be the bug in kernel. Maybe you're reading incorrect memory, or your hashing algorithm uses more fields than 5tuple. 17:38:29 <numans> imaximets, ovn doesn't set the selection_method so the default is used 17:38:57 <numans> imaximets, actually ovn sets, but it nevers gets encoded when ovn sends the group_mod message 17:39:11 <numans> I think we can continue further in the ML. 17:39:27 <imaximets> numans, ok. 17:39:34 <numans> zhouhan, I've replied to maciej's email. If you can take a look at that and reply to that after the testing that would be great. 17:39:42 <numans> imaximets, zhouhan thanks for the discussion. 17:39:50 <numans> I'm done. If someone wants to go next. 17:40:10 <zhouhan> numans: sure, will do 17:40:17 <_lore_> can I go next? 17:40:22 <numans> _lore_, sure. 17:40:25 <flaviof> #link https://github.com/openvswitch/ovs/commit/2e3fd24c7c440f87d7a24fbfce1474237de7e1cf pick_dp_hash_select_group ref on hash 17:40:36 <_lore_> this week I worked on a issue related to QoS ovn metering 17:41:18 <_lore_> in particular if you create 2 meters with same values for rate and burst on the same hv they will be mapped to the same kernel meter and they will share the bandwidth 17:41:35 <_lore_> I am wondering if it is an intended behaviour or not 17:41:39 <_lore_> any idea? 17:42:54 <_lore_> maybe not :) 17:43:10 <flaviof> sorry _lore_ i don't know 17:43:15 <numans> _lore_, not sure on that. 17:43:29 <_lore_> actually I posted a RFC patch to make the meters unique, I tested it and it works fine 17:43:37 <numans> _lore_, but I think we can fix that if that is causing incorrect meter allocation 17:43:55 <_lore_> it has been tested even by osp folks and they are fine with it 17:44:08 <_lore_> so I would send it as normal patch 17:44:15 <numans> _lore_, sounds good to me. 17:44:19 <_lore_> and in case we have some regression takes care of it 17:44:32 <_lore_> ack 17:44:48 <_lore_> moreover I posted some ipv6 pd trivial fixes 17:45:20 <_lore_> zhouhan: I read your reply about my ovn-scale-test PR but I did not get it exactly 17:45:28 <_lore_> what do you mean? 17:46:03 <zhouhan> _lore_: Oh, I mean, the port-groups and ACLs are better to be configured, instead of hard-coded in the implementation 17:46:37 <_lore_> ah ok 17:46:39 <zhouhan> _lore_: for ovn-scale-test, it should be able to test different port-groups and ACLs settings, without update the code everytime. 17:47:10 <_lore_> so you mean to generalize it adding the possibility to read the configuration from json file 17:47:25 <_lore_> instead of hard code ACL and so on 17:47:30 <_lore_> right? 17:48:01 <zhouhan> _lore_: I also posted an issue in ovn-k8s to discuss the reason why multiple default group is used instead of one. I think in ovn-scale-test we can test the scalability difference. 17:48:08 <zhouhan> yes 17:48:15 <zhouhan> that's right 17:48:49 <zhouhan> Then we can change the json file and test different scenarios and compare the results 17:49:18 <_lore_> ack, I will look how to generalize it 17:49:30 <_lore_> but the concept is ok for you, right? 17:49:50 <zhouhan> _lore_: sorry, what concept? 17:50:33 <_lore_> I mean to implement OpenShift network policy 17:51:15 <zhouhan> _lore_: yes, of course 17:51:33 <_lore_> ack fine 17:51:44 <_lore_> that's all from my side 17:51:53 <numans> Ok. Thanks. 17:51:56 <numans> Who wants to go next 17:52:11 <zhouhan> I can go quickly 17:52:23 <numans> sure 17:53:22 <zhouhan> We observed same problem dceara is fixing - the ovsdb missing updates, and spent lot of time debugging, until I recall what dceara has reported. So, thanks! 17:53:40 <zhouhan> I am reviewing dceara's v3 patch 17:53:47 <dceara> zhouhan, no worries :) Does the patch fix the issue for you? 17:54:27 <dceara> zhouhan, there's a v4 (addressing Ilya's comments): https://patchwork.ozlabs.org/project/openvswitch/list/?series=172109 17:54:29 <zhouhan> dceara: it is in production, so no chance to apply the patch. We just worked around by restarting all impacted ovn-controllers 17:54:38 <dceara> zhouhan, ack 17:56:16 <zhouhan> dceara: however, I am concerned even with a fix because in our case 1/3 of the HVs had the issue, which may due to a problem in one of the 3 nodes in the raft cluster. If all of the HVs starts to do the clear and resync at the same time if may cause very high load of the server. 17:56:30 <zhouhan> I will think more about it. 17:57:05 <zhouhan> Other than this, I was simply following up some of the discussions and reviews. 17:57:06 <dceara> zhouhan, you mean 1/3 of the nodes had the issue at the same time? 17:57:55 <zhouhan> 1/3 of the nodes were missing same flows and showing same warning logs at same time. 17:58:40 <dceara> zhouhan, oh, I see, we only saw it occasionally 17:59:58 <zhouhan> It could be that, when there is a change to SB, e.g. creating a new DP, it causes all HVs changing the condition, and at the same time one of the SB server were disconnected thus causing some of the flow updates in the following SB transaction missing. 18:00:43 <imaximets> dceara, zhouhan: one possibility that appears in mind is that we could store 'last_id' for the previous successful cond_change and use it instead of 0 on re-connection if there was in-flight cond_change. 18:01:10 <zhouhan> imaximets: yeah, great idea. 18:01:15 <dceara> imaximets, that might work yes 18:01:28 <zhouhan> well, I am done with my update 18:01:46 <flaviof> may I go next? 18:01:46 <dceara> imaximets, do you mind replying to the ML with the suggestion? I'll try it out tomorrow 18:01:58 <imaximets> dceara, ok. 18:02:03 <dceara> imaximets, thanks 18:02:22 <numans> flaviof, sure 18:02:37 <flaviof> numans: thanks. I spent some time taking a closer look at Ankur's port-range changes in OVN. I was a little 18:02:37 <flaviof> confused about the usage of the word external, thinking it was meant for external ip, but I 18:02:52 <flaviof> was wrong. It really means the source port range visible to the 'external' side of the connection. 18:02:52 <flaviof> And that may be the external_ip or the logical_ip, depending on the nat type (dnat vs snat). 18:03:09 <flaviof> More details on that are in the ML: 18:03:09 <flaviof> #link https://mail.openvswitch.org/pipermail/ovs-discuss/2020-April/049959.html questions on port-range 18:03:09 <flaviof> Anyways, I was able to test it some and verified that the nat rules are indeed populated properly 18:03:27 <flaviof> all the way into conntrack. Tested with ncat(s) ;) 18:03:35 <numans> cool 18:04:02 <flaviof> Also, I tried out Numan's fix to make OVN test 85 pass consistently. No 18:04:02 <flaviof> surprise it worked great. ;) 18:04:15 <flaviof> In the process, I saw OVN test 78 failing and decided to look into it. 18:04:15 <flaviof> Then, dceara told me that test 76 needed love too, and I was able to 18:04:15 <flaviof> reproduce the issue and make it better. 18:04:26 <flaviof> With these changes I get it to pass every time in my system, 18:04:27 <flaviof> but would love to have folks here find out if this is not just me. ;) 18:04:48 <flaviof> This is basically the command I do to test the changes: 18:04:57 <flaviof> CNT=0 ; while [ $? -eq 0 ]; do sleep 3 ; echo $(date +"%D %T %Z") -- cnt is $CNT ; \ 18:04:57 <flaviof> CNT=$((${CNT}+1)) ; make check TESTSUITEFLAGS="76 78 85" ; done 18:05:14 <flaviof> #link https://patchwork.ozlabs.org/project/openvswitch/patch/20200417145737.1769111-1-numans@ovn.org/ Fix for OVN test 85 18:05:14 <flaviof> #link https://patchwork.ozlabs.org/project/openvswitch/patch/20200417212501.23757-1-flavio@flaviof.com/ Fix for OVN test 78 18:05:14 <flaviof> #link https://patchwork.ozlabs.org/project/openvswitch/patch/20200423123731.29123-1-flavio@flaviof.com/ Fix for OVN test 76 18:05:24 <flaviof> that is it fro me 18:05:28 <flaviof> *from 18:05:30 <numans> flaviof, Thanks. I'll take a look tomorrow. I applied one of your patch and tested a wrong case. 18:05:43 <flaviof> lol. all good. 18:05:45 <numans> applied as in locally :) 18:05:59 <flaviof> timing things are tricky 18:05:59 <numans> Who wants to go next. 18:06:27 <numans> flaviof, yeah. And there're a few tests which fail 100% of the time in one of server I've access to. 18:06:34 <numans> I need to dig further. 18:06:40 <flaviof> awesome! 18:06:52 <numans> Anyway, if some one wants to go next/ 18:08:09 <dceara> I'd just like to bring up these OVN2.12 backports https://patchwork.ozlabs.org/project/openvswitch/list/?series=168987 It would be nice to have them backported as we need them downstream. 18:08:25 <dceara> #link https://patchwork.ozlabs.org/project/openvswitch/list/?series=168987 18:08:39 <dceara> flaviof, that's the way to share links right? :) 18:08:44 <dceara> that's it on my side 18:08:58 <numans> dceara, yes. 18:09:05 <flaviof> dceara: you got it. whatever comment you want to store, make it after the url 18:09:18 <dceara> flaviof, ah, now i see, ok, next time I know 18:11:03 <numans> Who wants to go next. 18:11:49 <aginwala> nm from my side but spent time with Han on debugging the flow miss issue since we have interconnection enabled now between two AZs, random pod/vm connectivity failures were reported by customers when reaching workloads to/from az1/az2. Hence, the fix by dceara helped us recall and audit results were surprising with 1/3 of the HVs missing the 18:11:50 <aginwala> updates. Following the ML for fixes by you guys. 18:13:25 <flaviof> aginwala: cool news on the interconnection you guys do! Great to hear from you 18:14:28 <aginwala> yo! 18:15:43 <numans> cool. 18:16:14 <numans> I thnk its time to end the meeting. 18:16:20 <zhouhan> numans: In case you didn't know yet, I confirmed that we also had the disconnection between RAFT nodes due to probe timeout you reported before, when the servers are overloaded, even though the election timer is not timed out yet. And NVIDIA folks also reported that and sent a patch to disable the probe #link https://patchwork.ozlabs.org/project/openvswitch/patch/20200331002104.26230-1-zhewang@nvidia.com/ 18:16:20 * flaviof cues up Space Bazooka :) http://dig.ccmixter.org/files/Kirkoid/43005 18:16:48 <numans> zhouhan, I saw that patch. 18:16:50 <numans> Ok. 18:17:11 <numans> zhouhan, thanks for the update. 18:17:18 <zhouhan> np 18:17:26 <numans> I guess we can end the meeting ? 18:18:02 <numans> Ok. Bye everyone. 18:18:08 <flaviof> bye all 18:18:13 <numans> #endmeeting