#openvswitch log

17:21:03 <numans> #startmeeting ovn-community-development-discussion
17:21:04 <openstack> Meeting started Thu Apr 23 17:21:03 2020 UTC and is due to finish in 60 minutes.  The chair is numans. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:21:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:21:07 <openstack> The meeting name has been set to 'ovn_community_development_discussion'
17:21:16 <numans> Hello
17:21:26 <numans> Lets start the meeting.
17:21:29 <zhouhan> hi
17:21:37 <numans> zhouhan, Hi
17:21:47 <numans> mmichelson is not joining today.
17:21:54 <numans> Who wants to start
17:22:21 <numans> I can go real quick.
17:22:40 <dceara> Hi
17:22:42 <numans> Last week I submitted the v3 of I-P patches
17:22:45 <numans> dceara, Hi
17:23:10 <numans> Then I started working on the OVN load balancer hashing.
17:23:34 <numans> I submitted a patch to enable openflow 1.5, since we cannot set the selection_method unless we use oflow 1.5
17:24:02 <numans> I'm still working on providing the option for the CMS to chose the hash fields - like ip_src, ip_dst, tp_src etc
17:24:14 <numans> which will be used for the hashing.
17:24:37 <numans> I'm working on another issue. Recently I submitted a patch to address an issue in tcp_reset action
17:24:55 <numans> And to support this, I added llflows to by pass tcp rst pkts from conntrack.
17:25:24 <numans> And this is causing the conntrack entries to be established state even after the client/server closes the connection by sending tcp rst pkt
17:25:28 <numans> I'm working on it.
17:25:31 <numans> That
17:25:38 <numans> That's it for me.
17:25:53 <numans> zhouhan, dceara It will be great if you can take a look at the I-P patches whenever you can.
17:25:54 <flaviof> Nice, numans! 2 follow up questions, if I may.
17:26:00 <numans> flaviof, sure.
17:26:07 <dceara> numans, ack
17:26:15 <_lore_> hi all
17:26:15 <zhouhan> numans: thanks a lot. I will review them
17:26:24 <numans> zhouhan, thanks.
17:26:26 <flaviof> 1) does the hashing for lp require a specific verion of the kernel? or anything that can do ovs1.5 is enough?
17:26:46 <numans> flaviof, No. it doesn't depend the kernel version.
17:26:57 <numans> OVS supports 2 hashing methods - dp_hash and hash.
17:27:08 <numans> dp_hash is calculated by the datapath.
17:27:30 <numans> for the latter, ovs calculates the hash.
17:27:48 <flaviof> numans: ah. ack. I heard from Maiciej about it today. Interesting find!
17:27:54 <flaviof> 2) can you tell me if the conntrack issue is a regression or has it always been there?
17:27:55 <zhouhan> numans: dp_hash should still ensure that same 5-tuple is hashed to same value, right?
17:28:09 <numans> zhouhan, that's not the case in my testing.
17:28:52 <numans> zhouhan,for a given 5tuple , I see that ovs is not choosing the same bucket all the time
17:29:04 <numans> flaviof, I don't think its a regression.
17:29:32 <zhouhan> numans: that's strange. For what I observed in the past it seems always the same bucket. Maybe I can do more test and confirm
17:29:48 <numans> zhouhan, that would be great if you could confirm
17:29:49 <flaviof> zhouhan: maybe you are using OF1.5 ?
17:30:07 <zhouhan> numans: usually I use ping (ICMP) to test. Does that impact the result?
17:30:08 <numans> flaviof, starting from ovs 2.10, the defualt selection method changed from hash to dp_hash in ovs.
17:30:16 <numans> zhouhan, I used ncat
17:30:29 <numans> and specified the source port.
17:31:09 <numans> flaviof, ovn was always selecting dp_hash, but since we are not using oflow1.5, that never reached ovs and it uses the default if no selection_method is set
17:31:33 <flaviof> ack, understood. ty
17:31:34 <zhouhan> numans: do you know by any chance what's the dp_hash algorithm? In the code comment it mentioned Webster method, but I never heard of it
17:31:36 <numans> zhouhan, I'd suggest to use ncat with src port specified in your testing
17:31:46 <numans> zhouhan, it is calculated by kernel
17:32:00 <numans> zhouhan, and from what I understand it uses skb_get_hash() for that
17:32:11 * numans merely knows datapath and I can be wrong.
17:32:45 <zhouhan> ok, thanks!
17:32:49 <imaximets> numans, zhouhan: IIRC, dp_hash is likely an RSS hash if available.
17:33:21 <numans> zhouhan, from the code I saw, it uses webster to allocate the hashes to buckets.
17:33:22 <numans> imaximets, ok
17:33:37 <imaximets> there might be difference while making upcall, because vswitchd will calculate 5tuple hash by itself if RSS hash is not present.
17:34:10 <zhouhan> numans: any links to "webster" would be helpful. Google just give me the dictionary :(
17:34:43 <numans> zhouhan, :). Even I'm not aware of it. I just saw the comments. But sure, I'll share if I come across
17:35:26 <zhouhan> imaximets: RSS hash should still make sure same flow hashed to same bucket, right? So I still don't understand how could same flow end up in different bucket
17:35:57 <imaximets> zhouhan, yes, RSS should ensure.
17:36:43 <numans> imaximets, zhouhan, I'm pretty certain on that. I did some prints, and the value of dp_hash was always different
17:36:55 <imaximets> the issue might be if you're calculating dp_hash in userspace for the first packet and using RSS in datapath for subsequent ones.  That is the only case I can think of.
17:37:38 <numans> https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-xlate.c#L4604
17:37:50 <numans> I put a print here and the dp_hash was different.
17:37:51 <numans> ok.
17:38:05 <imaximets> numans, if it's always different, this might be the bug in kernel. Maybe you're reading incorrect memory, or your hashing algorithm uses more fields than 5tuple.
17:38:29 <numans> imaximets, ovn doesn't set the selection_method so the default is used
17:38:57 <numans> imaximets, actually ovn sets, but it nevers gets encoded when ovn sends the group_mod message
17:39:11 <numans> I think we can continue further in the ML.
17:39:27 <imaximets> numans, ok.
17:39:34 <numans> zhouhan, I've replied to maciej's email. If you can take a look at that and reply to that after the testing that would be great.
17:39:42 <numans> imaximets, zhouhan thanks for the discussion.
17:39:50 <numans> I'm done. If someone wants to go next.
17:40:10 <zhouhan> numans: sure, will do
17:40:17 <_lore_> can I go next?
17:40:22 <numans> _lore_, sure.
17:40:25 <flaviof> #link   https://github.com/openvswitch/ovs/commit/2e3fd24c7c440f87d7a24fbfce1474237de7e1cf pick_dp_hash_select_group  ref on hash
17:40:36 <_lore_> this week I worked on a issue related to QoS ovn metering
17:41:18 <_lore_> in particular if you create 2 meters with same values for rate and burst on the same hv they will be mapped to the same kernel meter and they will share the bandwidth
17:41:35 <_lore_> I am wondering if it is an intended behaviour or not
17:41:39 <_lore_> any idea?
17:42:54 <_lore_> maybe not :)
17:43:10 <flaviof> sorry _lore_ i don't know
17:43:15 <numans> _lore_, not sure on that.
17:43:29 <_lore_> actually I posted a RFC patch to make the meters unique, I tested it and it works fine
17:43:37 <numans> _lore_, but I think we can fix that if that is causing incorrect meter allocation
17:43:55 <_lore_> it has been tested even by osp folks and they are fine with it
17:44:08 <_lore_> so I would send it as normal patch
17:44:15 <numans> _lore_, sounds good to me.
17:44:19 <_lore_> and in case we have some regression takes care of it
17:44:32 <_lore_> ack
17:44:48 <_lore_> moreover I posted some ipv6 pd trivial fixes
17:45:20 <_lore_> zhouhan: I read your reply about my ovn-scale-test PR but I did not get it exactly
17:45:28 <_lore_> what do you mean?
17:46:03 <zhouhan> _lore_: Oh, I mean, the port-groups and ACLs are better to be configured, instead of hard-coded in the implementation
17:46:37 <_lore_> ah ok
17:46:39 <zhouhan> _lore_: for ovn-scale-test, it should be able to test different port-groups and ACLs settings, without update the code everytime.
17:47:10 <_lore_> so you mean to generalize it adding the possibility to read the configuration from json file
17:47:25 <_lore_> instead of hard code ACL and so on
17:47:30 <_lore_> right?
17:48:01 <zhouhan> _lore_: I also posted an issue in ovn-k8s to discuss the reason why multiple default group is used instead of one. I think in ovn-scale-test we can test the scalability difference.
17:48:08 <zhouhan> yes
17:48:15 <zhouhan> that's right
17:48:49 <zhouhan> Then we can change the json file and test different scenarios and compare the results
17:49:18 <_lore_> ack, I will look how to generalize it
17:49:30 <_lore_> but the concept is ok for you, right?
17:49:50 <zhouhan> _lore_: sorry, what concept?
17:50:33 <_lore_> I mean to implement OpenShift network policy
17:51:15 <zhouhan> _lore_: yes, of course
17:51:33 <_lore_> ack fine
17:51:44 <_lore_> that's all from my side
17:51:53 <numans> Ok. Thanks.
17:51:56 <numans> Who wants to go next
17:52:11 <zhouhan> I can go quickly
17:52:23 <numans> sure
17:53:22 <zhouhan> We observed same problem dceara is fixing - the ovsdb missing updates, and spent lot of time debugging, until I recall what dceara has reported. So, thanks!
17:53:40 <zhouhan> I am reviewing dceara's v3 patch
17:53:47 <dceara> zhouhan, no worries :) Does the patch fix the issue for you?
17:54:27 <dceara> zhouhan, there's a v4 (addressing Ilya's comments):  https://patchwork.ozlabs.org/project/openvswitch/list/?series=172109
17:54:29 <zhouhan> dceara: it is in production, so no chance to apply the patch. We just worked around by restarting all impacted ovn-controllers
17:54:38 <dceara> zhouhan, ack
17:56:16 <zhouhan> dceara: however, I am concerned even with a fix because in our case 1/3 of the HVs had the issue, which may due to a problem in one of the 3 nodes in the raft cluster. If all of the HVs starts to do the clear and resync at the same time if may cause very high load of the server.
17:56:30 <zhouhan> I will think more about it.
17:57:05 <zhouhan> Other than this, I was simply following up some of the discussions and reviews.
17:57:06 <dceara> zhouhan, you mean 1/3 of the nodes had the issue at the same time?
17:57:55 <zhouhan> 1/3 of the nodes were missing same flows and showing same warning logs at same time.
17:58:40 <dceara> zhouhan, oh, I see, we only saw it occasionally
17:59:58 <zhouhan> It could be that, when there is a change to SB, e.g. creating a new DP, it causes all HVs changing the condition, and at the same time one of the SB server were disconnected thus causing some of the flow updates in the following SB transaction missing.
18:00:43 <imaximets> dceara, zhouhan: one possibility that appears in mind is that we could store 'last_id' for the previous successful cond_change and use it instead of 0 on re-connection if there was in-flight cond_change.
18:01:10 <zhouhan> imaximets: yeah, great idea.
18:01:15 <dceara> imaximets, that might work yes
18:01:28 <zhouhan> well, I am done with my update
18:01:46 <flaviof> may I go next?
18:01:46 <dceara> imaximets, do you mind replying to the ML with the suggestion? I'll try it out tomorrow
18:01:58 <imaximets> dceara, ok.
18:02:03 <dceara> imaximets, thanks
18:02:22 <numans> flaviof, sure
18:02:37 <flaviof> numans: thanks. I spent some time taking a closer look at Ankur's port-range changes in OVN. I was a little
18:02:37 <flaviof> confused about the usage of the word external, thinking it was meant for external ip, but I
18:02:52 <flaviof> was wrong. It really means the source port range visible to the 'external' side of the connection.
18:02:52 <flaviof> And that may be the external_ip or the logical_ip, depending on the nat type (dnat vs snat).
18:03:09 <flaviof> More details on that are in the ML:
18:03:09 <flaviof> #link https://mail.openvswitch.org/pipermail/ovs-discuss/2020-April/049959.html questions on port-range
18:03:09 <flaviof> Anyways, I was able to test it some and verified that the nat rules are indeed populated properly
18:03:27 <flaviof> all the way into conntrack. Tested with ncat(s)  ;)
18:03:35 <numans> cool
18:04:02 <flaviof> Also, I tried out Numan's fix to make OVN test 85 pass consistently. No
18:04:02 <flaviof> surprise it worked great. ;)
18:04:15 <flaviof> In the process, I saw OVN test 78 failing and decided to look into it.
18:04:15 <flaviof> Then, dceara told me that test 76 needed love too, and I was able to
18:04:15 <flaviof> reproduce the issue and make it better.
18:04:26 <flaviof> With these changes I get it to pass every time in my system,
18:04:27 <flaviof> but would love to have folks here find out if this is not just me. ;)
18:04:48 <flaviof> This is basically the command I do to test the changes:
18:04:57 <flaviof> CNT=0 ; while [ $? -eq 0 ]; do sleep 3 ; echo $(date +"%D %T %Z") -- cnt is $CNT ; \
18:04:57 <flaviof> CNT=$((${CNT}+1)) ; make check TESTSUITEFLAGS="76 78 85" ; done
18:05:14 <flaviof> #link https://patchwork.ozlabs.org/project/openvswitch/patch/20200417145737.1769111-1-numans@ovn.org/ Fix for OVN test 85
18:05:14 <flaviof> #link https://patchwork.ozlabs.org/project/openvswitch/patch/20200417212501.23757-1-flavio@flaviof.com/ Fix for OVN test 78
18:05:14 <flaviof> #link https://patchwork.ozlabs.org/project/openvswitch/patch/20200423123731.29123-1-flavio@flaviof.com/ Fix for OVN test 76
18:05:24 <flaviof> that is it fro me
18:05:28 <flaviof> *from
18:05:30 <numans> flaviof, Thanks. I'll take a look tomorrow. I applied one of your patch and tested a wrong case.
18:05:43 <flaviof> lol. all good.
18:05:45 <numans> applied as in locally :)
18:05:59 <flaviof> timing things are tricky
18:05:59 <numans> Who wants to go next.
18:06:27 <numans> flaviof, yeah. And there're a few tests which fail 100% of the time in one of server I've access to.
18:06:34 <numans> I need to dig further.
18:06:40 <flaviof> awesome!
18:06:52 <numans> Anyway, if some one wants to go next/
18:08:09 <dceara> I'd just like to bring up these OVN2.12 backports https://patchwork.ozlabs.org/project/openvswitch/list/?series=168987 It would be nice to have them backported as we need them downstream.
18:08:25 <dceara> #link https://patchwork.ozlabs.org/project/openvswitch/list/?series=168987
18:08:39 <dceara> flaviof, that's the way to share links right? :)
18:08:44 <dceara> that's it on my side
18:08:58 <numans> dceara, yes.
18:09:05 <flaviof> dceara: you got it. whatever comment you want to store, make it after the url
18:09:18 <dceara> flaviof, ah, now i see, ok, next time I know
18:11:03 <numans> Who wants to go next.
18:11:49 <aginwala> nm from my side but spent time with Han on debugging the flow miss issue since we have interconnection enabled now between two AZs, random pod/vm connectivity failures were reported by customers when reaching workloads to/from az1/az2. Hence, the fix by dceara helped us recall and audit results were surprising with 1/3 of the HVs missing the
18:11:50 <aginwala> updates. Following the ML for fixes by you guys.
18:13:25 <flaviof> aginwala: cool news on the interconnection you guys do! Great to hear from you
18:14:28 <aginwala> yo!
18:15:43 <numans> cool.
18:16:14 <numans> I thnk its time to end the meeting.
18:16:20 <zhouhan> numans: In case you didn't know yet, I confirmed that we also had the disconnection between RAFT nodes due to probe timeout you reported before, when the servers are overloaded, even though the election timer is not timed out yet. And NVIDIA folks also reported that and sent a patch to disable the probe #link https://patchwork.ozlabs.org/project/openvswitch/patch/20200331002104.26230-1-zhewang@nvidia.com/
18:16:20 * flaviof cues up Space Bazooka  :)  http://dig.ccmixter.org/files/Kirkoid/43005
18:16:48 <numans> zhouhan, I saw that patch.
18:16:50 <numans> Ok.
18:17:11 <numans> zhouhan, thanks for the update.
18:17:18 <zhouhan> np
18:17:26 <numans> I guess we can end the meeting ?
18:18:02 <numans> Ok. Bye everyone.
18:18:08 <flaviof> bye all
18:18:13 <numans> #endmeeting