#openvswitch log

17:18:55 <imaximets> #startmeeting ovn_community_development_discussion
17:18:56 <openstack> Meeting started Thu Aug 27 17:18:55 2020 UTC and is due to finish in 60 minutes.  The chair is imaximets. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:18:57 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:18:59 <openstack> The meeting name has been set to 'ovn_community_development_discussion'
17:19:25 <imaximets> I do not have much to say.  Whou wants to go first?
17:19:56 <dceara> I can start
17:20:18 <imaximets> dceara, ok.
17:20:36 <dceara> thanks zhouhan for reviewing the conntrack bypass patch, I'll probably not have time for a v3 until next week though.
17:21:01 <dceara> except for that I wanted to schedule a run on our scale setup with zhouhan
17:21:09 <dceara> 's I-P patches
17:21:23 <dceara> hopefully I can manage to do that tomorrow.
17:21:26 <zhouhan> dceara: thanks dceara
17:21:53 <dceara> And I sent a couple of bug fix patches today. One of which should probably be backported all the way until 20.03.
17:22:08 <dceara> That's it on my side, thanks!
17:22:59 <imaximets> dceara, thanks!
17:23:06 <zhouhan> I can go next
17:23:12 <imaximets> zhouhan, sure.
17:23:37 <zhouhan> I sent the series of incremental processing for flow installation: #link https://patchwork.ozlabs.org/project/openvswitch/list/?series=197009
17:24:24 <zhouhan> The CPU cost reduced around 40% for a scale of 1200 HV with 12K ports
17:24:58 <zhouhan> It also solves a bug when conjunction combination is used.
17:25:16 <zhouhan> (that may need back port as well)
17:25:42 <zhouhan> I also did more scale test for 3k HV with 30k ports.
17:26:36 <dceara> zhouhan: regarding the conjunction bug (I didn't look at the patches yet), but would it be possible to move it earlier in the series to make backporting easier?
17:27:10 <zhouhan> It ran successfully. However, current ovn-nbctl --wait=hv mechanism is not accurate for measuring the end to end latency, because the updates of nbcfg from all HVs actually contributes the most cost.
17:28:07 <zhouhan> dceara: I think the earlier patches are required by the bug fix. (The bug fix is actually a big part of the series)
17:28:40 <dceara> zhouhan: ack, thanks, i'll try to have a closer look too.
17:29:40 <zhouhan> To measure the latency more accurately, I think I need to improve the nb_cfg mechanism, to include a timestamp field. I will work on it.
17:30:21 <zhouhan> But overall, by manually checking the latency, it seems a port binding can finish within 4 - 5 sec at that scale.
17:31:19 <zhouhan> In addition, I did some code reviews. imaximets: could you take a look at this one as well?
17:31:21 <zhouhan> https://patchwork.ozlabs.org/project/openvswitch/patch/20200813205259.5036-1-zhewang@nvidia.com/
17:31:42 <zhouhan> That's it from me
17:32:20 <dceara> zhouhan: in our tests we wait until the port can ping its gateway (or an external host) and we see it taking >10sec in some cases. I didn't try with your patches yet though.
17:33:07 <imaximets> zhouhan, yeah, I looked a this patch and I'm thinking if it's possible to fix the issues from the inside of idl/jsonrpc, without requirement for CMS to call special functions.
17:33:28 <zhouhan> dceara: do you ping from all the VMs? I guess that action itself may take a lot of overhead.
17:34:07 <dceara> zhouhan: only from the new fake vm (netns) until it is successful
17:34:21 <zhouhan> imaximets: that would be better, if it can be supported.
17:35:04 <zhouhan> dceara: but there will be 30k of them? And we need to make sure the slowest one can ping ...
17:36:06 <zhouhan> dceara: or do you just ping from a random VM and assume most of the VMs got the flow installed at similar latency?
17:36:11 <dceara> zhouhan: in our tests we don't advance to create the next port until the current one can ping its own gateway.
17:37:32 <dceara> zhouhan: We also don't batch port add operations. This in order to try to see the worst case scenario latency for flow installation.
17:37:37 <zhouhan> dceara: Oh, I think that's a different scenario. I am testing when the whole scale is built up, then create and bind a new port, and see how long it takes for this new change to get processed in all the HVs (meaning the new port can reach all other ports)
17:38:31 <dceara> zhouhan: I see, ok, I can set try to set our scenario in a similar way too, thanks.
17:38:56 <dceara> s/can set try/can try/
17:39:05 <imaximets> zhouhan, I do not know yet, how to make re-balancing of connections work from the inside of idl, I will likely reply to ML with some ideas a bit later, if any.
17:39:23 <zhouhan> dceara: to make sure *all* HVs has processed the new change, I am utilizing the --wait=hv feature. Now I realized that this mechanism itself was a bottleneck (even after solving the flooding problem).
17:40:05 <dceara> zhouhan: ack, we decided to go for the ping approach exactly to avoid --wait=hv
17:41:05 <zhouhan> dceara: So I am thinking about posting a timestamp from each ovn-controller while reporting the nb_cfg number it processed, so the nbctl can finally rely on the timestamp to calculate the time spent for the slowest HV
17:42:04 <zhouhan> imaximets: ok, thanks! But do you think that could be a follow-up improvement, independent of the command provided by that patch?
17:42:51 <zhouhan> (of course, with that improvement, the current patch provided won't be as useful any more)
17:43:23 <imaximets> zhouhan, in general, I'd like to avoid introduction of new commands if possible, especially if we can fix the issue in general.
17:44:45 <imaximets> zhouhan, how these commands supposed to be used?  Will CMS just re-disribute all the clients by itself, or will nominate only part of them for re-connection?
17:45:00 <zhouhan> imaximets: agree in general. But I feel this command does provide some value for operational need.
17:45:48 <Ankur1> Hi
17:46:32 <zhouhan> imaximets: I think the typical case is when failover happend and the node recovered, the newly recovered node has no connections. So operator can use the command to instruct some of the HVs to connect back to the recovered node.
17:47:07 <Ankur1> hi
17:47:16 <Ankur171> ab
17:47:44 <zhouhan> imaximets: but it may also be useful if someone wants to adjust (fine tune) the load of different servers but moving clients from one server to another.
17:48:28 <imaximets> zhouhan, I see. Let me think a little bit.  I will reply on ML, or just apply the patch if there will be no clever ideas from my side. :)
17:48:33 <zhouhan> s/but moving/by moving
17:48:46 <zhouhan> thanks imaximets
17:49:13 <dceara> zhouhan: I think there were some similar discussions on the ML at some point about adding a timestamp to the sync mechanism. We can probably continue the chassis.nb_cfg discussion there.
17:49:43 <zhouhan> dceara: sure
17:50:14 <zhouhan> dceara: maybe I will try a POC first
17:50:21 <dceara> zhouhan: cool
17:53:51 <imaximets> OK.  Anyone else wants to share some updates?
17:55:33 <imaximets> So, I think, we could call it now.
17:56:31 <imaximets> #endmeeting