17:18:55 <imaximets> #startmeeting ovn_community_development_discussion 17:18:56 <openstack> Meeting started Thu Aug 27 17:18:55 2020 UTC and is due to finish in 60 minutes. The chair is imaximets. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:18:57 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:18:59 <openstack> The meeting name has been set to 'ovn_community_development_discussion' 17:19:25 <imaximets> I do not have much to say. Whou wants to go first? 17:19:56 <dceara> I can start 17:20:18 <imaximets> dceara, ok. 17:20:36 <dceara> thanks zhouhan for reviewing the conntrack bypass patch, I'll probably not have time for a v3 until next week though. 17:21:01 <dceara> except for that I wanted to schedule a run on our scale setup with zhouhan 17:21:09 <dceara> 's I-P patches 17:21:23 <dceara> hopefully I can manage to do that tomorrow. 17:21:26 <zhouhan> dceara: thanks dceara 17:21:53 <dceara> And I sent a couple of bug fix patches today. One of which should probably be backported all the way until 20.03. 17:22:08 <dceara> That's it on my side, thanks! 17:22:59 <imaximets> dceara, thanks! 17:23:06 <zhouhan> I can go next 17:23:12 <imaximets> zhouhan, sure. 17:23:37 <zhouhan> I sent the series of incremental processing for flow installation: #link https://patchwork.ozlabs.org/project/openvswitch/list/?series=197009 17:24:24 <zhouhan> The CPU cost reduced around 40% for a scale of 1200 HV with 12K ports 17:24:58 <zhouhan> It also solves a bug when conjunction combination is used. 17:25:16 <zhouhan> (that may need back port as well) 17:25:42 <zhouhan> I also did more scale test for 3k HV with 30k ports. 17:26:36 <dceara> zhouhan: regarding the conjunction bug (I didn't look at the patches yet), but would it be possible to move it earlier in the series to make backporting easier? 17:27:10 <zhouhan> It ran successfully. However, current ovn-nbctl --wait=hv mechanism is not accurate for measuring the end to end latency, because the updates of nbcfg from all HVs actually contributes the most cost. 17:28:07 <zhouhan> dceara: I think the earlier patches are required by the bug fix. (The bug fix is actually a big part of the series) 17:28:40 <dceara> zhouhan: ack, thanks, i'll try to have a closer look too. 17:29:40 <zhouhan> To measure the latency more accurately, I think I need to improve the nb_cfg mechanism, to include a timestamp field. I will work on it. 17:30:21 <zhouhan> But overall, by manually checking the latency, it seems a port binding can finish within 4 - 5 sec at that scale. 17:31:19 <zhouhan> In addition, I did some code reviews. imaximets: could you take a look at this one as well? 17:31:21 <zhouhan> https://patchwork.ozlabs.org/project/openvswitch/patch/20200813205259.5036-1-zhewang@nvidia.com/ 17:31:42 <zhouhan> That's it from me 17:32:20 <dceara> zhouhan: in our tests we wait until the port can ping its gateway (or an external host) and we see it taking >10sec in some cases. I didn't try with your patches yet though. 17:33:07 <imaximets> zhouhan, yeah, I looked a this patch and I'm thinking if it's possible to fix the issues from the inside of idl/jsonrpc, without requirement for CMS to call special functions. 17:33:28 <zhouhan> dceara: do you ping from all the VMs? I guess that action itself may take a lot of overhead. 17:34:07 <dceara> zhouhan: only from the new fake vm (netns) until it is successful 17:34:21 <zhouhan> imaximets: that would be better, if it can be supported. 17:35:04 <zhouhan> dceara: but there will be 30k of them? And we need to make sure the slowest one can ping ... 17:36:06 <zhouhan> dceara: or do you just ping from a random VM and assume most of the VMs got the flow installed at similar latency? 17:36:11 <dceara> zhouhan: in our tests we don't advance to create the next port until the current one can ping its own gateway. 17:37:32 <dceara> zhouhan: We also don't batch port add operations. This in order to try to see the worst case scenario latency for flow installation. 17:37:37 <zhouhan> dceara: Oh, I think that's a different scenario. I am testing when the whole scale is built up, then create and bind a new port, and see how long it takes for this new change to get processed in all the HVs (meaning the new port can reach all other ports) 17:38:31 <dceara> zhouhan: I see, ok, I can set try to set our scenario in a similar way too, thanks. 17:38:56 <dceara> s/can set try/can try/ 17:39:05 <imaximets> zhouhan, I do not know yet, how to make re-balancing of connections work from the inside of idl, I will likely reply to ML with some ideas a bit later, if any. 17:39:23 <zhouhan> dceara: to make sure *all* HVs has processed the new change, I am utilizing the --wait=hv feature. Now I realized that this mechanism itself was a bottleneck (even after solving the flooding problem). 17:40:05 <dceara> zhouhan: ack, we decided to go for the ping approach exactly to avoid --wait=hv 17:41:05 <zhouhan> dceara: So I am thinking about posting a timestamp from each ovn-controller while reporting the nb_cfg number it processed, so the nbctl can finally rely on the timestamp to calculate the time spent for the slowest HV 17:42:04 <zhouhan> imaximets: ok, thanks! But do you think that could be a follow-up improvement, independent of the command provided by that patch? 17:42:51 <zhouhan> (of course, with that improvement, the current patch provided won't be as useful any more) 17:43:23 <imaximets> zhouhan, in general, I'd like to avoid introduction of new commands if possible, especially if we can fix the issue in general. 17:44:45 <imaximets> zhouhan, how these commands supposed to be used? Will CMS just re-disribute all the clients by itself, or will nominate only part of them for re-connection? 17:45:00 <zhouhan> imaximets: agree in general. But I feel this command does provide some value for operational need. 17:45:48 <Ankur1> Hi 17:46:32 <zhouhan> imaximets: I think the typical case is when failover happend and the node recovered, the newly recovered node has no connections. So operator can use the command to instruct some of the HVs to connect back to the recovered node. 17:47:07 <Ankur1> hi 17:47:16 <Ankur171> ab 17:47:44 <zhouhan> imaximets: but it may also be useful if someone wants to adjust (fine tune) the load of different servers but moving clients from one server to another. 17:48:28 <imaximets> zhouhan, I see. Let me think a little bit. I will reply on ML, or just apply the patch if there will be no clever ideas from my side. :) 17:48:33 <zhouhan> s/but moving/by moving 17:48:46 <zhouhan> thanks imaximets 17:49:13 <dceara> zhouhan: I think there were some similar discussions on the ML at some point about adding a timestamp to the sync mechanism. We can probably continue the chassis.nb_cfg discussion there. 17:49:43 <zhouhan> dceara: sure 17:50:14 <zhouhan> dceara: maybe I will try a POC first 17:50:21 <dceara> zhouhan: cool 17:53:51 <imaximets> OK. Anyone else wants to share some updates? 17:55:33 <imaximets> So, I think, we could call it now. 17:56:31 <imaximets> #endmeeting