17:16:16 <mmichelson> #startmeeting ovn-community-development-discussion
17:16:16 <openstack> Meeting started Thu May 28 17:16:16 2020 UTC and is due to finish in 60 minutes.  The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:16:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:16:19 <openstack> The meeting name has been set to 'ovn_community_development_discussion'
17:16:21 <mmichelson> How's everyone?
17:16:30 <zhouhan> Hi
17:16:31 <numans> Good
17:16:33 <numans> Hi
17:16:40 <numans> panda|off, Hi
17:16:50 <panda|off> numans: hello.
17:17:35 <flaviof> hi all
17:17:37 <mmichelson> Cool. I got an ack for branch creation patches, so I'm planning to get the branch created. Once we have an ACK on numans's I-P patch series then we can backport it to the branch. From there, we can take some time to test/bug fix and release 20.06.1 in early June
17:17:44 <_lore_> hi all
17:17:59 <mmichelson> s/20.06.1/20.06.0/
17:18:14 <numans> sounds good
17:18:16 <mmichelson> What are people's opinions about making a 20.03.1 release?
17:18:27 <mmichelson> There are some important bug fixes in there.
17:18:31 <zhouhan> sounds good to me
17:19:15 <numans> I agree with that. There are bug fixes which went into 20.03 branch
17:19:17 <dceara> ++ for 20.03.1
17:19:20 <numans> *many*
17:19:24 <mmichelson> OK, cool.
17:19:43 <mmichelson> Aside from release-related stuff, I don't have much to talk about so that's it for me.
17:19:52 <numans> I can go real quick
17:20:10 <numans> I've been working on addressing the I-P patches.
17:20:20 <numans> I submitted v9 a while back.
17:20:29 <numans> zhouhan and dceara FYI
17:20:40 <numans> And thanks a lot for showing patience with me during the reviews.
17:21:00 <numans> I also did some reviews a bit today.
17:21:01 <zhouhan> numans: sure, will take a look asap
17:21:04 <dceara> numans, ack, sorry, I'm quite slow in reviewing them, I only focused on the first patches of the previous versions of the series
17:21:05 <numans> thanks.
17:21:16 <numans> dceara, no worries.
17:21:25 <numans> That's it from me.
17:21:38 <numans> Unless there are any questions.
17:21:44 <zhouhan> numans: it is really big change and great improvement. thanks numans!
17:21:59 <numans> Thanks.
17:22:14 <numans> zhouhan, Got any chance to test these patches with your scale setup ?
17:22:30 <dceara> numans, zhouhan I had a question about the tracked data but I can share it on the ML if that's preferable
17:22:30 <numans> If so, I hope all went fine. If not, no worries :)
17:22:56 <zhouhan> numans: no, we are still working on setting up a new scale env since the old one was decommed
17:23:04 <numans> Ok.
17:23:31 <numans> dceara, I'm fine anything. If you want we can discuss here
17:24:04 <numans> or if it would consume more time, we can take it in ML too.
17:24:06 <dceara> numans, i was just wondering if we really need a new field in the I-P engine nodes for "changed data". Isn't it enough to use the existing "data" field
17:24:30 <dceara> numans, but we can discuss the implications on the ML, it's probably better :)
17:24:36 <numans> dceara, ack.
17:24:51 <numans> or at the end when everyone is done.
17:24:58 <dceara> numans, ack
17:25:15 <numans> That's it from me. If some one wants to go next.
17:25:18 <panda|off> numans: I'm new, so I would be curious to know how did you sart proposing this big topic. If there's a change overview document somewhere. Sorry for the trivial question, you don't need to answer me now.
17:26:01 <numans> panda|off, You mean if we have like feature proposal thing similar to what openstack does ?
17:26:07 <numans> before working on a feature ?
17:26:12 <panda|off> numans: yes
17:26:43 <numans> We usually do everything in the mailing list. And if someone wants to work on a feature/refactoring, he/she can chose to discuss before
17:27:00 <zhouhan> I can go next
17:27:10 <numans> or submit the patches as RFC to discuss further in ML
17:27:12 <numans> zhouhan, sure.
17:27:42 * zhouhan sorry my screen was frozen and didn't see discussion going on. I can wait ...
17:27:44 <panda|off> numans: thanks
17:28:09 <numans> panda|off, let me know if any further questions.
17:28:19 <numans> If not, zhouhan all yours.
17:28:23 <flaviof> panda|off: ml info: https://mail.openvswitch.org/mailman/listinfo
17:29:14 <zhouhan> ok, I was mostly on code reviews and discussions.
17:29:14 <panda|off> numans: flaviof ok, thanks, I subcribed, but only recently so proably I missed the conversation. No other questions for now
17:30:01 <zhouhan> I am also waiting for review of the OVS patch: #link https://patchwork.ozlabs.org/project/openvswitch/patch/1589527067-91901-1-git-send-email-hzhou@ovn.org/
17:30:26 <zhouhan> Hope blp or Ilya could take a look there.
17:30:43 <zhouhan> That's all
17:31:13 <imaximets> zhouhan, I'll take a look, but most likely on next week.
17:31:33 <zhouhan> imaximets: thanks a lot. Glad you are here :)
17:32:17 <dceara> I can go next if that's ok
17:32:28 <zhouhan> dceara: if we have time, we can discuss the flow explosion problem later
17:32:39 <dceara> zhouhan, ah, exactly what I wanted to mention :)
17:33:28 <dceara> I ported your RFC v2 patch downstream for dalvarez to test in their scaled OSP setup. To see how big of an impact it is on openstack
17:34:17 <zhouhan> dceara: that patch alone doesn't solve the problem because the ARP requests are broadcasted and learned by other routers
17:34:48 <dceara> zhouhan, ack, the question was more regarding the main culprit: i.e., high number of logical flows and/or mac_bindings
17:35:34 <dceara> except for the flow explosion problem, I did a few reviews this week and sent a new version of the monitor_cond_change series (thanks zhouhan for acks!)
17:35:40 <dceara> that's it on my side
17:37:14 <flaviof> dceara: do you see any value/possibility in scaling better if the mac binding table was a per chassis thing?
17:37:22 <zhouhan> dceara: well, move lflow to mac-bindings should have some improvement, but it is still O(N^2) problem. But I think that RFC patch together with the proposed learn_from_arp_request=false approach should solve the problem completely.
17:38:42 <numans> flaviof, the idea of the mac_Bidning table is so that other ovn-controllers can learn and program.
17:38:48 <dceara> zhouhan, I agree. Did you already start looking at how we could implement learn_from_arp_request?
17:39:18 <zhouhan> dceara: yes, I am working on it
17:39:24 <dceara> zhouhan, nice!
17:39:52 <flaviof> numans: ack. but onlt the gw chassis that own the fip is the one that needs it? I may be missing something here, so my apologies if I'm somking something with this idea.
17:39:58 <flaviof> *only
17:40:01 <numans> I'm way behind on this issue due to PTO :)
17:40:17 <numans> flaviof, if we have per chassis, we can as well just remove mac_binding table and maintain the learnt mac bindings in-memory.
17:40:26 <numans> flaviof, thats fine :)
17:41:30 <dceara> zhouhan, back to flow explosion, to close the loop, for the self originated ARP packets (i.e., src.mac == router-mac) we keep the same behavior: flood in the L2 domain, right?
17:42:31 <zhouhan> flaviof: the mac-binding is equivalent to the dynamic ARP cache table in a physical router. Since OVN routers are distributed (except the gateway routers), it should be in SB DB so that all chassis with a leg in the LR can use it. I am not sure how to make it per-chassis.
17:43:22 <zhouhan> dceara: yes I think we can keep it unchanged, unless we really see a scale issue in the dataplane for the ARP messages.
17:44:12 <dceara> zhouhan, sounds good to me
17:44:58 <flaviof> zhouhan: ack. I was thinking along the lines of a sb db table that only gets populated on the chassis that need it, instead of all the gateway chassis.
17:45:30 <flaviof> so, thinking specifically about the gw routers
17:47:16 <dceara> flaviof, I think that could work too, but has the disadvantage that all/a lot of chassis might need to learn the same ARP in specific scenarios.
17:48:03 <numans> dceara, which could be ok, since its a onetime thing.
17:48:11 <numans> s/a/an
17:48:19 <flaviof> ack. thanks for entertaining the idea. I have lots to learn before talking more nonsense
17:48:20 <zhouhan> flaviof: in this case, for the gw routers, all the gateway chassis need (they thought) 99% of the entries (if there are 100 GRs). The proposal of learn_from_arp_request=false is to let the admin tell the GRs that they don't need them.
17:49:47 <dceara> numans, until we add mac_binding timeout support. If that will ever be needed.
17:49:54 <flaviof> zhouhan: ah, ack. I think we are on the same page then. just about how we could be smart about distributing the table so not all the entries need to be in a single gw somehow.
17:50:30 <flaviof> dceara: ack
17:51:04 <numans> dceara, at the end we can discuss about the tracking data if you're fine :)
17:51:17 <dceara> numans, sure, sounds good to me.
17:51:26 <_lore_> can I go next? very fast
17:52:07 <_lore_> this week I mainly this week I finalized the work for DVR ARP issue upstream and downstream
17:52:46 <_lore_> :s/this week I mainly//
17:53:03 <_lore_> thx to zhouhan, dceara and numans for the support
17:53:21 <zhouhan> _lore_: thank you
17:53:31 <_lore_> now I am switching to finalize the work to run perf on ovn-scale-test
17:53:38 <dceara> _lore_, np :)
17:54:04 <_lore_> I need to add the capability to stop it a given iteration otherwise the output file will be huge
17:54:21 <_lore_> that's all from my side
17:55:01 * zhouhan behind the reviews on ovn-scale-test
17:55:11 <_lore_> zhouhan: I have not sent it yet :)
17:55:57 <mmichelson> Does anyone else have anything to share?
17:56:21 <imaximets> I could
17:56:55 <imaximets> I worked with the ovsdb memory consumption issue for a last couple of weeks.
17:57:35 <imaximets> It turned out to be (at least partially) due to huge jsnonrpc backlogs on raft connections
17:57:53 <imaximets> 2 patches prepared.  Thanks zhouhan for reviews!
17:58:03 <zhouhan> imaximets: np :)
17:58:06 <imaximets> Both applied today on master.
17:58:33 <imaximets> And there was also one small memory leak while reading db.  Applied and backported too.
17:59:22 <imaximets> I also applied today the patch from zhouhan that backports hash passing via upcall.
17:59:44 <imaximets> for the out-of-tree kernel module
17:59:55 <imaximets> That's it.
18:00:03 <zhouhan> thanks a lot imaximets
18:00:16 <imaximets> zhouhan, np.
18:01:04 <imaximets> And I have one small fix for the "timeout" filed in "wait" command
18:01:24 <imaximets> https://patchwork.ozlabs.org/project/openvswitch/patch/20200525170702.761482-1-i.maximets@ovn.org/
18:01:53 <imaximets> whould be nice to have some reviews.  It blocks Cirrus CI on FreeBSD.  clang build failure.
18:02:03 <imaximets> Now that's it. :)
18:02:39 <numans> imaximets, I saw the same issue with my fedora 32 and clang
18:02:48 <numans> imaximets, I'll take a look tomorrow.
18:02:57 <imaximets> numans, thanks!
18:03:32 <numans> dceara, you want to ? if no is going next ?
18:04:34 <dceara> numans, we can also discuss after the meeting officially ends. I'm not sure what the best way is. I'm fine either way.
18:04:37 <numans> dceara, regarding the tracked data, can you plz take a  look at the patch 5 of the series and see if it can be done without the tracked data ?
18:04:48 <zhouhan> imaximets: looks like a simple fix, but since numans encountered the same issue I will let him try and ack :)
18:05:17 <numans> zhouhan, although its a while I've looked into the ovsdb code :).
18:05:28 <numans> I'll give test it out tomorrow
18:05:53 <dceara> numans, I'm not suggesting to not track data. I'm just asking, by looking at patch 4, what is the difference (from an I-P engine node perspective) between "data" and "tracked_data"?
18:06:09 <numans> dceara, tracked data gets cleared after every run
18:06:13 <dceara> numans, seems to me like the only difference is that tracked data needs to be cleared
18:06:29 <numans> dceara, I tried to model just like how we have IDL track for loops.
18:06:33 <dceara> numans, so then isn't it easier to just add a clear_data handler that can does that?
18:06:41 <dceara> s/can does/can do
18:07:16 <numans> dceara, we could probably do, but do you see any issues ? design concerns with this approach ?
18:07:41 <numans> I think this is clear. It also has track clear handler.
18:07:54 <numans> But I'm definitely biased :)
18:08:22 <dceara> numans, the only issue I had with it was that looking at the APIs we expose for tracked_data they do exactly the same thing as the ones we expose for "data". I was wondering if that would be confusing for people trying to understand the I-P engine.
18:08:50 <mmichelson> dceara, to offer my perspective, yes :)
18:08:54 <dceara> numans, but I don't have a really strong opinion about it.
18:09:05 <mmichelson> I had to look real close to understand the meaning of "tracked_data"
18:09:17 <numans> mmichelson, Ok.
18:09:29 <mmichelson> And it turns out it just is cleared on every run. Not sure why that's called "tracked"
18:10:13 <numans> mmichelson, I thought of the name from this macro for example - SBREC_PORT_BINDING_TABLE_FOR_EACH_TRACKED
18:10:20 <dceara> mmichelson, numans, it's quite clear when you look at how it's used in follow up patches but just looking at the inc-proc-eng.[ch] I was a bit confused
18:10:41 <numans> mmichelson, if you see, the idl code, it gets cleared after every idl_run
18:10:57 <zhouhan> I kind of agree with dceara. The cleanup in every engine iteration is useful, and we can abstract it just as cleanup. We can document it and point out it is particularly useful for cleanup tracked changes when necessary.
18:11:22 <numans> dceara, may be it needs more comments ?
18:11:24 <numans> Ok.
18:11:50 <numans> dceara, zhouhan mmichelson I'm fine removing the tracked_data, which means we store the same in the engine data ?
18:11:53 <dceara> numans, I'll share my thoughts on the ML and we can decide there. As I said, I just wanted to make sure I understand the differences between the two.
18:12:13 <numans> Ok.
18:12:18 <dceara> numans, btw, great work tackling the I-P enhancement. That's a LOT of work
18:12:37 <zhouhan> numans: the problem in I-P is that there is no way to abstract tracking because each node has its own ways of data orgnization with freedom, so the nodes depending on them have to explain (convert) the data anyway.
18:12:57 <numans> Thanks.
18:13:58 <numans> zhouhan, Ok. Right now the flowoutput handler for runtime data checks if there is tracked data. If we remove this tracked_data, then it has to see the runtime data->is_tracked for example.
18:14:18 <zhouhan> numans: but I didn't see it particularly harmful so I didn't comment on it earlier. It is merely add more burden to the I-P engine to maintain. But not a big deal.
18:14:32 <numans> My intention was to model around the idl tracking.
18:15:28 <numans> zhouhan, Ok. We can discuss it in ML, and happy to remove it if it doesn't add much value/or complicate the code.
18:15:55 <dceara> numans, zhouhan thanks for the time on this. It's more clear for me now.
18:16:16 <numans> You're welcome and thanks for brining it up here.
18:16:34 <mmichelson> I'm going to end the meeting here, but don't let that stop you from continuing to talk here if you wish.
18:16:35 <numans> I think it is also time
18:16:37 <mmichelson> #endmeeting