18:15:50 <mmichelson> #startmeeting ovn_community_development_discussion 18:15:51 <openstack> Meeting started Thu Nov 12 18:15:50 2020 UTC and is due to finish in 60 minutes. The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:15:53 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:15:55 <openstack> The meeting name has been set to 'ovn_community_development_discussion' 18:16:03 <flaviof> hi everybody! 18:16:11 <numans_> Hi 18:16:24 <_lore_> hi all 18:16:28 <mdgray> Hi 18:16:52 <mmichelson> I guess I can go ahead and get things going 18:17:34 <mmichelson> First off, I sent an email last week announcing that tomorrow (13 November) is the soft freeze for OVN 20.12. So if you have any patches you want included in 20.12, please do your best to get it submitted to the mailing list by tomorrow. 18:18:07 <mmichelson> As far as work is concerned, I got my SNAT CT zone selection patch submitted yesterday. I submitted a v2 today that addresses 0-day robot's problems. 18:18:36 <mmichelson> I'll be doing reviews today and tomorrow to try to reduce the review load after soft freeze 18:18:49 <mmichelson> And that's all from me, I suppose. 18:19:07 <flaviof> mmichelson: you fixed the robot or did the robot complained about something you had to fix? 18:19:25 <mmichelson> flaviof, the robot complained about something I had to fix. 18:19:37 <blp> hi 18:19:49 <mmichelson> flaviof, there were some minor checkpatch problems, but there was also a complaint about losing const-ness in a function call I was making 18:19:59 <mmichelson> It wasn't reported by gcc locally, so that was strange. But it's fixed in v2 18:20:04 <flaviof> mmichelson: ack. ++ 18:20:27 <blp> Must be different compiler versions. 18:21:11 <mmichelson> blp, that's what I assume 18:22:01 <blp> OK, I'll go next then. 18:22:04 <flaviof> mmichelson blp are we planning to have ovn-nrthd-ddlog in ovn 20.12? 18:22:19 <mdgray> mmichelson: I had a similar issue a while back. Passed in travis as well - compiler versions for sure. 18:22:40 <blp> I am continuing to iterate on the ddlog series. I'd love to have it in 20.12. I'd assume that it would be OK after the soft freeze because it's been cooking for some time, but I'd accept a contrary ruling of course. 18:24:26 <numans> blp, If I may, there are a couple of issues I found with my ddlog testing. 18:24:33 <mmichelson> blp, FYI we did some scale testing with DDLog. We encountered some issues after a while. We think it has something to do with probe intervals. I'll let numans expand on it 18:24:46 <blp> numans: Please explain 18:25:24 <numans> 1. There is compilation issue with the travis CI. If you could please take a look. Looks like make dist has few issues. (not a big issue :)) - https://travis-ci.com/github/numansiddique/ovn/jobs/434778314 18:26:07 <numans> 2. Here are the results with ddlog with 500 fake nodes - https://imgur.com/JJpsMhv vs no ddlog - https://imgur.com/gmbhfoa 18:26:28 <numans> blp, In the logs I see that ovn-northd-ddlog is sending probe interval every 1 second, even if I disable probe interval - ovn-nbctl set NB_Global . options:northd_probe_interval=0 18:26:41 <numans> And I think as scale increases, ovn-northd-ddlog is dropping the connection to sb db due to 1 second probe interval. 18:26:48 <numans> I think its happening because of this - https://github.com/blp/ovs-reviews/blob/ddlog9/northd/ovn-northd-ddlog.c#L893 18:28:08 <numans> I'm planning to hack the probe interval thing and run again to rule out if the issue is because of constant reconnections to the sb db. 18:28:48 <blp> Ah. OK. Thanks for those reports. I'll fix them for v6. 18:29:12 <blp> Probably simple to fix. 18:29:48 <numans> blp, In the northd-ddlog logs I observed this - 2020-11-10T13:51:32.186Z|01351|timeval|WARN|Unreasonably long 335427ms poll interval (214ms user, 1ms system) 18:30:00 <numans> blp, not sure if it is related to the probe interval. 18:30:03 <numans> blp, welcome. 18:30:31 <flaviof> numans: nice graphs! Is the probe interval on the non-ddlog version test set to 0? 18:31:06 <numans> flaviof, I think with ddlog, the north_probe_interval configured on the db is not taking effect. 18:31:07 <blp> Wow, that's a really long main loop. The difference between elapsed and user+system is suspicious though. 18:32:28 <numans> blp, Also with really huge SB DB, we are hitting this issue - https://github.com/ovn-org/ovn/commit/1e59feea933610b28fd4442243162ce35595cfee with ddlog too 18:32:37 <numans> blp, again related to probe interval. 18:32:48 <numans> You may want to consider the same for ddlog too. 18:33:17 <numans> other than the ddlog testing, I worked on my hairpin patches 18:33:21 <numans> and submitted v4. 18:33:27 <numans> I also did some code reviews. 18:33:31 <numans> That's it from me. 18:33:56 <blp> numans: Thanks for pointing that issue out. I'll look at fixing it too. 18:34:04 <numans> blp, thanks. 18:35:54 <numans> If some one wants to go next. 18:35:57 <_lore_> can I go next? 18:36:01 <_lore_> pretty fast 18:36:11 <numans> sure 18:36:21 <mmichelson> go for it 18:36:26 <_lore_> this week I posted a native implementation for bfd in ovn 18:36:33 <_lore_> (actually today) 18:36:51 <_lore_> it is not complete yet, but I wanted to post it to collect feedbacks 18:37:28 <_lore_> we decided to implement it in ovn since the ovs one seems not so suitable to our needs, we want to enable it on ovn entities, like logical router ports 18:37:50 <blp> Makes sense. 18:38:01 <_lore_> moreover we would enstablish a connection between a logical router port and multiple external entities 18:38:12 <_lore_> not belonging to ovn "domain" 18:38:14 <blp> Does it make sense to try to share code between the implementations? 18:38:23 <blp> (I don't have a guess at that.) 18:38:32 <_lore_> I looked at ovs implementation 18:38:52 <_lore_> and it relies on ovs stuff (netdev, interface, ecc.) 18:38:57 <_lore_> so it is not so easy 18:39:17 <_lore_> maybe we can do as follow-up series 18:40:14 <_lore_> I implemented basic functionalities 18:40:25 <_lore_> tx/rx, state machine 18:40:33 <blp> I think that any code sharing would have to be factoring out a common library and using it both places. I don't have any idea whether that would be productive. 18:41:00 <_lore_> yes, I agree 18:41:21 <_lore_> when the code is more or less ready let's see how much we can share 18:41:27 <_lore_> agree? 18:42:02 <blp> yes 18:42:06 <_lore_> ok, cool 18:42:14 <_lore_> so I will continue to work on my implementation 18:42:21 <_lore_> that's all from my side, thx 18:42:53 <flaviof> _lore_ native in ovn meaning the bfd packets are generated/handled by the ovn-controller process? 18:43:01 <_lore_> yes 18:43:06 <_lore_> flaviof: correct 18:43:11 <flaviof> _lore_ cool beans. 18:43:21 <blp> How big is this likely going to need to scale? Hundreds or thousands of packets/sec might use a lot of CPU%. 18:43:41 <_lore_> blp: I do not think so 18:43:55 <_lore_> but even in ovs packets are sent in userspace 18:44:11 <_lore_> as upcall 18:44:59 <blp> Yes, of course. This takes one more hop across IPC, which could be an additional bottleneck. 18:45:56 <_lore_> I agree 18:46:29 <_lore_> let's sync with osp/osh folks on it 18:46:57 <_lore_> I think the main use-case is lb health-check 18:47:06 <_lore_> for ecmp routing 18:47:48 <blp> OK. Some network virtualization systems end up running BFD (etc.) in a full mesh across all chassis. That is expensive. 18:48:02 <blp> If it's just for a limited number of connections, it will scale better. 18:48:27 <_lore_> I do not have any exact number now 18:48:41 <_lore_> I will sync with osh folks and get back on it 18:48:57 <blp> What is osh? 18:49:17 <_lore_> open-shift 18:49:19 <_lore_> :) 18:49:32 <flaviof> g-osh! ;) 18:50:01 <blp> Ah. I did not know the abbrev. 18:50:14 <blp> And osp? 18:50:45 <imaximets> blp, OpenStack Platform, IIUC. 18:50:50 <numans> openstack 18:50:52 <_lore_> yes 18:50:59 <blp> Thanks. 18:51:02 <_lore_> sorry for the confusion :( 18:51:12 <blp> Anyone else? 18:51:19 <blp> I have my homework list, I think. 18:51:26 <imaximets> I have a quick update. 18:51:53 <imaximets> There is a recurring issue with the size of sb db. 18:52:15 <imaximets> So, this week I tried to prepare PoC patches to combine lflows. 18:52:45 <imaximets> There are lots of lflows that differs only in logical datapath column, so they could be combined to save space. 18:53:27 <imaximets> by introducing Logical_Datapath_Group table or something similar. 18:54:08 <imaximets> northd changes was kind of straightforward, but I found it really hard to make changes in ovn-controller. 18:54:10 <blp> Oh, interesting. Or the logical_datapath column could be made a set: empty -> all datapaths, otherwise all the listed datapaths. 18:54:25 <imaximets> blp, yep. 18:54:33 <blp> ovn-controller makes me sad right now. 18:54:37 <blp> It's complicated. 18:55:01 <imaximets> the issue with ovn-controller is that it works directly with database without any abstraction, so it's hard to modify schema in general. 18:55:36 <imaximets> the second big issue is that almost all parts of lflow processing depends on fact that lflow to datapath is 1:1 relation. 18:56:03 <blp> eventually it might make sense to apply ddlog to ovn-controller, but it does not solve any problems immediately 18:56:23 <imaximets> dceara said that he could help reducing dependencies from the logical datapath, at least we could eliminate dependency in match expression parsing code. 18:56:58 <imaximets> So, I'll wait for him to do that before continuing my efforts. 18:57:49 <blp> I guess another option would be to add a match on datapath to the expression logic. 18:58:08 <blp> And then allow the logical_datapath field to be empty. 18:58:52 <blp> A match on a port would have a prerequisite on the logical datapath that contains the port, I guess. 18:59:15 <imaximets> but will this reduce number of lflows in sb db? 18:59:41 <numans> blp, there are many lflows like match(1), action=next; which imaximets wants to reduce is what I understand. 18:59:58 <numans> this is just one example. there could be many 19:00:13 <imaximets> numans, there are lots of different flows. :) 19:00:16 <blp> Sure it would. That kind of flow would naturally work on any logical datapath. No prerequisites would be generated, since it doesn't match on a port. 19:00:29 <zhouhan> blp: I am thinking the other way around. I think putting everything in match condition as a string makes it hard for optimization in ovn-controller because we can't get any metadata before parsing the logical flow 19:00:58 <blp> Oh, i guess the other issue is that it's hard to just get the flows a chassis needs. Currently that's pretty easy. 19:01:08 <zhouhan> blp: yes 19:01:19 <numans> that's a good point. 19:01:43 <numans> blp, in the case of large scale deployments we have noticed that disabling conditional monitoring helps. 19:01:44 <imaximets> this will hit performance of ovn-controller. 19:02:04 <blp> numans: Is that because of CPU% on the dbserver? 19:02:08 <numans> otherwise ovsdb-server takes lot of cpu cycles to handle conditional monitoring 19:02:11 <numans> blp, yes 19:02:14 <zhouhan> blp: even today, because port exists only in match, we lose some chance to use that to filter out the lflows that doesn't need to be parsed by the chassis 19:02:31 <blp> OK. 19:03:23 <imaximets> Anyway ovn-controller needs some refactoring and untangling. 19:04:06 <mmichelson> that it does 19:04:10 <imaximets> Once this done I'll be able to continue with lflow combining. 19:05:02 <imaximets> In any way of implementation it should reduce size of sb db by factor of number of logical switches in some cases. 19:05:42 <imaximets> That's it from my side. 19:06:06 <mmichelson> Anybody else? 19:06:08 <flaviof> may I? 19:06:47 * flaviof thinking yes.... ;) 19:06:51 <flaviof> As mentioned in the ovs-dev ML, I'm wrapping up the "fair meters" changes, now using a bool column in the NB Meters table. TY blp and dceara for all the support! 19:07:06 <flaviof> #link https://mail.openvswitch.org/pipermail/ovs-dev/2020-November/377277.html the current plan on implementing fair meters 19:07:06 <flaviof> #link https://patchwork.ozlabs.org/project/ovn/patch/20201107213913.GC2753733@ovn.org/ blp generously explaining fair meters changes for ovn-northd-ddlog 19:07:15 <flaviof> I should be done soon, but want to wait for ddlog's merges first. 19:07:29 <flaviof> Besides ACL logs, I see only one other potential user of Meters table in NB: 19:07:29 <flaviof> #link https://github.com/ovn-org/ovn-kubernetes/blob/061d543f363d48d2a4f0fc1df95fea2c6129e3cf/go-controller/pkg/ovn/ovn.go#L435 controller_event 19:07:42 <blp> flaviof: Do you want any support for me at the moment on the fair meters changes? 19:07:43 <flaviof> There is nothing in northd that deal with that, so this is looking easier than I thought. I hope I'm not wrong. ;) 19:07:44 <flaviof> That is all from me. 19:08:08 <flaviof> blp maybe. But let me try it first. I'll let you know! 19:09:01 <numans> blp, one question on the ddlog, I notice that northd-ddlog is using update method instead of update2/update3. is that intentional ? Just curious. 19:09:30 <numans> i mean the jsonrpc updates from the ovsdb-server. 19:09:39 <blp> numans: someone else wrote that code and I haven't revisited it. I'll take a look. 19:09:56 <numans> blp, ack. thanks. 19:11:06 <blp> I already got in the coreutils doc update that Dumitru pointed out was needed. https://lists.gnu.org/archive/html/coreutils/2020-11/msg00014.html 19:11:56 <imaximets> blp, side question, there is a concern that ddlog doesn't use ovsdb-idl, but re-implements it. Maybe it's better to use ovsdb-idl instead? That might be easier to maintain. 19:12:42 <blp> The reason that ddlog doesn't use the idl is that it needs the deltas, which the idl hides. We could extend the idl to expose the deltas; it's a valid option. 19:12:51 <blp> If you like, I can explore that path. 19:13:20 <numans> blp, that could be useful to ovn-controller as well. 19:13:22 <imaximets> it would be great. At least to check. 19:13:34 <blp> I'll explore. 19:13:36 <zhouhan> blp: for deltas do you mean something different from change-tracking? 19:13:53 <imaximets> blp, thanks. 19:14:08 <zhouhan> blp: or improvements on the IDL change-tracking? 19:14:18 <blp> Changes and deltas are the same. I don't know whether the idl change-tracking code exposes all the information I need. That will be one of the things I check. 19:14:39 <numans> I think idl change tracking doesn't expose what column changed. 19:14:54 <zhouhan> blp: ok. IDL change-tracking currently doesn't provide old data, if a row is modified. 19:15:13 <zhouhan> numans: yes, that's another one 19:16:00 <numans> python ovs idl provides a callback for any updates. and the callback function can figure out what columns changed. 19:16:19 <blp> Old data is essential, because ddlog needs to be passed the row that was deleted and the row that was added. 19:16:34 <blp> Maybe the API can be adapted. I will see. 19:17:23 <zhouhan> one more thing, For set/map columns, it would be better to get just what's changed, because there can be huge data in a column, such as address-set, static-route, etc. 19:17:46 * numans says bye. 19:17:49 <blp> DDlog isn't prepared to consume anything other than full row updates at the moment. 19:18:10 <zhouhan> blp: ok, I guess that might be needed 19:18:29 <blp> If there's a bottleneck, then there are various ways we could deal with it. 19:18:48 <blp> For example, we could break a table down into multiple relations. 19:18:57 <blp> I don't think that's a problem yet though. 19:19:31 <zhouhan> for scenarios like: add a member in a group for policy enforcement - we don't want all the related flows got recomputed 19:19:47 <zhouhan> if the group is huge, e.g. thousands of members 19:21:08 <zhouhan> or, in a deployment we could have a single router, with ten thousands of routes. If any of the routes change, it is just one row change in the DB, but the row contains all the routes. 19:21:49 <blp> It's likely that in that particular situation what's changing is a northbound group membership. That would normally just translate to a single southbound group membership change. If it actually affected thousands of flows, that's a problem in the flow table design. 19:21:58 <mmichelson> zhouhan, that sounds like it could be solved by blp's idea of breaking into multiple relations 19:22:26 <blp> If there is indeed a problem here, then we will come up with a way to systematically break up the tables into multiple relations. 19:22:33 <zhouhan> mmichelson: oh, yes, if that's discussed already 19:22:33 <mmichelson> Actually, no it wouldn't. It just would shift where the burden is some. 19:22:40 <mmichelson> Hmm 19:23:07 <blp> If a change to nb will cause a large change to sb, then there's no way to avoid computation at least linear in the change to the sb. 19:23:21 <blp> Otherwise, it can probably be avoided. 19:23:52 <blp> I have a meeting with my boss in 5 minutes. 19:24:05 <blp> Well, probably more like 15, he's usually late. 19:24:09 <zhouhan> If we wont to solve this by schema change, there will be many places to change ... 19:24:43 <blp> The "breaking into multiple relations" appraoch I am talking about would not require any change to the schemas, only to ovn-northd-ddlog. 19:25:01 <zhouhan> blp: oh, I misunderstood then 19:25:05 <blp> (schemas? schemata?) 19:25:21 <mmichelson> yeah I misunderstood as well 19:25:35 <blp> But I don't want to start down that path until we know there is a problem 19:25:47 <blp> and it might make more sense to look at ovn-controller first 19:27:25 <zhouhan> blp: if the "breaking into multiple relations" is in ddlog, then still we will need to solve the problem of observing what's really changed in the huge column 19:27:54 <flaviof> zhouhan I thought you were referring to columns that have a long list of values. Like ports in a port_group? Ins't that something what could be turned into a separate table? Forgive me if I'm not understanding it right. 19:28:07 <blp> That's not a problem, it's just a matter of writing code. 19:28:13 <blp> we're good at writing code 19:28:21 <flaviof> blp++ 19:28:23 <zhouhan> blp: sure :) 19:28:52 <zhouhan> it's the change-tracking enhancement I was talking about 19:29:08 <blp> flaviof: That's what I was talking about. 19:29:12 <zhouhan> flaviof: yes that's another approach, but not what blp mentioned 19:29:19 <zhouhan> :D 19:29:23 <flaviof> ack. TY 19:29:25 <blp> OK folks, I need to run. Talk to you next week! 19:29:37 <zhouhan> thanks, bye 19:29:41 <flaviof> bye all! 19:29:46 <imaximets> bye 19:29:57 <mmichelson> bye! 19:30:02 <mmichelson> #endmeeting