18:15:23 #startmeeting ovn_community_development_meeting 18:15:23 Meeting started Thu Mar 11 18:15:23 2021 UTC and is due to finish in 60 minutes. The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:15:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:15:26 The meeting name has been set to 'ovn_community_development_meeting' 18:15:33 Ah. I guess I never noticed that because I don't use vim much. 18:15:39 BLASPHEMY 18:15:54 Anyways, hi everyone! 18:16:06 (I can use it effectively, it's just not my preference.) 18:16:14 hi everyone! 18:16:49 I was hoping to start the meeting off by getting an update about what (if any) OVS patches we might be waiting on before we can release 21.03.0 18:16:49 <_lore_> hi all 18:16:54 hi 18:17:56 Are there any patches we're waiting on before we can release? If not, then I'd like to release tomorrow. 18:18:09 mmichelson: There's the IDL bug I'm trying to fix (v2 here: https://patchwork.ozlabs.org/project/openvswitch/list/?series=231872&state=*) 18:19:02 mmichelson: I have v3 almost ready, I've been struggling a bit today with an OVN test that seems to be failing more often with my IDL changes. I'm still not sure if it's a flaky test or not. 18:19:14 dceara: Which test? 18:19:32 blp: 139: ovn -- 4 HV, 1 LS, 1 LR, packet test with HA distributed router gateway port -- ovn-northd FAILED (ovs-macros.at:253 18:20:11 That one doesn't ring a bell for me, so maybe I haven't seen a lot of problems with it. 18:20:27 I've spent a lot of time trying to figure out whether some tests are flaky, so some of them are familiar. 18:20:47 Ugh I wish we didn't need end-to-end tests so badly, they're so hard to make reliable. 18:21:42 blp: I think I saw this test failing in the past but with my change it fails more often. So I'd like to make sure it's not another IDL bug that I'm introducing/uncovering. 18:21:50 Unfortunately, I can't with any confidence say that dceara's issue is a flaky test. It's also possible due to weirdness in some OVN component that things process in a slightly different order sometimes and cause a problem. 18:22:07 mmichelson: That's a possibility too. 18:22:23 * mmichelson has flashbacks to a CT zone issue from the past 18:22:41 hello 18:22:51 For ovn-northd-ddlog, I often compare the southbound flow table dump against ovn-northd. 18:22:58 mmichelson, I'd say we should go ahead and release tomorrow. 18:23:15 We can probably updte the ovs submodule commit once the patch is merged. 18:23:15 For your change, you might be able to compare dumps of sbflows or something else, with and without your change. 18:23:35 * numans oops. sorry for jumping in. 18:23:41 blp: Yes, will do. 18:23:42 Many of the tests now dump southbound flows to a file 'sbflows' to make this easier. 18:24:57 blp: The main problem though is that the test fails in github CI. On my machine it mostly passes. 18:24:58 What I found in the CT zone issue I referred to was that sometimes I'd have multiple ovn-nbctl commands that would get handled by ovn-northd as one operation, and other times they'd get handled as two separate transactions. Then this would result in ovn-controller either processing it all as one change vs. as two separate changes. And depending on which happened, we'd have different behavior. And yes, if you guessed it was an 18:24:58 incremental processing issue, you would be correct. 18:25:24 dceara: That makes it harder. 18:25:31 :) 18:25:46 dceara: I often see tests fail when I run them with high parallelism, e.g. TESTSUITEFLAGS=-j12 on a 6-core laptop. 18:26:00 dceara: If you don't already try that, it's worth trying. 18:26:17 blp: Same here, i've been trying with -j20 on a dev server I'm using. Not so much luck. 18:26:30 (At some point I might switch to developing on my 3990x box and then I'll use -j128.) 18:26:42 mmichelson, if we're going to release tomorrow than we should move submodule now to ac09cbfcb70a ("ovsdb-cs: Fix use-after-free for the request id.") before the release. 18:28:06 imaximets, ack 18:29:33 Anyways, aside from release talk, I didn't have anything else I wanted to share. I can update the ovs submodule and put that up for review after this meeting 18:29:38 Is there more to discuss on this topic or shall we move on? 18:29:49 And after that, I can post the release commits as well 18:29:54 So taht should allow for us to release tomorrow. 18:30:01 blp, I think that's it, if you'd like to go next. 18:30:01 +1 18:30:05 Cool. 18:30:09 mmichelson, I'll send a patch for submodule shortly. 18:30:21 I posted a series of improvements to ovn-northd-ddlog last week. 18:30:31 Numan pointed out some issues that I should have noticed, but had not. 18:30:46 I have posted a number of small fixes that can apply separately from that series. 18:30:54 And I will also post a revision of the series itself soon. 18:31:10 But, more excitingly, Leonid spent some time optimizing the ddlog code 18:31:18 and he gave me a tarball of the changes last night 18:31:28 which I will look at and probably post (maybe add to my series?). 18:31:42 They are pretty small changes and he says they obtain more than 3x performance improvement 18:31:49 over the improvement I had gained in my series. 18:32:12 That's impressive! 18:32:25 He also says that with them the cost of each step in the benchmark that Numan provided seems to go to O(1) rather than increasing with each iteration. 18:32:28 that's cool. 18:32:43 Once we get that in, I would really appreciate it if people would throw more benchmark challenges at me. 18:32:54 I think that I understand the optimization principles that Leonid used. 18:32:57 I'm going to apply them myself 18:33:04 up for the challenge :) 18:33:06 and I'm going to try to write them up for others to understand as well. 18:33:17 numans: awesome 18:33:35 blp, just so I'm clear, are these changes from Leonid to DDLog the language or to ovn-northd-ddlog? 18:33:45 They are mainly to ovn-north-ddlog. 18:34:08 Got it 18:34:09 Leonid did add a small feature to the ovsdb2ddlog program we use for generating .dl files. 18:34:27 which is here if you want to look at it: https://github.com/vmware/differential-datalog/pull/934 18:35:24 I can also report that we've hired a couple of people to work on ddlog at VMware. 18:35:46 which is a good indication that it will be maintained. 18:35:52 and enhanced 18:36:02 I am done with my report but I'm happy to answer more questions if anyone has them. 18:36:33 (I'm hoping that one of the new hires will write a ddlog formatter like "indent" for C or "rustfmt" for Rust.) 18:37:12 blp, I had reported few memory leaks. 18:37:22 I'm not sure if you've addressed them in your series. 18:37:22 No questions from me. I'm really happy to see the improvements being made to DDLog 18:37:24 I posted patches to fix those, I think. 18:37:31 cool then. 18:37:39 I haven't looked into the patches. 18:38:24 I remember dceara mentioned that in his testing northd-ddlog took huge ram like 75gb or so. I may be wrong, dceara can update if he is still here. 18:38:30 numans: Yes, the leak fixes are specifically: 18:38:31 https://mail.openvswitch.org/pipermail/ovs-dev/2021-March/381117.html 18:38:31 https://mail.openvswitch.org/pipermail/ovs-dev/2021-March/381119.html 18:38:41 numans: They should be easy to review if you have a few minutes to look at them. 18:38:52 sure. I'll take a look. thanks. 18:39:14 numans: northd-ddlog does take more RAM. I think that the optimization pathces (and leak fixes) should help. 18:39:40 ok. The patches looks straightforward. 18:39:43 blp: It's very nice to see the ddlog related activity! I did give ovn-northd-ddlog a try on one of the large NB databases extracted from one of our scale tests. I hope the 75gb was due to the memleaks you fixed :) 18:40:21 75 GB is a lot. 18:40:44 blp: I also had a small bug report that I didn't get a chance to report on the mailing list yet: it seems that ovn-northd-ddlog clears NB_Global.options:use_logical_dp_groups after the first run, effectively disabling the feature. 18:41:00 dceara: I'll look at that. Should be an easy fix. 18:41:14 blp: Yep, looked relatively straightforward indeed. 18:42:08 blp: Do you know what's the expected memory consumption without leak? How many times more than regular northd? 18:43:25 zhouhan: I don't have an estimate for that yet. We can target memory use like we target speed, by dumping a memory profile and looking for excessive consumption then making ddlog code adjustments. 18:44:39 blp: ok, just want to have a rough idea about. I remember last time (1 - 2 years ago) when I was testing it, it was about 10x of regular northd 18:45:00 In other software we've built with ddlog, I think it was more like 2x or 3x after we did a little work to bring it down. 18:45:21 blp: 2x - 3x is much better now :) 18:45:24 It is more or less unavoidable that an incremental version of anything would take more memory than a nonincremental version. 18:45:47 But if it uses so much memory that it's unusable, obviously that's not a good tradeoff. 18:46:31 blp: this is reasonable. Just want to know the worse case. On central node I think 2x - 3x memory is not an issue at all. I am only thinking about the future of ovn-controller using ddlog :) 18:46:54 That is a good point. Memory is much more critical for ovn-controller. 18:47:33 I used to joke that NVP (back at Nicira circa 2011) required infinite memory. 18:47:51 And that its "nlog" language should be called "n exponential". 18:48:01 e^nlog 18:48:03 (Optimization helped.) 18:48:06 I've observed ovn-controller taking up like 9gb on a scaled env and vswitchd takes around ~2.5gb. 18:48:20 Holy crap that's a lot already. 18:48:34 numans: With the lflow cache enabled? 18:48:41 dceara, yes. 18:48:42 I'm embarrassed, how did my baby become a monster? 18:48:42 sounds like a proof that it is doing I-P :D 18:49:28 Does anyone else want to share? 18:49:45 mmichelson, blp: https://patchwork.ozlabs.org/project/ovn/patch/20210311183719.2517358-1-i.maximets@ovn.org/ 18:50:41 imaximets, that should be an easy ack 18:50:43 I can go real quick. I was busy almost this week working on a couple of crashes seen in ovn-controller 18:50:58 All thanks to me and the runtime data I-P handling :) 18:51:17 I submitted the patch for review - https://patchwork.ozlabs.org/project/ovn/patch/20210311124757.2997057-1-numans@ovn.org/ 18:51:48 zhouhan, I couldn't reply to your questions on the ct.inv drop patch. I'll get to that next week. thanks for the comments and review so far. 18:52:51 numans: np. I also reviewed your I-P refactor, will review the ofctrl refactor RFC next couple of days. 18:52:52 I'm also inclined to submit v2 making it as a config option so that regular users are not missed out on this, in case there are scenarios where invalid pkts needs to be dropped. 18:53:00 zhouhan, thanks. 18:53:52 imaximets: Thank you Ilya. I should have sent that earlier. acked. 18:54:25 that's it from me. 18:54:44 hello, can I go next? Had a question regarding upgrade 18:55:49 sure. 18:55:55 https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050988.html 18:56:14 Is this upgrade supported? I seem to be hitting a backward compatibility issue 18:57:00 we are basically trying to move from v2.11 to ovn-20.09. If the controller is upgraded first, the chassis don't show up 18:57:37 karthikc, I saw that email. Sorry I couldn't reply. 18:57:40 karthikc, Someone can correct me if I'm wrong, but I think the upgrade order assumes that you also will update the central components too. It sounds like you've upgraded ovn-controller but not ovn-northd 18:58:21 that's correct. 18:58:23 mmichelson: The upgrade doc says to upgrade ovn-controller first. 18:58:35 that's correct. ovn-northd is yet to be upgraded. But there is a transient state where chassis don't show up 18:58:41 karthikc, is your concern that during this time, the traffic is broken ? 18:58:46 yes 18:59:07 karthikc, recently we added a feature to pin ovn-controller to a specific version of ovn-northd 18:59:28 if ovn-controller sees this mismatch, it does nothing until ovn-northd is also upgraded. 18:59:41 karthikc, I'll reply about that in the ML. 18:59:50 I'm not sure if that commit is 20.09 or 20.12 18:59:57 neat, that will be helpful, thanks 19:00:01 I think it was 20.12 19:00:08 But I might be wrong 19:00:19 mmichelson, you may be right. 19:00:33 karthikc, may be you can consider uprading to 20.12 19:00:52 ok 19:01:12 hi, can i go real quick next? 19:01:21 The other option is to use the --restart option when restarting ovn-controller so that ovn-controller doesn't "clean up" when you stop it. That way when it starts back up, it should be in the same configuration as when it was running previously. You at least shouldn't have interrupted traffic if you do that. But you also won't be able to make changes. 19:01:36 dhathri, go for it 19:02:10 just wanted to request for review on the multiple gateway router support patch (https://mail.openvswitch.org/pipermail/ovs-dev/2021-March/380979.html) 19:02:13 looks like we need to be careful when we add new columns which ovn-controlller needs to update. 19:02:45 dhathri, sure. Meanwhile could you also plan for another version with ddlog support ? 19:03:16 dhathri, I'll take a look next week. 19:03:56 sure, thanks.. ddlog will take more time since there is a learning curve. Wanted to get the c patch reviewed while i am working on the ddlog changes 19:04:19 I'm in another meeting now, so I'll talk to everyone next week. 19:04:24 bye blp 19:05:02 Does anyone else have a report? 19:06:01 I'll take the lack of response to mean "no" 19:06:11 Bye everyone 19:06:15 bye! 19:06:16 Bye 19:06:20 bye 19:06:28 bye 19:06:39 #endmeeting