17:16:35 <mmichelson> #startmeeting ovn_community_development_discussion 17:16:36 <openstack> Meeting started Thu Oct 22 17:16:35 2020 UTC and is due to finish in 60 minutes. The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:16:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:16:39 <openstack> The meeting name has been set to 'ovn_community_development_discussion' 17:17:25 <numans> Hello 17:17:46 <mmichelson> As a quick note at the beginning, we're coming up on that awkward time of year where the clocks start getting set back. Europe sets their clocks back an hour this weekend. North America waits one more week before doing it. And I'm not sure about the rest of the world. 17:18:04 <mmichelson> The email that goes out should take that into account when announcing the meeting for next week. 17:18:10 <blp> I made it! 17:18:14 <mmichelson> blp, hi! 17:18:19 <numans> blp, hI 17:18:22 <imaximets> hi 17:18:34 <numans> sorry my caps were on :) 17:19:09 <mmichelson> Anyway, with time change stuff out of the way, we can get this thing started. 17:19:29 <mmichelson> Biggest things to report are that I'm giving Anton's newest patches a review and I'm reworking the unit test patch series based on Ilya's feedback 17:19:37 <_lore_> hi all 17:19:58 <mmichelson> And that's all I have to report. I also plan to review Numan's loopback optimization patches sometime real soon 17:22:13 <numans> I can go real quick next 17:22:43 <numans> I worked on the load balancer hairpin lflow improvement patches and submitted for review yesterday 17:22:57 <numans> thanks mmichelson for planning to take a look 17:23:06 <numans> I did some code reviews today. 17:23:56 <numans> one of the OVS patch broke the OVN compilation. I submitted a patch for that - https://patchwork.ozlabs.org/project/ovn/patch/20201022155339.300989-1-numans@ovn.org/ 17:24:12 <blp> I have a brief report when there's an opportunity. 17:24:14 <numans> I think we need to discuss about this on how to fix it in OVN. 17:24:23 <numans> That's it from myside. 17:24:48 <numans> blp, go ahead. 17:24:59 <imaximets> numans, IMHO, we need to support building OVS with all supported versions of OVS. at least starting from 2.13 if possible. 17:25:01 <blp> Two things. 17:25:15 <mmichelson> imaximets, agreed 17:25:43 <blp> First, we got 20 submissions for ovscon (not 17 as I originally thought). I think we're on track to announce the accepted talks by the end of the month. 17:26:38 <blp> Second, I'm starting to push ddlog patches toward OVN. It's a lot of patches, so I'm doing them in batches. I sent out the first batch last night. They should be easy to review because they are only documentation improvements. 17:26:47 <blp> The documentation improvements relate to the current state of OVN. They are not ddlog specific. 17:27:16 <blp> I'd appreciate it if people could take a look at them and pass along their feedback. Once those are in, I'll prepare and send the next batch, which will actually include some code changes. 17:28:00 <mmichelson> blp, numan acked the doc patches and I pushed them to master earlier today 17:28:05 <mmichelson> blp, so we at least got that part done :) 17:28:05 <blp> oh! That was fast. 17:28:20 <blp> I'll prepare the second batch today then. Honestly I was expecting it to take longer. 17:28:27 <blp> But it's fantastic that it was quick. 17:28:36 <blp> That's all from me. 17:28:38 <mmichelson> Doc patches are pretty easy to review :) 17:29:03 <blp> Well, if they're correct, anyhow. I hope that was reviewed :-) 17:29:23 <blp> Some of these were me figuring things out from code and commit messages, and who knows whether I misunderstood something. 17:29:49 <mmichelson> blp, it all looked correct to me 17:29:53 <blp> yay! 17:30:40 <imaximets> blp, now it's documented and we'll change the code to correspond. :D 17:31:44 <numans> blp, shall we hold on merging any patches to OVN mainly the ones which change ovn-northd ? until your ddlog patches are reviewed and merged ? 17:32:06 <blp> numans: If you can afford to do that, it would be great! 17:32:30 <numans> I'm fine with it 17:32:34 <blp> I've been struggling to keep up. 17:32:48 <numans> others, does that sounds reasonable ? 17:33:14 <numans> blp, I understand. with many changes, you have to rework again. And that could be frustrating. 17:34:26 <blp> Someone else wants to speak? 17:35:02 <imaximets> I have a small update. 17:35:16 <zhouhan> numans: maybe holding on new features, but allowing bug fixes. 17:35:29 <blp> zhouhan: +1 on that 17:35:44 <zhouhan> imaximets: sorry, please go ahead 17:36:29 <imaximets> I'm swamped into memory consumption issues of ovsdb-server. Technically, one thing that bothers the most is that ovsdb-server doesn't free memory back to system if we're cleaning up the database. 17:37:01 <blp> imaximets: You mean ovsdb-server calls free() but that doesn't return memory to the kernel? 17:37:08 <imaximets> It seems like reusing most of this memory later if we're performing same tests, ubt not always. 17:37:22 <blp> (That's a very old C FAQ.) 17:37:25 <imaximets> blp, no, it seems like not calling free() on some memory. 17:37:40 <blp> imaximets: Oh! That's more of an issue. 17:37:49 <imaximets> but I'm not sure right now. 17:38:20 <imaximets> blp, right now I have an instance of ovsdb-server that consumes 13GB with empty database. 17:38:21 <blp> I think ovsdb-server has some features for memory statistics. 17:38:40 <blp> At least, ovs-vswitchd does, I don't recall whether we put the same thing into ovsdb-server. 17:38:42 <blp> 13GB! 17:38:43 <blp> wow 17:38:46 <imaximets> blp, these are not enough. I'm extending them. 17:38:52 <blp> great! 17:39:22 <imaximets> blp, I suppose, these are raft or ovsdb transaction logs. 17:39:55 <blp> Probably. The raft transaction log stays in memory until it gets snapshotted. You could force a snapshot and see what happens. 17:40:48 <imaximets> blp, I did. Does not help. So, I'm digging different places and trying to get more statistics right now. 17:41:05 <blp> The C heap is so opaque. 17:41:44 <imaximets> the test is to create one port group and add few thousands of acls one by one. This generates huge trafic since each transaction is a few hundreds of kilobytes. 17:42:07 <imaximets> investigating. 17:42:12 <zhouhan> imaximets: does 13GB reflect the peak usage (data or raft log) in your test? If you start new tests with the empty DB does it increase before the data/log size get large enough? 17:43:32 <imaximets> zhouhan, memory consumption continuously grows while adding new acls to port group and stays at the peak forever. Even after removin all acls and the port group. and after db compaction. 17:44:42 <zhouhan> imaximets: yes, but the 13GB doesn't grow until you have enough new data that exceeds the previous peak, right? 17:45:57 <imaximets> zhouhan, it depends. I'm not sure. In most cases it doesn't grow if I'm starting the test again. But there are cases where it grows higher. 17:46:49 <zhouhan> ok. thanks! I am trying to understand is it something like memory leak which is critical for production, or just something to be optimized, but it seems not quite clear yet. Will see what you find later. 17:46:54 <imaximets> I'll continue my investigation and will send something on a mail list, likely. 17:47:14 <zhouhan> thanks imaximets 17:47:35 <imaximets> And I'll be on PTO next week. :) 17:47:46 <imaximets> That's it form me. 17:47:57 <zhouhan> imaximets: no worries, we still have enough memory 17:48:45 <zhouhan> (I've nothing to report except some reviews) 17:49:00 <numans> zhouhan, sounds good to me accept just bug fixes. 17:50:14 <blp> Anyone else? 17:50:52 <imaximets> maybe one more thing 17:51:27 <imaximets> The issue that OVN depends on non-exported sources of OVS that leads to OVN build failures. 17:51:45 <imaximets> i.e. non-public OVS parts. 17:51:49 <blp> Yes, that's nasty. 17:52:04 <blp> There are multiple possible solutions. All of them require work. 17:52:47 <imaximets> I just think tht we need a policy for this kind of things, to not discuss each time 17:53:07 <numans> right now OVN compilation is broken after this commit - 91fc374a9c5a("Eliminate use of term "slave" in bond, LACP, and bundle contexts.") in OVS 17:53:47 <imaximets> mmichelson seems to agree that we need to support OVN build with all stable OVS versions starting from 2.13, if possible. 17:54:04 <numans> mmichelson During the split we did discuss about this right ? 17:54:38 <numans> there are two things, one is run time compatibility and other compilation. 17:55:20 <numans> I'm not sure if we want to support compilation of OVN with older versions of OVS as it could be hard to maintain that 17:56:18 <imaximets> But this is hard to handle, because OVN releases are more frequent. 17:56:31 <numans> I think if some one is compiling OVN from master, he/she can pick up the latest OVS. 17:56:39 <numans> just for compilation. 17:56:48 <blp> OVN shouldn't use non-public bits of OVS. Either OVS should export the bits it needs, or OVN should copy-paste the bits it uses, I think. 17:56:55 <mmichelson> numans, imaximets Thinking about it again, the most important thing is for OVN to be able to run against older 2.13 and newer versions of OVS. When it comes to compilation, it's not nearly as important to be compatible with older versions 17:57:15 <numans> mmichelson, that's what I think too. 17:57:38 <imaximets> mmichelson, but building with unstable OVS sources doesn't seem like a good idea. 17:58:19 <imaximets> or we need to say that OVN builds with latest stable release and not support building with master. 17:58:33 <imaximets> * latest stabel OVS release 17:59:12 <mmichelson> imaximets, But then we run into the issue that may happen where a change we make in OVN requires an OVS change as well. I guess we can go with Anton's method for handling that right now. 17:59:54 <mmichelson> At least that will work for new functionality. 18:00:27 <blp> OK, I'm switching to a different meeting now. Thanks so much for the reviews, especially! I'll post another series later today. 18:00:27 <mmichelson> If you're not sure what I mean, I can link Anton's paralellization patch. Just a sec. 18:00:32 <numans> I think for compilation we should not complicate things or add #ifdef code which checks the ovs version and compile conditionally. 18:00:46 <imaximets> mmichelson, numans: is conditional compilation that bad? 18:00:47 <numans> blp, Bye 18:00:49 <mmichelson> See the bottom of: https://patchwork.ozlabs.org/project/ovn/patch/20201014162714.5978-1-anton.ivanov@cambridgegreys.com/ 18:01:43 <numans> imaximets, I think in a long run it would complicate the code. 18:01:43 <zhouhan> In fact it seems even runtime compatibility is tricky. In redhat it may be fine but for ubuntu since deb package uses dynamic .so libs, would this be concerned? 18:02:28 <mmichelson> Yes that can be a problem. 18:03:01 <numans> I think if some one wants to compile OVN from sources, he/she can definitely pull up latest OVS sources too. 18:03:17 <numans> for production anyway, a user is expected to consume OVN from distributions. 18:03:39 <numans> and stable versions of OVN. 18:04:05 <mmichelson> Correct. We just need to ensure that we don't break debian packages. 18:04:45 <imaximets> numans, how can I find out which version of OVS is compatible with specific OVN commit? 18:05:01 <numans> In the case of fedora which I maintain, if compilation breaks due to OVS, I usually sync to the latest OVS or backport such patches. 18:05:27 <zhouhan> Now that we release OVN more frequently than OVS, does it mean the mid-cycle release of OVN is regarded unstable (because it may depend on unreleased OVS?) 18:05:28 <numans> imaximets, we discussed that during split. During a release, OVN would mention the commit of OVS. 18:06:16 <imaximets> numans, so how can I bisect issues if my OVN bisect will always require different versions of OVS and I do not even know which exactly? 18:06:40 <numans> imaximets, you mean for production deployments ? 18:06:59 <imaximets> numans, doesn't matter. 18:07:02 <numans> imaximets, when you compile OVN, you need to provide the OVS sources right ? 18:07:51 <numans> as a developer if you're compiling OVN, you would probably know which OVS version of code base is used for compilation. 18:08:09 <zhouhan> imaximets: I guess now bisect means both OVN and OVS (it is not a bisect anymore). That's the case when I debugged the northd performance regression caused by an OVSDB IDL change. 18:08:17 <imaximets> numans, right. Assuming I have a bug and I'm truing to identify the commit that introduced it. I'm using git bisect, but I can not automate it and I need to find out correct OVS commit to build with each time. 18:09:20 <imaximets> zhouhan, at least you were able to use OVS master with all your OVN versions higher that 2.12. 18:09:29 <numans> imaximets, I don't think we will have a straightforward answer until we either use a openvswitch lib or copy the ovs bits to OVN 18:09:31 <mmichelson> We've been doing the separate OVN builds for nearly a year now. I think the war stories you have imaximets can factor into how we manage the project going forward 18:09:56 <numans> as suggested by Ben 18:10:05 <mmichelson> It may be that we need to pin our builds to a specific version of OVS, in order to stabilize things and make debugging easier. 18:10:36 <mmichelson> numans, which OVS bits would we copy to OVN? 18:10:39 <numans> imaximets, in the case of RHEL and fedora we would be knowing which OVS version/tar ball is used. 18:11:10 <numans> mmichelson, this is what Ben said a while back - <blp> OVN shouldn't use non-public bits of OVS. Either OVS should export the bits it needs, or OVN should copy-paste the bits it uses, I think. 18:11:52 <mmichelson> numans, OK, it probably makes sense to make the "non-public" pieces public then. In my view, there's no much reason why hmap is public but smap is not. 18:11:56 <mmichelson> As an example 18:12:28 <mmichelson> Assuming that "public" refers to headers in include/ and "non-public" refers to headers in lib/ 18:13:17 <numans> mmichelson, we also use lib/packets.h too. 18:13:28 <numans> all that needs to be moved to include/ of ovs. 18:13:45 <imaximets> numans, mmichelson: making things public will require versioning anyway and OVN will have to detect version of OVS public libraries. 18:15:12 <imaximets> and stick to specific versions or support multiple with conditional compilation. 18:15:23 <zhouhan> Or, if OVN depends on a new feature of OVS which is not released, we could create a branch for the stable OVS release and backport the new features to the stable OVS branch. This way we can avoid using unstable OVS as much as possible. 18:16:18 <mmichelson> imaximets, I agree that making things public still requires versioning. You have to crawl before you can walk or run, though. So step one would be to make the library we need to consume. Step 2 would be to version this API. 18:16:34 <numans> The big question is how we want to handle the compilation issue now ? I submitted a patch here - https://patchwork.ozlabs.org/project/ovn/patch/20201022155339.300989-1-numans@ovn.org/ 18:17:06 <numans> if we accept the submitted patch, then we release ovn 20.12 we need to consume OVS from unstable branch for "compilation". 18:17:08 <mmichelson> numans, the current attitude with OVN is to do exactly what you did. Compile against the latest OVS, even if it's unstable. 18:17:30 <mmichelson> numans, but going forward, we probably need to come up with a more structured way of doing things for our own sanity 18:17:44 <imaximets> numans, mmichelson, zhouhan: what about git submodule? Was this option discussed during the split? 18:18:14 <imaximets> I mean submodules are fixed to specific commit hash. 18:19:33 <zhouhan> imaximets: that was discussed, but I remember it was dropped because of other drawbacks 18:20:00 <zhouhan> maybe mmichelson numans remember more details 18:20:31 <numans> And the agreement at that time was to compile OVN master from OVS master and when OVN is released we document the OVS commit to use to. 18:21:17 <numans> I think until now we didn't face a compilation issue so far in which released OVN depends on OVS master. 18:21:30 <imaximets> submodules are not pretty, but it's less evil if we're sticking to OVS master for compilation anyway. At least we will know against which version we're building exactly. 18:21:31 <numans> or unreleased OVS 18:22:09 <zhouhan> imaximets: Would OVS stable branch with backporting work? It solves the compile problem (and also the debian runtime problem, probably), but has lower risk than using OVS master 18:22:26 <mmichelson> yeah we went the submodule way (actually I think we used subtree, but it's very similar) as a test, but ended up not going that way because it took us away from the long-term goal of being able to just install an ovs-devel package on the system and compile against that. 18:24:54 <numans> I think its better if we work towards having a ovs-devel package and then OVN using it for compilation. 18:25:18 <numans> I got to sign off now. 18:25:20 <numans> Bye 18:25:43 <mmichelson> OK 18:26:12 <mmichelson> I think we can continue this discussion over email. For the record, I like zhouhan's idea of having a clone of a stable OVS branch that we can backport fixes/necessary features to. 18:26:38 <mmichelson> Including that branch as a submodule or otherwise requiring it for compilation would be the next step 18:27:08 <numans> mmichelson, may be we can also think of cloning ovs repo in the ovn-org git and use it for compilation. 18:27:12 <numans> Just a thought. 18:27:21 <mmichelson> numans, yeah that's a good place for the backport to live 18:27:29 <zhouhan> yes 18:27:51 <zhouhan> We can continue discussing in ML 18:28:06 <numans> sounds good. 18:28:06 <numans> Bye 18:28:08 <zhouhan> I need to drop, too. Bye folks 18:28:10 <mmichelson> We've gone pretty long today for this meeting, so I'm going to end it. 18:28:14 <mmichelson> #endmeeting