14:00:10 <mlavalle> #startmeeting neutron_drivers 14:00:11 <openstack> Meeting started Fri Jul 26 14:00:10 2019 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:14 <openstack> The meeting name has been set to 'neutron_drivers' 14:00:17 <slaweq> hi 14:00:23 <mlavalle> hey 14:00:25 <yamamoto> hi 14:00:34 <amotoki> hi 14:00:43 <ralonsoh> hi 14:00:58 <liuyulong> hello 14:01:11 <mlavalle> haleyb|away: is off on vacation, so we are good to go 14:01:19 <mlavalle> #topic RFEs 14:02:11 <mlavalle> Good to see yamamoto is recovered from his surgery. \o/ 14:02:22 <amotoki> \o/ 14:02:24 <yamamoto> thank you 14:02:26 <njohnston> o/ 14:02:54 <mlavalle> This is the RFE to be discussed today: https://bugs.launchpad.net/neutron/+bug/1836834 14:02:55 <openstack> Launchpad bug 1836834 in neutron "[RFE] introduce distributed locks to ipam" [Wishlist,Confirmed] - Assigned to qinhaizhong (qinhaizhong) 14:03:20 <mlavalle> and its associated spec: https://review.opendev.org/#/c/657221/ 14:06:07 <brinzhang> this spec already +2, maybe need some others propose changes or W+1 14:06:11 <njohnston> I will say that the spec is lacking a bit of detail on the basic operation of this proposal. For example, we are implementing distributed locks, but what precisely is getting locked? If a process fails to release a lock, how is that sensed and remedied? 14:06:18 <slaweq> for me it looks good as separate driver 14:07:24 <njohnston> Is the lock on a per-subnet basis or is it for all IPAM IP allocation? Is there a fallback mode if the backend tooz points to does not respond in a timely manner? 14:07:57 <brinzhang> I think these implementations will be reflected in the code logic, this is a new driver that you can configure to use. 14:09:30 <njohnston> brinzhang: I understand, and there are many details that will be determined at implementation. But the spec says "We will fix IP allocation conflicts by using locks" without saying *how* using locks fixes IP allocation conflicts. 14:10:14 <liuyulong> brinzhang, you may give us some brief summary about the implementation. : ) njohnston's concern is worth clarifying. 14:10:28 <brinzhang> Based on subnet-id + ip 14:10:29 <ralonsoh> I agree with njohnston . We can be relaxed with the spec description, but something more detailed could be defined in the document 14:11:00 <amotoki> njohnston: good point. the spec should clarify a basic approach and how the problem can be solved. 14:11:10 <brinzhang> njohnston: qinhaizhong will be attendance, wait please. 14:12:56 <mlavalle> the other question is, how much experience does the team proposing this spec have with tooz? I mean, have you solved previous scale up problems using it? 14:13:25 <slaweq> I was thinking about this spec more like some overall description of problem and wanted to see implementation details as PoC and to check if it will really improve IPAM, but I can also wait for some more detailed spec 14:14:46 <slaweq> mlavalle: I don't have much experience with tooz but I know that when lucasgomes implemented some "hash ring" mechanism in networking-ovn using tooz, it speed up process of creating trunk subports about 10 times or something like that, so it may help with such problems for sure 14:15:21 <njohnston> I don't think there has to be a deep deep description, just a few sentences laying out the approach so it's transparent to the community why this is a superior approach 14:15:22 <liuyulong> Moreover, this will introduce a new lock store? What if the lock DB is down? No port can be created successfully? 14:15:50 <mlavalle> I am not doubting tooz. I just want evidence that it has chances to help indeed 14:16:29 <mlavalle> and the ovn experience you mention is a good data point. thanks for mentioning it 14:16:34 <ralonsoh> liuyulong, I don't think this rely on the same DB, but I'm not sure 14:16:43 <slaweq> mlavalle: I understand, and IMO best evidence would be if we would have PoC and could compare it with current driver :) 14:17:06 <ralonsoh> slaweq, agree 14:17:12 <brinzhang> slaweq: I agree. 14:17:37 <liuyulong> ralonsoh, tooz can use various store drivers, like redis, mysql and so on. 14:17:39 <ralonsoh> (and a more detailed description in the spec) 14:17:51 <njohnston> If the authors wanted to work out the details in a POC and then update the spec with a concise description of the approach once they have worked it out, I think that would allay my concerns about transparency 14:17:54 <ralonsoh> liuyulong, I know, just guessing 14:18:40 <slaweq> njohnston++ 14:19:03 <amotoki> njohnston: agree 14:20:05 <brinzhang> qinhaizhong01: njohnston: I was thinking about this spec more like some overall description of problem and wanted to see implementation details as PoC and to check if it will really improve IPAM, but I can also wait for some more detailed spec 14:20:05 <brinzhang> mlavalle: I don't have much experience with tooz but I know that when lucasgomes implemented some "hash ring" mechanism in networking-ovn using tooz, it speed up process of creating trunk subports about 10 times or something like that, so it may help with such problems for sure 14:21:13 <mlavalle> Performance and scalibility improvement is very important for Neutro 14:21:16 <mlavalle> Neutron 14:21:24 <qinhaizhong01> Based on the original _generate_ips algorithm, this method will be reimplemented. The ips calculated for _generate_ips will be "ip+subnet_id" as the key plus the distributed lock. 14:22:00 <mlavalle> So I like in principle this proposal, even in an exploratory fashion 14:22:10 <liuyulong> Introduce such centralized components, always increase the deployment difficulties of operation and maintenance. (I'm not saying this is not acceptable, just some thoughts.) 14:22:35 <amotoki> I would like to see why and how a distributed lock addresses the problem in the spec. 14:23:04 <mlavalle> with that in mind and seeing the feedback from the team, this is what we propose: 14:23:23 <mlavalle> 1) We approve the RFE today with the understanding that 14:24:02 <mlavalle> 2) A more detailed spec will be proposed with the feedback from the team today 14:24:46 <mlavalle> 3) We will see the code as a PoC. We will use the experience of the PoC to feedback on the spec if needed 14:24:58 <njohnston> +1 14:25:05 <mlavalle> 4) The code will come with Rally tests, so we can measure improvement 14:26:17 <mlavalle> IMO, we should welcoming experimenting 14:26:25 <mlavalle> what do others think? 14:26:34 <liuyulong> +1 14:26:45 <amotoki> it totally makes sense to me 14:26:48 <slaweq> +! 14:26:50 <yamamoto> +1 14:26:51 <slaweq> +1 14:26:54 <njohnston> I really like that approach 14:26:55 <brinzhang> +1 14:27:04 <ralonsoh> +1 14:28:35 <mlavalle> brinzhang, qinhaizhong01: let me be clear. we are heading towards the final milestone of the cycle. And we are taking this proposal as a PoC. So the chances of this merging in TRain are rather slim. ok? 14:30:31 <mlavalle> I'll take the silence as acquiescence :-) 14:31:55 <mlavalle> I'll approve the RFE at the end of the meeting, adding the four points ^^^^ in the comments section 14:32:06 <mlavalle> Let's move on then 14:32:29 <mlavalle> Next one we have today is https://bugs.launchpad.net/neutron/+bug/1837847 14:32:30 <openstack> Launchpad bug 1837847 in neutron "[RFE] neutron-vpnaas OpenVPN driver" [Undecided,New] 14:33:07 <mlavalle> I am not sure we should discuss it today. I don't understand all the details in this proposal 14:33:47 <mlavalle> I am bringing it up today with the hope that amotoki and yamamoto who know more about VPN could comment on it and help us triage it 14:33:57 <brinzhang> mlavalle: where let you confusing? 14:34:03 <qinhaizhong01> What details? 14:34:31 <mlavalle> I am talking now about another RFE, not yours brinzhang ann qinhaizhong01 14:36:27 <amotoki> I haven't looked at the OpenVPN driver RFE... 14:36:29 <slaweq> I'm not vpnaas expert so it's hard for me to talk about it 14:36:58 <njohnston> When he says his use case is to have broadcast/multicast communication with the instances, does that mean the vpn IP needs to be within the same L2 domain? 14:37:49 <mlavalle> good question to ask in the RFE 14:37:56 <njohnston> If so, then I don't see how going through Neutron IPAM can be avoided, whether it's in a pre-reservation capacity or on-demand. 14:38:13 * njohnston posts the question 14:38:38 <mlavalle> Thanks! 14:39:14 <amotoki> njohnston is much faster than me. 14:39:21 <liuyulong> Looks like it will involved with DHCP? A spec with some detail is needed also. 14:39:49 <amotoki> At a glance, I cannot understand that point and am thinking.... 14:40:13 <amotoki> liuyulong: perhaps we need to understand what is the actual problem first before a spec. 14:41:45 <liuyulong> A spec always has the section "Problem Description" : ) 14:41:48 <mlavalle> ok, let's post questions in the RFE and see if we can move it forward 14:42:23 <mlavalle> anythong else we should discuss today? 14:42:40 <liuyulong> I have one 14:43:01 <liuyulong> https://bugs.launchpad.net/neutron/+bug/1817881 14:43:02 <openstack> Launchpad bug 1817881 in neutron " [RFE] L3 IPs monitor/metering via current QoS functionality (tc filters)" [Wishlist,In progress] - Assigned to LIU Yulong (dragon889) 14:43:07 <liuyulong> We discussed this once in drivers meeting. 14:43:18 <liuyulong> But we have no result. 14:43:42 <liuyulong> Basicly we have a consensus is adding a new l3 agent extension. 14:43:55 <liuyulong> But it is not approved yet. 14:44:51 <slaweq> one question: is it still proposal only for FIPs with QoS set? 14:45:48 <liuyulong> slaweq, it relay on that, the tc statistics. 14:46:11 <slaweq> can't we use tc statistics without QoS enabled for FIP? 14:46:51 <slaweq> it doesn't look like something user friendly: "do You want metering, You need to set QoS on FIP" :) 14:46:52 <liuyulong> I'm not sure, but I can. Once tc filters accept 0 for rate and burst. 14:47:09 <liuyulong> But now, we may need a very large value for it as default. 14:47:39 <slaweq> if we want to have metering enabled, it should be done for all FIPs/routers handled by L3 agent 14:47:46 <ralonsoh> slaweq, we need to measure the possible performance impact 14:48:03 <slaweq> and should be independent of QoS feature 14:48:12 <ralonsoh> if we are adding a TC filter in a interface, without needing it, this can slow down it 14:48:18 <liuyulong> Bandwidth limitation and metering of public IP is a basic rule. 14:49:00 <amotoki> liuyulong: do you mean all deployments should have bandwidth limitation and metering of public IPs? 14:49:04 <mlavalle> but we only need the TC filter if the operator configures metering, right? 14:49:18 <mlavalle> in this scenario I mean 14:49:23 <ralonsoh> mlavalle, right now yes 14:49:42 <ralonsoh> sorry, we have TC filters if we ask for QoS 14:49:51 <liuyulong> FYI, the discuss once: http://eavesdrop.openstack.org/meetings/neutron_drivers/2019/neutron_drivers.2019-03-29-14.00.log.html 14:50:49 <slaweq> mlavalle: yes, and IMO it should be like that: "You want metering - You should enable metering in config file" and not "You want metering - You should enable QoS for FIP with some custom, very high value for bw limit" :) 14:51:11 <liuyulong> mlavalle, yes, tc filter rule is enough. It has a accurate statistics data. 14:51:55 <mlavalle> so will slaweq's point be addressed? 14:52:40 <liuyulong> amotoki, no, it is not a compulsory requirements, I mean it's more like a deployment consensus. 14:53:41 <liuyulong> So, what's value will be considered as very large for filter default? 14:53:54 <liuyulong> 10Gpbs? 14:53:56 <liuyulong> 40Gbps? 14:54:43 <ralonsoh> ConnectX-6 HCAs with 200Gbps 14:54:53 <ralonsoh> from the Mellanox website 14:55:19 <liuyulong> Haha, It seems that there will always be higher values. 14:58:06 <slaweq> there is spec for this, I will try to review it in next few days, will it be ok liuyulong? 14:58:35 <liuyulong> slaweq, OK, let me paste it here. 14:58:47 <liuyulong> #link https://review.opendev.org/#/c/658511/ 14:58:54 <slaweq> liuyulong: thx 14:59:00 <liuyulong> Title " 14:59:01 <liuyulong> L3 agent self-service metering" 14:59:05 <mlavalle> ok, let's all review the spec and we start with this RFE next week 14:59:20 <slaweq> mlavalle++ 14:59:40 <mlavalle> Have a nice weekend 14:59:48 <mlavalle> #endmeeting