#openstack-meeting log

14:00:10 <mlavalle> #startmeeting neutron_drivers
14:00:11 <openstack> Meeting started Fri Jul 26 14:00:10 2019 UTC and is due to finish in 60 minutes.  The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:14 <openstack> The meeting name has been set to 'neutron_drivers'
14:00:17 <slaweq> hi
14:00:23 <mlavalle> hey
14:00:25 <yamamoto> hi
14:00:34 <amotoki> hi
14:00:43 <ralonsoh> hi
14:00:58 <liuyulong> hello
14:01:11 <mlavalle> haleyb|away: is off on vacation, so we are good to go
14:01:19 <mlavalle> #topic RFEs
14:02:11 <mlavalle> Good to see yamamoto is recovered from his surgery. \o/
14:02:22 <amotoki> \o/
14:02:24 <yamamoto> thank you
14:02:26 <njohnston> o/
14:02:54 <mlavalle> This is the RFE to be discussed today: https://bugs.launchpad.net/neutron/+bug/1836834
14:02:55 <openstack> Launchpad bug 1836834 in neutron "[RFE] introduce distributed locks to ipam" [Wishlist,Confirmed] - Assigned to qinhaizhong (qinhaizhong)
14:03:20 <mlavalle> and its associated spec: https://review.opendev.org/#/c/657221/
14:06:07 <brinzhang> this spec already +2, maybe need some others propose changes or W+1
14:06:11 <njohnston> I will say that the spec is lacking a bit of detail on the basic operation of this proposal.  For example, we are implementing distributed locks, but what precisely is getting locked?  If a process fails to release a lock, how is that sensed and remedied?
14:06:18 <slaweq> for me it looks good as separate driver
14:07:24 <njohnston> Is the lock on a per-subnet basis or is it for all IPAM IP allocation?  Is there a fallback mode if the backend tooz points to does not respond in a timely manner?
14:07:57 <brinzhang> I think these implementations will be reflected in the code logic, this is a new driver that you can configure to use.
14:09:30 <njohnston> brinzhang: I understand, and there are many details that will be determined at implementation.  But the spec says "We will fix IP allocation conflicts by using locks" without saying *how* using locks fixes IP allocation conflicts.
14:10:14 <liuyulong> brinzhang, you may give us some brief summary about the implementation. : ) njohnston's concern is worth clarifying.
14:10:28 <brinzhang> Based on subnet-id + ip
14:10:29 <ralonsoh> I agree with njohnston . We can be relaxed with the spec description, but something more detailed could be defined in the document
14:11:00 <amotoki> njohnston: good point. the spec should clarify a basic approach and how the problem can be solved.
14:11:10 <brinzhang> njohnston: qinhaizhong will be attendance, wait please.
14:12:56 <mlavalle> the other question is, how much experience does the team proposing this spec have with tooz? I mean, have you solved previous scale up problems using it?
14:13:25 <slaweq> I was thinking about this spec more like some overall description of problem and wanted to see implementation details as PoC and to check if it will really improve IPAM, but I can also wait for some more detailed spec
14:14:46 <slaweq> mlavalle: I don't have much experience with tooz but I know that when lucasgomes implemented some "hash ring" mechanism in networking-ovn using tooz, it speed up process of creating trunk subports about 10 times or something like that, so it may help with such problems for sure
14:15:21 <njohnston> I don't think there has to be a deep deep description, just a few sentences laying out the approach so it's transparent to the community why this is a superior approach
14:15:22 <liuyulong> Moreover, this will introduce a new lock store? What if the lock DB is down? No port can be created successfully?
14:15:50 <mlavalle> I am not doubting tooz. I just want evidence that it has chances to help indeed
14:16:29 <mlavalle> and the ovn experience you mention is a good data point. thanks for mentioning it
14:16:34 <ralonsoh> liuyulong, I don't think this rely on the same DB, but I'm not sure
14:16:43 <slaweq> mlavalle: I understand, and IMO best evidence would be if we would have PoC and could compare it with current driver :)
14:17:06 <ralonsoh> slaweq, agree
14:17:12 <brinzhang> slaweq: I agree.
14:17:37 <liuyulong> ralonsoh, tooz can use various store drivers, like redis, mysql and so on.
14:17:39 <ralonsoh> (and a more detailed description in the spec)
14:17:51 <njohnston> If the authors wanted to work out the details in a POC and then update the spec with a concise description of the approach once they have worked it out, I think that would allay my concerns about transparency
14:17:54 <ralonsoh> liuyulong, I know, just guessing
14:18:40 <slaweq> njohnston++
14:19:03 <amotoki> njohnston: agree
14:20:05 <brinzhang> qinhaizhong01: njohnston: I was thinking about this spec more like some overall description of problem and wanted to see implementation details as PoC and to check if it will really improve IPAM, but I can also wait for some more detailed spec
14:20:05 <brinzhang> mlavalle: I don't have much experience with tooz but I know that when lucasgomes implemented some "hash ring" mechanism in networking-ovn using tooz, it speed up process of creating trunk subports about 10 times or something like that, so it may help with such problems for sure
14:21:13 <mlavalle> Performance and scalibility improvement is very important for Neutro
14:21:16 <mlavalle> Neutron
14:21:24 <qinhaizhong01> Based on the original _generate_ips algorithm, this method will be reimplemented. The ips calculated for _generate_ips will be "ip+subnet_id" as the key plus the distributed lock.
14:22:00 <mlavalle> So I like in principle this proposal, even in an exploratory fashion
14:22:10 <liuyulong> Introduce such centralized components, always increase the deployment difficulties of operation and maintenance. (I'm not saying this is not acceptable, just some thoughts.)
14:22:35 <amotoki> I would like to see why and how a distributed lock addresses the problem in the spec.
14:23:04 <mlavalle> with that in mind and seeing the feedback from the team, this is what we propose:
14:23:23 <mlavalle> 1) We approve the RFE today with the understanding that
14:24:02 <mlavalle> 2) A more detailed spec will be proposed with the feedback from the team today
14:24:46 <mlavalle> 3) We will see the code as a PoC. We will use the experience of the PoC to feedback on the spec if needed
14:24:58 <njohnston> +1
14:25:05 <mlavalle> 4) The code will come with Rally tests, so we can measure improvement
14:26:17 <mlavalle> IMO, we should welcoming experimenting
14:26:25 <mlavalle> what do others think?
14:26:34 <liuyulong> +1
14:26:45 <amotoki> it totally makes sense to me
14:26:48 <slaweq> +!
14:26:50 <yamamoto> +1
14:26:51 <slaweq> +1
14:26:54 <njohnston> I really like that approach
14:26:55 <brinzhang> +1
14:27:04 <ralonsoh> +1
14:28:35 <mlavalle> brinzhang, qinhaizhong01: let me be clear. we are heading towards the final milestone of the cycle. And we are taking this proposal as a PoC. So the chances of this merging in TRain are rather slim. ok?
14:30:31 <mlavalle> I'll take the silence as acquiescence :-)
14:31:55 <mlavalle> I'll approve the RFE at the end of the meeting, adding the four points ^^^^ in the comments section
14:32:06 <mlavalle> Let's move on then
14:32:29 <mlavalle> Next one we have today is https://bugs.launchpad.net/neutron/+bug/1837847
14:32:30 <openstack> Launchpad bug 1837847 in neutron "[RFE] neutron-vpnaas OpenVPN driver" [Undecided,New]
14:33:07 <mlavalle> I am not sure we should discuss it today. I don't understand all the details in this proposal
14:33:47 <mlavalle> I am bringing it up today with the hope that amotoki and yamamoto who know more about VPN could comment on it and help us triage it
14:33:57 <brinzhang> mlavalle: where let you confusing?
14:34:03 <qinhaizhong01> What details?
14:34:31 <mlavalle> I am talking now about another RFE, not yours brinzhang ann qinhaizhong01
14:36:27 <amotoki> I haven't looked at the OpenVPN driver RFE...
14:36:29 <slaweq> I'm not vpnaas expert so it's hard for me to talk about it
14:36:58 <njohnston> When he says his use case is to have broadcast/multicast communication with the instances, does that mean the vpn IP needs to be within the same L2 domain?
14:37:49 <mlavalle> good question to ask in the RFE
14:37:56 <njohnston> If so, then I don't see how going through Neutron IPAM can be avoided, whether it's in a pre-reservation capacity or on-demand.
14:38:13 * njohnston posts the question
14:38:38 <mlavalle> Thanks!
14:39:14 <amotoki> njohnston is much faster than me.
14:39:21 <liuyulong> Looks like it will involved with DHCP? A spec with some detail is needed also.
14:39:49 <amotoki> At a glance, I cannot understand that point and am thinking....
14:40:13 <amotoki> liuyulong: perhaps we need to understand what is the actual problem first before a spec.
14:41:45 <liuyulong> A spec always has the section "Problem Description" : )
14:41:48 <mlavalle> ok, let's post questions in the RFE and see if we can move it forward
14:42:23 <mlavalle> anythong else we should discuss today?
14:42:40 <liuyulong> I have one
14:43:01 <liuyulong> https://bugs.launchpad.net/neutron/+bug/1817881
14:43:02 <openstack> Launchpad bug 1817881 in neutron " [RFE] L3 IPs monitor/metering via current QoS functionality (tc filters)" [Wishlist,In progress] - Assigned to LIU Yulong (dragon889)
14:43:07 <liuyulong> We discussed this once in drivers meeting.
14:43:18 <liuyulong> But we have no result.
14:43:42 <liuyulong> Basicly we have a consensus is adding a new l3 agent extension.
14:43:55 <liuyulong> But it is not approved yet.
14:44:51 <slaweq> one question: is it still proposal only for FIPs with QoS set?
14:45:48 <liuyulong> slaweq, it relay on that, the tc statistics.
14:46:11 <slaweq> can't we use tc statistics without QoS enabled for FIP?
14:46:51 <slaweq> it doesn't look like something user friendly: "do You want metering, You need to set QoS on FIP" :)
14:46:52 <liuyulong> I'm not sure, but I can. Once tc filters accept 0 for rate and burst.
14:47:09 <liuyulong> But now, we may need a very large value for it as default.
14:47:39 <slaweq> if we want to have metering enabled, it should be done for all FIPs/routers handled by L3 agent
14:47:46 <ralonsoh> slaweq, we need to measure the possible performance impact
14:48:03 <slaweq> and should be independent of QoS feature
14:48:12 <ralonsoh> if we are adding a TC filter in a interface, without needing it, this can slow down it
14:48:18 <liuyulong> Bandwidth limitation and metering of public IP is a basic rule.
14:49:00 <amotoki> liuyulong: do you mean all deployments should have bandwidth limitation and metering of public IPs?
14:49:04 <mlavalle> but we only need the TC filter if the operator configures metering, right?
14:49:18 <mlavalle> in this scenario I mean
14:49:23 <ralonsoh> mlavalle, right now yes
14:49:42 <ralonsoh> sorry, we have TC filters if we ask for QoS
14:49:51 <liuyulong> FYI, the discuss once: http://eavesdrop.openstack.org/meetings/neutron_drivers/2019/neutron_drivers.2019-03-29-14.00.log.html
14:50:49 <slaweq> mlavalle: yes, and IMO it should be like that: "You want metering - You should enable metering in config file" and not "You want metering - You should enable QoS for FIP with some custom, very high value for bw limit" :)
14:51:11 <liuyulong> mlavalle, yes, tc filter rule is enough. It has a accurate statistics data.
14:51:55 <mlavalle> so will slaweq's point be addressed?
14:52:40 <liuyulong> amotoki, no, it is not a compulsory requirements, I mean it's more like a deployment consensus.
14:53:41 <liuyulong> So, what's value will be considered as very large for filter default?
14:53:54 <liuyulong> 10Gpbs?
14:53:56 <liuyulong> 40Gbps?
14:54:43 <ralonsoh> ConnectX-6 HCAs with 200Gbps
14:54:53 <ralonsoh> from the Mellanox website
14:55:19 <liuyulong> Haha, It seems that there will always be higher values.
14:58:06 <slaweq> there is spec for this, I will try to review it in next few days, will it be ok liuyulong?
14:58:35 <liuyulong> slaweq, OK, let me paste it here.
14:58:47 <liuyulong> #link https://review.opendev.org/#/c/658511/
14:58:54 <slaweq> liuyulong: thx
14:59:00 <liuyulong> Title "
14:59:01 <liuyulong> L3 agent self-service metering"
14:59:05 <mlavalle> ok, let's all review the spec and we start with this RFE next week
14:59:20 <slaweq> mlavalle++
14:59:40 <mlavalle> Have a nice weekend
14:59:48 <mlavalle> #endmeeting