14:00:33 <slaweq> #startmeeting neutron_drivers
14:00:34 <openstack> Meeting started Fri Apr 17 14:00:33 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:35 <njohnston> o/
14:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:38 <openstack> The meeting name has been set to 'neutron_drivers'
14:00:40 <slaweq> hi
14:00:41 <mlavalle> o/
14:00:51 <haleyb> hi
14:01:29 <amotoki> hi
14:01:30 <ralonsoh> hi
14:01:31 <slaweq> lets wait few more minutes for yamamoto and amotoki
14:01:34 <slaweq> ok, hi amotoki :)
14:01:39 <yamamoto> hi
14:01:47 <slaweq> hi yamamoto
14:01:52 <slaweq> ok, so we can start
14:01:54 <slaweq> #topic RFEs
14:02:00 <slaweq> we have 3 RFEs for today
14:02:07 <slaweq> https://bugs.launchpad.net/neutron/+bug/1869129
14:02:07 <openstack> Launchpad bug 1869129 in neutron "neutron accepts CIDR in security groups that are invalid in ovn" [Undecided,In progress] - Assigned to Jake Yip (waipengyip)
14:02:28 <slaweq> we discussed about it last time but we didn't make any agreement on it
14:03:53 <slaweq> there is my summary of our last discussion in comment #10 and some new comments from owner in comment #11
14:05:04 <njohnston> Yeah, I would like to drop my proposal to add an api extension around /0 netmasks - if someone asks for it that's fine but I'd rather not solve a problem that doesn't really exist.
14:06:10 <slaweq> I was thinking about it too and I'm not sure if we should change behaviour of API with config option (extension)
14:06:55 <amotoki> I missed the last meeitng. IMHO it looks better to continue to accept non-network address with netmask as CIDR not to break existing tools and basically it should be handled in the driver layer.
14:08:10 <mlavalle> njohnston: +1
14:08:32 <njohnston> yeah, I think the idea that the OVN driver code would silently translate the addresses OVN can't handle into addresses it can makes the most sense from a compatibility perspective
14:08:33 <slaweq> amotoki: I also tend to choose this option and fix it on drivers side, without changes of existing API
14:09:02 <slaweq> and maybe some doc update to explain it exactly what are the implications of using wrong cidrs
14:09:23 <amotoki> another idea/compromise which hits my mind is that we can convert such CIDR into a correct network address when stored in DB. it might affect less to existing tools perhaps.
14:09:58 <slaweq> amotoki: it's what haleyb proposed last time
14:10:39 <ralonsoh> I don't agree with "hiding" a user input possible mistake
14:10:57 <ralonsoh> if we consider the current input is correct, then we should store it as-is in the DB
14:11:11 <haleyb> i am flexible, we know we have to at least fix the driver, was thinking if not changing the current code would lead to duplicate SG rules?
14:12:09 <amotoki> regarding a check for duplicate SG rules, I think we should improve the check.
14:12:48 <amotoki> Duplicated rules are not what we expect
14:12:54 <haleyb> ralonsoh: i don't think the input is incorrect, it's in a grey area, it's just that it won't be used as given, that was why i'd prefer changing it
14:13:06 <njohnston> amotoki: My main concern is that if a user creates a security group rule for 1.2.3.4/24 when they later list the sg rules it should still say 1.2.3.4/24, which means that we have to store it in the DB the way the user submitted it
14:13:28 <ralonsoh> exactly
14:13:28 <njohnston> otherwise a user can't be sure the rule they created is actually in the DB
14:13:46 <slaweq> njohnston: I agree with that
14:13:50 <haleyb> njohnston: right, that is the only issue, guess noone is using IPv6 enough to notice my change :)
14:13:58 <amotoki> njohnston: ralonsoh: I understand the concern.
14:14:12 <ralonsoh> if 1.2.3.4/24 and 1.2.3.10/24 is a duplicate rule in the backend, we need to handle this in there, in the driver
14:14:17 <slaweq> maybe we could add new attribute to rule, like "cidr" and store "fixed" cidr there
14:14:19 <amotoki> if the neutron API returns different reponse, it would be confuing.
14:14:31 <slaweq> so for e.g. 1.2.3.4/24 in this new field we would store 1.2.3.0/24
14:14:35 <ralonsoh> or we keep the current API or we modify it, making a rule check
14:14:41 <ralonsoh> but nothing in the middle
14:14:52 <amotoki> /24 means we only care the first 24 bits
14:14:59 <slaweq> this new field would be read-only
14:16:10 <njohnston> I think the idea of a 'normalized cidr' is a good one
14:16:23 <amotoki> +1
14:16:33 <ralonsoh> but you rejected this idea last week!
14:16:50 <mlavalle> that's ok, people can change opinions
14:17:01 <mlavalle> I do all the time
14:17:09 <ralonsoh> then cgoncalves can reply
14:17:11 * njohnston is very good at changing opinions except when he isn't
14:17:29 <slaweq> ralonsoh: sorry, but I probably missed it somewhere last week
14:17:55 <njohnston> I like the idea of a net enw column because then we have a uniform behavior for all drivers
14:18:03 <njohnston> so compatibility is maintained
14:18:04 <slaweq> ralonsoh: but I really don't remember about proposal of adding new field to SG rule
14:18:05 <haleyb> so then the drivers would get the rule with this normalized cidr, correct?
14:18:11 <cgoncalves> my view stands from last time this was discussed. I am just a single voice and a Neutron user, not developer
14:18:21 <ralonsoh> we agreed to keep the API as is and change the problem we have in OVN
14:18:22 <ralonsoh> that's all
14:18:30 <cgoncalves> ralonsoh, +1
14:18:33 <mlavalle> even
14:18:35 <ralonsoh> adding a "rule checker" will change the API
14:18:46 <mlavalle> ptls are allowed to change opinions
14:18:53 <slaweq> ralonsoh: I agree to not change exisitng API, but adding new field shouldn't be big problem
14:18:54 <njohnston> yes, this would not change the API except to add an extra field returned
14:18:55 <ralonsoh> or "hiding" the input rule, normalizing it
14:19:11 <amotoki> the new field proposed is almost duplicated to the existing cidr field, but it clarifies what neutron actually recognizes.
14:19:44 <mlavalle> makes the actual behavior visible
14:19:53 <slaweq> amotoki: yes, it's something like to be "more verbose" to the user - You set 1.2.3.4/24 and we know about it but we will use 1.2.3.0/24 effectively
14:20:12 <ralonsoh> we can do this on the fly, we don't need to store anything
14:20:23 <njohnston> and that's the thing - if we are going to normalize things it would be good to not make a secret about it.  that will also make very obvious to anyone who is looking that a rule resolves to 0.0.0.0/0 or whatever
14:20:29 <haleyb> right, when the driver asks we can convert
14:21:04 <slaweq> yes, we can, and we are doing it indirectly e.g. in iptables driver now (iptables is doing it for us IIUC)
14:21:20 <slaweq> but with this new field we could be more verbose for users
14:22:01 <mlavalle> it's not necessarily predictable behavior, but it becomes visible
14:25:21 <slaweq> so here are possible options for this bug:
14:25:27 <cgoncalves> to keep in mind: unless the OVN bug is addressed in its driver (i.e. without relying on the normalization from neutron), the OVN bug will continue to exist in stable branches given that the normalization patch wouldn't be backportable
14:25:57 <slaweq> 1. fix it only on ovn driver, and silently convert cidr from rule to be good for ovn,
14:26:34 <slaweq> 2. add new api field and use it e.g. in ovn driver - and in fact in other drivers like iptables or ovs we can use this new field too probably
14:27:02 <slaweq> I think that other possibilities are already rejected, right?
14:27:15 <cgoncalves> slaweq, is the OVN bug present in stable branches?
14:27:34 <slaweq> cgoncalves: I'm not sure, but probably yes
14:27:54 <njohnston> slaweq: yes I think that is what we have narrowed it down to
14:27:55 <cgoncalves> slaweq, ok. if it is option 2 won't help for stable branches
14:28:27 <cgoncalves> may I politely ask why don't someone fix the OVN driver and be done with it?
14:28:29 <yamamoto> was "3. no change in api/db, fix ovn" rejected?
14:28:29 <slaweq> cgoncalves: for stable branches we can fix it with silent conversion on backend side
14:28:56 <cgoncalves> yamamoto, +1000000
14:29:13 <slaweq> yamamoto: it's point 1) from my summary
14:29:45 <haleyb> i think 1 is two parts - 1a) fix the OVN driver (required), and 1b) change the API/RPC to give the normalized CIDR to drivers
14:29:47 <yamamoto> "silently convert" in ovn?
14:30:20 <haleyb> 1a can be backported easily
14:30:23 <slaweq> ok, so I think I wasn't clear, by silently convert I meant convertion on ovn driver's side to apply what it can appect
14:30:39 <slaweq> sorry for not being clear
14:30:48 <slaweq> my 1) is exactly what yamamoto proposed
14:30:55 <yamamoto> ok
14:30:56 <haleyb> slaweq: +1 on that, which is what the iptables driver does for a few things
14:31:07 <cgoncalves> slaweq, ok, I agree with you. it is the same yamamoto suggested
14:31:20 <njohnston> I don't know if the openvswitch developers will be willing to fix OVN to accept what they may consider to be "incorrect" CIDRs, but if they do then we also don't know how long until that code is released and what customers may be using it.
14:31:52 <njohnston> So that is why we are working around the OVN behavior in the neutron OVN driver by doing a conversion.  The only question is: do we be transparent tot he user about what is happening?
14:32:32 <slaweq> njohnston: yes, so IMO we can do it 2 steps:
14:32:39 <mlavalle> transparency is always the best policy IMO
14:32:48 <slaweq> 1. fix in ovn driver only - and backport this fix to stable branches
14:33:04 <slaweq> 2. add api extension to add this normalized cidr field to SG rule
14:33:12 <njohnston> +100
14:33:18 <slaweq> that will not be backported ofcourse
14:33:20 <mlavalle> in fact transparency should always be baked in our policies
14:33:31 <amotoki> totally agree
14:34:28 <amotoki> 2nd step would also give us the clear understanding on how drivers should behave.
14:34:56 <amotoki> * to driver developers
14:35:28 <slaweq> so lets vote if that 2 steps solution would be good to go with :)
14:35:37 <njohnston> +1
14:35:46 <amotoki> +1
14:35:47 <mlavalle> I am good
14:36:19 <haleyb> +1
14:37:28 <slaweq> yamamoto: any thoughts?
14:38:25 <yamamoto> ok for me, but i'm afraid the new field can confuse users even more. after all, the problem, if any, belongs to doc, IMO.
14:38:47 <cgoncalves> yamamoto, +1
14:39:37 <slaweq> ok, so I will sumup in bug what we agreed here and I will mark it as approved RFE
14:39:47 <slaweq> thx for great discussion
14:40:01 <slaweq> next one
14:40:03 <slaweq> https://bugs.launchpad.net/neutron/+bug/1870319
14:40:03 <openstack> Launchpad bug 1870319 in neutron "[RFE] Network cascade deletion API call" [Wishlist,Triaged] - Assigned to Slawek Kaplonski (slaweq)
14:40:26 <slaweq> this was in fact proposed by dulek from Kuryr team and we talked about it in Shanghai IIRC
14:42:22 <amotoki> IIRC the main problem is the number of API calls and most of them are related to port deletion.
14:42:36 <slaweq> I will probably need to write spec with description of all possible resources which should be deleted together with such force network delete
14:42:51 <slaweq> amotoki: yes, that's the problem for Kuryr basically
14:43:12 <amotoki> roughly speaking, there seems two ways: (1) cascade network deletion (2) bulk port delete
14:43:14 <slaweq> as they need to make many API calls to cleanup
14:43:25 <amotoki> (2) would reduce most API calls
14:44:24 <amotoki> of course the cascade network deletion would be more friendly to API consumers, but it might be more complicated than bulk port deletion.
14:44:30 <slaweq> seems that You are right, I can ask dulek if bulk port deletion would solve their problem
14:46:10 <njohnston> ++
14:46:43 <slaweq> I pinged dulek, maybe he will join us soon
14:47:05 <amotoki> both options have a problem that what response code should return when we hit partial deletion failure.
14:47:37 <slaweq> amotoki: I think that nova supports bulk vms deletion, no?
14:47:56 <amotoki> I am not sure at the moment
14:48:32 <amotoki> it looks like "nova" CLI supports it but the API does not.
14:48:47 <slaweq> ahh, ok
14:49:08 <slaweq> hi dulek :)
14:49:10 <dulek> o/
14:49:20 <slaweq> we are talking about https://bugs.launchpad.net/neutron/+bug/1870319
14:49:20 <openstack> Launchpad bug 1870319 in neutron "[RFE] Network cascade deletion API call" [Wishlist,Triaged] - Assigned to Slawek Kaplonski (slaweq)
14:49:59 <dulek> Right! So what's the problem? Need any info about use case we have in Kuryr?
14:49:59 <slaweq> and we would like to know if  bulk port deletion would solve Kuryr biggest issue related to this cleanup
14:50:30 <dulek> Ha, the answer is most likely not what you'd wanted to hear - not really.
14:50:34 <dulek> Or partially.
14:51:12 <dulek> This is because when we're cleaning up a network in Kuryr, ports are subports of a trunk.
14:51:27 <dulek> So we cannot delete them before removing them from trunk.
14:51:59 <dulek> So bulk delete would improve stuff a bit, but we'd still get multiple calls here.
14:52:08 <dulek> If I'm not mistaken we need to remove them from a trunk one-by-one?
14:52:52 <amotoki> IIUC we need to delete subport one-by-one now
14:52:57 <njohnston> any bulk port deletion solution would potentially have to account for some of the ports being trunk ports anyway
14:53:20 <amotoki> I see. even if we implement bulk port delete, it should be cascade port deletetion
14:53:41 <dulek> njohnston: True, but currently delete on a port that's a subport will fail.
14:54:12 <dulek> So if bulk-delete would work the same, we would still need to do n calls, where n is number of ports.
14:56:18 <amotoki> In my understanding, the main problem is the number of API calls and most of them are related to port deletion.
14:56:51 <amotoki> if the bulk port deletion is supported including cascading port deletion (of subports), it would reduce most of the APi calls.
14:57:29 <dulek> That is true.
14:57:32 <njohnston> what if we added an option to port deletion that would optionally allow the deletion behavior to include removing the port from the trunk?  Then the bulk port deletion could be called with that option supplied for all ports.
14:57:57 <amotoki> While we haven't investigated which is simpler, the cascade network delete or the bulk port delete with cascade delete, can either of them addresses your problem?
14:58:06 <dulek> Basically it doesn't matter too much if we're doing 3 calls or 1 call, but if we're doing n + x calls, where n is number of ports.
14:58:18 <amotoki> of course, I understand the cascade network delete woudl be better for you.
14:59:04 <dulek> amotoki: Sure, 1 call is better than 3 (bulk port, subnet, network), but it's totally a huge improvement over what we have now.
14:59:13 <njohnston> if we did the bulk port delete with trunk deletion added, then the only remaining steps would be Detach the subnet from the router and Remove the network
14:59:51 <dulek> Right, forgot about router operation. Anyway it's still constant number of ops, not linear.
15:00:01 <amotoki> njohnston: i think so
15:00:01 <njohnston> yep
15:00:17 <slaweq> njohnston: true, so probably if we would have bulk port deletion we would be close to cascade network deletion as well
15:00:49 <slaweq> ok, we are over time now
15:00:57 <slaweq> we will get back to this one
15:01:01 <slaweq> thx for attendance
15:01:04 <mlavalle> o/
15:01:06 <amotoki> o/
15:01:07 <ralonsoh> bye
15:01:08 <slaweq> and have a great weekend all :)
15:01:10 <slaweq> o/
15:01:12 <slaweq> #endmeeting