14:00:33 <slaweq> #startmeeting neutron_drivers 14:00:34 <openstack> Meeting started Fri Apr 17 14:00:33 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:35 <njohnston> o/ 14:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:38 <openstack> The meeting name has been set to 'neutron_drivers' 14:00:40 <slaweq> hi 14:00:41 <mlavalle> o/ 14:00:51 <haleyb> hi 14:01:29 <amotoki> hi 14:01:30 <ralonsoh> hi 14:01:31 <slaweq> lets wait few more minutes for yamamoto and amotoki 14:01:34 <slaweq> ok, hi amotoki :) 14:01:39 <yamamoto> hi 14:01:47 <slaweq> hi yamamoto 14:01:52 <slaweq> ok, so we can start 14:01:54 <slaweq> #topic RFEs 14:02:00 <slaweq> we have 3 RFEs for today 14:02:07 <slaweq> https://bugs.launchpad.net/neutron/+bug/1869129 14:02:07 <openstack> Launchpad bug 1869129 in neutron "neutron accepts CIDR in security groups that are invalid in ovn" [Undecided,In progress] - Assigned to Jake Yip (waipengyip) 14:02:28 <slaweq> we discussed about it last time but we didn't make any agreement on it 14:03:53 <slaweq> there is my summary of our last discussion in comment #10 and some new comments from owner in comment #11 14:05:04 <njohnston> Yeah, I would like to drop my proposal to add an api extension around /0 netmasks - if someone asks for it that's fine but I'd rather not solve a problem that doesn't really exist. 14:06:10 <slaweq> I was thinking about it too and I'm not sure if we should change behaviour of API with config option (extension) 14:06:55 <amotoki> I missed the last meeitng. IMHO it looks better to continue to accept non-network address with netmask as CIDR not to break existing tools and basically it should be handled in the driver layer. 14:08:10 <mlavalle> njohnston: +1 14:08:32 <njohnston> yeah, I think the idea that the OVN driver code would silently translate the addresses OVN can't handle into addresses it can makes the most sense from a compatibility perspective 14:08:33 <slaweq> amotoki: I also tend to choose this option and fix it on drivers side, without changes of existing API 14:09:02 <slaweq> and maybe some doc update to explain it exactly what are the implications of using wrong cidrs 14:09:23 <amotoki> another idea/compromise which hits my mind is that we can convert such CIDR into a correct network address when stored in DB. it might affect less to existing tools perhaps. 14:09:58 <slaweq> amotoki: it's what haleyb proposed last time 14:10:39 <ralonsoh> I don't agree with "hiding" a user input possible mistake 14:10:57 <ralonsoh> if we consider the current input is correct, then we should store it as-is in the DB 14:11:11 <haleyb> i am flexible, we know we have to at least fix the driver, was thinking if not changing the current code would lead to duplicate SG rules? 14:12:09 <amotoki> regarding a check for duplicate SG rules, I think we should improve the check. 14:12:48 <amotoki> Duplicated rules are not what we expect 14:12:54 <haleyb> ralonsoh: i don't think the input is incorrect, it's in a grey area, it's just that it won't be used as given, that was why i'd prefer changing it 14:13:06 <njohnston> amotoki: My main concern is that if a user creates a security group rule for 1.2.3.4/24 when they later list the sg rules it should still say 1.2.3.4/24, which means that we have to store it in the DB the way the user submitted it 14:13:28 <ralonsoh> exactly 14:13:28 <njohnston> otherwise a user can't be sure the rule they created is actually in the DB 14:13:46 <slaweq> njohnston: I agree with that 14:13:50 <haleyb> njohnston: right, that is the only issue, guess noone is using IPv6 enough to notice my change :) 14:13:58 <amotoki> njohnston: ralonsoh: I understand the concern. 14:14:12 <ralonsoh> if 1.2.3.4/24 and 1.2.3.10/24 is a duplicate rule in the backend, we need to handle this in there, in the driver 14:14:17 <slaweq> maybe we could add new attribute to rule, like "cidr" and store "fixed" cidr there 14:14:19 <amotoki> if the neutron API returns different reponse, it would be confuing. 14:14:31 <slaweq> so for e.g. 1.2.3.4/24 in this new field we would store 1.2.3.0/24 14:14:35 <ralonsoh> or we keep the current API or we modify it, making a rule check 14:14:41 <ralonsoh> but nothing in the middle 14:14:52 <amotoki> /24 means we only care the first 24 bits 14:14:59 <slaweq> this new field would be read-only 14:16:10 <njohnston> I think the idea of a 'normalized cidr' is a good one 14:16:23 <amotoki> +1 14:16:33 <ralonsoh> but you rejected this idea last week! 14:16:50 <mlavalle> that's ok, people can change opinions 14:17:01 <mlavalle> I do all the time 14:17:09 <ralonsoh> then cgoncalves can reply 14:17:11 * njohnston is very good at changing opinions except when he isn't 14:17:29 <slaweq> ralonsoh: sorry, but I probably missed it somewhere last week 14:17:55 <njohnston> I like the idea of a net enw column because then we have a uniform behavior for all drivers 14:18:03 <njohnston> so compatibility is maintained 14:18:04 <slaweq> ralonsoh: but I really don't remember about proposal of adding new field to SG rule 14:18:05 <haleyb> so then the drivers would get the rule with this normalized cidr, correct? 14:18:11 <cgoncalves> my view stands from last time this was discussed. I am just a single voice and a Neutron user, not developer 14:18:21 <ralonsoh> we agreed to keep the API as is and change the problem we have in OVN 14:18:22 <ralonsoh> that's all 14:18:30 <cgoncalves> ralonsoh, +1 14:18:33 <mlavalle> even 14:18:35 <ralonsoh> adding a "rule checker" will change the API 14:18:46 <mlavalle> ptls are allowed to change opinions 14:18:53 <slaweq> ralonsoh: I agree to not change exisitng API, but adding new field shouldn't be big problem 14:18:54 <njohnston> yes, this would not change the API except to add an extra field returned 14:18:55 <ralonsoh> or "hiding" the input rule, normalizing it 14:19:11 <amotoki> the new field proposed is almost duplicated to the existing cidr field, but it clarifies what neutron actually recognizes. 14:19:44 <mlavalle> makes the actual behavior visible 14:19:53 <slaweq> amotoki: yes, it's something like to be "more verbose" to the user - You set 1.2.3.4/24 and we know about it but we will use 1.2.3.0/24 effectively 14:20:12 <ralonsoh> we can do this on the fly, we don't need to store anything 14:20:23 <njohnston> and that's the thing - if we are going to normalize things it would be good to not make a secret about it. that will also make very obvious to anyone who is looking that a rule resolves to 0.0.0.0/0 or whatever 14:20:29 <haleyb> right, when the driver asks we can convert 14:21:04 <slaweq> yes, we can, and we are doing it indirectly e.g. in iptables driver now (iptables is doing it for us IIUC) 14:21:20 <slaweq> but with this new field we could be more verbose for users 14:22:01 <mlavalle> it's not necessarily predictable behavior, but it becomes visible 14:25:21 <slaweq> so here are possible options for this bug: 14:25:27 <cgoncalves> to keep in mind: unless the OVN bug is addressed in its driver (i.e. without relying on the normalization from neutron), the OVN bug will continue to exist in stable branches given that the normalization patch wouldn't be backportable 14:25:57 <slaweq> 1. fix it only on ovn driver, and silently convert cidr from rule to be good for ovn, 14:26:34 <slaweq> 2. add new api field and use it e.g. in ovn driver - and in fact in other drivers like iptables or ovs we can use this new field too probably 14:27:02 <slaweq> I think that other possibilities are already rejected, right? 14:27:15 <cgoncalves> slaweq, is the OVN bug present in stable branches? 14:27:34 <slaweq> cgoncalves: I'm not sure, but probably yes 14:27:54 <njohnston> slaweq: yes I think that is what we have narrowed it down to 14:27:55 <cgoncalves> slaweq, ok. if it is option 2 won't help for stable branches 14:28:27 <cgoncalves> may I politely ask why don't someone fix the OVN driver and be done with it? 14:28:29 <yamamoto> was "3. no change in api/db, fix ovn" rejected? 14:28:29 <slaweq> cgoncalves: for stable branches we can fix it with silent conversion on backend side 14:28:56 <cgoncalves> yamamoto, +1000000 14:29:13 <slaweq> yamamoto: it's point 1) from my summary 14:29:45 <haleyb> i think 1 is two parts - 1a) fix the OVN driver (required), and 1b) change the API/RPC to give the normalized CIDR to drivers 14:29:47 <yamamoto> "silently convert" in ovn? 14:30:20 <haleyb> 1a can be backported easily 14:30:23 <slaweq> ok, so I think I wasn't clear, by silently convert I meant convertion on ovn driver's side to apply what it can appect 14:30:39 <slaweq> sorry for not being clear 14:30:48 <slaweq> my 1) is exactly what yamamoto proposed 14:30:55 <yamamoto> ok 14:30:56 <haleyb> slaweq: +1 on that, which is what the iptables driver does for a few things 14:31:07 <cgoncalves> slaweq, ok, I agree with you. it is the same yamamoto suggested 14:31:20 <njohnston> I don't know if the openvswitch developers will be willing to fix OVN to accept what they may consider to be "incorrect" CIDRs, but if they do then we also don't know how long until that code is released and what customers may be using it. 14:31:52 <njohnston> So that is why we are working around the OVN behavior in the neutron OVN driver by doing a conversion. The only question is: do we be transparent tot he user about what is happening? 14:32:32 <slaweq> njohnston: yes, so IMO we can do it 2 steps: 14:32:39 <mlavalle> transparency is always the best policy IMO 14:32:48 <slaweq> 1. fix in ovn driver only - and backport this fix to stable branches 14:33:04 <slaweq> 2. add api extension to add this normalized cidr field to SG rule 14:33:12 <njohnston> +100 14:33:18 <slaweq> that will not be backported ofcourse 14:33:20 <mlavalle> in fact transparency should always be baked in our policies 14:33:31 <amotoki> totally agree 14:34:28 <amotoki> 2nd step would also give us the clear understanding on how drivers should behave. 14:34:56 <amotoki> * to driver developers 14:35:28 <slaweq> so lets vote if that 2 steps solution would be good to go with :) 14:35:37 <njohnston> +1 14:35:46 <amotoki> +1 14:35:47 <mlavalle> I am good 14:36:19 <haleyb> +1 14:37:28 <slaweq> yamamoto: any thoughts? 14:38:25 <yamamoto> ok for me, but i'm afraid the new field can confuse users even more. after all, the problem, if any, belongs to doc, IMO. 14:38:47 <cgoncalves> yamamoto, +1 14:39:37 <slaweq> ok, so I will sumup in bug what we agreed here and I will mark it as approved RFE 14:39:47 <slaweq> thx for great discussion 14:40:01 <slaweq> next one 14:40:03 <slaweq> https://bugs.launchpad.net/neutron/+bug/1870319 14:40:03 <openstack> Launchpad bug 1870319 in neutron "[RFE] Network cascade deletion API call" [Wishlist,Triaged] - Assigned to Slawek Kaplonski (slaweq) 14:40:26 <slaweq> this was in fact proposed by dulek from Kuryr team and we talked about it in Shanghai IIRC 14:42:22 <amotoki> IIRC the main problem is the number of API calls and most of them are related to port deletion. 14:42:36 <slaweq> I will probably need to write spec with description of all possible resources which should be deleted together with such force network delete 14:42:51 <slaweq> amotoki: yes, that's the problem for Kuryr basically 14:43:12 <amotoki> roughly speaking, there seems two ways: (1) cascade network deletion (2) bulk port delete 14:43:14 <slaweq> as they need to make many API calls to cleanup 14:43:25 <amotoki> (2) would reduce most API calls 14:44:24 <amotoki> of course the cascade network deletion would be more friendly to API consumers, but it might be more complicated than bulk port deletion. 14:44:30 <slaweq> seems that You are right, I can ask dulek if bulk port deletion would solve their problem 14:46:10 <njohnston> ++ 14:46:43 <slaweq> I pinged dulek, maybe he will join us soon 14:47:05 <amotoki> both options have a problem that what response code should return when we hit partial deletion failure. 14:47:37 <slaweq> amotoki: I think that nova supports bulk vms deletion, no? 14:47:56 <amotoki> I am not sure at the moment 14:48:32 <amotoki> it looks like "nova" CLI supports it but the API does not. 14:48:47 <slaweq> ahh, ok 14:49:08 <slaweq> hi dulek :) 14:49:10 <dulek> o/ 14:49:20 <slaweq> we are talking about https://bugs.launchpad.net/neutron/+bug/1870319 14:49:20 <openstack> Launchpad bug 1870319 in neutron "[RFE] Network cascade deletion API call" [Wishlist,Triaged] - Assigned to Slawek Kaplonski (slaweq) 14:49:59 <dulek> Right! So what's the problem? Need any info about use case we have in Kuryr? 14:49:59 <slaweq> and we would like to know if bulk port deletion would solve Kuryr biggest issue related to this cleanup 14:50:30 <dulek> Ha, the answer is most likely not what you'd wanted to hear - not really. 14:50:34 <dulek> Or partially. 14:51:12 <dulek> This is because when we're cleaning up a network in Kuryr, ports are subports of a trunk. 14:51:27 <dulek> So we cannot delete them before removing them from trunk. 14:51:59 <dulek> So bulk delete would improve stuff a bit, but we'd still get multiple calls here. 14:52:08 <dulek> If I'm not mistaken we need to remove them from a trunk one-by-one? 14:52:52 <amotoki> IIUC we need to delete subport one-by-one now 14:52:57 <njohnston> any bulk port deletion solution would potentially have to account for some of the ports being trunk ports anyway 14:53:20 <amotoki> I see. even if we implement bulk port delete, it should be cascade port deletetion 14:53:41 <dulek> njohnston: True, but currently delete on a port that's a subport will fail. 14:54:12 <dulek> So if bulk-delete would work the same, we would still need to do n calls, where n is number of ports. 14:56:18 <amotoki> In my understanding, the main problem is the number of API calls and most of them are related to port deletion. 14:56:51 <amotoki> if the bulk port deletion is supported including cascading port deletion (of subports), it would reduce most of the APi calls. 14:57:29 <dulek> That is true. 14:57:32 <njohnston> what if we added an option to port deletion that would optionally allow the deletion behavior to include removing the port from the trunk? Then the bulk port deletion could be called with that option supplied for all ports. 14:57:57 <amotoki> While we haven't investigated which is simpler, the cascade network delete or the bulk port delete with cascade delete, can either of them addresses your problem? 14:58:06 <dulek> Basically it doesn't matter too much if we're doing 3 calls or 1 call, but if we're doing n + x calls, where n is number of ports. 14:58:18 <amotoki> of course, I understand the cascade network delete woudl be better for you. 14:59:04 <dulek> amotoki: Sure, 1 call is better than 3 (bulk port, subnet, network), but it's totally a huge improvement over what we have now. 14:59:13 <njohnston> if we did the bulk port delete with trunk deletion added, then the only remaining steps would be Detach the subnet from the router and Remove the network 14:59:51 <dulek> Right, forgot about router operation. Anyway it's still constant number of ops, not linear. 15:00:01 <amotoki> njohnston: i think so 15:00:01 <njohnston> yep 15:00:17 <slaweq> njohnston: true, so probably if we would have bulk port deletion we would be close to cascade network deletion as well 15:00:49 <slaweq> ok, we are over time now 15:00:57 <slaweq> we will get back to this one 15:01:01 <slaweq> thx for attendance 15:01:04 <mlavalle> o/ 15:01:06 <amotoki> o/ 15:01:07 <ralonsoh> bye 15:01:08 <slaweq> and have a great weekend all :) 15:01:10 <slaweq> o/ 15:01:12 <slaweq> #endmeeting