#openstack-meeting log

13:59:59 <liuyulong> #startmeeting neutron_l3
14:00:00 <openstack> Meeting started Wed Nov 20 13:59:59 2019 UTC and is due to finish in 60 minutes.  The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:03 <openstack> The meeting name has been set to 'neutron_l3'
14:00:06 <liuyulong> #chair liuyulong_
14:00:07 <openstack> Current chairs: liuyulong liuyulong_
14:01:41 <liuyulong> #topic Announcements
14:02:32 <liuyulong> Let's recall the announcements yesterday
14:02:39 <liuyulong> #link http://eavesdrop.openstack.org/meetings/networking/2019/networking.2019-11-19-14.00.log.html#l-10
14:02:59 <liuyulong> Then no more from me.
14:05:06 <liuyulong> #topic Bugs
14:06:42 <liuyulong> No bug deputy email received this week, so let's directly search the bug list.
14:09:13 <liuyulong> First one
14:09:18 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1852777
14:09:18 <openstack> Launchpad bug 1852777 in neutron "Neutron allows to create two subnets with same CIDR in a network through heat" [High,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
14:09:51 <ralonsoh> hi yes
14:09:56 <ralonsoh> I'm still testing this
14:09:56 <liuyulong> The distributed lock should be introduced for this, IMO
14:10:04 <ralonsoh> no, is not working
14:10:16 <liuyulong> A local file or memory lock does not work in multiple physical hosts.
14:10:27 <ralonsoh> even with the threading lock I have both subnets created
14:10:46 <ralonsoh> distributed lock?
14:10:54 <ralonsoh> sorry can you point me to this?
14:11:00 <liuyulong> I mean we should use tooz.
14:11:06 <ralonsoh> ah ok
14:11:11 <slaweq> hi, sorry for being late
14:11:30 <ralonsoh> but this request will be done to a single server only
14:11:37 <ralonsoh> or am I wrong?
14:11:52 <ralonsoh> so a distributed lock won't be necessary here
14:12:13 <liuyulong> create subnet with same CIDR should be two different API calls.
14:12:22 <slaweq> ralonsoh: I think that each create-subnet request can go to different host with neutron-api process
14:12:24 <slaweq> no?
14:12:27 <liuyulong> So it will spread to different hosts.
14:12:34 <ralonsoh> ok
14:13:05 <ralonsoh> btw, this is going to add an extra delay in subnet creation
14:13:27 <ralonsoh> just a heads-up to everybody complaining about the time consumption in neutron API
14:13:32 <liuyulong> BTW, the UNIQUE CONSTRAINT is also worthy to add.
14:13:48 <ralonsoh> where?
14:14:07 <ralonsoh> because two different cidrs can overlap being different
14:14:11 <ralonsoh> just different masks
14:14:22 <ralonsoh> 10.0.0.0/24 and 10.0.0.0/25
14:14:25 <liuyulong> Yes, the distributed lock will make the API workers from different linearly.
14:14:28 <slaweq> exactly, constraint on db level will not help here
14:14:45 <ralonsoh> (I tried to do something with IPSets)
14:14:53 <ralonsoh> by adding a new table
14:14:58 <ralonsoh> and adding a register per network
14:15:10 <ralonsoh> containing the IPSets of the CIDRs
14:15:15 <liuyulong> ralonsoh, slaweq, but at least it can cover one case.
14:15:56 <ralonsoh> liuyulong, hmmm I don;t think this is enough
14:16:01 <liuyulong> #chair slaweq
14:16:02 <openstack> Current chairs: liuyulong liuyulong_ slaweq
14:16:11 <liuyulong> ^ just in case
14:17:05 <liuyulong> ralonsoh, yes, I'm not saying it will cover all cases.
14:17:06 <slaweq> unique constraint can at least fix problem when 2 workers will try to create exactly same subnets
14:17:20 <slaweq> and as it's probably easy to add, it makes sense for me
14:17:27 <slaweq> but this will not solve the problem for sure
14:17:38 <ralonsoh> ok
14:17:46 <ralonsoh> I'll add a partial-bug patch for this
14:18:23 <slaweq> ralonsoh++
14:18:39 <slaweq> liuyulong++ for the idea about unique constraint too
14:18:49 <liuyulong> I have another bad idea based on such  unique constraint, : )
14:19:32 <liuyulong> Store each IP of the CIDR and add  unique constraint between IP and network_id, hahaha
14:19:44 <ralonsoh> not all the IPs
14:19:48 <ralonsoh> but the IPSet
14:19:52 <ralonsoh> as I proposed before
14:20:13 <ralonsoh> one IPset per network, in one single register
14:20:40 <ralonsoh> this will force, any time we want to update the network subnets, to update this single register
14:21:11 <ralonsoh> this will use the DB to enforce the logic
14:21:37 <liuyulong> Database has such IPset data type?
14:21:55 <ralonsoh> no, json.dumps(netaddr.IPSet())
14:22:02 <ralonsoh> and then json.loads()
14:22:03 <slaweq> ralonsoh: how one IPSet will work for network is I will have e.g. 2 subnets 1.0.0.0/24 and 2.0.0.0/24 ?
14:22:30 <ralonsoh> one sec
14:22:52 <liuyulong> ralonsoh, a json list can be used for unique constraint?
14:22:59 <ralonsoh> >>> n10=netaddr.IPNetwork('1.0.0.0/24')
14:22:59 <ralonsoh> >>> n11=netaddr.IPNetwork('2.0.0.0/24')
14:22:59 <ralonsoh> >>> ips=netaddr.IPSet([n10,n11])
14:22:59 <ralonsoh> >>> ips
14:22:59 <ralonsoh> IPSet(['1.0.0.0/24', '2.0.0.0/24'])
14:23:00 <slaweq> and also You will still need to have logic in python to validate this IPset each time e.g. new subnet is added
14:23:08 <ralonsoh> slaweq, yes
14:23:23 <slaweq> so it still not be atomic
14:23:25 <slaweq> right?
14:23:33 <ralonsoh> but the point is you can have only one writer context to one DB register
14:23:59 <ralonsoh> as I said, this is not easy and I'm trying to find the way
14:24:00 <slaweq> ahh, ok
14:24:09 <slaweq> so this would be locked by one api worker
14:24:17 <ralonsoh> it should
14:24:23 <slaweq> and other would need to wait to read from it, correct?
14:24:32 <ralonsoh> (of course, this will break the DB normal forms)
14:24:40 <ralonsoh> exactly, it should wait
14:25:11 <slaweq> than this may work
14:25:33 <liuyulong> +1, make sense
14:26:29 <slaweq> but one more thing
14:26:31 <liuyulong> It looks like a distributed lock implemented by neutron itself for each network during create subnet.
14:26:43 <slaweq> do You want to store in db list(ips)? or what exactly?
14:27:03 <ralonsoh> store str(IPSet)
14:27:17 <ralonsoh> this is way shorter than the IP list
14:27:35 <slaweq> TypeError: Object of type IPSet is not JSON serializable
14:27:47 <slaweq> I have such error when I'm trying to do this
14:27:57 <ralonsoh> I know, we need to create a serializer
14:28:18 <slaweq> but ok, we can even store there list of cidrs, and than create IPSet object in fligh during the validation
14:28:19 <ralonsoh> this could be done just with the ranges list
14:28:26 <ralonsoh> e.g.: ['1.0.0.0/24', '2.0.0.0/24']
14:28:31 <liuyulong> This is my understanding: one API try to add 'ip_set' to the 'new table' and it should have uniq constraint for network; while another worker try to write this table will meet uniq constraint error.
14:29:01 <ralonsoh> slaweq, but we should not store the CIDR list
14:29:15 <ralonsoh> slaweq, we already have this information from the DB
14:29:26 <ralonsoh> this is bad DB design
14:29:49 <ralonsoh> liuyulong, yes, that's the point
14:29:53 <liuyulong> So after the first one creation done, and another will start another retry to write this table, and go to the IPAM check again.
14:29:54 <ralonsoh> to use the DB as a lock
14:30:16 <slaweq> ralonsoh: sure, I was thinking about list as You mentioned: ['1.0.0.0/24', '2.0.0.0/24']
14:30:44 <ralonsoh> slaweq, yes another example
14:31:00 <ralonsoh> >>> n11=netaddr.IPNetwork('1.0.1.0/24')
14:31:00 <ralonsoh> >>> ips=netaddr.IPSet([n10,n11])
14:31:00 <ralonsoh> >>> ips
14:31:00 <ralonsoh> IPSet(['1.0.0.0/23'])
14:31:10 <ralonsoh> n10=netaddr.IPNetwork('1.0.0.0/24')
14:31:30 <ralonsoh> one network range for two cidrs
14:32:54 <slaweq> yes, that makes sense
14:35:03 <liuyulong> If neutron is willing to introduce tooz, the subnet creation can also apply lock on the network only, basically logical is same.
14:35:56 <liuyulong> OK, we have good ideas here, thanks.
14:36:06 <liuyulong> Next one
14:36:10 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1852760
14:36:10 <openstack> Launchpad bug 1852760 in neutron "When running 'openstack floating ip list' on undercloud, client cannot handle NotFoundException" [Low,Invalid] - Assigned to Nate Johnston (nate-johnston)
14:36:58 <njohnston_> yes, I moved that to storyboard as it's a client issue https://storyboard.openstack.org/#!/story/2006863
14:37:28 <liuyulong> It is a client error. We need user friendly outputs for it. Right?
14:37:51 <njohnston_> correct.  neutron api is returning the correct thing.
14:38:21 <liuyulong> And the Neutron API response should also add the resource type in the message, IMO.
14:38:31 <liuyulong> Now it is just "The resource could not be found"
14:38:53 <njohnston_> that would be nice, but not absolutely required
14:39:34 <liuyulong> Yes, user should remember the resource type they are just trying to find.
14:39:41 <njohnston_> indeed
14:39:54 <liuyulong> Next
14:39:57 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1852680
14:39:58 <openstack> Launchpad bug 1852680 in neutron "floatingip can not access after associate to instance" [Undecided,Incomplete]
14:40:32 <liuyulong> I highly doubt the VM was not set the security group rule correctly.
14:41:13 <liuyulong> Since there is no more information attached to this now, let's remain it as Incomplete.
14:42:22 <slaweq> I agree
14:42:24 <liuyulong> Next
14:42:28 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1852504
14:42:28 <openstack> Launchpad bug 1852504 in neutron "DHCP reserved ports that were unscheduled are advertised as DNS servers" [Medium,In progress] - Assigned to Mithil Arun (arun-mithil)
14:43:58 <liuyulong> Alright, another DHCP bug, this was seen in our cloud.
14:44:38 <liuyulong> Mainly are because of the auto_schedule mechanism of DHCP.
14:45:24 <slaweq> the issue here is that we are not removing reserved_dhcp_ports but left them unbound, right?
14:46:18 <liuyulong> We have no fix of this, but as a workaround, I just suggest to disable the auto_schedule of the DHCP, and increase the dhcp_agents_per_network to 3 or more. In such way, it can cover most failure case.
14:46:44 <liuyulong> slaweq, yes
14:46:59 <slaweq> maybe we should remove such ports?
14:47:18 <liuyulong> slaweq, but you can see that the bug description has more that 2 ACTIVE  DHCP ports...
14:48:02 <slaweq> liuyulong: yes, but how it's related?
14:48:08 <slaweq> active ports are ok, right?
14:49:00 <liuyulong> I have no idea because we have used the distributed DHCP based on the openflow and ovs local controller.
14:49:25 <liuyulong> Which I proposed during the PTG, : )
14:49:54 <slaweq> yes, I remember that one :)
14:50:37 <liuyulong> Anyway, it has a fix there: https://review.opendev.org/694859
14:50:50 <liuyulong> We can test that.
14:51:03 <slaweq> yes, let's review this
14:51:57 <liuyulong> Next: https://bugs.launchpad.net/neutron/+bug/1852468
14:51:57 <openstack> Launchpad bug 1852468 in neutron "network router:external value is non-boolean (Internal) which causes server create failure" [Undecided,Invalid]
14:52:00 <liuyulong> It is invalid now.
14:52:24 <liuyulong> And next:
14:52:27 <liuyulong> https://bugs.launchpad.net/neutron/+bug/1852447
14:52:27 <openstack> Launchpad bug 1852447 in neutron "FWaaS: adding a router port to fwg and removing it leaves the fwg active" [Medium,Triaged]
14:52:49 <liuyulong> Who is the new daddy of this project now? haha
14:53:23 <slaweq> there is no new daddy for fwaas (yet)
14:54:21 <liuyulong> Alright, time is running out. Let's move on.
14:54:32 <liuyulong> #topic On demand agenda
14:55:02 <liuyulong> I have one update of IPv6.
14:55:38 <liuyulong> We finally move back to dhcpv6-stateful for both address and other option with prefix len of 64.
14:56:48 <liuyulong> Everything works fine for now, the instance image does not change for the IPv6 and the NetworkManager also works fine.
14:59:05 * haleyb completely forgot about the time change for this meeting, sorry :(  updated...
14:59:07 <liuyulong> Windows have a very magic behavior, when you add a IPv6 address for a port (once with IPv4 only), the NIC of the windows will automatically set the IPv6 address to it. But for Linux, user should to ifdow/up the network interface to dhcp the IPv6 address.
14:59:26 <liuyulong> OK, let's end here.
14:59:29 <liuyulong> #endmeeting