14:00:53 <liuyulong> #startmeeting neutron_l3
14:00:57 <openstack> Meeting started Wed Aug  7 14:00:53 2019 UTC and is due to finish in 60 minutes.  The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:58 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:00 <openstack> The meeting name has been set to 'neutron_l3'
14:01:04 <haleyb> hi
14:01:10 <liuyulong> #chair haleyb
14:01:11 <openstack> Current chairs: haleyb liuyulong
14:01:13 <liuyulong> hi
14:01:13 <ralonsoh> hi
14:01:29 <liuyulong> #topic Announcements
14:03:17 <liuyulong> Unfortunately, the port forwarding topic was not accepted by the Summit Team.
14:04:01 <liuyulong> So mlavalle and I will not share this in the Summit. : )
14:04:45 <liuyulong> Any other announcements?
14:05:33 <liuyulong> OK, let's move on.
14:05:41 <liuyulong> #topic Bugs
14:07:40 <haleyb> there were some new bugs filed last week in the l3 space
14:07:51 <liuyulong_> Bad network connection...
14:08:01 <liuyulong_> #link https://wiki.openstack.org/wiki/Network/Meetings#Bug_deputy
14:08:04 <haleyb> https://bugs.launchpad.net/neutron/+bug/1838699
14:08:05 <openstack> Launchpad bug 1838699 in neutron "Removing a subnet from DVR router also removes DVR MAC flows for other router on network" [High,Confirmed]
14:08:51 <haleyb> slaweq confirmed it, but it will need an owner
14:08:58 <ralonsoh> My comment yesterday was not registered...
14:09:00 <ralonsoh> in this bug
14:09:09 <liuyulong_> #link https://bugs.launchpad.net/neutron/+bug/1838697
14:09:10 <openstack> Launchpad bug 1838697 in neutron "DVR Mac conversion rules are only added for the first router a network is attached to" [Undecided,Incomplete]
14:09:14 <liuyulong_> this is also related.
14:09:33 <ralonsoh> if you have more than one DVR, the match flows will be the same
14:09:50 <ralonsoh> that means, when you deleted the flows, you'll delete all of them
14:09:53 <haleyb> liuyulong_: ack, makes sense think they were both filed by same person
14:10:05 <ralonsoh> (I'l write this comment again in the bug)
14:10:24 <haleyb> ralonsoh: thanks
14:10:46 <liuyulong_> ralonsoh, so it is designed like that, it is a feature?
14:10:58 <ralonsoh> I don't think so
14:11:11 <ralonsoh> that means we can't have more than one DVR per host
14:11:17 <ralonsoh> but I need confirmation
14:11:23 <ralonsoh> I'll write the comment again in the bug
14:12:19 <liuyulong_> OK, thank you, I read some code ealier, the ovs-agent will query the related flow by ports subnet.
14:12:56 <liuyulong_> If more than one dvr ports in same subnet, it will indeed delete once for all.
14:15:05 <liuyulong_> haleyb, I have bad network connection now, please take over the meeting chair.
14:15:13 <haleyb> ack
14:15:32 <haleyb> next bug, https://bugs.launchpad.net/neutron/+bug/1838793
14:15:33 <openstack> Launchpad bug 1838793 in neutron ""KeepalivedManagerTestCase" tests failing during namespace deletion" [High,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
14:15:46 <haleyb> https://review.opendev.org/#/c/674820/ was created - thanks ralonsoh
14:16:09 <ralonsoh> I need to check the CI again
14:16:51 <haleyb> ralonsoh: i've added myself to review so will look at next update
14:16:58 <ralonsoh> thanks!
14:18:11 <haleyb> next bug, https://bugs.launchpad.net/neutron/+bug/1838403
14:18:12 <openstack> Launchpad bug 1838403 in neutron "Asymmetric floating IP notifications" [Medium,New]
14:19:10 <haleyb> i had triaged this last week and couldn't reproduce part of it.  see now it was on queens, so perhaps part was fixed
14:20:11 <haleyb> still needs owner to track down the other possible issued with notifications, if noone wants it i can take a look
14:20:20 <liuyulong_> How to "delete a router that still has fip"?
14:20:54 <haleyb> liuyulong_: right, it didn't work for me on master, but can't imagine it works on queens either
14:21:48 <haleyb> liuyulong_: the part we need to investigate is what happens with a VM is destroyed - is the floating IP in one of the messages?  his trace showed it wasn't
14:22:52 <liuyulong_> If the VM port is delete, l3_db will catch port_delete notification and release the FIP.
14:24:20 <haleyb> liuyulong_: yes, but is that sending a notification?
14:25:49 * haleyb wonders how fast liuyulong_'s modem is :)  (if anyone remembers what a modem is)
14:26:10 <ralonsoh> 28.8 bauds per sec
14:26:20 <haleyb> zoom zoom
14:26:34 <liuyulong_> Subscrition is more accurate.
14:26:47 <liuyulong_> Subscription
14:27:11 <haleyb> liuyulong_: oh, maybe they didn't subscribe to all the events?
14:27:52 <liuyulong_> https://github.com/openstack/neutron/blob/master/neutron/db/l3_db.py#L1848
14:28:04 <liuyulong_> This line it is.
14:30:04 <haleyb> liuyulong_: there is nothing there regarding floating IP though, maybe i'm mis-understanding
14:30:30 <liuyulong_> https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/plugin.py#L1941-L1943
14:31:11 <haleyb> liuyulong_: if i'm listening for FLOATINGIP events shouldn't i get one when a port with an associated floating IP is deleted?
14:32:58 <haleyb> either way, please add a comment to the bug so maybe the submitter can track things down
14:33:30 <liuyulong_> I have no details about the FIP events now, but according to my experiences, the floating IP will finally get disassociated.
14:33:37 <haleyb> there was one more new bug
14:33:39 <haleyb> https://bugs.launchpad.net/neutron/+bug/1839004
14:33:40 <openstack> Launchpad bug 1839004 in neutron "Rocky DVR-SNAT seems missing entries for conntrack marking" [Undecided,Incomplete]
14:34:30 <haleyb> don't know if tidwellr is here, but this looked like maybe a mis-configuration
14:34:48 * tidwellr is lurking
14:35:48 <haleyb> tidwellr: hi, and this involved dynamic-routing too, any thoughts based on last update?
14:36:33 <tidwellr> there's still something to look into in that bug, supposedly there was an address scope mismatch and yet the API was reporting that it was finding a next-hop for the tenant subnet
14:36:43 <tidwellr> that doesn't seem right
14:37:57 <haleyb> tidwellr: it could be correct if snat was enabled though i think, but he had it disabled
14:40:16 <tidwellr> you can see that the uplink subnet is in the null address scope
14:40:57 <tidwellr> then, he shows neutron-dynamic-routing returning a next-hop for his tenant subnet
14:41:23 <tidwellr> that should only happen if the inside and outside subnets are both in the same address scope
14:41:25 <tidwellr> so
14:41:48 <tidwellr> either the info in the bug report is inaccurate, or there is still a bug
14:42:30 <haleyb> so is that a bug in neutron or dynamic-routing?
14:42:50 <tidwellr> assuming the steps in the bug report are accurate, yes
14:43:42 <tidwellr> not sure how that would slip through the tests though, I would have to really look closely at that
14:43:48 <haleyb> yes a bug in dynamic-routing?
14:44:05 <tidwellr> yes
14:44:36 <tidwellr> he ran "openstack bgp speaker list advertised routes" and it returned routes it shouldn't have had
14:44:58 <haleyb> tidwellr: should we re-assign?  as the scoping issue looks like user error
14:46:02 <tidwellr> I'll leave a comment, then try to reproduce this myself. It seems pretty straight forward given his instructions in the bug report
14:46:25 <haleyb> tidwellr: thanks
14:46:54 <haleyb> liuyulong_: i didn't have any other new bugs, did you have old ones you wanted to talk about?  or anyone else?
14:47:20 <liuyulong_> Yes, I have
14:48:05 <liuyulong_> For the fix: https://review.opendev.org/#/c/673557/ and the https://bugs.launchpad.net/neutron/+bug/1834308
14:48:06 <openstack> Launchpad bug 1834308 in neutron "[DVR][DB] too many slow query during agent restart" [Medium,In progress] - Assigned to LIU Yulong (dragon889)
14:49:49 <liuyulong_> It is well tested locally. It does not break DVR functions. But I still hope to see if more test result can come from our community.
14:50:27 <liuyulong_> The next is: https://bugs.launchpad.net/neutron/+bug/1828494
14:50:28 <openstack> Launchpad bug 1828494 in neutron "[RFE][L3] l3-agent should have its capacity" [Wishlist,In progress] - Assigned to LIU Yulong (dragon889)
14:51:02 <liuyulong_> I have one question, how many router do you guys think a network node can host? 100? 200? 1000?
14:51:32 <ralonsoh> no idea
14:51:41 <tidwellr> when I ran network nodes in production, we capped it at 250
14:51:41 <haleyb> liuyulong_: there is no easy answer for that
14:52:02 <haleyb> i think the limiting factor always seems to be how long it takes to restart the agents
14:52:13 <tidwellr> there are a lot of unique factors that go into that number we came up with though
14:52:53 <tidwellr> and when I say we capped it, I mean we would add network nodes and rebalance routers to spread the load
14:53:15 <liuyulong_> I have one result, when the router reach 300+, the ovs-agent will never restart successfully.
14:54:10 <tidwellr> that's consistent with what I've observed (anecdotally)
14:54:43 <liuyulong_> My env is 17 physical hosts for dvr_snat nodes, with 2700+ router, disable DHCP.
14:54:58 <liuyulong_> Every ovs-agent will host about 1700+ ports!
14:55:30 <liuyulong_> Yes, I've tested 400+ ports for a ovs-agent once, it is about 40+ mins to restart.
14:57:04 <liuyulong_> Ovs-agent seems can be easily stuck in many code path....
14:58:48 <haleyb> that is too long of course, should it be on the performance sub-team's list ?
15:00:05 <liuyulong_> Should be, make sense
15:01:02 <haleyb> liuyulong_: we're at time
15:01:17 <liuyulong_> OK
15:01:22 <liuyulong_> Let's end here.
15:01:33 <ralonsoh> bye
15:01:40 <liuyulong_> This nick does not have that right.
15:01:51 <liuyulong_> haleyb, please end our meeting, thank you.
15:02:00 <haleyb> #endmeeting