14:00:53 <liuyulong> #startmeeting neutron_l3 14:00:57 <openstack> Meeting started Wed Aug 7 14:00:53 2019 UTC and is due to finish in 60 minutes. The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:58 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:00 <openstack> The meeting name has been set to 'neutron_l3' 14:01:04 <haleyb> hi 14:01:10 <liuyulong> #chair haleyb 14:01:11 <openstack> Current chairs: haleyb liuyulong 14:01:13 <liuyulong> hi 14:01:13 <ralonsoh> hi 14:01:29 <liuyulong> #topic Announcements 14:03:17 <liuyulong> Unfortunately, the port forwarding topic was not accepted by the Summit Team. 14:04:01 <liuyulong> So mlavalle and I will not share this in the Summit. : ) 14:04:45 <liuyulong> Any other announcements? 14:05:33 <liuyulong> OK, let's move on. 14:05:41 <liuyulong> #topic Bugs 14:07:40 <haleyb> there were some new bugs filed last week in the l3 space 14:07:51 <liuyulong_> Bad network connection... 14:08:01 <liuyulong_> #link https://wiki.openstack.org/wiki/Network/Meetings#Bug_deputy 14:08:04 <haleyb> https://bugs.launchpad.net/neutron/+bug/1838699 14:08:05 <openstack> Launchpad bug 1838699 in neutron "Removing a subnet from DVR router also removes DVR MAC flows for other router on network" [High,Confirmed] 14:08:51 <haleyb> slaweq confirmed it, but it will need an owner 14:08:58 <ralonsoh> My comment yesterday was not registered... 14:09:00 <ralonsoh> in this bug 14:09:09 <liuyulong_> #link https://bugs.launchpad.net/neutron/+bug/1838697 14:09:10 <openstack> Launchpad bug 1838697 in neutron "DVR Mac conversion rules are only added for the first router a network is attached to" [Undecided,Incomplete] 14:09:14 <liuyulong_> this is also related. 14:09:33 <ralonsoh> if you have more than one DVR, the match flows will be the same 14:09:50 <ralonsoh> that means, when you deleted the flows, you'll delete all of them 14:09:53 <haleyb> liuyulong_: ack, makes sense think they were both filed by same person 14:10:05 <ralonsoh> (I'l write this comment again in the bug) 14:10:24 <haleyb> ralonsoh: thanks 14:10:46 <liuyulong_> ralonsoh, so it is designed like that, it is a feature? 14:10:58 <ralonsoh> I don't think so 14:11:11 <ralonsoh> that means we can't have more than one DVR per host 14:11:17 <ralonsoh> but I need confirmation 14:11:23 <ralonsoh> I'll write the comment again in the bug 14:12:19 <liuyulong_> OK, thank you, I read some code ealier, the ovs-agent will query the related flow by ports subnet. 14:12:56 <liuyulong_> If more than one dvr ports in same subnet, it will indeed delete once for all. 14:15:05 <liuyulong_> haleyb, I have bad network connection now, please take over the meeting chair. 14:15:13 <haleyb> ack 14:15:32 <haleyb> next bug, https://bugs.launchpad.net/neutron/+bug/1838793 14:15:33 <openstack> Launchpad bug 1838793 in neutron ""KeepalivedManagerTestCase" tests failing during namespace deletion" [High,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 14:15:46 <haleyb> https://review.opendev.org/#/c/674820/ was created - thanks ralonsoh 14:16:09 <ralonsoh> I need to check the CI again 14:16:51 <haleyb> ralonsoh: i've added myself to review so will look at next update 14:16:58 <ralonsoh> thanks! 14:18:11 <haleyb> next bug, https://bugs.launchpad.net/neutron/+bug/1838403 14:18:12 <openstack> Launchpad bug 1838403 in neutron "Asymmetric floating IP notifications" [Medium,New] 14:19:10 <haleyb> i had triaged this last week and couldn't reproduce part of it. see now it was on queens, so perhaps part was fixed 14:20:11 <haleyb> still needs owner to track down the other possible issued with notifications, if noone wants it i can take a look 14:20:20 <liuyulong_> How to "delete a router that still has fip"? 14:20:54 <haleyb> liuyulong_: right, it didn't work for me on master, but can't imagine it works on queens either 14:21:48 <haleyb> liuyulong_: the part we need to investigate is what happens with a VM is destroyed - is the floating IP in one of the messages? his trace showed it wasn't 14:22:52 <liuyulong_> If the VM port is delete, l3_db will catch port_delete notification and release the FIP. 14:24:20 <haleyb> liuyulong_: yes, but is that sending a notification? 14:25:49 * haleyb wonders how fast liuyulong_'s modem is :) (if anyone remembers what a modem is) 14:26:10 <ralonsoh> 28.8 bauds per sec 14:26:20 <haleyb> zoom zoom 14:26:34 <liuyulong_> Subscrition is more accurate. 14:26:47 <liuyulong_> Subscription 14:27:11 <haleyb> liuyulong_: oh, maybe they didn't subscribe to all the events? 14:27:52 <liuyulong_> https://github.com/openstack/neutron/blob/master/neutron/db/l3_db.py#L1848 14:28:04 <liuyulong_> This line it is. 14:30:04 <haleyb> liuyulong_: there is nothing there regarding floating IP though, maybe i'm mis-understanding 14:30:30 <liuyulong_> https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/plugin.py#L1941-L1943 14:31:11 <haleyb> liuyulong_: if i'm listening for FLOATINGIP events shouldn't i get one when a port with an associated floating IP is deleted? 14:32:58 <haleyb> either way, please add a comment to the bug so maybe the submitter can track things down 14:33:30 <liuyulong_> I have no details about the FIP events now, but according to my experiences, the floating IP will finally get disassociated. 14:33:37 <haleyb> there was one more new bug 14:33:39 <haleyb> https://bugs.launchpad.net/neutron/+bug/1839004 14:33:40 <openstack> Launchpad bug 1839004 in neutron "Rocky DVR-SNAT seems missing entries for conntrack marking" [Undecided,Incomplete] 14:34:30 <haleyb> don't know if tidwellr is here, but this looked like maybe a mis-configuration 14:34:48 * tidwellr is lurking 14:35:48 <haleyb> tidwellr: hi, and this involved dynamic-routing too, any thoughts based on last update? 14:36:33 <tidwellr> there's still something to look into in that bug, supposedly there was an address scope mismatch and yet the API was reporting that it was finding a next-hop for the tenant subnet 14:36:43 <tidwellr> that doesn't seem right 14:37:57 <haleyb> tidwellr: it could be correct if snat was enabled though i think, but he had it disabled 14:40:16 <tidwellr> you can see that the uplink subnet is in the null address scope 14:40:57 <tidwellr> then, he shows neutron-dynamic-routing returning a next-hop for his tenant subnet 14:41:23 <tidwellr> that should only happen if the inside and outside subnets are both in the same address scope 14:41:25 <tidwellr> so 14:41:48 <tidwellr> either the info in the bug report is inaccurate, or there is still a bug 14:42:30 <haleyb> so is that a bug in neutron or dynamic-routing? 14:42:50 <tidwellr> assuming the steps in the bug report are accurate, yes 14:43:42 <tidwellr> not sure how that would slip through the tests though, I would have to really look closely at that 14:43:48 <haleyb> yes a bug in dynamic-routing? 14:44:05 <tidwellr> yes 14:44:36 <tidwellr> he ran "openstack bgp speaker list advertised routes" and it returned routes it shouldn't have had 14:44:58 <haleyb> tidwellr: should we re-assign? as the scoping issue looks like user error 14:46:02 <tidwellr> I'll leave a comment, then try to reproduce this myself. It seems pretty straight forward given his instructions in the bug report 14:46:25 <haleyb> tidwellr: thanks 14:46:54 <haleyb> liuyulong_: i didn't have any other new bugs, did you have old ones you wanted to talk about? or anyone else? 14:47:20 <liuyulong_> Yes, I have 14:48:05 <liuyulong_> For the fix: https://review.opendev.org/#/c/673557/ and the https://bugs.launchpad.net/neutron/+bug/1834308 14:48:06 <openstack> Launchpad bug 1834308 in neutron "[DVR][DB] too many slow query during agent restart" [Medium,In progress] - Assigned to LIU Yulong (dragon889) 14:49:49 <liuyulong_> It is well tested locally. It does not break DVR functions. But I still hope to see if more test result can come from our community. 14:50:27 <liuyulong_> The next is: https://bugs.launchpad.net/neutron/+bug/1828494 14:50:28 <openstack> Launchpad bug 1828494 in neutron "[RFE][L3] l3-agent should have its capacity" [Wishlist,In progress] - Assigned to LIU Yulong (dragon889) 14:51:02 <liuyulong_> I have one question, how many router do you guys think a network node can host? 100? 200? 1000? 14:51:32 <ralonsoh> no idea 14:51:41 <tidwellr> when I ran network nodes in production, we capped it at 250 14:51:41 <haleyb> liuyulong_: there is no easy answer for that 14:52:02 <haleyb> i think the limiting factor always seems to be how long it takes to restart the agents 14:52:13 <tidwellr> there are a lot of unique factors that go into that number we came up with though 14:52:53 <tidwellr> and when I say we capped it, I mean we would add network nodes and rebalance routers to spread the load 14:53:15 <liuyulong_> I have one result, when the router reach 300+, the ovs-agent will never restart successfully. 14:54:10 <tidwellr> that's consistent with what I've observed (anecdotally) 14:54:43 <liuyulong_> My env is 17 physical hosts for dvr_snat nodes, with 2700+ router, disable DHCP. 14:54:58 <liuyulong_> Every ovs-agent will host about 1700+ ports! 14:55:30 <liuyulong_> Yes, I've tested 400+ ports for a ovs-agent once, it is about 40+ mins to restart. 14:57:04 <liuyulong_> Ovs-agent seems can be easily stuck in many code path.... 14:58:48 <haleyb> that is too long of course, should it be on the performance sub-team's list ? 15:00:05 <liuyulong_> Should be, make sense 15:01:02 <haleyb> liuyulong_: we're at time 15:01:17 <liuyulong_> OK 15:01:22 <liuyulong_> Let's end here. 15:01:33 <ralonsoh> bye 15:01:40 <liuyulong_> This nick does not have that right. 15:01:51 <liuyulong_> haleyb, please end our meeting, thank you. 15:02:00 <haleyb> #endmeeting