14:01:12 <liuyulong> #startmeeting neutron_l3 14:01:13 <openstack> Meeting started Wed Jan 8 14:01:12 2020 UTC and is due to finish in 60 minutes. The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:16 <openstack> The meeting name has been set to 'neutron_l3' 14:01:47 <liuyulong> #chair liuyulong_ 14:01:48 <openstack> Current chairs: liuyulong liuyulong_ 14:02:09 <liuyulong> Happy new year everyone! 14:03:26 <liuyulong> #topic Announcements 14:03:51 <liuyulong> #link https://launchpad.net/neutron/+milestone/ussuri-2 14:04:27 <liuyulong> Expected: 2020-02-12 14:05:46 <liuyulong> There will be about 10 days holidays for Chinese New Year this month. 14:06:56 <liuyulong> Someone may not online, so time is running out... 14:07:14 <haleyb> hi 14:07:30 <liuyulong> hi 14:08:09 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1858419 14:08:09 <openstack> Launchpad bug 1858419 in neutron "Docs needed for tunables at large scale" [Undecided,Confirmed] 14:08:38 <liuyulong> Slawek asked me something in mail about this large scale cloud. 14:08:56 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1858419/comments/1 14:09:11 <liuyulong> Allow me to say something here 14:09:22 <liuyulong> This could be a really long story. 14:10:00 <liuyulong> Config option tunning may have a lot choices. 14:10:42 <liuyulong> But neutron itself still have some architecture defect, which may not be resolved by configuration. 14:10:44 <slaweq> hi 14:10:48 <slaweq> sorry for being late 14:11:27 <liuyulong> As you may see in the comment #1, we did some local works for neutron itself. 14:11:51 <slaweq> liuyulong: I know that we can't solve everything by config options 14:11:52 <liuyulong> (Some of them was talked during Shanghai PTG.) 14:12:53 <slaweq> but it's rather more about identyfing options which are crucial for large scale and to add some note for some options that e.g. "setting this to high/load value may have impact on large scale because it will make huge load on rabbitmq" (it's just an example for non existing option now :)) 14:14:04 <liuyulong> Yes, we can start in such way. 14:14:40 <liuyulong> Anyway, I will share some config tunning running in our cloud deployment. 14:15:09 <slaweq> liuyulong: thx a lot 14:15:52 <liuyulong> OK, let's move on. 14:15:55 <liuyulong> #topic Bugs 14:16:09 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011831.html 14:16:38 <liuyulong> And this I guess: 14:16:42 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011766.html 14:17:20 <liuyulong> May be also this: 14:17:22 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011751.html 14:17:34 <liuyulong> OK, first one: 14:17:49 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1858086 14:17:49 <openstack> Launchpad bug 1858086 in neutron "qrouter's local link route cannot be restored " [Medium,Confirmed] 14:18:16 <liuyulong> This should be an API leak for the user input check. 14:19:02 <liuyulong> We should not allow user to add some route destination CIDR which overlaps the subnet. 14:19:45 <liuyulong> There are too many potential risks for DVR related traffic. 14:19:50 <haleyb> yes, i thought i was reading that wrong but how can you add a route to a local subnet via a non-local IP ? 14:21:42 <liuyulong> It is router route-add action? 14:22:19 <liuyulong> Not the subnet static route, right? 14:24:42 <slaweq> it's "extra-route" but I'm not sure what action is called on server side for it 14:24:56 <slaweq> on client's side You do "neutron router-update --extra-route" 14:26:38 <liuyulong> Yes, "openstack router set --route destination=<subnet>,gateway=<ip-address>]" 14:28:17 <liuyulong> Such overlap should not be allowed. 14:30:44 <liuyulong> This is obvious, when you add an IP address to your host, the system will add a default on-link route for it. 14:32:19 <liuyulong> That means "this subnet is directly accessible.", change it does not make any sense in most scenario. 14:33:04 <liuyulong> But by the way, the bug reporter said neutron does not recover that route automatically. 14:33:11 <haleyb> i would tend to agree, actually surprised it didn't throw an exception when adding it 14:33:52 <liuyulong> This can be another view of the bug, since neutron does not handle such on-link route in the qrouter namespace when it is directly accessible. 14:35:28 <liuyulong> So, I think it's OK to terminate it at the very beginning of API. 14:35:48 <slaweq> sounds good for me 14:35:58 <liuyulong> OK, next one. 14:36:01 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1857422 14:36:01 <openstack> Launchpad bug 1857422 in neutron "neutron-keepalived-state-change and keeplived cannot be cleanup for those routers which is deleted during l3-agent died" [Undecided,New] 14:38:19 <liuyulong> Firstly, because the L3-agent is dead, so the "delete RPC" will not be processed, this could be a reason why the processed remained. 14:40:06 <haleyb> if i'm remembering correctly, the l3-agent should clean-up the namespace(s) at the end of it's sync, but is it just not cleaning keepalived stuff because it didn't know that the associated router was ha ? 14:40:37 <liuyulong> But we did encounter similar phenomena in our own deployment when L3 agent is alive. The "neutron-keepalived-state-change" and "radvd" processes sometimes remain when routers were deleted. 14:42:14 <haleyb> is this the same thing? 14:42:29 <liuyulong> haleyb, I'm not sure, maybe the user's L3-agent is just dead too long time to re-process the delete RPC. 14:43:29 <liuyulong> haleyb, no, just some similar phenomena. 14:43:34 <haleyb> right, if for example it didn't get the RPC, that's when the resources get orphaned? 14:44:40 <liuyulong> Yes, according to the "reproduction steps" in the bug description. 14:46:44 <haleyb> i guess it seems like a valid bug 14:47:32 <liuyulong> If we need to cover this situation, the L3-agent may need a persistent cache to distinguish which router was delete during the down time. And then starts the delete procedure for the stale routers. 14:50:02 <liuyulong> And I still have questions, the router namespace, meta-proxy and radvd process will remain too? Or just neutron-keepalived-state-change and keeplived ? 14:50:55 <haleyb> at the end of sync, the l3-agent should have cleaned the router namespace 14:51:05 <haleyb> initial sync at startup that is 14:53:50 <liuyulong> And +1 to Miguel's comment, if this is not seen in the production environment, then it is contrived. : ) https://bugs.launchpad.net/neutron/+bug/1857422/comments/2 14:53:50 <openstack> Launchpad bug 1857422 in neutron "neutron-keepalived-state-change and keeplived cannot be cleanup for those routers which is deleted during l3-agent died" [Undecided,New] 14:54:51 <liuyulong> Last one: 14:54:54 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1856839 14:54:54 <openstack> Launchpad bug 1856839 in neutron "[L3] router processing time increase if there are large set ports" [Medium,In progress] - Assigned to LIU Yulong (dragon889) 14:55:13 <liuyulong> Code is here: https://review.opendev.org/701077 14:55:29 <liuyulong> It is an optimization for large scale cloud. : ) 14:57:14 <slaweq> I would also like to ask You for review https://review.opendev.org/#/c/700011/ if You will have some time 14:58:16 <liuyulong> We are running out of time, maybe you can leave the comment in the gerrit. 14:59:18 <liuyulong> Alright, let's end here. 14:59:26 <liuyulong> #endmeeting