14:01:12 #startmeeting neutron_l3 14:01:13 Meeting started Wed Jan 8 14:01:12 2020 UTC and is due to finish in 60 minutes. The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:16 The meeting name has been set to 'neutron_l3' 14:01:47 #chair liuyulong_ 14:01:48 Current chairs: liuyulong liuyulong_ 14:02:09 Happy new year everyone! 14:03:26 #topic Announcements 14:03:51 #link https://launchpad.net/neutron/+milestone/ussuri-2 14:04:27 Expected: 2020-02-12 14:05:46 There will be about 10 days holidays for Chinese New Year this month. 14:06:56 Someone may not online, so time is running out... 14:07:14 hi 14:07:30 hi 14:08:09 #link https://bugs.launchpad.net/neutron/+bug/1858419 14:08:09 Launchpad bug 1858419 in neutron "Docs needed for tunables at large scale" [Undecided,Confirmed] 14:08:38 Slawek asked me something in mail about this large scale cloud. 14:08:56 #link https://bugs.launchpad.net/neutron/+bug/1858419/comments/1 14:09:11 Allow me to say something here 14:09:22 This could be a really long story. 14:10:00 Config option tunning may have a lot choices. 14:10:42 But neutron itself still have some architecture defect, which may not be resolved by configuration. 14:10:44 hi 14:10:48 sorry for being late 14:11:27 As you may see in the comment #1, we did some local works for neutron itself. 14:11:51 liuyulong: I know that we can't solve everything by config options 14:11:52 (Some of them was talked during Shanghai PTG.) 14:12:53 but it's rather more about identyfing options which are crucial for large scale and to add some note for some options that e.g. "setting this to high/load value may have impact on large scale because it will make huge load on rabbitmq" (it's just an example for non existing option now :)) 14:14:04 Yes, we can start in such way. 14:14:40 Anyway, I will share some config tunning running in our cloud deployment. 14:15:09 liuyulong: thx a lot 14:15:52 OK, let's move on. 14:15:55 #topic Bugs 14:16:09 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011831.html 14:16:38 And this I guess: 14:16:42 #link http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011766.html 14:17:20 May be also this: 14:17:22 #link http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011751.html 14:17:34 OK, first one: 14:17:49 #link https://bugs.launchpad.net/neutron/+bug/1858086 14:17:49 Launchpad bug 1858086 in neutron "qrouter's local link route cannot be restored " [Medium,Confirmed] 14:18:16 This should be an API leak for the user input check. 14:19:02 We should not allow user to add some route destination CIDR which overlaps the subnet. 14:19:45 There are too many potential risks for DVR related traffic. 14:19:50 yes, i thought i was reading that wrong but how can you add a route to a local subnet via a non-local IP ? 14:21:42 It is router route-add action? 14:22:19 Not the subnet static route, right? 14:24:42 it's "extra-route" but I'm not sure what action is called on server side for it 14:24:56 on client's side You do "neutron router-update --extra-route" 14:26:38 Yes, "openstack router set --route destination=,gateway=]" 14:28:17 Such overlap should not be allowed. 14:30:44 This is obvious, when you add an IP address to your host, the system will add a default on-link route for it. 14:32:19 That means "this subnet is directly accessible.", change it does not make any sense in most scenario. 14:33:04 But by the way, the bug reporter said neutron does not recover that route automatically. 14:33:11 i would tend to agree, actually surprised it didn't throw an exception when adding it 14:33:52 This can be another view of the bug, since neutron does not handle such on-link route in the qrouter namespace when it is directly accessible. 14:35:28 So, I think it's OK to terminate it at the very beginning of API. 14:35:48 sounds good for me 14:35:58 OK, next one. 14:36:01 #link https://bugs.launchpad.net/neutron/+bug/1857422 14:36:01 Launchpad bug 1857422 in neutron "neutron-keepalived-state-change and keeplived cannot be cleanup for those routers which is deleted during l3-agent died" [Undecided,New] 14:38:19 Firstly, because the L3-agent is dead, so the "delete RPC" will not be processed, this could be a reason why the processed remained. 14:40:06 if i'm remembering correctly, the l3-agent should clean-up the namespace(s) at the end of it's sync, but is it just not cleaning keepalived stuff because it didn't know that the associated router was ha ? 14:40:37 But we did encounter similar phenomena in our own deployment when L3 agent is alive. The "neutron-keepalived-state-change" and "radvd" processes sometimes remain when routers were deleted. 14:42:14 is this the same thing? 14:42:29 haleyb, I'm not sure, maybe the user's L3-agent is just dead too long time to re-process the delete RPC. 14:43:29 haleyb, no, just some similar phenomena. 14:43:34 right, if for example it didn't get the RPC, that's when the resources get orphaned? 14:44:40 Yes, according to the "reproduction steps" in the bug description. 14:46:44 i guess it seems like a valid bug 14:47:32 If we need to cover this situation, the L3-agent may need a persistent cache to distinguish which router was delete during the down time. And then starts the delete procedure for the stale routers. 14:50:02 And I still have questions, the router namespace, meta-proxy and radvd process will remain too? Or just neutron-keepalived-state-change and keeplived ? 14:50:55 at the end of sync, the l3-agent should have cleaned the router namespace 14:51:05 initial sync at startup that is 14:53:50 And +1 to Miguel's comment, if this is not seen in the production environment, then it is contrived. : ) https://bugs.launchpad.net/neutron/+bug/1857422/comments/2 14:53:50 Launchpad bug 1857422 in neutron "neutron-keepalived-state-change and keeplived cannot be cleanup for those routers which is deleted during l3-agent died" [Undecided,New] 14:54:51 Last one: 14:54:54 #link https://bugs.launchpad.net/neutron/+bug/1856839 14:54:54 Launchpad bug 1856839 in neutron "[L3] router processing time increase if there are large set ports" [Medium,In progress] - Assigned to LIU Yulong (dragon889) 14:55:13 Code is here: https://review.opendev.org/701077 14:55:29 It is an optimization for large scale cloud. : ) 14:57:14 I would also like to ask You for review https://review.opendev.org/#/c/700011/ if You will have some time 14:58:16 We are running out of time, maybe you can leave the comment in the gerrit. 14:59:18 Alright, let's end here. 14:59:26 #endmeeting