14:00:47 <liuyulong> #startmeeting neutron_l3 14:00:48 <openstack> Meeting started Wed Sep 23 14:00:47 2020 UTC and is due to finish in 60 minutes. The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:49 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:51 <openstack> The meeting name has been set to 'neutron_l3' 14:01:23 <slaweq> hi 14:02:33 <liuyulong> hi 14:03:50 <liuyulong> No announcements from me today, so maybe we can directly goto the Bug section to cut the meeting time. 14:04:44 <ralonsoh> hi 14:04:49 <liuyulong> OK, no objection, : ) 14:04:59 <liuyulong> #topic Bugs 14:05:06 <liuyulong> ralonsoh, hi 14:05:16 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017254.html 14:05:23 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017432.html 14:05:41 <liuyulong> These are the bug lists from our deputy. 14:05:58 <liuyulong> First one 14:05:59 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1895950 14:06:02 <openstack> Launchpad bug 1895950 in neutron "keepalived can't perform failover if the l3 agent is down" [Medium,Won't fix] 14:06:57 <slaweq> I don't understand why You marked it as won't fix 14:07:05 <liuyulong> I replied to this last week, the L3 agent should be alive during the HA router state change. 14:07:15 <slaweq> IMO if we will move bringing interfaces to be up to neutron-state-change-monitor process it should works 14:07:22 <liuyulong> After the patch https://review.opendev.org/#/c/707406/ 14:07:56 <slaweq> but that is regression introduced by this patch 14:07:59 <slaweq> isn't it? 14:08:23 <slaweq> small but IMHO still regression 14:09:02 <liuyulong> Because the running state-agent process can not do that work if you do not re-spawn it. 14:09:33 <slaweq> isn't it respawned if You restart L3 agent? 14:10:49 <ralonsoh> no 14:10:56 <liuyulong> I'm not sure, but from my experience, the state change process will run as it is. 14:11:08 <ralonsoh> if the keepalived-state-change process is running, is not rebooted 14:11:28 <ralonsoh> but if reload_cfg if enabled, then we'll send SIGHUP 14:12:01 <ralonsoh> (reload_cfg is false when restarting l3 agent) 14:12:08 <ralonsoh> so no, we don't restart it 14:12:10 <slaweq> ok, so maybe we can add bringing interfaces to be up/down to the state-change process and keep it in l3 agent for 1 cycle 14:12:16 <liuyulong> It reloads the config options, not the python process. 14:12:17 <slaweq> later remove it from the l3 agent 14:12:26 <slaweq> or maybe 2 cycles 14:12:34 <slaweq> and add e.g. release note about that 14:12:36 <slaweq> idk 14:13:03 <ralonsoh> one question: if the l3 agent is down, how this host will become master? 14:13:13 <slaweq> keepalived can still be running 14:13:17 <slaweq> and it can failover 14:13:30 <slaweq> but l3 agent will not bring interfaces up on new master node 14:13:39 <ralonsoh> yeah, that was my question 14:13:41 <liuyulong> The DB state updating still needs L3 agent alive. 14:13:45 <ralonsoh> ^ 14:13:58 <slaweq> I know that 14:14:19 <slaweq> but still IMO would be better to have working dataplane even in case when L3 agent is down for some reason 14:14:32 <liuyulong> Actually L3 agent must run during HA router failover, it is designed by this. (not me, but it is) : ) 14:15:04 <slaweq> liuyulong: before Your patch even? 14:15:55 <liuyulong> No, I mean HA state change workflow has something related to L3 agent. It needs L3 agent to do some work. 14:16:33 <liuyulong> Not the gateway, but something like RA, DB state, config state and so on. 14:17:28 <liuyulong> But, it's fine to add the gateway UP action to the state-change process. 14:17:32 <liuyulong> I'm fine with it. 14:17:32 <slaweq> ok, lets keep this bug as won't fix for now 14:18:01 <slaweq> and maybe check/update docs to be clear about that there 14:21:44 <liuyulong> Sorry, bad connection 14:21:56 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1894843 14:21:59 <openstack> Launchpad bug 1894843 in neutron "[dvr_snat] Router update deletes rfp interface from qrouter even when VM port is present on this host" [Medium,New] 14:23:04 <liuyulong> I have no idea why set "dvr_snat" on every hypervisor? Should it be "dvr"? 14:24:09 <ralonsoh> dvr_snat should be only on network controllers 14:24:12 <slaweq> we are using dvr_snat e.g. in our gates 14:24:26 <liuyulong> L3 agent in "dvr_snat" with mixed compute service does not work fine from my personal experiences. 14:24:26 <slaweq> and that possible can cause some failures in dvr multinode jobs maybe 14:24:34 <slaweq> (idk for sure but just guessing) 14:25:41 <liuyulong> IMO, this should be documented well, users should not deploy their cloud like this. 14:26:47 <liuyulong> IMO, there are no much agent mode check for "dvr_snat" during the router processing. 14:27:45 <liuyulong> We have consensus that the "dvr_snat" is for those centralized network node (functions) which can not be distributed. 14:29:21 <liuyulong> So, my advice for this bug/user is to change the config options. 14:29:30 <ralonsoh> agree 14:30:04 <liuyulong> The final cloud deployment should be in two scenario: 14:30:47 <liuyulong> 1. their compute nodes have ability to external network (internet), so the compute node set the L3 agent mode to "dvr". 14:31:26 <liuyulong> 2. compute node can not reach the Internet, set the agent mode to "dvr_no_external" 14:32:14 <liuyulong> 3. centralized network nodes should be run dedicated physical hosts, and the L3 agent mode is "dvr_snat". 14:35:01 <liuyulong> OK, no more bugs from me 14:36:36 <liuyulong> OK, let's move on 14:36:37 <liuyulong> #topic On demand agenda 14:38:23 <ralonsoh> nothing from me 14:38:24 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1895972 14:38:35 <openstack> liuyulong: Error: Could not gather data from Launchpad for bug #1895972 (https://launchpad.net/bugs/1895972). The error has been logged 14:39:04 <liuyulong> Another gap is filling... Congrats! 14:39:44 <ralonsoh> this feature is ongoing but yes! 14:41:56 <liuyulong> There are C works, so it is one example of fullstack development process for OVN feature. 14:42:26 <liuyulong> Python works are not started. 14:42:44 <liuyulong> #link https://review.opendev.org/#/c/738551/ 14:43:04 <liuyulong> slaweq, hi, I've replied the comments. 14:43:43 <liuyulong> I've tested it from my local devstack environment for a while. 14:43:52 <ralonsoh> and what is happening with https://review.opendev.org/#/c/731446 14:43:53 <ralonsoh> ? 14:44:07 <ralonsoh> superseded by yours, I think so 14:44:15 <liuyulong> I cannot say I covered every cases, but those I noticed and experienced. 14:44:51 <liuyulong> ralonsoh, yep, it has 2 closes bugs. 14:46:54 <slaweq> liuyulong: ok, I will check that 14:47:33 <liuyulong> But with some deep thinking, after these flows refactor or rediect (some works else), IMO the entire flow structure may have a chance to redesign in someday. 14:48:00 <liuyulong> It could be a long story. Just forget it. : ) 14:48:09 <ZhuXiaoYu> Oh, I wonder why https://review.opendev.org/#/c/731446 is not approved too 14:50:10 <ZhuXiaoYu> would you give an explanation? 14:50:39 <ZhuXiaoYu> I will tell Li YaJie later 14:51:14 <liuyulong> Please take look at the inline comments in gerrit, and the meeting LOG here. : ) 14:51:31 <liuyulong> OK, no more talks from me now. 14:51:42 <liuyulong> I will left 1 or 2 mins here. 14:52:45 <ZhuXiaoYu> https://review.opendev.org/#/c/743661/ 14:52:58 <ZhuXiaoYu> my patch for ecmp 14:53:52 <ZhuXiaoYu> I really hope it can be 'merged' 14:55:37 <liuyulong> It's feature freeze now, IMO, it should be moved to next dev cycle. 14:55:58 <liuyulong> Wait... 14:57:07 <ZhuXiaoYu> ..so when is the next dev cycle? 14:57:16 <liuyulong> #link http://eavesdrop.openstack.org/meetings/networking/2020/networking.2020-09-15-14.00.log.html#l-13 14:57:21 <liuyulong> #link https://launchpad.net/neutron/+milestone/victoria-3 14:57:59 <liuyulong> If this was not in the V-3 list, it will not be merged for now. 14:58:17 <liuyulong> Sorry, I cannot open the launchpad.net for now. 14:58:18 <slaweq> ZhuXiaoYu: yes, we are in the RC-1 week now 14:58:38 <slaweq> so we can merge this patch after rc-1 will be released and we will have stable/victoria branch created already 14:59:25 <ZhuXiaoYu> got it, really thx for tell me that, it's helpful 14:59:34 <liuyulong> I will start another round review of the spec https://review.opendev.org/#/c/729532 this week. 15:00:11 <liuyulong> Time is up. 15:00:15 <liuyulong> Thank you guys. 15:00:16 <liuyulong> Bye 15:00:21 <ZhuXiaoYu> Bye 15:00:25 <slaweq> thx 15:00:26 <liuyulong> #endmeeting