14:00:47 <liuyulong> #startmeeting neutron_l3
14:00:48 <openstack> Meeting started Wed Sep 23 14:00:47 2020 UTC and is due to finish in 60 minutes.  The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:49 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:51 <openstack> The meeting name has been set to 'neutron_l3'
14:01:23 <slaweq> hi
14:02:33 <liuyulong> hi
14:03:50 <liuyulong> No announcements from me today, so maybe we can directly goto the Bug section to cut the meeting time.
14:04:44 <ralonsoh> hi
14:04:49 <liuyulong> OK, no objection, : )
14:04:59 <liuyulong> #topic Bugs
14:05:06 <liuyulong> ralonsoh, hi
14:05:16 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017254.html
14:05:23 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017432.html
14:05:41 <liuyulong> These are the bug lists from our deputy.
14:05:58 <liuyulong> First one
14:05:59 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1895950
14:06:02 <openstack> Launchpad bug 1895950 in neutron "keepalived can't perform failover if the l3 agent is down" [Medium,Won't fix]
14:06:57 <slaweq> I don't understand why You marked it as won't fix
14:07:05 <liuyulong> I replied to this last week, the L3 agent should be alive during the HA router state change.
14:07:15 <slaweq> IMO if we will move bringing interfaces to be up to neutron-state-change-monitor process it should works
14:07:22 <liuyulong> After the patch https://review.opendev.org/#/c/707406/
14:07:56 <slaweq> but that is regression introduced by this patch
14:07:59 <slaweq> isn't it?
14:08:23 <slaweq> small but IMHO still regression
14:09:02 <liuyulong> Because the running state-agent process can not do that work if you do not re-spawn it.
14:09:33 <slaweq> isn't it respawned if You restart L3 agent?
14:10:49 <ralonsoh> no
14:10:56 <liuyulong> I'm not sure, but from my experience, the state change process will run as it is.
14:11:08 <ralonsoh> if the keepalived-state-change process is running, is not rebooted
14:11:28 <ralonsoh> but if reload_cfg if enabled, then we'll send SIGHUP
14:12:01 <ralonsoh> (reload_cfg is false when restarting l3 agent)
14:12:08 <ralonsoh> so no, we don't restart it
14:12:10 <slaweq> ok, so maybe we can add bringing interfaces to be up/down to the state-change process and keep it in l3 agent for 1 cycle
14:12:16 <liuyulong> It reloads the config options, not the python process.
14:12:17 <slaweq> later remove it from the l3 agent
14:12:26 <slaweq> or maybe 2 cycles
14:12:34 <slaweq> and add e.g. release note about that
14:12:36 <slaweq> idk
14:13:03 <ralonsoh> one question: if the l3 agent is down, how this host will become master?
14:13:13 <slaweq> keepalived can still be running
14:13:17 <slaweq> and it can failover
14:13:30 <slaweq> but l3 agent will not bring interfaces up on new master node
14:13:39 <ralonsoh> yeah, that was my question
14:13:41 <liuyulong> The DB state updating still needs L3 agent alive.
14:13:45 <ralonsoh> ^
14:13:58 <slaweq> I know that
14:14:19 <slaweq> but still IMO would be better to have working dataplane even in case when L3 agent is down for some reason
14:14:32 <liuyulong> Actually L3 agent must run during HA router failover, it is designed by this. (not me, but it is) : )
14:15:04 <slaweq> liuyulong: before Your patch even?
14:15:55 <liuyulong> No, I mean HA state change workflow has something related to L3 agent. It needs L3 agent to do some work.
14:16:33 <liuyulong> Not the gateway, but something like RA, DB state, config state and so on.
14:17:28 <liuyulong> But, it's fine to add the gateway UP action to the state-change process.
14:17:32 <liuyulong> I'm fine with it.
14:17:32 <slaweq> ok, lets keep this bug as won't fix for now
14:18:01 <slaweq> and maybe check/update docs to be clear about that there
14:21:44 <liuyulong> Sorry, bad connection
14:21:56 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1894843
14:21:59 <openstack> Launchpad bug 1894843 in neutron "[dvr_snat] Router update deletes rfp interface from qrouter even when VM port is present on this host" [Medium,New]
14:23:04 <liuyulong> I have no idea why set "dvr_snat" on every hypervisor? Should it be "dvr"?
14:24:09 <ralonsoh> dvr_snat should be only on network controllers
14:24:12 <slaweq> we are using dvr_snat e.g. in our gates
14:24:26 <liuyulong> L3 agent in "dvr_snat" with mixed compute service does not work fine from my personal experiences.
14:24:26 <slaweq> and that possible can cause some failures in dvr multinode jobs maybe
14:24:34 <slaweq> (idk for sure but just guessing)
14:25:41 <liuyulong> IMO, this should be documented well, users should not deploy their cloud like this.
14:26:47 <liuyulong> IMO, there are no much agent mode check for "dvr_snat" during the router processing.
14:27:45 <liuyulong> We have consensus that the "dvr_snat" is for those centralized network node (functions) which can not be distributed.
14:29:21 <liuyulong> So, my advice for this bug/user is to change the config options.
14:29:30 <ralonsoh> agree
14:30:04 <liuyulong> The final cloud deployment should be in two scenario:
14:30:47 <liuyulong> 1. their compute nodes have ability to external network (internet), so the compute node set the L3 agent mode to "dvr".
14:31:26 <liuyulong> 2. compute node can not reach the Internet, set the agent mode to "dvr_no_external"
14:32:14 <liuyulong> 3. centralized network nodes should be run dedicated physical hosts, and the L3 agent mode is "dvr_snat".
14:35:01 <liuyulong> OK, no more bugs from me
14:36:36 <liuyulong> OK, let's move on
14:36:37 <liuyulong> #topic On demand agenda
14:38:23 <ralonsoh> nothing from me
14:38:24 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1895972
14:38:35 <openstack> liuyulong: Error: Could not gather data from Launchpad for bug #1895972 (https://launchpad.net/bugs/1895972). The error has been logged
14:39:04 <liuyulong> Another gap is filling... Congrats!
14:39:44 <ralonsoh> this feature is ongoing but yes!
14:41:56 <liuyulong> There are C works, so it is one example of fullstack development process for OVN feature.
14:42:26 <liuyulong> Python works are not started.
14:42:44 <liuyulong> #link https://review.opendev.org/#/c/738551/
14:43:04 <liuyulong> slaweq, hi, I've replied the comments.
14:43:43 <liuyulong> I've tested it from my local devstack environment for a while.
14:43:52 <ralonsoh> and what is happening with https://review.opendev.org/#/c/731446
14:43:53 <ralonsoh> ?
14:44:07 <ralonsoh> superseded by yours, I think so
14:44:15 <liuyulong> I cannot say I covered every cases, but those I noticed and experienced.
14:44:51 <liuyulong> ralonsoh, yep, it has 2 closes bugs.
14:46:54 <slaweq> liuyulong: ok, I will check that
14:47:33 <liuyulong> But with some deep thinking, after these flows refactor or rediect (some works else), IMO the entire flow structure may have a chance to redesign in someday.
14:48:00 <liuyulong> It could be a long story. Just forget it. : )
14:48:09 <ZhuXiaoYu> Oh, I wonder why  https://review.opendev.org/#/c/731446 is not approved too
14:50:10 <ZhuXiaoYu> would you give an explanation?
14:50:39 <ZhuXiaoYu> I will tell Li YaJie later
14:51:14 <liuyulong> Please take look at the inline comments in gerrit, and the meeting LOG here. : )
14:51:31 <liuyulong> OK, no more talks from me now.
14:51:42 <liuyulong> I will left 1 or 2 mins here.
14:52:45 <ZhuXiaoYu> https://review.opendev.org/#/c/743661/
14:52:58 <ZhuXiaoYu> my patch for ecmp
14:53:52 <ZhuXiaoYu> I really hope it can be 'merged'
14:55:37 <liuyulong> It's feature freeze now, IMO, it should be moved to next dev cycle.
14:55:58 <liuyulong> Wait...
14:57:07 <ZhuXiaoYu> ..so when is the next dev cycle?
14:57:16 <liuyulong> #link http://eavesdrop.openstack.org/meetings/networking/2020/networking.2020-09-15-14.00.log.html#l-13
14:57:21 <liuyulong> #link https://launchpad.net/neutron/+milestone/victoria-3
14:57:59 <liuyulong> If this was not in the V-3 list, it will not be merged for now.
14:58:17 <liuyulong> Sorry, I cannot open the launchpad.net for now.
14:58:18 <slaweq> ZhuXiaoYu: yes, we are in the RC-1 week now
14:58:38 <slaweq> so we can merge this patch after rc-1 will be released and we will have stable/victoria branch created already
14:59:25 <ZhuXiaoYu> got it, really thx for tell me that, it's helpful
14:59:34 <liuyulong> I will start another round review of the spec https://review.opendev.org/#/c/729532 this week.
15:00:11 <liuyulong> Time is up.
15:00:15 <liuyulong> Thank you guys.
15:00:16 <liuyulong> Bye
15:00:21 <ZhuXiaoYu> Bye
15:00:25 <slaweq> thx
15:00:26 <liuyulong> #endmeeting