14:00:14 <liuyulong> #startmeeting neutron_l3
14:00:15 <openstack> Meeting started Wed Mar 11 14:00:14 2020 UTC and is due to finish in 60 minutes.  The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:18 <openstack> The meeting name has been set to 'neutron_l3'
14:00:24 <liuyulong> #topic Announcements
14:00:25 <ralonsoh> hi
14:00:33 <liuyulong> ralonsoh, hi
14:01:11 <liuyulong> #link http://eavesdrop.openstack.org/meetings/networking/2020/networking.2020-03-10-14.00.log.html#l-10
14:01:45 <slaweq> hi
14:02:02 <liuyulong> I guess we can skip this topic today, everything was talked yesterday.
14:02:20 <liuyulong> slaweq, hi
14:02:32 <liuyulong> OK, let's move on
14:02:36 <liuyulong> #topic Bugs
14:02:50 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-March/013177.html
14:03:09 <liuyulong> Bence was our bug deputy last week.
14:03:29 <liuyulong> So the first one:
14:03:44 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1866336
14:03:46 <openstack> Launchpad bug 1866336 in neutron "Binding of floating ip agent gateway port and agent_id isn't removed " [High,In progress] - Assigned to Slawek Kaplonski (slaweq)
14:03:55 <liuyulong> slaweq is working on this now.
14:04:17 <liuyulong> Code looks good to me, just one small concern about a DB query. : )
14:04:41 <liuyulong> I have left a comment on the gerrit:
14:05:06 <ralonsoh> I agree with your comment, but I'm still reviewing the patch
14:05:08 <liuyulong> #link https://review.opendev.org/#/c/711623/3/neutron/db/l3_dvr_db.py@371
14:05:09 <slaweq> liuyulong: thx, I will check it after today's meetings
14:05:48 <liuyulong> slaweq, yw
14:05:52 <liuyulong> OK, next one
14:06:08 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1866560
14:06:09 <openstack> Launchpad bug 1866560 in neutron "FIP Port forwarding description API extension don't work" [High,In progress] - Assigned to Slawek Kaplonski (slaweq)
14:06:29 <liuyulong> Also assigned to slaweq
14:07:13 <slaweq> yes, this one requires new neutron-lib version
14:07:14 <liuyulong> One small thing is we have 2 bps for this.
14:07:54 <ralonsoh> the correct one related this this RFE is https://blueprints.launchpad.net/neutron/+spec/portforwarding-description
14:08:50 <liuyulong> So since we are still in the developing cycle, I'd like to say we should add this bp link to the commit message to trace all the code.
14:09:13 <slaweq> liuyulong: ok, I will add link to BPs to this patch
14:10:20 <liuyulong> slaweq, sure, so both link are fine to me, you could choose any one you want. : )
14:10:44 <slaweq> I'm using https://blueprints.launchpad.net/neutron/+spec/fip-pf-description to track this feature
14:10:53 <slaweq> actually I wasn't aware of the other one :)
14:12:01 <liuyulong> Next one, we've discussed last week.
14:12:16 <liuyulong> during the LP bug scanning
14:12:18 <liuyulong> #link #https://bugs.launchpad.net/neutron/+bug/1865891
14:12:20 <openstack> Launchpad bug 1865891 in neutron "Race condition during removal of subnet from the router and removal of subnet" [Medium,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
14:13:04 <liuyulong> ralonsoh had left a potential solution
14:13:08 <slaweq> it's exactly like ralonsoh described
14:13:21 <ralonsoh> I tested this possible solution and works
14:13:33 <ralonsoh> I like to use the DB as a lock
14:13:47 <slaweq> ralonsoh: do You have patch for that already?
14:13:50 <ralonsoh> but, of course, this will add some extra time in the process
14:13:56 <ralonsoh> slaweq, I don't but I can propose it
14:14:08 <ralonsoh> do you want me to do this?
14:14:21 <slaweq> ralonsoh: if You can, that would be great
14:14:28 <ralonsoh> perfect
14:14:35 <slaweq> I did debugging process but I didn't propose any patch yet
14:15:04 <slaweq> and btw. it's IMO even Low instead of Medium bug
14:15:25 <slaweq> as it is very unlikly to happen in real world scenario
14:15:26 <liuyulong> ralonsoh, so every port creation will have to check this DB lock?
14:15:52 <ralonsoh> liuyulong, yes, the transaction will lock any other one modifying/deleting this register
14:16:04 <slaweq> ralonsoh: there is one thing which bothers me with this
14:16:13 <slaweq> is this only related to router's ports?
14:16:27 <slaweq> or can it be the same when we are creating "regular" ports
14:16:46 <slaweq> (I didn't try to reproduce it with port-create command)
14:17:01 <liuyulong> yes, if it is every port creation, nova may not happy during large sets VM booting.
14:17:03 <ralonsoh> let me check.... but this will work for any port requesting an IP from IPAM
14:18:02 <ralonsoh> yes, that will work for any port creation
14:18:10 <ralonsoh> the subnets will be locked during this process
14:18:58 <slaweq> ralonsoh: if You can push some PS with fix for it, that would be great
14:19:11 <ralonsoh> slaweq, sure, tomorrow
14:19:11 <slaweq> we can then maybe compare its performance impact on rally job
14:19:25 <slaweq> and see how much cost it has for us :)
14:19:32 <slaweq> what do You think?
14:19:43 <ralonsoh> rally is the best tool for this
14:20:14 <liuyulong> OK, but again, we should find out the balance between we fix that and warn the users not to do that.
14:20:23 <liuyulong> : )
14:20:32 <ralonsoh> well, that's difficult
14:20:41 <ralonsoh> you can always send a batch of commands
14:20:52 <ralonsoh> of course, this should be executed at the same time
14:21:07 <ralonsoh> but the MUST in our code is reliability
14:22:02 <slaweq> ralonsoh++
14:22:37 <liuyulong> Make sense
14:22:48 <liuyulong> OK, next one:
14:22:49 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1866077
14:22:50 <openstack> Launchpad bug 1866077 in neutron "[L3][IPv6] IPv6 traffic with DVR in compute host" [Undecided,Incomplete]
14:23:00 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1774463
14:23:00 <openstack> Launchpad bug 1774463 in neutron "RFE: Add support for IPv6 on DVR Routers for the Fast-path exit " [High,In progress] - Assigned to LIU Yulong (dragon889)
14:23:18 <liuyulong> It is related to the 1774463 as haleyb|away pointed.
14:24:44 <liuyulong> I filed the bug/1866077 is because we found that the IPv6 in dvr mode is missing some address scope mark which will block the traffic from the FIP-namespace.
14:26:00 <liuyulong> We have a fix of that, and interesting thing is Swami’s patch also does:
14:26:03 <liuyulong> #link https://review.opendev.org/#/c/662111/33/neutron/agent/l3/dvr_local_router.py@628
14:26:32 <liuyulong> So, I may want to take his patch and continue the work.
14:26:43 <slaweq> liuyulong: that's very old patch :)
14:27:03 <liuyulong> Yes, I've rebased that on the master branch.
14:27:47 <liuyulong> It is a really nice feature for IPv6 with DVR.
14:28:12 <liuyulong> IPv6 traffic can be distributed if users want to.
14:29:30 <liuyulong> And IPv6 could be the feature, maybe someday IPv4 support will be dropped like py27. : )
14:29:49 <slaweq> liuyulong: You wish :P
14:30:24 <liuyulong> OK, last one
14:30:25 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1866445
14:30:26 <openstack> Launchpad bug 1732067 in OpenStack Security Notes "duplicate for #1866445 openvswitch firewall flows cause flooding on integration bridge" [Undecided,New]
14:30:42 <liuyulong> It is more like a L2 bug。
14:30:58 <liuyulong> But the reporter left some thing related to L3 router.
14:30:59 <liuyulong> https://bugs.launchpad.net/neutron/+bug/1866445/comments/3
14:32:49 <liuyulong> I have no fully picture about this bug. Will need to have more information about this.
14:33:07 <ralonsoh> I need to check why Yi is claiming about the MAC learning
14:37:37 <liuyulong> OK, that's all bugs from me.
14:37:46 <slaweq> I want to talk again about one
14:38:02 <slaweq> https://bugs.launchpad.net/neutron/+bug/1859832
14:38:04 <openstack> Launchpad bug 1859832 in neutron "L3 HA connectivity to GW port can be broken after reboot of backup node" [Medium,In progress] - Assigned to LIU Yulong (dragon889)
14:38:04 <slaweq> :)
14:38:26 <slaweq> liuyulong: I was again looking at Your patch and testing it
14:39:08 <slaweq> and I still have worries that it may introduce new race conditions and regression during the failover
14:39:43 <slaweq> I wrote it in my reply to Your comment https://review.opendev.org/#/c/707406/9/neutron/agent/l3/keepalived_state_change.py@98
14:40:54 <liuyulong> The first one has a config option for keepalived, 'vrrp_garp_interval'.
14:41:10 <liuyulong> We talked about that last week.
14:41:48 <slaweq> yes, but it's interval between garp packets send by keepalived, right?
14:41:54 <liuyulong> But I'm not going to add it to this patch, it will be a follow-up one.
14:42:25 <slaweq> I still don't see how it may solve potential race condition between sending garps and setting interface to be up
14:44:48 <liuyulong> It has 3 times garp for HA router.
14:45:21 <slaweq> yes, but as I said in the reply to Your comment: first one will always fail
14:46:00 <slaweq> second one is after 10 seconds (so 10 seconds of datapath break, right?) and You have no any quarantee that interfaces will be already up then
14:46:18 <slaweq> and third one should be IMO removed from neutron-keepalived-state-change code :)
14:48:52 <liuyulong> The link up action for linux device is not time-consuming, so the first one has config vrrp_garp_interval=1 could be enough.
14:49:14 <liuyulong> The keepalived will send 5 times GARP when it hit the state chagne.
14:49:19 <liuyulong> s/change
14:49:47 <slaweq> and what about longer time of failover?
14:49:48 <liuyulong> So the first one fails due to link not up, wait one second, and next one.
14:50:04 <slaweq> ralonsoh: what do You think?
14:50:34 <ralonsoh> I still think that we should not rely on the garp
14:50:58 <ralonsoh> and using slaweq's patch but implementing liuyulong's linuxinterface change
14:51:02 <liuyulong> And more about this is the out side world has ARP.
14:51:16 <ralonsoh> we can have an easy architecture and we handle all backends
14:52:18 <slaweq> ralonsoh: now I'm a bit lost, I don't understand why we would need my patch with part of the liuyulong's change
14:52:35 <ralonsoh> https://review.opendev.org/#/c/707406/9/neutron/agent/linux/interface.py
14:52:46 <liuyulong> For the 3rd GARP, IMO, remove that is not so much good for neutron. But could move it to L3-agent state change code path, not the sate change deamon.
14:52:57 <ralonsoh> you should implement this change in your patch, slaweq
14:53:12 <slaweq> ralonsoh: but why?
14:53:34 <ralonsoh> to handle all possible interfaces
14:53:39 <ralonsoh> ovs, lb, etc
14:53:48 <ralonsoh> this is the main problem you see in your patch
14:54:46 <slaweq> ralonsoh: ok, I think I need to talk offline with You about it later :)
14:54:51 <ralonsoh> sure
14:55:14 <slaweq> I don't want to stole all remaining time so please continue with other topics now
14:58:14 <liuyulong> OK, we could continue the code discussion on the patch set.
14:58:30 <liuyulong> Actually we have no much time now.
14:58:34 <slaweq> liuyulong: yes :)
14:58:40 <liuyulong> #topic On demand agenda
14:59:20 <liuyulong> Alright, let's end here.
14:59:27 <liuyulong> Thank you guys.
14:59:32 <liuyulong> Bye
14:59:37 <liuyulong> #endmeeting