14:02:14 <liuyulong> #startmeeting neutron_l3
14:02:27 <liuyulong> Long time no see. :)
14:02:50 <slaweq> hi
14:03:26 <liuyulong> OK, let's start.
14:04:04 <liuyulong> #topic Bugs
14:04:17 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1913621
14:04:19 <openstack> Launchpad bug 1913646 in neutron "duplicate for #1913621 DVR router ARP traffic broken for networks containing multiple subnets" [Medium,Confirmed] - Assigned to LIU Yulong (dragon889)
14:04:21 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1913646
14:04:22 <openstack> Launchpad bug 1913646 in neutron "DVR router ARP traffic broken for networks containing multiple subnets" [Medium,Confirmed] - Assigned to LIU Yulong (dragon889)
14:04:47 <liuyulong> This was fixed in a way that we change the ARP reply dest mac to the router gateway.
14:05:12 <liuyulong> But the bug reporter said that 1913646 is a bit different.
14:05:38 <liuyulong> Sorry, it's 1913621
14:06:12 <liuyulong> The main issue of bug/1913621 is why the Permant ARP was not added.
14:06:48 <lajoskatona> Hi
14:07:10 <liuyulong> Hi, lajoskatona
14:07:19 <liuyulong> So I just removed the duplicated mark.
14:07:43 <liuyulong> I agree that point, the main problem of 1913621  may exist in dvr related code.
14:10:22 <liuyulong> I will revisit this bug  1913621 and try to reproduce that.
14:10:38 <liuyulong> next
14:10:39 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1916022
14:10:40 <openstack> Launchpad bug 1916022 in neutron "L3HA Race condition during startup of the agent may cause inconsistent router's states" [Low,In progress] - Assigned to Slawek Kaplonski (slaweq)
14:11:11 <liuyulong> #link https://review.opendev.org/c/openstack/neutron/+/776423
14:11:14 <liuyulong> The patch is here.
14:11:39 <liuyulong> I've read the code, but not test it yet.
14:12:27 <liuyulong> A "Race condition" bug sometimes is not so much easy to test, IMO.
14:13:22 <slaweq> yes we found it with tobiko tests
14:14:59 <slaweq> those tobiko tests can be very useful for us in some cases :)
14:15:41 <liuyulong> maybe run some times job in that tobiko to verify the fix
14:16:00 <slaweq> liuyulong: I did
14:16:02 <slaweq> https://review.opendev.org/c/openstack/neutron/+/776284/5
14:16:11 <slaweq> this is "test patch" which runs tobiko jobs
14:16:19 <slaweq> and it passed many times already
14:16:31 <slaweq> so for me it's clearly solving the issue
14:16:54 <liuyulong> Cool
14:17:12 <liuyulong> Yep, Great work!
14:17:23 <slaweq> thx
14:18:30 <liuyulong> Just post my +2
14:18:39 <slaweq> thx
14:18:40 <liuyulong> OK, next one
14:18:43 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1916024
14:18:45 <openstack> Launchpad bug 1916024 in neutron "HA router master instance in error state because qg-xx interface is down" [High,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
14:19:23 <liuyulong> #link https://review.opendev.org/c/openstack/neutron/+/776427
14:19:34 <liuyulong> The fix ^
14:20:32 <liuyulong> The fix is simple, just use the normal workaround method "retry".
14:20:49 <liuyulong> #link https://review.opendev.org/c/openstack/neutron/+/776427/4//COMMIT_MSG
14:20:56 <liuyulong> I have left some comment here.
14:21:14 <liuyulong> #link http://paste.openstack.org/show/802779/
14:21:55 <liuyulong> There are some actions " DelPortCommand(port=qg-3e872c7f-68" and "AddPortCommand(bridge=br-int, port=qg-3e872c7f-68"
14:22:30 <liuyulong> I don't know if these method can be the root cause, but it is really close to the behavior we found.
14:23:53 <liuyulong> While ovsdbapp is doing the "delete and add" work, the privsep deamon is trying to run the "ip link" related command.
14:28:30 <liuyulong> So, maybe we can refactor that replace_port method to a more grace way: not delete it, but clear attributes and reset.
14:28:49 <liuyulong> Just some thoughts, do not be sure if it really works.
14:29:03 <liuyulong> OK, no more bugs from me then.
14:29:11 <liuyulong> Any updates?
14:31:10 <liuyulong> OK, let's move on
14:31:19 <liuyulong> #topic distributed_dhcp
14:31:39 <liuyulong> I have uploaded all code.
14:31:41 <liuyulong> #link https://review.opendev.org/q/topic:%22bp%252Fdistributed-dhcp-for-ml2-ovs%22+(status:open%20OR%20status:merged)
14:32:10 <liuyulong> #link https://review.opendev.org/c/openstack/neutron/+/776568
14:32:34 <liuyulong> The fullstack test case is passed locally in my devstack environment.
14:32:51 <liuyulong> I'm not quite sure why the upstream is failing.
14:33:07 <liuyulong> One issue may be the DHCP client version.
14:35:19 <liuyulong> Since we use namespace as the fake VM, the dhcp client should be from "Linux ubuntu-focal-ovh-gra1-0023152345 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux"
14:35:54 <liuyulong> Maybe I should run a devstack deployment in ubuntu to verify this case.
14:37:16 <liuyulong> Some other concerns are that maybe protocol coverage of the DHCPv4/v6 responder.
14:38:21 <liuyulong> All these options and replies are based on the dnsmasq example of dhcp-instance from DHCP-agent.
14:39:00 <liuyulong> We use Wireshark to verify and compare all related options from distribtued_dhcp and dnsmasq
14:39:14 <liuyulong> Maybe it's not enough.
14:40:18 <liuyulong> So, any comments/problem/testing/issues you have about these DHCPv4/v6 responder are welcomed.
14:40:38 <haleyb> i wonder if looking at things in an OVN environment would help?  at least the flows?
14:40:54 <liuyulong> This is the main part of the distributed DHCP.
14:41:24 <liuyulong> haleyb, I did that, seems negtive.
14:41:47 <liuyulong> The flows from OVN and OVS are totally different.
14:42:17 <liuyulong> Actually for the implementation here for ovs agent, we only add one flow for DHCP request.
14:42:28 <liuyulong> "submit to controller, aka ovs-agent"
14:42:56 <liuyulong> OVN's flow has some user data, then upload to ovn-controller.
14:43:43 <haleyb> ack, just thinking out loud
14:43:50 <liuyulong> ovn-controller can read those userdata, but ovs-agent with ryu app does not.
14:45:33 <liuyulong> Last thing about this bp is that config option...
14:45:41 <liuyulong> "disable_traditional_dhcp" or "enable_traditional_dhcp"
14:46:36 <liuyulong> I'm still thinking that we should not add config option to "enable neutron's default behavior by default".
14:46:39 <lajoskatona> I would say to make anyway default the "legacy" dhcp, otherwise I am fine with any
14:47:34 <liuyulong> The original purpose and main aim is to "disable" something.
14:49:27 <liuyulong> lajoskatona, yes, we will make sure that.
14:49:27 <haleyb> liuyulong: yes, it seems a little backwards, but to me it looks similar to other config options we've added where the default is True
14:50:33 <haleyb> for example, we've done that when we wanted to add a new thing then backport the config option, not that this is the same case
14:51:00 <haleyb> i.e. set the option to what the current default is - enabling dhcp-agent
14:51:27 <haleyb> i don't think the "votes" agreed on either direction in the review
14:53:09 <liuyulong> This option should not be backported to stable branch. : )
14:53:45 <liuyulong> Maybe someday this "disable_traditional_dhcp = True" will be default value.
14:55:12 <liuyulong> OK, time is running out.
14:55:20 <haleyb> or the enable default is False :)  maybe i should ask someone that works on tripleo what they think, since they're maybe more in-tune with exposing config options to customers
14:56:05 <liuyulong> haleyb, great, thank you.
14:56:10 <liuyulong> We can continue the discussion on the gerrit.
14:56:23 <haleyb> sure
14:56:44 <liuyulong> #topic On demand agenda
14:56:50 <liuyulong> I have one update here:
14:57:19 <liuyulong> The spec for "elastic snat" https://review.opendev.org/c/openstack/neutron-specs/+/770540
14:57:59 <liuyulong> Reviews are welcomed.
14:59:45 <liuyulong> OK, thank you guys
14:59:51 <liuyulong> Let's end here.
14:59:53 <liuyulong> Bye
14:59:59 <lajoskatona> Bye
15:00:05 <liuyulong> #endmeeting