15:00:38 <mlavalle> #startmeeting neutron_l3
15:00:53 <mlavalle> Hi Swami
15:00:55 <tidwellr_> hi
15:01:11 <mlavalle> hey tidwellr_
15:01:24 <haleyb> hi
15:01:47 <slaweq> hi
15:02:04 <Swami> hi
15:02:10 <davidsha> Hi
15:02:50 <mlavalle> ok, let's get started
15:02:57 <mlavalle> #topic Announcements
15:03:42 <mlavalle> First one is a reminder that our next milestone is Stein-1, October 22 - 26
15:04:45 <mlavalle> Second announcement is to highlight, for those who might not seen it, that I sent to the ML a summary of the PTG:
15:04:54 <mlavalle> #link http://lists.openstack.org/pipermail/openstack-dev/2018-September/135032.html
15:05:10 <davidsha> Thank you!
15:05:10 <mlavalle> I hope I captured the discussion accurately
15:05:56 <mlavalle> Please ping me or respond to the message if anything needs to be amended
15:07:10 <mlavalle> Finally, we are within 45 days from the Summit in Berlin. I know because my employer allows us to do travelling arrangements once we are inside that window
15:07:24 <mlavalle> so maybe some of you have to start making reservations as well
15:08:04 <mlavalle> Hope to see many of you in Berlin
15:08:13 <mlavalle> Any other announcements?
15:08:43 <mlavalle> ok, let's move on
15:08:54 <mlavalle> #topic Bugs
15:09:34 <mlavalle> Swami: please go ahead
15:09:37 <Swami> Sure
15:09:43 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1792493
15:09:43 <openstack> Launchpad bug 1792493 in neutron "DVR and floating IPs broken in latest" [High,Triaged]
15:10:02 <mlavalle> ahh nice that you picked that one up
15:10:09 <mlavalle> I was going to ask about it
15:10:27 <Swami> This bug is not very clear as to what is causing this problem. Either the configuration of the external bridge or something else.
15:10:36 <Swami> mlavalle: do you have the history behind this bug
15:10:57 <mlavalle> no, I was triaging bugs in preparation for this meeting and I just looked at it
15:10:58 <Swami> mlavalle: is it reproduceable
15:11:28 <mlavalle> I don't know. I am working on another bug right now. This bug would be the next one I try
15:11:44 <Swami> mlavalle: one of the comment in there states about the unique MAC address and I think the bug reporter is confused between the host DVR Mac and the router macs.
15:11:46 <mlavalle> unless someone yanks it forcibly out of my hands, of course
15:12:09 <Swami> Also he had mentioned configuring the provider network with the right bridge might have solved his problems. I will give it a try.
15:12:25 <Swami> I am not sure what he mentioned about teamed vlan interfaces.
15:12:34 <mlavalle> ok, I will assume you are looking at that one then
15:12:53 <Swami> mlavalle: Sure I will test it.
15:13:32 <Swami> The next one in the list is #link https://bugs.launchpad.net/neutron/+bug/1794305
15:13:32 <openstack> Launchpad bug 1794305 in neutron "[dvr_no_external][ha] centralized fip show up in the backup snat-namespace on the restore host (restaring or down/up)" [Medium,In progress] - Assigned to LIU Yulong (dragon889)
15:13:46 <Swami> There was patch up for review on this bug.
15:14:08 <Swami> #link https://review.openstack.org/605358
15:14:39 <Swami> Patch needs review. The fix is pretty trivial.
15:14:47 <Swami> The next one in the list is
15:14:52 <mlavalle> does it look good to you?
15:14:56 <mlavalle> I mean the fix
15:15:33 <Swami> mlavalle: yes
15:15:45 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1793529
15:15:45 <openstack> Launchpad bug 1793529 in neutron "[dvr][ha][dataplane down] router_gateway port binding host goes wrong after the 'master' host down/up" [Undecided,New]
15:17:02 <Swami> This one needs triaging.
15:17:35 <Swami> It requires a HA setup and 4 node setup to test it. If anyone has a handy HA setup they can test it.
15:18:02 <mlavalle> I have 3 nodes setup, but I can rapidly add a fourth one
15:18:13 <Swami> mlavalle: ok can you check this out.
15:18:34 <mlavalle> right now I have allinone/controller, compute1 and network
15:18:44 <mlavalle> what do I have to add? another compute?
15:18:59 <haleyb> liuyulong has also been good at proposing patches, sometimes before we can even triage
15:19:05 <Swami> mlavalle: you need two network node for HA.
15:19:29 <mlavalle> that I already have, becasue my allinone is also network node
15:19:31 <haleyb> and compute needs to be in mode dvr_no_external
15:19:42 <liuyulong> 3 nodes is enough for this bug.
15:19:45 <Swami> haleyb: yes, I did see I think he is currently extensively testing the dvr_no_external with DVR HA and finding all these issues.
15:19:55 <Swami> liuyulong: there you go.
15:20:01 <haleyb> liuyulong: thanks for all the bugs and fixes
15:20:17 <mlavalle> liuyulong: ack, I'll try to test it
15:20:26 <Swami> liuyulong: So two network node and one compute is all is required?
15:20:35 <liuyulong> bug 1793529 may involve with l2.
15:20:36 <openstack> bug 1793529 in neutron "[dvr][ha][dataplane down] router_gateway port binding host goes wrong after the 'master' host down/up" [Undecided,New] https://launchpad.net/bugs/1793529
15:20:59 <liuyulong> Swami, yes
15:21:04 <mlavalle> liuyulong: if I get stuck I'll ask you questions in the bug
15:21:20 <Swami> liuyulong: thanks
15:21:45 <liuyulong> Swami, np
15:21:46 <Swami> The next one in the list is #link https://bugs.launchpad.net/neutron/+bug/1786272
15:21:46 <openstack> Launchpad bug 1786272 in neutron "Connection between two virtual routers does not work with DVR" [Medium,In progress] - Assigned to Brian Haley (brian-haley)
15:21:53 <liuyulong> mlavalle, OK
15:22:20 <Swami> #link https://review.openstack.org/597567 - patch is up for review.
15:22:32 <Swami> haleyb: slaweq: I have a question on this patch.
15:22:33 <haleyb> slaweq has been working on that, i had an update to help deal with a possible race condition
15:22:45 <Swami> I added a comment on what is still remaining in this patch.
15:23:13 <slaweq> I didn't had time today to look into it yet
15:23:25 <slaweq> I also had to remove my dvr env where I was testing it
15:23:32 <mlavalle> and liuyulong left some comments last night.... well afternoon his time ;-)
15:23:41 <Swami> haleyb: Deleting a port with connected routers seems to be complicated.  With the existing logic, if we have more than one subnet and more than two routers connected together deleting is an issue.
15:24:22 <Swami> We may have to retain all the router namespaces in the agent for all the connected routers, until the last Service port or VM from the compute related to all these routers is gone.
15:24:58 <Swami> haleyb: slaweq: what is your opinion on that.
15:25:47 <haleyb> Swami: to avoid a race condition?
15:26:02 <Swami> haleyb: no it is a different topic.
15:26:41 <Swami> haleyb: the current 'get-routers_to_remove' function in the l3_dvrscheduler_db.py  has to be refined for the statement that I made above.
15:27:19 <slaweq> TBH I don't know this code as good so I don't know :/
15:27:35 <Swami> I fixed a part of it, but still when we have multiple subnets and more than two connected routers we may have some problem, were it might delete the router namespace or the connected routers namepace and so the network will be broken.
15:27:38 <haleyb> Swami: yes, that seems ok, i would have to look at the code again
15:27:38 <slaweq> I will need to take a look at the code and check then
15:28:08 <Swami> slaweq: haleyb: I need to spend another day or two to refine that function a bit.
15:28:16 <Swami> slaweq: I will update you on my findings.
15:28:23 <slaweq> Swami: ok, thx a lot
15:28:41 <Swami> slaweq: haleyb: Meanwhile if you can work on the race condition on the agent the liuyulong pointed out that might be ok to me.
15:28:55 <slaweq> k
15:29:01 <haleyb> Swami: i had a first pass, will look at the comments
15:29:23 <Swami> haleyb: thanks
15:29:33 <Swami> Let us move on.
15:29:37 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1774459
15:29:39 <openstack> Launchpad bug 1774459 in neutron "Update permanent ARP entries for allowed_address_pair IPs in DVR Routers" [High,Confirmed]
15:30:11 <Swami> #link https://review.openstack.org/601336
15:30:33 <Swami> mlavalle: haleyb: slaweq: The patch is still work in progress.
15:31:01 <Swami> This is based on the discussion we had at the PTG to send the packet to the ryu controller for processing with GARP.
15:31:08 <mlavalle> yeap
15:31:15 <Swami> I have some code to handle the packet_in handler.
15:31:45 <Swami> mlavalle: I had some questions on my approach, may be if anyone of you can check it out and let me your early comments on it.
15:32:03 <Swami> That would be great.
15:32:06 <mlavalle> are the questions in the patch?
15:32:21 <Swami> Yes.
15:32:29 <mlavalle> ack. will look at it
15:32:30 <Swami> Is yamamoto here?
15:32:53 <mlavalle> well, he is connected, but he might be asleep
15:33:14 <mlavalle> but chances are that he will be in tomorrow's drivers meeting
15:33:26 <mlavalle> that's at 7am your time
15:33:26 <Swami> mlavalle: ok no problem, since he was the original author of the native openflow controller, I thought he can also add his comment.
15:33:48 <mlavalle> oh, I'll ask him to look at the pacth tomorrow
15:33:52 <Swami> mlavalle: ok, if I am not there you can update him on this discussion.
15:34:31 <Swami> Ok Let us move on to next.
15:34:35 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1793527
15:34:35 <openstack> Launchpad bug 1793527 in neutron "[dvr_no_external][ha][dataplane down]centralized floating IP nat rules not install in every HA node" [Undecided,In progress] - Assigned to LIU Yulong (dragon889)
15:34:44 <Swami> there is also a patch for this bug.
15:35:03 <Swami> #link  https://review.openstack.org/604094
15:35:19 <Swami> Reviews are ongoing on this patch, so nothing much to discuss.
15:35:31 <Swami> But I might have a question to liuyulong on this patch
15:35:36 <Swami> liuyulong: are you there
15:35:40 <liuyulong> yes
15:36:21 <Swami> So when you update the floatingip nat rules into the SNAT namespace of the HA routers, are you still checking if the floatingip is intended for the centralized for the distributed?
15:36:30 <Swami> that one was not clear to me.
15:37:25 <liuyulong> Yes, the check is needed, since we can't install policy route rules and iptables rules at same time.
15:38:01 <Swami> liuyulong: Ok I will test it out and give my feedback.
15:38:06 <Swami> liuyulong: thanks
15:38:26 <liuyulong> Swami, np, : )
15:38:28 <Swami> mlavalle: that's all I had for today. Back to you.
15:38:57 <mlavalle> Thanks
15:39:08 <mlavalle> I have some other bugs
15:39:37 <Swami> mlavalle: sure go ahead
15:39:41 <mlavalle> https://bugs.launchpad.net/neutron/+bug/1791989
15:39:41 <openstack> Launchpad bug 1791989 in neutron "grenade-dvr-multinode job fails" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
15:39:50 <mlavalle> any updates on this one, slaweq?
15:40:02 <slaweq> mlavalle: still no
15:40:13 <mlavalle> ok, cool
15:40:24 <slaweq> haleyb said on Tuesday that he will add some extra logging there
15:40:36 <slaweq> but I don't know if he test something already
15:40:46 <haleyb> slaweq: yes, i'll get to that, haven't added it yet
15:40:55 <slaweq> ok :)
15:41:03 <mlavalle> thanks for the update :-)
15:41:12 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1789434
15:41:12 <openstack> Launchpad bug 1789434 in neutron "neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA failing 100% times" [High,Confirmed] - Assigned to Manjeet Singh Bhatia (manjeet-s-bhatia)
15:41:27 <mlavalle> I don't see manjeets around
15:41:40 <mlavalle> I'll ping him later today
15:42:25 <mlavalle> I see slaweq is marking the related tests as unstable
15:43:03 <slaweq> this patch with marking tests as unstable is merged already I think
15:43:11 <mlavalle> yeap
15:43:17 <mlavalle> just making the point
15:43:29 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1787919
15:43:29 <openstack> Launchpad bug 1787919 in neutron "Upgrade router to L3 HA broke IPv6" [High,Confirmed] - Assigned to Miguel Lavalle (minsel)
15:43:41 <mlavalle> I am currently working on this one
15:43:57 <mlavalle> I was talking to the submitter in channel while we were in this meeting
15:44:11 <mlavalle> I clarified a few things
15:44:18 <mlavalle> and I'll continue working on it
15:44:51 <mlavalle> The last one I want to bring up is https://bugs.launchpad.net/neutron/+bug/1792901
15:44:51 <openstack> Launchpad bug 1792901 in neutron "subnet pool can not delete prefixes" [Medium,New]
15:45:54 <mlavalle> This is not a bug. It might be a RFE. But before going that router, I would like to ask tidwellr_ if he can leave a comment as to why it is currently this way
15:46:13 <mlavalle> since he is the author of https://review.openstack.org/#/c/148698
15:46:39 <tidwellr_> I'll take a look, there was a good reason
15:46:43 <mlavalle> and the commit message explicitely mentions "Prefixes cannot be removed from the pool once added"
15:46:46 <tidwellr_> I don't remember what it was :)
15:47:08 <mlavalle> yeah, I was struggling with it last night. That's why I am asking for help
15:47:43 <mlavalle> I was hoping to find a comment where the exception is raised
15:48:18 <mlavalle> any other bugs we should discuss today?
15:48:58 <slaweq> I reported one today
15:49:00 <slaweq> https://bugs.launchpad.net/neutron/+bug/1794809
15:49:00 <openstack> Launchpad bug 1794809 in neutron "Gateway ports are down after reboot of control plane nodes" [Medium,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
15:49:07 <slaweq> so I want to mention it here :)
15:49:47 <mlavalle> thanks
15:49:54 <mlavalle> ok, let's move on
15:50:07 <mlavalle> #topic On demand agenda
15:50:14 <slaweq> it looks that when control plane is e.g. booting up it may happen that L3 agent is already up and tries to bind gateway ports but L2 agent is still down and ports are binding_failed
15:50:40 <mlavalle> Are there any other topics to discuss today?
15:51:46 <mlavalle> ok, thanks for atteding! Have a great weekend
15:51:48 <liuyulong> slaweq, IMO, the root cause is similar to bug 1793529.
15:51:48 <openstack> bug 1793529 in neutron "[dvr][ha][dataplane down] router_gateway port binding host goes wrong after the 'master' host down/up" [Undecided,New] https://launchpad.net/bugs/1793529
15:52:44 <davidsha> Thanks!
15:53:16 <liuyulong> slaweq, I have some clue, if the router host is power on, the gateway port may plug again, then l2 agent may bind it to this host.
15:53:50 <mlavalle> should we mark one of those bugs as duplicate?
15:53:53 <slaweq> liuyulong: I'm not sure, I clearly saw in logs that port was failed to bind and was marked with binding failed and it happens after host was rebooted
15:54:44 <liuyulong> slaweq, if the port is not binding failed, you may see the gateway port binding host is back to this rebooted host.
15:56:01 <liuyulong> mlavalle, I'm not sure, we need to find the root cause first. : )
15:56:07 <slaweq> but bug which I reported is related to problem when L3 agent is up and L2 is down, port is then binding_failed and gateway is not working fine
15:56:31 <slaweq> liuyulong: I think that in my case I know root cause and I'm not sure if those are duplicates in fact
15:56:34 <mlavalle> ok, let's investigate further and discuss in channel and / or the bugs
15:56:41 <slaweq> but maybe I'm wrong
15:56:58 <mlavalle> let carry on for the time being
15:57:10 <mlavalle> let's not mark anything as duplicate
15:57:16 <liuyulong> OK
15:57:34 <mlavalle> see you next week
15:57:37 <mlavalle> #endmeeting