15:00:38 #startmeeting neutron_l3 15:00:39 Meeting started Thu Sep 27 15:00:38 2018 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:40 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:43 The meeting name has been set to 'neutron_l3' 15:00:53 Hi Swami 15:00:55 hi 15:01:11 hey tidwellr_ 15:01:24 hi 15:01:32 is it getting cold in Minneapolis yet? 15:01:47 hi 15:01:54 yep, it's kind of depressing :) 15:02:02 This morning, for the first time I had to wear a sweater to walk my dog in Austin 15:02:04 hi 15:02:10 Hi 15:02:32 mlavalle: I see 3 foot snow drifts in my future 15:02:44 LOL 15:02:50 ok, let's get started 15:02:57 #topic Announcements 15:03:42 First one is a reminder that our next milestone is Stein-1, October 22 - 26 15:04:45 Second announcement is to highlight, for those who might not seen it, that I sent to the ML a summary of the PTG: 15:04:54 #link http://lists.openstack.org/pipermail/openstack-dev/2018-September/135032.html 15:05:10 Thank you! 15:05:10 I hope I captured the discussion accurately 15:05:56 Please ping me or respond to the message if anything needs to be amended 15:07:10 Finally, we are within 45 days from the Summit in Berlin. I know because my employer allows us to do travelling arrangements once we are inside that window 15:07:24 so maybe some of you have to start making reservations as well 15:08:04 Hope to see many of you in Berlin 15:08:13 Any other announcements? 15:08:43 ok, let's move on 15:08:54 #topic Bugs 15:09:34 Swami: please go ahead 15:09:37 Sure 15:09:43 #link https://bugs.launchpad.net/neutron/+bug/1792493 15:09:43 Launchpad bug 1792493 in neutron "DVR and floating IPs broken in latest 7.0.0.0rc1?" [High,Triaged] 15:10:02 ahh nice that you picked that one up 15:10:09 I was going to ask about it 15:10:27 This bug is not very clear as to what is causing this problem. Either the configuration of the external bridge or something else. 15:10:36 mlavalle: do you have the history behind this bug 15:10:57 no, I was triaging bugs in preparation for this meeting and I just looked at it 15:10:58 mlavalle: is it reproduceable 15:11:28 I don't know. I am working on another bug right now. This bug would be the next one I try 15:11:44 mlavalle: one of the comment in there states about the unique MAC address and I think the bug reporter is confused between the host DVR Mac and the router macs. 15:11:46 unless someone yanks it forcibly out of my hands, of course 15:12:09 Also he had mentioned configuring the provider network with the right bridge might have solved his problems. I will give it a try. 15:12:25 I am not sure what he mentioned about teamed vlan interfaces. 15:12:34 ok, I will assume you are looking at that one then 15:12:53 mlavalle: Sure I will test it. 15:13:32 The next one in the list is #link https://bugs.launchpad.net/neutron/+bug/1794305 15:13:32 Launchpad bug 1794305 in neutron "[dvr_no_external][ha] centralized fip show up in the backup snat-namespace on the restore host (restaring or down/up)" [Medium,In progress] - Assigned to LIU Yulong (dragon889) 15:13:46 There was patch up for review on this bug. 15:14:08 #link https://review.openstack.org/605358 15:14:39 Patch needs review. The fix is pretty trivial. 15:14:47 The next one in the list is 15:14:52 does it look good to you? 15:14:56 I mean the fix 15:15:33 mlavalle: yes 15:15:45 #link https://bugs.launchpad.net/neutron/+bug/1793529 15:15:45 Launchpad bug 1793529 in neutron "[dvr][ha][dataplane down] router_gateway port binding host goes wrong after the 'master' host down/up" [Undecided,New] 15:17:02 This one needs triaging. 15:17:35 It requires a HA setup and 4 node setup to test it. If anyone has a handy HA setup they can test it. 15:18:02 I have 3 nodes setup, but I can rapidly add a fourth one 15:18:13 mlavalle: ok can you check this out. 15:18:34 right now I have allinone/controller, compute1 and network 15:18:44 what do I have to add? another compute? 15:18:59 liuyulong has also been good at proposing patches, sometimes before we can even triage 15:19:05 mlavalle: you need two network node for HA. 15:19:29 that I already have, becasue my allinone is also network node 15:19:31 and compute needs to be in mode dvr_no_external 15:19:42 3 nodes is enough for this bug. 15:19:45 haleyb: yes, I did see I think he is currently extensively testing the dvr_no_external with DVR HA and finding all these issues. 15:19:55 liuyulong: there you go. 15:20:01 liuyulong: thanks for all the bugs and fixes 15:20:17 liuyulong: ack, I'll try to test it 15:20:26 liuyulong: So two network node and one compute is all is required? 15:20:35 bug 1793529 may involve with l2. 15:20:36 bug 1793529 in neutron "[dvr][ha][dataplane down] router_gateway port binding host goes wrong after the 'master' host down/up" [Undecided,New] https://launchpad.net/bugs/1793529 15:20:59 Swami, yes 15:21:04 liuyulong: if I get stuck I'll ask you questions in the bug 15:21:20 liuyulong: thanks 15:21:45 Swami, np 15:21:46 The next one in the list is #link https://bugs.launchpad.net/neutron/+bug/1786272 15:21:46 Launchpad bug 1786272 in neutron "Connection between two virtual routers does not work with DVR" [Medium,In progress] - Assigned to Brian Haley (brian-haley) 15:21:53 mlavalle, OK 15:22:20 #link https://review.openstack.org/597567 - patch is up for review. 15:22:32 haleyb: slaweq: I have a question on this patch. 15:22:33 slaweq has been working on that, i had an update to help deal with a possible race condition 15:22:45 I added a comment on what is still remaining in this patch. 15:23:13 I didn't had time today to look into it yet 15:23:25 I also had to remove my dvr env where I was testing it 15:23:32 and liuyulong left some comments last night.... well afternoon his time ;-) 15:23:41 haleyb: Deleting a port with connected routers seems to be complicated. With the existing logic, if we have more than one subnet and more than two routers connected together deleting is an issue. 15:24:22 We may have to retain all the router namespaces in the agent for all the connected routers, until the last Service port or VM from the compute related to all these routers is gone. 15:24:58 haleyb: slaweq: what is your opinion on that. 15:25:47 Swami: to avoid a race condition? 15:26:02 haleyb: no it is a different topic. 15:26:41 haleyb: the current 'get-routers_to_remove' function in the l3_dvrscheduler_db.py has to be refined for the statement that I made above. 15:27:19 TBH I don't know this code as good so I don't know :/ 15:27:35 I fixed a part of it, but still when we have multiple subnets and more than two connected routers we may have some problem, were it might delete the router namespace or the connected routers namepace and so the network will be broken. 15:27:38 Swami: yes, that seems ok, i would have to look at the code again 15:27:38 I will need to take a look at the code and check then 15:28:08 slaweq: haleyb: I need to spend another day or two to refine that function a bit. 15:28:16 slaweq: I will update you on my findings. 15:28:23 Swami: ok, thx a lot 15:28:41 slaweq: haleyb: Meanwhile if you can work on the race condition on the agent the liuyulong pointed out that might be ok to me. 15:28:55 k 15:29:01 Swami: i had a first pass, will look at the comments 15:29:23 haleyb: thanks 15:29:33 Let us move on. 15:29:37 #link https://bugs.launchpad.net/neutron/+bug/1774459 15:29:39 Launchpad bug 1774459 in neutron "Update permanent ARP entries for allowed_address_pair IPs in DVR Routers" [High,Confirmed] 15:30:11 #link https://review.openstack.org/601336 15:30:33 mlavalle: haleyb: slaweq: The patch is still work in progress. 15:31:01 This is based on the discussion we had at the PTG to send the packet to the ryu controller for processing with GARP. 15:31:08 yeap 15:31:15 I have some code to handle the packet_in handler. 15:31:45 mlavalle: I had some questions on my approach, may be if anyone of you can check it out and let me your early comments on it. 15:32:03 That would be great. 15:32:06 are the questions in the patch? 15:32:21 Yes. 15:32:29 ack. will look at it 15:32:30 Is yamamoto here? 15:32:53 well, he is connected, but he might be asleep 15:33:14 but chances are that he will be in tomorrow's drivers meeting 15:33:26 that's at 7am your time 15:33:26 mlavalle: ok no problem, since he was the original author of the native openflow controller, I thought he can also add his comment. 15:33:48 oh, I'll ask him to look at the pacth tomorrow 15:33:52 mlavalle: ok, if I am not there you can update him on this discussion. 15:34:31 Ok Let us move on to next. 15:34:35 #link https://bugs.launchpad.net/neutron/+bug/1793527 15:34:35 Launchpad bug 1793527 in neutron "[dvr_no_external][ha][dataplane down]centralized floating IP nat rules not install in every HA node" [Undecided,In progress] - Assigned to LIU Yulong (dragon889) 15:34:44 there is also a patch for this bug. 15:35:03 #link https://review.openstack.org/604094 15:35:19 Reviews are ongoing on this patch, so nothing much to discuss. 15:35:31 But I might have a question to liuyulong on this patch 15:35:36 liuyulong: are you there 15:35:40 yes 15:36:21 So when you update the floatingip nat rules into the SNAT namespace of the HA routers, are you still checking if the floatingip is intended for the centralized for the distributed? 15:36:30 that one was not clear to me. 15:37:25 Yes, the check is needed, since we can't install policy route rules and iptables rules at same time. 15:38:01 liuyulong: Ok I will test it out and give my feedback. 15:38:06 liuyulong: thanks 15:38:26 Swami, np, : ) 15:38:28 mlavalle: that's all I had for today. Back to you. 15:38:57 Thanks 15:39:08 I have some other bugs 15:39:37 mlavalle: sure go ahead 15:39:41 https://bugs.launchpad.net/neutron/+bug/1791989 15:39:41 Launchpad bug 1791989 in neutron "grenade-dvr-multinode job fails" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:39:50 any updates on this one, slaweq? 15:40:02 mlavalle: still no 15:40:13 ok, cool 15:40:24 haleyb said on Tuesday that he will add some extra logging there 15:40:36 but I don't know if he test something already 15:40:46 slaweq: yes, i'll get to that, haven't added it yet 15:40:55 ok :) 15:41:03 thanks for the update :-) 15:41:12 Next one is https://bugs.launchpad.net/neutron/+bug/1789434 15:41:12 Launchpad bug 1789434 in neutron "neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA failing 100% times" [High,Confirmed] - Assigned to Manjeet Singh Bhatia (manjeet-s-bhatia) 15:41:27 I don't see manjeets around 15:41:40 I'll ping him later today 15:42:25 I see slaweq is marking the related tests as unstable 15:43:03 this patch with marking tests as unstable is merged already I think 15:43:11 yeap 15:43:17 just making the point 15:43:29 Next one is https://bugs.launchpad.net/neutron/+bug/1787919 15:43:29 Launchpad bug 1787919 in neutron "Upgrade router to L3 HA broke IPv6" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 15:43:41 I am currently working on this one 15:43:57 I was talking to the submitter in channel while we were in this meeting 15:44:11 I clarified a few things 15:44:18 and I'll continue working on it 15:44:51 The last one I want to bring up is https://bugs.launchpad.net/neutron/+bug/1792901 15:44:51 Launchpad bug 1792901 in neutron "subnet pool can not delete prefixes" [Medium,New] 15:45:54 This is not a bug. It might be a RFE. But before going that router, I would like to ask tidwellr_ if he can leave a comment as to why it is currently this way 15:46:13 since he is the author of https://review.openstack.org/#/c/148698 15:46:39 I'll take a look, there was a good reason 15:46:43 and the commit message explicitely mentions "Prefixes cannot be removed from the pool once added" 15:46:46 I don't remember what it was :) 15:47:08 yeah, I was struggling with it last night. That's why I am asking for help 15:47:43 I was hoping to find a comment where the exception is raised 15:48:18 any other bugs we should discuss today? 15:48:58 I reported one today 15:49:00 https://bugs.launchpad.net/neutron/+bug/1794809 15:49:00 Launchpad bug 1794809 in neutron "Gateway ports are down after reboot of control plane nodes" [Medium,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:49:07 so I want to mention it here :) 15:49:47 thanks 15:49:54 ok, let's move on 15:50:07 #topic On demand agenda 15:50:14 it looks that when control plane is e.g. booting up it may happen that L3 agent is already up and tries to bind gateway ports but L2 agent is still down and ports are binding_failed 15:50:40 Are there any other topics to discuss today? 15:51:46 ok, thanks for atteding! Have a great weekend 15:51:48 slaweq, IMO, the root cause is similar to bug 1793529. 15:51:48 bug 1793529 in neutron "[dvr][ha][dataplane down] router_gateway port binding host goes wrong after the 'master' host down/up" [Undecided,New] https://launchpad.net/bugs/1793529 15:52:44 Thanks! 15:53:16 slaweq, I have some clue, if the router host is power on, the gateway port may plug again, then l2 agent may bind it to this host. 15:53:50 should we mark one of those bugs as duplicate? 15:53:53 liuyulong: I'm not sure, I clearly saw in logs that port was failed to bind and was marked with binding failed and it happens after host was rebooted 15:54:44 slaweq, if the port is not binding failed, you may see the gateway port binding host is back to this rebooted host. 15:56:01 mlavalle, I'm not sure, we need to find the root cause first. : ) 15:56:07 but bug which I reported is related to problem when L3 agent is up and L2 is down, port is then binding_failed and gateway is not working fine 15:56:31 liuyulong: I think that in my case I know root cause and I'm not sure if those are duplicates in fact 15:56:34 ok, let's investigate further and discuss in channel and / or the bugs 15:56:41 but maybe I'm wrong 15:56:58 let carry on for the time being 15:57:10 let's not mark anything as duplicate 15:57:16 OK 15:57:34 see you next week 15:57:37 #endmeeting