15:02:52 #startmeeting neutron_l3 15:02:53 Meeting started Thu Sep 21 15:02:52 2017 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:56 hi 15:02:57 The meeting name has been set to 'neutron_l3' 15:03:06 #chair Swami 15:03:07 Current chairs: Swami haleyb 15:04:16 #topic Announcements 15:04:56 Hope people had a productive PTG and have recovered by now 15:05:16 mlavalle will not be able to attend today's meeting, since he has a conflict 15:05:20 haleyb: sure 15:06:33 haleyb: I hope mlavalle might have sent out a report of the PTG update 15:07:04 i don't remember seeing it yet, i'll look again 15:07:38 haleyb: I thought mlavalle mentioned that he would send it out in a day or two. Sorry I have not seen either. 15:07:41 I guess the one thing we got out of the PTG was more L3 bugs, at least in all these corner cases of router migration and such 15:08:42 I had no other announcements, might as well move to bugs 15:08:48 #topic Bugs 15:09:14 haleyb: thanks, let us go over the dvr bugs 15:09:36 haleyb: yes agreed 15:10:19 #link https://bugs.launchpad.net/neutron/+bug/1718585 15:10:21 Launchpad bug 1718585 in neutron "set floatingip status to DOWN during creation" [Undecided,Opinion] - Assigned to venkata anil (anil-venkata) 15:10:43 This bug has been filed by anil-venkata 15:11:13 It seems that he is asking to change the behavior of floatingip status report. 15:11:14 i thought we set status to ERROR by default, only ACTIVE if succeedd 15:11:46 haleyb: That was my opinion, but I need to recheck and I have not payed attention to the status. 15:12:18 haleyb: But in the case of migration of floatingIP or router associated with a floatingip keeping a new status for floatingip would be tedious. 15:13:11 We always say that when a VM migrates or a floatingIP migrates there should not be any down time, then why do we need to change the state. 15:14:09 I agree that during the initial floatingIP setup there should be some state that determines, if it is ready to be consumed or not 15:14:12 yeah, we should only change on target host if it failed, otherwise the state could be flakey if new updated before old host 15:14:46 since one could be tearing down while other is building 15:14:49 haleyb: but the floatingip state is not tied to host. 15:15:42 but the agent reports the state, that's what i'm getting at 15:16:24 haleyb: There are currently three floatingIP states and I did see an additional state that is defined in the agent as 'NOCHANGE' 15:17:56 haleyb: may be we can see what makes sense to handle all these timing issues with floatingip. 15:18:04 More discussion needed on this. 15:18:31 The next one in the list is 15:18:34 #link https://bugs.launchpad.net/neutron/+bug/1718345 15:18:35 Launchpad bug 1718345 in neutron "ml2_distributed_port_bindings not cleared after migration from DVR" [Undecided,New] 15:19:27 I have to check the code path to see why the ml2 port binding is not being cleared when the router is migrated. 15:20:02 The port binding is actually done when ensure_port_binding is called. 15:20:17 But we need to see if the router migration takes a similar path or not. 15:20:25 looks like a bug, since we found all these other cases with router ports i'm not surprised to find something else 15:20:48 The original design was to move the legacy to dvr and not to move the dvr to legacy. So there may be some corner cases here. 15:21:08 which we have not addressed. 15:21:27 agreed, we've just never noticed since noone typically does this 15:21:30 I will triage this and see what is missing in here. 15:22:00 great, thanks 15:22:15 The next in the list is 15:22:18 #link https://bugs.launchpad.net/neutron/+bug/1717302 15:22:19 Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [Undecided,New] 15:23:30 I posted a patch to address this issue 15:23:45 #link https://review.openstack.org/#/c/505324/ 15:24:11 but still I was seeing a couple of tests failing for east-west communication. 15:24:18 Can you add a closes-bug to that next update? to tie it to the bug? 15:24:45 haleyb: This is not directly related to this bug, this was a migration patch that I included to test the case here. 15:25:01 haleyb: But still I am seeing east-west with fip failing. 15:25:07 haleyb: not sure what is causing this. 15:25:35 haleyb: The log trace still shows that an IP cannot be assigned to qg- interface and does not exist. 15:26:07 Swami: i will look at the dvr scheduler part of that change again, think that's where i didn't look as close 15:26:17 but the qg- error is still strange 15:26:34 Also the unknown factor here is, in the logs, I can see that it is trying to ARP on an IP for qg- interface in the qrouter namespace. I am not sure if this is log noise or something related to the failure. 15:27:07 hi 15:27:18 reedip_: hi 15:27:25 reedip_: hi 15:27:42 haleyb: we will keep debuging this issue with the patch. 15:27:56 just joining in , please continue, I will put up my point in an open discussion if possible :) 15:28:10 reedip_: we are discussing the dvr bugs. 15:28:22 reedip_: If you have one related to the FWaaS, you can post it now 15:28:58 Swami : yep , but I think its more related to the FWaaS than DVR actually . 15:29:36 ok I will go ahead and bring in that bug for discussion 15:29:39 #link https://bugs.launchpad.net/neutron/+bug/1716401 15:29:40 Launchpad bug 1716401 in neutron "FWaaS: Ip tables rules do not get updated in case of distributed virtual routers (DVR)" [Undecided,New] - Assigned to Reedip (reedip-banerjee) 15:29:51 oh heheheh :) 15:29:55 yeah thats the one :) 15:30:01 reedip_: can you discuss it now 15:30:35 I understand that FWaaS still hasnt looped in the HA part of a router, so its a separate discussion :) 15:31:00 reedip_: I think I had read through the bug description and add in my comments. I also wanted to talk to sridhar during the PTG, but he was bussy. 15:31:18 reedip_: let us keep the ha part apart and work with non-ha first. 15:31:33 Swami : well we can discuss it now :) 15:32:23 reedip_: go ahead 15:32:51 reedip_: if you can point me in the bug, where your code is handling the router_update scenario, then I can check it out. 15:32:57 Swami : as far as I know, the DVR code included the DVR and DVR_SNAT option for deployment in the Compute and the Controller node . 15:33:15 I do not know much about DVR, so just starting to look at it from the bug;s point of view 15:33:34 but FWaaS has not considered DVR_SNAT as of now 15:33:57 reedip_: I am sure when we originally designed this both was considered. 15:34:40 reedip_: in the case of floatingip either residing in dvr_snat node or in dvr node, the rules were configured on the 'rfp' port of the router namespace and cleared when floatingip was removed. 15:35:20 reedip_: this is only for north-south. In the case of dvr_snat you additionally need to setup the rules in the qg- interface on the snat_namespace. 15:35:34 Swami : ok ... 15:36:29 reedip_: The only behavior change that we introduced in dvr is creating the fipnamespace along with the gateway create on external network. 15:36:32 Swami : I need to read the arch for DVR, so maybe I can put this issue up later ? I will look into the points that you have mentioned 15:36:59 reedip_: ok, ping me on the IRC channel or add your comments on the bug description and we can take it from there. 15:37:06 reedip_: hope this helps. 15:37:38 Sure Swami. By the way any doc where in I can find some DVR info ? 15:37:49 I mean the arch for DVR ? 15:38:08 reedip_: All our docs were in the google. Ping me offline and I can point you to the dvr docs. 15:38:27 The next bug in the list is 15:38:30 #link https://bugs.launchpad.net/neutron/+bug/1707003 15:38:32 Launchpad bug 1707003 in neutron "gate-tempest-dsvm-neutron-dvr-ha-multinode-full-ubuntu-xenial-nv job has a very high failure rate" [High,Confirmed] - Assigned to Brian Haley (brian-haley) 15:38:32 ping you offline ? 15:38:34 ok :) 15:39:17 haleyb: any update on the grafana with this issue. Are we still seeing the grenade failures 15:39:48 refreshing page now... 15:39:56 haleyb: sorry wrong post. 15:39:59 #link https://bugs.launchpad.net/neutron/+bug/1713927 15:40:00 Launchpad bug 1713927 in neutron "gate-grenade-dsvm-neutron-dvr-multinode-ubuntu-xenial fails constantly" [High,In progress] - Assigned to Brian Haley (brian-haley) 15:40:05 I meant this bug. 15:40:36 grafana still has the job just under 20%, let me look at other bug 15:41:32 oh, that bug :) I just updated my fip host patch this morning, had to go through the logic yet again 15:42:28 haleyb: For the grafana issue the patch that you are working on is not required. Since we already have the agent side fix, that should have brought it down. 15:42:29 https://review.openstack.org/#/c/500143/ is the patch 15:42:56 right, it's just the last remaining known issue to fix and backport 15:42:59 haleyb: thanks for the patch link. 15:43:20 haleyb: The one thing I have seen in the grenade failure is something unrelated like volume failures etc., 15:43:44 i will look at some of the failures to see if they are related, i've seen the "can't ssh" failures randomly, and there is another patch for that 15:44:35 haleyb: ok sounds good. 15:44:38 Let us move on 15:44:41 #link https://bugs.launchpad.net/neutron/+bug/1717597 15:44:42 Launchpad bug 1717597 in neutron "Bad arping call in DVR centralized floating IP code" [High,In progress] - Assigned to Brian Haley (brian-haley) 15:44:42 https://review.openstack.org/#/c/500384/ was the patch 15:45:01 haleyb: ok thanks 15:45:07 i think that patch just got approved - for arping 15:45:26 #link https://review.openstack.org/#/c/504252/ 15:45:29 link to the patch. 15:45:36 haleyb: yes it should merge today. 15:46:02 then i'll cherry-pick 15:46:25 haleyb: thanks 15:46:28 #link https://bugs.launchpad.net/neutron/+bug/1716829 15:46:29 Launchpad bug 1716829 in neutron "Centralized floatingips not configured right with DVR and HA" [High,In progress] - Assigned to Brian Haley (brian-haley) 15:46:49 #link https://review.openstack.org/#/c/503530/ 15:46:56 This needs a workflow 15:48:04 I think this can even merge before the fip['host'] patch, so there not really much of dependencies here. 15:48:04 the parent if the fip host patch, so it can't merge yet 15:48:19 haleyb: ok 15:49:02 strange how my comments appear in the latest +2, blame that on gerrit 15:50:08 haleyb: sure 15:50:27 haleyb: I think we have discussed almost all the bugs. 15:50:55 haleyb: There are still some gate failure bugs, which are DB related. StaledataError and failed to bind a port. 15:51:13 #link https://bugs.launchpad.net/neutron/+bug/1716321 15:51:14 Launchpad bug 1716321 in neutron "StaleDataError: UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched." [Undecided,New] 15:51:40 I think we fixed, this one, but we need to make sure if it is occuring again. 15:51:45 Swami: related to bugs, there was a dvr change to master recently that i think needed a backport to pike, but i can't find the master review now 15:52:04 haleyb: do you remember what was the patch about 15:52:35 haleyb: Is that the DVR - HA migration with router device owner change. 15:53:13 link? i'll eventually find it but remember adding a comment in it the other day 15:54:46 never mind that patch has already been cherry picked. 15:55:22 i have a tab open somewhere, i'll ping you later 15:55:46 #link https://review.openstack.org/#/c/494376/ is this patch 15:56:36 I just cherry-picked this one, while I type. 15:56:57 i can't remember, i'll find it eventually :) 15:57:18 haleyb: ok I will go through my committed changes and cherry-pick if any 15:57:23 https://bugs.launchpad.net/neutron/+bug/1718369 was the only other recent L3 bug, and I marked as need more info since the release wasn't given 15:57:24 Launchpad bug 1718369 in neutron "DBDeadlock occurs when delete router_gateway" [Undecided,Incomplete] 15:58:28 ok i will keep an eye on it. 15:58:32 we're about out of time 15:58:37 #topic Open Discussion 15:58:39 I think we have come to the top of the hour 15:58:48 any topics for the last minute? 15:58:57 I hope we need to resolve the tempest issue. 15:59:23 haleyb: can you check with anil_venkata if he can run it in house the tempest test to see the failure on vrrp. 15:59:38 Swami: which bug? 16:00:26 time is up, ping me on #neutron channel 16:00:30 #endmeeting