15:02:52 <haleyb> #startmeeting neutron_l3 15:02:53 <openstack> Meeting started Thu Sep 21 15:02:52 2017 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:56 <Swami> hi 15:02:57 <openstack> The meeting name has been set to 'neutron_l3' 15:03:06 <haleyb> #chair Swami 15:03:07 <openstack> Current chairs: Swami haleyb 15:04:16 <haleyb> #topic Announcements 15:04:56 <haleyb> Hope people had a productive PTG and have recovered by now 15:05:16 <Swami> mlavalle will not be able to attend today's meeting, since he has a conflict 15:05:20 <Swami> haleyb: sure 15:06:33 <Swami> haleyb: I hope mlavalle might have sent out a report of the PTG update 15:07:04 <haleyb> i don't remember seeing it yet, i'll look again 15:07:38 <Swami> haleyb: I thought mlavalle mentioned that he would send it out in a day or two. Sorry I have not seen either. 15:07:41 <haleyb> I guess the one thing we got out of the PTG was more L3 bugs, at least in all these corner cases of router migration and such 15:08:42 <haleyb> I had no other announcements, might as well move to bugs 15:08:48 <haleyb> #topic Bugs 15:09:14 <Swami> haleyb: thanks, let us go over the dvr bugs 15:09:36 <Swami> haleyb: yes agreed 15:10:19 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1718585 15:10:21 <openstack> Launchpad bug 1718585 in neutron "set floatingip status to DOWN during creation" [Undecided,Opinion] - Assigned to venkata anil (anil-venkata) 15:10:43 <Swami> This bug has been filed by anil-venkata 15:11:13 <Swami> It seems that he is asking to change the behavior of floatingip status report. 15:11:14 <haleyb> i thought we set status to ERROR by default, only ACTIVE if succeedd 15:11:46 <Swami> haleyb: That was my opinion, but I need to recheck and I have not payed attention to the status. 15:12:18 <Swami> haleyb: But in the case of migration of floatingIP or router associated with a floatingip keeping a new status for floatingip would be tedious. 15:13:11 <Swami> We always say that when a VM migrates or a floatingIP migrates there should not be any down time, then why do we need to change the state. 15:14:09 <Swami> I agree that during the initial floatingIP setup there should be some state that determines, if it is ready to be consumed or not 15:14:12 <haleyb> yeah, we should only change on target host if it failed, otherwise the state could be flakey if new updated before old host 15:14:46 <haleyb> since one could be tearing down while other is building 15:14:49 <Swami> haleyb: but the floatingip state is not tied to host. 15:15:42 <haleyb> but the agent reports the state, that's what i'm getting at 15:16:24 <Swami> haleyb: There are currently three floatingIP states and I did see an additional state that is defined in the agent as 'NOCHANGE' 15:17:56 <Swami> haleyb: may be we can see what makes sense to handle all these timing issues with floatingip. 15:18:04 <Swami> More discussion needed on this. 15:18:31 <Swami> The next one in the list is 15:18:34 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1718345 15:18:35 <openstack> Launchpad bug 1718345 in neutron "ml2_distributed_port_bindings not cleared after migration from DVR" [Undecided,New] 15:19:27 <Swami> I have to check the code path to see why the ml2 port binding is not being cleared when the router is migrated. 15:20:02 <Swami> The port binding is actually done when ensure_port_binding is called. 15:20:17 <Swami> But we need to see if the router migration takes a similar path or not. 15:20:25 <haleyb> looks like a bug, since we found all these other cases with router ports i'm not surprised to find something else 15:20:48 <Swami> The original design was to move the legacy to dvr and not to move the dvr to legacy. So there may be some corner cases here. 15:21:08 <Swami> which we have not addressed. 15:21:27 <haleyb> agreed, we've just never noticed since noone typically does this 15:21:30 <Swami> I will triage this and see what is missing in here. 15:22:00 <haleyb> great, thanks 15:22:15 <Swami> The next in the list is 15:22:18 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1717302 15:22:19 <openstack> Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [Undecided,New] 15:23:30 <Swami> I posted a patch to address this issue 15:23:45 <Swami> #link https://review.openstack.org/#/c/505324/ 15:24:11 <Swami> but still I was seeing a couple of tests failing for east-west communication. 15:24:18 <haleyb> Can you add a closes-bug to that next update? to tie it to the bug? 15:24:45 <Swami> haleyb: This is not directly related to this bug, this was a migration patch that I included to test the case here. 15:25:01 <Swami> haleyb: But still I am seeing east-west with fip failing. 15:25:07 <Swami> haleyb: not sure what is causing this. 15:25:35 <Swami> haleyb: The log trace still shows that an IP cannot be assigned to qg- interface and does not exist. 15:26:07 <haleyb> Swami: i will look at the dvr scheduler part of that change again, think that's where i didn't look as close 15:26:17 <haleyb> but the qg- error is still strange 15:26:34 <Swami> Also the unknown factor here is, in the logs, I can see that it is trying to ARP on an IP for qg- interface in the qrouter namespace. I am not sure if this is log noise or something related to the failure. 15:27:07 <reedip_> hi 15:27:18 <haleyb> reedip_: hi 15:27:25 <Swami> reedip_: hi 15:27:42 <Swami> haleyb: we will keep debuging this issue with the patch. 15:27:56 <reedip_> just joining in , please continue, I will put up my point in an open discussion if possible :) 15:28:10 <Swami> reedip_: we are discussing the dvr bugs. 15:28:22 <Swami> reedip_: If you have one related to the FWaaS, you can post it now 15:28:58 <reedip_> Swami : yep , but I think its more related to the FWaaS than DVR actually . 15:29:36 <Swami> ok I will go ahead and bring in that bug for discussion 15:29:39 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1716401 15:29:40 <openstack> Launchpad bug 1716401 in neutron "FWaaS: Ip tables rules do not get updated in case of distributed virtual routers (DVR)" [Undecided,New] - Assigned to Reedip (reedip-banerjee) 15:29:51 <reedip_> oh heheheh :) 15:29:55 <reedip_> yeah thats the one :) 15:30:01 <Swami> reedip_: can you discuss it now 15:30:35 <reedip_> I understand that FWaaS still hasnt looped in the HA part of a router, so its a separate discussion :) 15:31:00 <Swami> reedip_: I think I had read through the bug description and add in my comments. I also wanted to talk to sridhar during the PTG, but he was bussy. 15:31:18 <Swami> reedip_: let us keep the ha part apart and work with non-ha first. 15:31:33 <reedip_> Swami : well we can discuss it now :) 15:32:23 <Swami> reedip_: go ahead 15:32:51 <Swami> reedip_: if you can point me in the bug, where your code is handling the router_update scenario, then I can check it out. 15:32:57 <reedip_> Swami : as far as I know, the DVR code included the DVR and DVR_SNAT option for deployment in the Compute and the Controller node . 15:33:15 <reedip_> I do not know much about DVR, so just starting to look at it from the bug;s point of view 15:33:34 <reedip_> but FWaaS has not considered DVR_SNAT as of now 15:33:57 <Swami> reedip_: I am sure when we originally designed this both was considered. 15:34:40 <Swami> reedip_: in the case of floatingip either residing in dvr_snat node or in dvr node, the rules were configured on the 'rfp' port of the router namespace and cleared when floatingip was removed. 15:35:20 <Swami> reedip_: this is only for north-south. In the case of dvr_snat you additionally need to setup the rules in the qg- interface on the snat_namespace. 15:35:34 <reedip_> Swami : ok ... 15:36:29 <Swami> reedip_: The only behavior change that we introduced in dvr is creating the fipnamespace along with the gateway create on external network. 15:36:32 <reedip_> Swami : I need to read the arch for DVR, so maybe I can put this issue up later ? I will look into the points that you have mentioned 15:36:59 <Swami> reedip_: ok, ping me on the IRC channel or add your comments on the bug description and we can take it from there. 15:37:06 <Swami> reedip_: hope this helps. 15:37:38 <reedip_> Sure Swami. By the way any doc where in I can find some DVR info ? 15:37:49 <reedip_> I mean the arch for DVR ? 15:38:08 <Swami> reedip_: All our docs were in the google. Ping me offline and I can point you to the dvr docs. 15:38:27 <Swami> The next bug in the list is 15:38:30 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1707003 15:38:32 <openstack> Launchpad bug 1707003 in neutron "gate-tempest-dsvm-neutron-dvr-ha-multinode-full-ubuntu-xenial-nv job has a very high failure rate" [High,Confirmed] - Assigned to Brian Haley (brian-haley) 15:38:32 <reedip_> ping you offline ? 15:38:34 <reedip_> ok :) 15:39:17 <Swami> haleyb: any update on the grafana with this issue. Are we still seeing the grenade failures 15:39:48 <haleyb> refreshing page now... 15:39:56 <Swami> haleyb: sorry wrong post. 15:39:59 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1713927 15:40:00 <openstack> Launchpad bug 1713927 in neutron "gate-grenade-dsvm-neutron-dvr-multinode-ubuntu-xenial fails constantly" [High,In progress] - Assigned to Brian Haley (brian-haley) 15:40:05 <Swami> I meant this bug. 15:40:36 <haleyb> grafana still has the job just under 20%, let me look at other bug 15:41:32 <haleyb> oh, that bug :) I just updated my fip host patch this morning, had to go through the logic yet again 15:42:28 <Swami> haleyb: For the grafana issue the patch that you are working on is not required. Since we already have the agent side fix, that should have brought it down. 15:42:29 <haleyb> https://review.openstack.org/#/c/500143/ is the patch 15:42:56 <haleyb> right, it's just the last remaining known issue to fix and backport 15:42:59 <Swami> haleyb: thanks for the patch link. 15:43:20 <Swami> haleyb: The one thing I have seen in the grenade failure is something unrelated like volume failures etc., 15:43:44 <haleyb> i will look at some of the failures to see if they are related, i've seen the "can't ssh" failures randomly, and there is another patch for that 15:44:35 <Swami> haleyb: ok sounds good. 15:44:38 <Swami> Let us move on 15:44:41 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1717597 15:44:42 <openstack> Launchpad bug 1717597 in neutron "Bad arping call in DVR centralized floating IP code" [High,In progress] - Assigned to Brian Haley (brian-haley) 15:44:42 <haleyb> https://review.openstack.org/#/c/500384/ was the patch 15:45:01 <Swami> haleyb: ok thanks 15:45:07 <haleyb> i think that patch just got approved - for arping 15:45:26 <Swami> #link https://review.openstack.org/#/c/504252/ 15:45:29 <Swami> link to the patch. 15:45:36 <Swami> haleyb: yes it should merge today. 15:46:02 <haleyb> then i'll cherry-pick 15:46:25 <Swami> haleyb: thanks 15:46:28 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1716829 15:46:29 <openstack> Launchpad bug 1716829 in neutron "Centralized floatingips not configured right with DVR and HA" [High,In progress] - Assigned to Brian Haley (brian-haley) 15:46:49 <Swami> #link https://review.openstack.org/#/c/503530/ 15:46:56 <Swami> This needs a workflow 15:48:04 <Swami> I think this can even merge before the fip['host'] patch, so there not really much of dependencies here. 15:48:04 <haleyb> the parent if the fip host patch, so it can't merge yet 15:48:19 <Swami> haleyb: ok 15:49:02 <haleyb> strange how my comments appear in the latest +2, blame that on gerrit 15:50:08 <Swami> haleyb: sure 15:50:27 <Swami> haleyb: I think we have discussed almost all the bugs. 15:50:55 <Swami> haleyb: There are still some gate failure bugs, which are DB related. StaledataError and failed to bind a port. 15:51:13 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1716321 15:51:14 <openstack> Launchpad bug 1716321 in neutron "StaleDataError: UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched." [Undecided,New] 15:51:40 <Swami> I think we fixed, this one, but we need to make sure if it is occuring again. 15:51:45 <haleyb> Swami: related to bugs, there was a dvr change to master recently that i think needed a backport to pike, but i can't find the master review now 15:52:04 <Swami> haleyb: do you remember what was the patch about 15:52:35 <Swami> haleyb: Is that the DVR - HA migration with router device owner change. 15:53:13 <haleyb> link? i'll eventually find it but remember adding a comment in it the other day 15:54:46 <Swami> never mind that patch has already been cherry picked. 15:55:22 <haleyb> i have a tab open somewhere, i'll ping you later 15:55:46 <Swami> #link https://review.openstack.org/#/c/494376/ is this patch 15:56:36 <Swami> I just cherry-picked this one, while I type. 15:56:57 <haleyb> i can't remember, i'll find it eventually :) 15:57:18 <Swami> haleyb: ok I will go through my committed changes and cherry-pick if any 15:57:23 <haleyb> https://bugs.launchpad.net/neutron/+bug/1718369 was the only other recent L3 bug, and I marked as need more info since the release wasn't given 15:57:24 <openstack> Launchpad bug 1718369 in neutron "DBDeadlock occurs when delete router_gateway" [Undecided,Incomplete] 15:58:28 <Swami> ok i will keep an eye on it. 15:58:32 <haleyb> we're about out of time 15:58:37 <haleyb> #topic Open Discussion 15:58:39 <Swami> I think we have come to the top of the hour 15:58:48 <haleyb> any topics for the last minute? 15:58:57 <Swami> I hope we need to resolve the tempest issue. 15:59:23 <Swami> haleyb: can you check with anil_venkata if he can run it in house the tempest test to see the failure on vrrp. 15:59:38 <haleyb> Swami: which bug? 16:00:26 <haleyb> time is up, ping me on #neutron channel 16:00:30 <haleyb> #endmeeting