15:00:21 #startmeeting neutron_l3 15:00:27 Meeting started Thu May 31 15:00:21 2018 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:31 The meeting name has been set to 'neutron_l3' 15:00:35 hi 15:00:38 Hi there! 15:00:38 o/ 15:01:10 #topic Announcements 15:01:52 Just a reminder that the Rocky-2 milestone will be next week 15:02:15 Any other announcements from the team? 15:03:01 ok then, let's move on 15:03:05 #topic Bugs 15:03:37 First one for today is https://bugs.launchpad.net/neutron/+bug/1766701 15:03:38 Launchpad bug 1766701 in neutron "Trunk Tests are failing often in dvr-multinode scenario job" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 15:03:51 I've been working on this one 15:05:26 At least one of the failures is due to a change introduced recently to an interface command 15:05:52 I have a tentative fix proposed to it: https://review.openstack.org/#/c/571043 15:06:08 I managed so far to change the nature of the error 15:06:22 but still need to do some additional work on it 15:06:35 any comments? 15:07:13 nice catch on the } 15:07:39 well, I still need to really fix it 15:07:45 did you want to merge that change regardless? 15:08:08 or keep looking and do it all at once? 15:09:17 I want to look at it further 15:09:33 I'll remove the WIP later 15:09:42 ack 15:10:43 Next one is https://bugs.launchpad.net/neutron/+bug/1756301 15:10:45 Launchpad bug 1756301 in neutron "Tempest DVR HA multimode tests fails due to no FIP connectivity" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 15:11:02 I spent some time with it earlier today 15:11:36 I can see the failures in several cases. 15:12:00 I am digging right now in http://logs.openstack.org/93/565593/4/check/neutron-tempest-dvr-ha-multinode-full/1c1b813/job-output.txt.gz 15:12:51 Looking at https://bugs.launchpad.net/neutron/+bug/1717302 15:12:53 Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [High,Confirmed] - Assigned to Brian Haley (brian-haley) 15:13:06 don't you think we can consolidate this two bugs? 15:13:11 these^^^^ 15:13:33 probably as they are both dvr ha multinode 15:14:07 and it's essentially the same report about floating ips 15:15:39 The latter one has more debugging notes. So I will keep this one, assign it to me and mark the other one duplicated 15:15:43 makes sense? 15:15:46 +1 15:16:34 hi 15:16:43 Done 15:16:49 hey Swami_ 15:16:54 hi 15:17:48 do you have any bugs to report Swami_? 15:18:07 mlavalle: yes I do have one, that needs some attention. 15:18:11 Let me pull up the bug 15:19:30 #link https://bugs.launchpad.net/neutron/+bug/1773999 15:19:32 Launchpad bug 1773999 in neutron "Allowed Address Pairs doesn’t work after neutron-port update" [Undecided,In progress] - Assigned to Boris (boris-maeck) 15:20:04 We might have to revert one of the patches that we pushed in to address this issue. 15:20:07 #link https://review.openstack.org/#/c/550676/ 15:20:27 mlavalle: I need to have a conversation on this issue. 15:21:03 you mean a high bandwidth conversation? 15:21:07 or here? 15:21:37 Changing the ARP entry NUD, does not help for this bug, since if we have a temporary state in the ARP table and when traffic is routed between the subnets, the ARP entry goes back to incomplete. 15:21:58 mlavalle: probably here, but if it takes longer then we can move to the channel. 15:22:09 let's carry on here 15:22:23 that way haleyb and njohnston can chime in 15:22:48 mlavalle: The optimal way to address this issue is only by polling on the GARP message for a given IP in the router namespace and then add a permanent entry for the allowed_address_pair. 15:23:10 But this would have a hit in performance. 15:23:34 yeah 15:23:58 Since it would have a hit in performance, should we make it configurable, so that people who wanted to use this feature can turn on or off the feature. 15:24:41 This all boils down to the design constraint of DVR not able to forward the ARP request outside of the given node. 15:25:09 what is meant by 'polling'? i need to look at the bug 15:25:51 mlavalle: haleyb: I mean't we need to sniff all the packets that are coming to the router-namespace and filter it by the GARP type and IP address registered to filter for. 15:25:59 haleyb: That is what I meant for polling. 15:26:18 haleyb: Is there any other way to address this issue. 15:27:46 mlavalle: Do you think we need to file an RFE for this change request? 15:28:20 that is probably a good idea 15:28:49 so we can iterate on the problem and the possible solution 15:28:53 mlavalle: ok let me go ahead and file an RFE to discuss further on this approach. 15:29:12 should we do the revert right away? 15:29:22 i don't know how we'd easily watch for gARPs. i guess i'd like to know how the router is trying to confirm the ARP and failing 15:30:32 haleyb: Yes right now, the router gets this GARP message and the ARP table is update as temporary. Since permanent entries are not changeable. 15:31:52 haleyb: So when we have a VM residing on a different subnet trying to ping this vrrp ip, the arp message from the VM reaches the router, the router sees the ARP table but since the entry is temporary it is trying to confirm by re-arping to the IP to confirm the MAC. Since we don't allow the ARP traffic to pass out, the ARP entry becomes invalid after a few seconds. 15:32:30 Swami_: i guess writing it up is the first step, then maybe we can think of ways to fix it 15:32:43 haleyb: ok, I will write it up. 15:33:02 this is the status after merging https://review.openstack.org/#/c/550676/ 15:33:04 haleyb: mlavalle: I create an RFE and then add in my thoughts on how to approach this. 15:33:05 right? 15:33:49 mlavalle: yes you are right. 15:33:56 ok 15:34:08 yeah a write up will help 15:34:23 so that change merged in master/queens/pike :( 15:34:25 mlavalle: ok thanks, will do. 15:34:31 haleyb: yes. 15:35:13 how about the revert? 15:35:52 mlavalle: A revert should remove the existing change. 15:36:12 oh I know, but is that a more desirable poisition? 15:36:42 mlavalle: But the allowed_address_pair issue will still persist and the long term solution would be the new approach that I mentioned. 15:37:13 ok, so it seems that we don't need to revert right away 15:37:23 am I correct? 15:38:10 mlavalle: The reason someone had asked to revert is because, previously at least it was able to pass traffic if a customer manually did a port update, since port update will push in a new arp table update if MAC had changed and this patch broke it. 15:39:14 so reverting leaves us in a better position then 15:39:14 mlavalle: So let us first revert it and think about the new proposal. 15:39:22 mlavalle: yes 15:39:33 yeah, that is what I was tryiong to get at 15:39:47 ok with you haleyb? 15:40:09 sure 15:40:25 Swami_: will you propose the revert? 15:40:44 mlavalle: Yes there is already a patch. https://review.openstack.org/#/c/570941/ 15:40:52 Thanks 15:41:14 mlavalle: haleyb: Another question about RFE. 15:41:37 not from me, I will wait for the RFE 15:41:51 mlavalle: just wonder if revert should be done from original change so context is all there 15:41:58 mlavalle: haleyb: I will also file an RFE to make the FloatingIP path to IPv6, since we don't have IPv6 support right now for the fast path exit. 15:42:24 haleyb: that is good point. let's preserve history 15:42:29 Swami_: ok 15:42:44 haleyb: ok, I will revert from the original patch. I will push in one. 15:43:16 Swami_: that is this person's frist contribution 15:43:28 mlavalle: Yes 15:43:41 let me leave a comment in his patch proposing that he does the revert from the original patch 15:43:52 I want to be welcoming ;-) 15:44:02 makes sense? 15:44:04 mlavalle: Sure, nice. 15:44:10 and they can revert the stable ones the same way 15:44:32 yeah, he might be interested in making a contribution 15:44:49 mlavalle: he is a customer that I am working with. 15:46:05 ok done. I left a comment there 15:46:14 mlavalle: thanks 15:47:09 mlavalle: haleyb: I will be also filing an RFE for the IPv6 support for the FloatingIP namespace for DVR routers, since we don't have radpd and link local address in IPv6 yet. 15:47:38 Swami_: ack 15:47:52 mlavalle: ok thanks 15:48:14 I also have a request for Swami_ and / or haleyb. Would you help triaging https://bugs.launchpad.net/neutron/+bug/1773286? 15:48:15 Launchpad bug 1773286 in neutron "In some specific case with dvr mode I found the l2pop flows is incomplete." [Undecided,New] 15:48:19 Swami_: ack, and there might be some old patches around to help make that work 15:48:43 mlavalle: yes, I think I know this issue. 15:48:49 mlavalle: yes, saw that and he replied with the bad code, haven't looked yet 15:49:22 mlavalle: The line of code pointed in this bug, was by passed, since we were seeing an error log in the jenkins. 15:49:55 ok, cool. Please leave some comments in the bug 15:49:58 mlavalle: But if it is eating up the l2pop, we can leave with the error log. 15:50:19 mlavalle: will do. 15:50:41 The other petion that we have is to see if anyone has seen the following: 15:50:45 http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/subnode-2/screen-q-l3.txt.gz?level=ERROR 15:50:48 haleyb: can you please point me to the old patches for the radpd for the fip namespace. 15:51:33 This came up in the last CI meeting 15:51:58 Swami_: i'll find them, don't think it was with ra proxy (i think that's what you meant) 15:52:13 haleyb: thanks 15:52:36 mlavalle: Let me check the code path to see why we are seeing this gateway route update error. 15:53:01 if you find something relevant, please file a bug 15:53:11 mlavalle: sure will do. 15:53:21 Thank you! 15:53:44 Any other bugs we should discuss today? 15:54:47 I take that as no 15:54:56 #topic Open Agenda 15:55:09 any thing we should discuss today? 15:55:30 mlavalle: I don't have any. 15:56:07 haleyb: You missed the good food and drinks in the summit 15:56:10 haleyb: do you know that Swami_ and Ryan Tidwell attended the Summit? 15:56:40 darn, hope everyone had a good time 15:56:45 also Mark McClain 15:57:11 haleyb: rest assured you were missed ;-0 15:57:25 any more sea plane rides? 15:57:40 mlavalle: I did a seaplane ride. 15:57:53 haleyb: it was fun getting on a seaplane ride. 15:58:07 I didn't do the sea plane this time around, but I went ziplinning in the mountains 15:58:13 5 zip lines 15:58:13 mlavalle: haleyb: It was an 1 hour ride. 15:58:21 mlavalle: That would be fun 15:58:33 you guys have all the fun :) 15:58:34 the longest one is almost a third of a mile long 15:58:37 mlavalle: Did you do it in the Grouse mountain. 15:58:50 Exactly, Grouse Mountain 15:59:17 ok, time is up 15:59:22 Nice talking to you 15:59:27 #endmeeting