15:00:08 #startmeeting neutron_dvr 15:00:10 in all seriousness, I'll be a bit distracted during the front end of the call 15:00:13 Meeting started Wed Nov 11 15:00:08 2015 UTC and is due to finish in 60 minutes. The chair is Swami. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:17 The meeting name has been set to 'neutron_dvr' 15:00:34 #topic announcement 15:00:49 Do we have anything to share with the team. 15:00:59 just for the minutes, link to the agenda? 15:01:07 The gate was broken yesterday, is it stable now. 15:01:13 regXboi: sure thanks 15:01:35 The agenda for todays meeting can be seein in the following link 15:01:48 #link https://wiki.openstack.org/wiki/Meetings/Neutron-DVR agenda link for today's meeting 15:01:49 #link https://wiki.openstack.org/wiki/Meetings/Neutron-DVR 15:01:54 heh - sorry about that 15:02:03 regXboi: you are faster than me 15:02:11 too much IRC meeting 15:02:24 haleyb will be joining late today, so I will be running this meeting today. 15:02:34 Now let us move on 15:03:05 #topic Bugs 15:03:53 regXboi: Thanks you have added a list of bugs for todays discussion, let me know if I miss anything. 15:03:58 The first on the list is 15:04:02 #link https://bugs.launchpad.net/neutron/+bug/1372141 15:04:02 Launchpad bug 1372141 in neutron "StaleDataError while updating ml2_dvr_port_bindings" [Medium,Confirmed] 15:04:04 will do 15:04:29 regXboi: You have mentioned that this is no more seen for the last seven days in the gate. 15:04:41 yes - let me double check that as of today 15:05:17 regXboi: we should probably wait for another week and then close if not seen 15:06:12 The next bug in the list is 15:06:19 yes - I see 37 hits in logstash and none of them are from DVR - and I'm good waiting another week 15:06:34 ok lets move on. 15:06:46 that is good news that it is not seen against DVR. 15:06:52 #link https://bugs.launchpad.net/neutron/+bug/1456624 15:06:52 Launchpad bug 1456624 in neutron "DVR Connection to external network lost when associating a floating IP" [Medium,In progress] - Assigned to venkata anil (anil-venkata) 15:07:11 regXboi: you had a note in there to make sure if this is really a bug. 15:07:26 My problem is that I'm not convinced this is a bug - I think this is expected behavior 15:08:12 I think I agree. I've been thinking about it and haven't changed my mind. 15:08:28 The bug states that while pinging through the snat, when FIP is applied the ping hangs and then resumes if we reping. I am not sure either. If this is the same behavior as the centralized router, then we can close it. But in the case of centralized router if it behaves different, then we should pay attention to it. 15:09:04 carl_baldwin: so what is your thought right now 15:09:48 It does behave differently. But, the fix is very painful. 15:10:18 Maybe we could propose a fix which makes centralized behave the same. ;) 15:10:39 carl_baldwin: nice! :-) 15:10:41 carl_baldwin: +1 15:10:42 um 15:10:50 I'm not sure how that would work 15:10:55 but that now sounds like an rfe 15:11:29 Just a conntrack kill of existing traffic 15:11:31 ok let us tag it as opinion. 15:11:47 * regXboi has to think about that idea 15:11:57 * regXboi notes "that idea" = "conntrack kill" 15:12:08 carl_baldwin: I think for now we can document this behavior so that we have captured it and then move on and focus on other issues. 15:12:12 I still don't like the proposed cure. Very complicated 15:12:33 I'd say let's go opinion and revisit in it bit 15:12:35 Swami: +1 15:12:57 is everyone ok with this proposal, if so we can move on. 15:13:00 regXboi: +1 too 15:13:29 ok, moving on to the next one. 15:13:35 #link https://bugs.launchpad.net/neutron/+bug/1458541 15:13:35 Launchpad bug 1458541 in neutron "Decomposite DVR router compute node and network node functionallity to two classes" [Medium,In progress] - Assigned to Ryan Moats (rmoats) 15:14:18 I hope it says that we will wait until the binding refactoring. So nothing more to add on this item. 15:14:20 this is on hold for the rebinding 15:14:22 yes 15:14:27 regXboi: do you want to add anything to it. 15:14:40 nope that's up to date 15:14:54 moving on to the next one. 15:14:59 #link https://bugs.launchpad.net/neutron/+bug/1510796 15:14:59 Launchpad bug 1510796 in neutron "Function sync_routers always call _get_dvr_sync_data in ha scenario" [Low,In progress] - Assigned to ZongKai LI (lzklibj) 15:15:20 There is a already a patch set for this bug. 15:15:34 #link https://review.openstack.org/#/c/239908/ 15:15:40 yes there is 15:16:19 Ok this might require more reviews 15:16:49 Anything else to add to this bug. 15:17:19 nope 15:17:25 regXboi: ok 15:17:30 #link https://bugs.launchpad.net/neutron/+bug/1513678 15:17:30 Launchpad bug 1513678 in neutron "At scale router scheduling takes a long time with DVR routers with multiple compute nodes hosting thousands of VMs" [High,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:18:03 I had two patches (WIP) on this bug. 15:18:46 But I got review comments from carl_baldwin and obondarev_ that one of the patch may be addressed after the scheduler refactor. 15:19:11 Swami: your point about backporting it sounds reasonable to me 15:19:31 so I’m ok with both 15:19:53 obondarev_: meaning addressing this bug on liberty and kilo rather in the master. 15:20:24 Swami: we need to merge it in master first before backporting I guess 15:20:32 obondarev_: +1 15:20:39 yes, that's the process 15:20:42 obondarev_: ok, so you are good moving forward with these two patches. 15:20:55 I'm good with both once they're in shape. 15:20:57 Swami: yep 15:21:01 ok, no worries, then let us move on. Thanks 15:21:17 next one 15:21:21 #link https://bugs.launchpad.net/neutron/+bug/1513574 15:21:21 Launchpad bug 1513574 in neutron "firewall rules on DVR FIP fails to work for ingress traffic" [Undecided,Confirmed] 15:21:44 This one was filed recently and the bug is confirmed. 15:22:03 I saw the spec for FWaaS 2.0, which I agree should address this 15:22:40 So we will not fixing this bug right now and will wait for the FWaaS 2.0 and once the rules are applied to ports, this should solve the problem. 15:23:03 I will update sc68cal on this in the FWaaS meeting. 15:23:31 #link https://bugs.launchpad.net/neutron/+bug/1362242 15:23:31 Launchpad bug 1362242 in neutron "bridge_mappings isn't bound to any segment warning from l2pop" [Low,Confirmed] - Assigned to Irena Berezovsky (irenab) 15:24:21 I have not seen any patch to this bug and it is there for a while. Since the priority is not higher I have not payed attention to this bug. 15:25:02 yes, I'm trying to clean up some of the lower priority cruft today - I'd like to see if somebody is willing to pick this up 15:25:35 We need to see if this is still seen in the master. This bug was reported long back. 15:25:50 I can look at this a bit. I honestly don't remember filing it. 15:25:58 oh - that I can update - yes it is - I looked yesterday 15:26:00 carl_baldwin: thanks 15:26:18 Ok, let us move on. 15:26:30 #link https://bugs.launchpad.net/neutron/+bug/1447227 15:26:30 Launchpad bug 1447227 in neutron "Connecting two or more distributed routers to a subnet doesn't work properly" [Low,In progress] - Assigned to ZongKai LI (lzklibj) 15:27:26 I don't think we have a patch for this bug as well. 15:28:00 Sorry there was a patch and it was abandoned a while back 15:28:03 #link https://review.openstack.org/#/c/191671/ 15:28:24 auto-abandoned 15:28:27 I am not sure if ZongKai Li is still pursuing this patch. 15:28:43 swami: can you follow up to see? 15:28:48 that was the AI I wanted 15:28:58 Ok I will check with ZongKai Li on this. 15:29:01 note: me updated https://bugs.launchpad.net/neutron/+bug/1362242 with logstash information 15:29:01 Launchpad bug 1362242 in neutron "bridge_mappings isn't bound to any segment warning from l2pop" [Low,Confirmed] - Assigned to Irena Berezovsky (irenab) 15:29:08 ok let us move on. 15:29:37 we are half way, we need to move on 15:29:55 #link https://bugs.launchpad.net/neutron/+bug/1445255 15:29:55 Launchpad bug 1445255 in neutron "DVR FloatingIP to unbound port does not work" [Low,In progress] 15:30:23 Yes there is a patch on this bug but it inactive or abandon. 15:30:45 well, there is a reference from obondarev to a merged patch, so I was wondering if this is still an issue 15:30:46 I thought it was fixed a while ago 15:31:06 What exactly is an unbound port here? 15:31:17 This issue might have been fixed, since we do have a way to address floatingip on bound ports once it is bound to a host. 15:31:40 nm, answered my owned question by reading the bug. 15:31:48 can somebody take the AI of verifying if this still exists? 15:31:50 carl_baldwin: When a port does not have a host_binding and if that port is associated with a FIP> 15:32:27 I think we addressed this bug by supporting late-binding to the FIP. This should be solved by now 15:32:37 I will update the bug. 15:32:41 thx 15:32:51 #link https://bugs.launchpad.net/neutron/+bug/1452458 15:32:51 Launchpad bug 1452458 in neutron "Server returns error 500 when setting a DVR router as a gateway on a network with no subnet, and then adding an interface" [Low,In progress] - Assigned to Nikhil AP (niks3089) 15:33:28 I have not seen any patch on this bug yet. 15:33:56 no, the question is to find out if the assignee is working on one 15:34:21 I did not see any activity. 15:34:30 If not we will find a new owner. 15:34:49 I will ask either adolfo or ritesh to take a look at it. 15:34:52 Let us move on. 15:35:00 I can take the AI of verifying that it still exists 15:35:15 #action regXboi to verify https://bugs.launchpad.net/neutron/+bug/1452458 15:35:15 Launchpad bug 1452458 in neutron "Server returns error 500 when setting a DVR router as a gateway on a network with no subnet, and then adding an interface" [Low,In progress] - Assigned to Nikhil AP (niks3089) 15:35:22 regXboi: thanks 15:35:30 The next one in the list is 15:35:33 #link https://bugs.launchpad.net/neutron/+bug/1463831 15:35:33 Launchpad bug 1463831 in neutron "neutron DVR poor performance" [Undecided,Incomplete] - Assigned to Adolfo Duarte (adolfo-duarte) 15:35:55 I did not see adolfo in here. 15:36:10 we can follow up with him in channel 15:36:20 Ok, I will check with him and update the bug. 15:36:24 I'm here 15:36:45 fitoduarte: hi 15:36:54 this one is cause by the router staying very very quite:) 15:37:06 quiet? 15:37:25 It seems that you are the owner on this bug. Is it still seen and valid bug. 15:37:35 basically the open vs witch ages out the mac 15:38:19 I'm thinking we don't want that to ever happen, do we? 15:38:25 it is seen, not sure how much impact it has. very corner case 15:38:45 no we don't 15:38:46 but is that behavior different than the cvr in dvr 15:39:22 it is in that yiu can come up with a rare case in which only happens for dvr. 15:39:35 fitoduarte: ok adolfo, the bug currently says incomplete, if you can update it with the right flag, that would work out. 15:39:47 will do 15:39:53 ok, thanks we will move 15:39:53 and I'm thinking Low/Wishlist as it *is* a corner case 15:40:07 regXboi: agree 15:40:20 The next one in the list 15:40:24 #link https://bugs.launchpad.net/neutron/+bug/1496201 15:40:24 Launchpad bug 1496201 in neutron "DVR: router namespace can't be deleted if bulk delete VMs" [Medium,New] - Assigned to Kasey Alusi (kasey-alusi) 15:40:24 * haleyb wanders in 15:40:44 This one I have not triaged it but there is a owner already assigned. 15:40:52 haleyb: we're still walking through the old/crufty bugs that had accumulated at the bottom of the list 15:41:10 regXboi: thanks, trying to read s/b 15:41:15 yes, this needs triage/verification 15:41:20 I will ping the owner and see if he has any patch set on it. 15:41:24 thx 15:41:27 I will triage this. 15:41:41 #link https://bugs.launchpad.net/neutron/+bug/1499045 15:41:41 Launchpad bug 1499045 in neutron "get_snat_port_for_internal_port called twice when an interface is added or removed by the l3 agent in the case of DVR routers." [Low,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:42:04 should we tag this low hanging fruit? 15:42:09 it seems simple enough 15:42:16 This is part of refactor one. I think carl_baldwin was not happy with this one. So let us close this bug. 15:42:27 ok that works for me too :) 15:42:39 regXboi: agreed 15:42:51 #link https://bugs.launchpad.net/neutron/+bug/1505571 15:42:51 Launchpad bug 1505571 in neutron "FIP disassociation takes longer in non DVR test scenario" [Undecided,Incomplete] - Assigned to Sonu (sonu-sudhakaran) 15:43:07 There are couple of Kilo related bugs. 15:43:28 Last week we discussed about the state of the Kilo related bugs, do we have any handle on it. 15:43:38 yes - and since we've decided to support kilo, these need review 15:43:41 are we going to fix it or only security related bug. 15:43:54 regXboi: Ok thanks 15:44:09 Since we only have 15 more minutes let us get into the other topics 15:44:22 so, folks - please look at the patchsets for the last two bugs in today's list 15:44:44 #topic Gate_failures 15:44:58 regXboi: Any updates on this 15:45:07 the gate took a bad hit this week due to other issues 15:45:17 so we can't really say anything 15:45:27 both dvvr and non-dvr gates were broken 15:45:36 oh yeah, multinode got hammered 15:45:47 I had a general question on the failures. 15:45:49 so it was pretty much a lost week :( 15:46:10 The "SSHTimeout" failure, I was just doing some research on this. This failure is seen on both cvr and dvr 15:46:20 But the frequency of occurence with DVR is more. 15:46:36 oy 15:46:40 We have talked about this earlier on how to attack this issue with tempest. 15:47:12 As a team what should we do to address this issue in the gate/tempest test to provide more reasonable error 15:47:53 so I've been trying to replicate the race condition locally and I've not hit a success case *yet* 15:48:12 haleyb: you mentioned last week that we can get hold of the tempest core to discuss. 15:48:29 regXboi: yes that is tough 15:48:31 until I see an example, I'm loathe to fire blindly 15:48:51 regXboi: "success" as in see a failure? 15:48:55 Swami: was this about somehow stopping things during failure to try and debug? 15:48:56 we need to invite the tempest folks to the dvr meeting to discuss a strategy. 15:49:10 carl_baldwin: yes in this case "success" is seeing a test fail on SSHTimeout 15:49:23 haleyb: yes, that is one thing, the other one is how to improve the error report from the tempest test. 15:50:19 carl_baldwin: haleyb: can we get some tempest cores attention on this issue. 15:50:24 Swami: is the bug # in the meeting notes? 15:50:52 haleyb: There is no specific bug to this, but it is related to general gate failures for DVR. 15:51:38 Should it have a bug? Might be good to have something to center a discussion with them around. 15:52:07 carl_baldwin: ok we can add a bug to track DVR gate failures. I will file one today. 15:52:11 yes, we should create a bug with the output of the failure 15:52:23 will do. 15:52:42 anything else on the gate failures that the team wanted to discuss. If not we will move on 15:52:58 #topic Performance_scalability 15:53:07 obondarev_: are you still there. 15:53:30 yep 15:53:35 any update 15:53:38 I guess next step in scalability improvement should be binding refactoring 15:54:04 didn’t have a chance to get scale lab for my tests this week 15:54:33 should grab it next week and do some verification 15:54:34 I've started an etherpad page: https://etherpad.openstack.org/p/hyper-scale 15:54:40 obondarev_: ok thanks 15:54:50 regXboi: yes thanks for the link 15:56:10 obondarev_: Thanks for the update. 15:56:22 we do have 5 more minutes. 15:56:41 Swami: your patches for scheduling optimization should help scalability as well 15:56:51 obondarev_: yes that would. 15:57:11 #topic open_discussion 15:57:15 Swami: let's add those as links to the agenda so that people can review 15:57:46 regXboi: Yes I added the bug in there but not the patches, will add it to the scalability section. 15:57:53 thx 15:58:23 The DVR HA server side patch requires one more core review. #link https://bugs.launchpad.net/neutron/+bug/1365473/ 15:58:23 Launchpad bug 1365473 in neutron "Unable to create a router that's both HA and distributed" [High,In progress] - Assigned to Adolfo Duarte (adolfo-duarte) 15:58:45 I will ping amuller to see if he can bless it. 15:59:21 Thanks everyone for joining the meeting. 15:59:27 Meet you all next week. 15:59:30 I think we are done - hopefully bugs won't eat us out of house and home next week 15:59:43 #endmeeting