15:00:21 #startmeeting neutron_dvr 15:00:22 Meeting started Wed Jan 13 15:00:21 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:25 The meeting name has been set to 'neutron_dvr' 15:00:29 o/ 15:00:33 o/ 15:00:33 #chair Swami 15:00:34 Current chairs: Swami haleyb 15:00:58 #topic Announcements 15:01:14 Mitaka 2 is next week according to http://docs.openstack.org/releases/schedules/mitaka.html 15:01:52 so let's not slow down :) 15:02:02 :) 15:02:46 We have a lot of bugs, and I'd like to try and lessen that, so (copying from the meeting wiki...) 15:02:47 I was about to ask how are we as DVR team affected by Mitaka 2 :) 15:03:16 well, it's more of a "time will sneak up on us" comment 15:03:31 especially with something like the migration work 15:03:36 yeah 15:03:54 which migration? 15:04:01 Yes that is one of the high's that we need to resolve and since it has some dependency with nova that would be critical. 15:04:03 live migration you mean? 15:04:09 the nova migration with dvr swami is working on 15:04:15 I see 15:04:17 obondarev: yes 15:04:43 ok, sorry for interruption 15:05:00 np, it's good that we're clear 15:05:08 obondarev: did you get a chance to review the discussion on this item in the launchpad bug.#1456073 15:05:42 Swami: I saw the discussion, just didn't have time yet to express my opinion 15:06:02 I still have some doubts on it 15:06:28 obondarev: please provide your thoughts and valuable input 15:06:41 Swami: will do 15:06:50 We as a team should be in an agreement on the neutron sides changes before we get the nova team involved. 15:07:02 Swami: right 15:07:03 obondarev: this was based on your suggestion in the mailing list. 15:07:11 I'm trying to get caught up on the DVR code so that I can actually make a useful contribution. Helping out with DVR is pretty much my main task. 15:07:19 (for this cycle) 15:07:38 otherwiseguy: good to hear that, thanks 15:07:42 otherwiseguy: welcome and thanks for volunteering. 15:08:25 Swami: so do you want responses on the mailing list or the bug? i'd assume the ML 15:08:30 haleyb: did you get a chance to take a look at the proposal on the live migration. 15:09:05 Swami: i have only gone through it quickly, i will take another look today 15:09:05 Swami: that patch with neutron changes that I saw seemed an overkill to me 15:09:09 haleyb: I think from the neutron team we can use the bug as our discussion thread and also update the ML with our findings so that Nova team gets notified. That was my idea. 15:09:19 Swami: that one that is changing portbinding extension 15:10:28 obondarev: yes I thought that would be the way to go. If you feel that it is a overkill then please provide me your thoughts. Also I did see that there was a patch on l2pop that required to know the migration state and portbinding would be place where we can track. 15:11:27 yeah, this needs more thinking, and estimation if it's worth it.. I mean 15:11:31 obondarev: without the host information, we cannot create the floatingip and routers ahead of time. We need the host binding information based on the current design 15:12:08 I mean if the only problem is connection lost to floating IP during live migration.. maybe we can document and live with it 15:12:51 I'm not sure however about the importance of the problem 15:12:57 obondarev: that is one part of the problem which is related to this bug. But one the overall picture in order to make live migration and dvr work smoother we need this information. 15:13:44 any other bugs related to dvr and migration? 15:13:55 just to clarify 15:13:58 obondarev: we are also working on moving the dvr to the ovsvapp and we have seen some issue with live migration and the vmware drs. 15:14:17 well, let's come up with a plan first, that way we can prioritize it, since it might be "N" before it's all done 15:14:23 ^^ regarding live migratin 15:14:51 haleyb: I agree with it. 15:15:11 Swami: ovsvapp seems N-ish unless it's just a bug 15:16:31 Let's move on the Bugs, and I'll just make some comments before we dig into it 15:16:38 #topic Bugs 15:16:39 haleyb:got it. 15:17:05 regarding bugs, this week we just for one old bug reopened. 15:17:06 We have a lot, and i want to make sure they are prioritized and making progress 15:17:26 #link https://bugs.launchpad.net/neutron/+bug/1512199 15:17:28 Launchpad bug 1512199 in neutron "change vm fixed ips will cause unable to communicate to vm in other network " [Medium,In progress] - Assigned to John Schwarz (jschwarz) 15:17:28 When going through them I'd like to try to get feedback on whether they are a MUST FIX, SHOULD FIX, or GOOD TO HAVE (gets at having priority set correctly) 15:17:38 * haleyb was still on his soapbox 15:17:59 If it needs a rebase, do it soon after Jenkins notifies you 15:18:06 If you just need reviews, ping people on irc or post a comment in the review 15:18:12 So, in a nutshell, let's try and get these merged 15:18:19 This bug was originally reported to be seen in kilo branch, but now seen also in the master branch. I suspect that this is due to regression. 15:18:33 There is patch available for this bug. 15:18:47 link please? 15:19:02 #link https://review.openstack.org/#/c/263772/ 15:19:08 thanks 15:19:28 obondarev: this is the patch that had a comment from carl_baldwin to refactor the arp_delete and arp_add function. 15:19:35 seems some action from the owner is needed 15:19:40 So I introduced the refactor patch so that it can be used. 15:19:48 Swami: yeah, I see 15:19:59 #link https://review.openstack.org/#/c/264356/ 15:20:56 john schwarz is the owner of the patch, i will ping him to see if he still wanted to proceed, or else I will fix it. 15:21:20 Swami: a bit concerned with code duplication, I need to review again to see how we can avoid it 15:21:50 Swami: seems in John's patch there was no duplication 15:22:05 obondarev: ok I saw your review comment, but I thought that there was not much of code duplication there, may be add in your comments. 15:22:34 Swami: I will 15:22:52 ok let us move on to the next one. 15:22:55 #link https://bugs.launchpad.net/neutron/+bug/1462154 15:22:56 Launchpad bug 1462154 in neutron "With DVR Pings to floating IPs replied with fixed-ips" [High,In progress] - Assigned to ZongKai LI (lzklibj) 15:23:20 Swami: also I think it's a good idea to fix bug first (no refactoring) to make it easier to backport 15:23:25 * carl_baldwin stuck in meetings today, will read logs soon 15:23:26 This bug is in review for a while. I think carl_baldwin is reviewing it from day one. 15:23:57 obondarev: I know while we working on reviewing the patches we do keep suggesting the refactor options. 15:24:11 i think stephen-ma knows a lot of the background on that as well 15:24:24 I am also reviewing the bugfixes for 1462154. 15:24:30 haleyb: yes, he was the one to work on that patch. 15:24:42 stephen-ma: how close is this to get resolved. 15:24:52 Is the bugfix being held up because of the rootwrap filter change? 15:25:29 stephen-ma: I am not aware about it, but you should know better than for this bug. 15:26:17 I will review his latest patch submission today. 15:26:59 stephen-ma: thanks. 15:27:13 let us move on to the next one. 15:27:32 #link https://bugs.launchpad.net/neutron/+bug/1522824 15:27:33 Launchpad bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] - Assigned to Oleg Bondarev (obondarev) 15:28:06 got some comments on that one half a hour ago, need to work on them 15:28:25 #link https://review.openstack.org/#/c/253569/ 15:28:39 obondarev: yes I did see some comments from rossella 15:29:13 Let us move on to the next one. 15:29:16 #link https://bugs.launchpad.net/neutron/+bug/1445255 15:29:17 Launchpad bug 1445255 in neutron "DVR FloatingIP to unbound port does not work" [Low,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:30:01 This patch is ready for review. It would be great if I could get more reviews on the patch shown below. #link https://review.openstack.org/#/c/254439/ 15:30:39 This bug is specific to how dvr handles the unbound "allowed_address_pairs" with floatingip. 15:30:49 bug title is a bit confusing as such a bug was fixed a while ago, shouldn't we rename better rename it to state the problem more clearly? 15:31:44 obondarev: yes I thought about it, will change the bug title to reflect the patch. 15:31:55 Swami: thanks 15:32:25 #link https://bugs.launchpad.net/neutron/+bug/1510796 15:32:27 Launchpad bug 1510796 in neutron "Function sync_routers always call _get_dvr_sync_data in ha scenario" [Low,In progress] - Assigned to ZongKai LI (lzklibj) 15:32:45 here is the link to the patch. #link https://review.openstack.org/#/c/239908/ 15:32:50 This patch needs review. 15:34:00 #link https://bugs.launchpad.net/neutron/+bug/1526175 15:34:01 Launchpad bug 1526175 in neutron "ha router schedule to dvr agent in compute node" [Medium,In progress] - Assigned to zhang sheng (langyxxl) 15:34:16 There is a patch for this bug and needs review. 15:34:30 #link https://review.openstack.org/#/c/265499/ 15:35:28 The next one in the list is 15:35:31 #link https://bugs.launchpad.net/neutron/+bug/1505575 15:35:32 Launchpad bug 1505575 in neutron "Fatal memory consumption by neutron-server with DVR at scale" [High,Fix released] - Assigned to Oleg Bondarev (obondarev) 15:35:45 This bug is under review for a while 15:35:55 * haleyb sees his review backlog is at its maximum 15:35:59 patch #link https://review.openstack.org/#/c/234067/ 15:36:00 this bug is noi longer valid, no memory consumption 15:36:14 the review was retargeted to another bug 15:36:21 obondarev: so are you planning to abandon this patch and close this bug. 15:36:30 Swami: no 15:36:41 it now closes another bug 15:36:42 what was that bug number, can you provide me 15:36:57 https://bugs.launchpad.net/neutron/+bug/1516260 15:36:58 Launchpad bug 1516260 in neutron "L3 agent sync_routers timeouts may cause cluster to fall down" [High,In progress] - Assigned to Oleg Bondarev (obondarev) 15:37:18 obondarev: ok I will update the wiki with the right one. 15:37:34 obondarev: i see the paginate change is looking good, related to that, https://review.openstack.org/#/c/234067/ is that the main focus now? 15:37:37 Swami: thanks 15:38:24 haleyb: sorry, related to what? 15:38:52 related to the bug you listed, i hadn't noticed the review link swami posted 15:39:15 haleyb: ah, got it 15:39:26 so yeah, that is the patch 15:39:37 obondarev: and did the dhcp change that was simlar adopt the same method? i have lost track of it 15:40:03 haleyb: same for me.. need to see it again 15:40:10 The next one is the very old DVR HA bug. The server side patch has not merged yet. 15:40:22 #link https://review.openstack.org/#/c/143169/ 15:40:25 obondarev: ok, thanks, that was over the holiday break when i was out 15:40:53 I had a conversation on it with amuller 15:40:54 adolfo is on vacation this week, so once he returns he might address the last review comment and merge conflict on this patch. 15:41:32 haleyb: that's all I had for bugs today. 15:41:39 the problem is that this patch now is dealing with all the dvr scheduling complexity, which will be removed by dvr scheduling refactoring 15:42:12 so probably it's better to base in on top of refactoring patches 15:42:15 obondarev: yes we fix and refactor something, there needs to be a change in this patch and that's the reason it is taking a long time. 15:42:59 obondarev: agreed, I will ask adolfo to do it. 15:43:21 haleyb: no more major bugs to discuss and I will hand it over to you. 15:43:35 so which refactoring patches have to merge first for that? most in our bug list? 15:44:16 haleyb: at this point most of the refactoring bugs have merged. But we should merge this patch first before we merge any other scheduler refactor patches. 15:45:13 Swami: I feel the opposite way 15:45:43 these are refactoring patches https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:bp/improve-dvr-l3-agent-binding 15:45:58 obondarev: The reason I say is then this server side code should be changed and it will take a while, we have been waiting for this fix to go in from October. 15:46:02 which are eliminating dvr scheduler complexity 15:47:08 Swami: that's why I'm not insisting, it's just my opinion, once the ha + dvr patch is well tested and addresses all comments it can be merged 15:47:25 obondarev: yes I agree with it. It need not wait. 15:47:44 obondarev: But I understand that the ha complexity might be reduced when we add it on top of yours. 15:47:45 and I'll have to update that code as part of refactoring, but I'm ok with that 15:48:12 agreed, it's been ongoing since 2014 and every little change causes a rebase 15:48:25 haleyb: agreed. 15:48:30 although PS66 isn't a record :) 15:49:02 haleyb: But the issue here is agent side code landed 3 months back and the server side is still under review. 15:49:26 haleyb: let us move on 15:49:33 Swami: true, hopefully adolfo can update quickly... moving on 15:49:46 #topic Gate failures 15:50:11 this has kind-of fallen off the radar 15:50:34 haleyb: I don't see any new failures in the gate this week. 15:50:35 there was a faulure affecting both dvr and non-dvr multinode jobs 15:50:44 fixed recently 15:50:48 obondarev: that has come done 15:51:04 does anyone know the bug? 15:51:05 obondarev: I see the graph trending downhill. 15:51:35 obondarev: I don't know the exact bug that caused the hump. 15:51:35 yeah it was fixed yesterday, I'm not sure was it some nova fix or smth else 15:51:40 Have we seen many Tempest "SSHTimeout" failures? 15:52:59 Swami: can you please post a logstash query that you're using? 15:53:10 haleyb: obondarev: Just an update on the debug patch that I had for the SSHtimeout issue, I still see when I run it against the tempest, there are more inconsistent results for the ping test to pass. But when I run in my own setup it seems to pass. So I am not sure what in the tempest test is causing the pings to be inconsistent. 15:53:31 obondarev: was this for the SSHtimeout 15:53:41 Swami: yeah 15:55:58 obondarev: build_name:"gate-tempest-dsvm-neutron-dvr" AND build_status:"FAILURE" AND message:"SSHTimeout" AND project:"openstack/neutron" 15:56:22 Swami: cool, thanks! 15:56:28 Swami: is there more debug info we can add? 15:57:07 haleyb: what is see as a strange behavior is it does get two ping packets and then it losses the rest of the packets or complains about unreachable. 15:57:21 haleyb: I am thinking through it. 15:57:36 s/what is see/what I see 15:57:39 ok, feel free to ping me for help 15:57:46 haleyb: will do. 15:57:50 #topic Performance 15:58:03 obondarev: you had already posted https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:bp/improve-dvr-l3-agent-binding 15:58:09 yep 15:58:23 Is the only other thing the pagination? at least for now? 15:58:55 sorry? 15:59:03 * haleyb finds the link 15:59:56 haleyb: what do you mean by the only other thing? 16:00:22 obondarev: just that i didn't see the router pagination review in that link i thnk 16:00:26 (we can go to neutron channel) 16:00:39 yeah, it's the end 16:00:42 haleyb: ah, the link is only about bp 16:00:54 #endmeeting