15:00:21 <haleyb> #startmeeting neutron_dvr
15:00:22 <openstack> Meeting started Wed Jan 13 15:00:21 2016 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:25 <openstack> The meeting name has been set to 'neutron_dvr'
15:00:29 <dasm> o/
15:00:33 <njohnston> o/
15:00:33 <haleyb> #chair Swami
15:00:34 <openstack> Current chairs: Swami haleyb
15:00:58 <haleyb> #topic Announcements
15:01:14 <haleyb> Mitaka 2 is next week according to http://docs.openstack.org/releases/schedules/mitaka.html
15:01:52 <haleyb> so let's not slow down :)
15:02:02 <obondarev> :)
15:02:46 <haleyb> We have a lot of bugs, and I'd like to try and lessen that, so (copying from the meeting wiki...)
15:02:47 <obondarev> I was about to ask how are we as DVR team affected by Mitaka 2 :)
15:03:16 <haleyb> well, it's more of a "time will sneak up on us" comment
15:03:31 <haleyb> especially with something like the migration work
15:03:36 <obondarev> yeah
15:03:54 <obondarev> which migration?
15:04:01 <Swami> Yes that is one of the high's that we need to resolve and since it has some dependency with nova that would be critical.
15:04:03 <obondarev> live migration you mean?
15:04:09 <haleyb> the nova migration with dvr swami is working on
15:04:15 <obondarev> I see
15:04:17 <Swami> obondarev: yes
15:04:43 <obondarev> ok, sorry for interruption
15:05:00 <haleyb> np, it's good that we're clear
15:05:08 <Swami> obondarev: did you get a chance to review the discussion on this item in the launchpad bug.#1456073
15:05:42 <obondarev> Swami: I saw the discussion, just didn't have time yet to express my opinion
15:06:02 <obondarev> I still have some doubts on it
15:06:28 <Swami> obondarev: please provide your thoughts and valuable input
15:06:41 <obondarev> Swami: will do
15:06:50 <Swami> We as a team should be in an agreement on the neutron sides changes before we get the nova team involved.
15:07:02 <obondarev> Swami: right
15:07:03 <Swami> obondarev: this was based on your suggestion in the mailing list.
15:07:11 <otherwiseguy> I'm trying to get caught up on the DVR code so that I can actually make a useful contribution. Helping out with DVR is pretty much my main task.
15:07:19 <otherwiseguy> (for this cycle)
15:07:38 <obondarev> otherwiseguy: good to hear that, thanks
15:07:42 <Swami> otherwiseguy: welcome and thanks for volunteering.
15:08:25 <haleyb> Swami: so do you want responses on the mailing list or the bug?  i'd assume the ML
15:08:30 <Swami> haleyb: did you get a chance to take a look at the proposal on the live migration.
15:09:05 <haleyb> Swami: i have only gone through it quickly, i will take another look today
15:09:05 <obondarev> Swami: that patch with neutron changes that I saw seemed an overkill to me
15:09:09 <Swami> haleyb: I think from the neutron team we can use the bug as our discussion thread and also update the ML with our findings so that Nova team gets notified. That was my idea.
15:09:19 <obondarev> Swami: that one that is changing portbinding extension
15:10:28 <Swami> obondarev: yes I thought that would be the way to go. If you feel that it is a overkill then please provide me your thoughts. Also I did see that there was a patch on l2pop that required to know the migration state and portbinding would be place where we can track.
15:11:27 <obondarev> yeah, this needs more thinking, and estimation if it's worth it.. I mean
15:11:31 <Swami> obondarev: without the host information, we cannot create the floatingip and routers ahead of time. We need the host binding information based on the current design
15:12:08 <obondarev> I mean if the only problem is connection lost to floating IP during live migration.. maybe we can document and live with it
15:12:51 <obondarev> I'm not sure however about the importance of the problem
15:12:57 <Swami> obondarev: that is one part of the problem which is related to this bug. But one the overall picture in order to make live migration and dvr work smoother we need this information.
15:13:44 <obondarev> any other bugs related to dvr and migration?
15:13:55 <obondarev> just to clarify
15:13:58 <Swami> obondarev: we are also working on moving the dvr to the ovsvapp and we have seen some issue with live migration and the vmware drs.
15:14:17 <haleyb> well, let's come up with a plan first, that way we can prioritize it, since it might be "N" before it's all done
15:14:23 <haleyb> ^^ regarding live migratin
15:14:51 <Swami> haleyb: I agree with it.
15:15:11 <haleyb> Swami: ovsvapp seems N-ish unless it's just a bug
15:16:31 <haleyb> Let's move on the Bugs, and I'll just make some comments before we dig into it
15:16:38 <haleyb> #topic Bugs
15:16:39 <Swami> haleyb:got it.
15:17:05 <Swami> regarding bugs, this week we just for one old bug reopened.
15:17:06 <haleyb> We have a lot, and i want to make sure they are prioritized and making progress
15:17:26 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1512199
15:17:28 <openstack> Launchpad bug 1512199 in neutron "change vm fixed ips will cause unable to communicate to vm in other network " [Medium,In progress] - Assigned to John Schwarz (jschwarz)
15:17:28 <haleyb> When going through them I'd like to try to get feedback on whether they are a MUST FIX, SHOULD FIX, or GOOD TO HAVE (gets at having priority set correctly)
15:17:38 * haleyb was still on his soapbox
15:17:59 <haleyb> If it needs a rebase, do it soon after Jenkins notifies you
15:18:06 <haleyb> If you just need reviews, ping people on irc or post a comment in the review
15:18:12 <haleyb> So, in a nutshell, let's try and get these merged
15:18:19 <Swami> This bug was originally reported to be seen in kilo branch, but now seen also in the master branch. I suspect that this is due to regression.
15:18:33 <Swami> There is patch available for this bug.
15:18:47 <obondarev> link please?
15:19:02 <Swami> #link https://review.openstack.org/#/c/263772/
15:19:08 <obondarev> thanks
15:19:28 <Swami> obondarev: this is the patch that had a comment from carl_baldwin to refactor the arp_delete and arp_add function.
15:19:35 <obondarev> seems some action from the owner is needed
15:19:40 <Swami> So I introduced the refactor patch so that it can be used.
15:19:48 <obondarev> Swami: yeah, I see
15:19:59 <Swami> #link https://review.openstack.org/#/c/264356/
15:20:56 <Swami> john schwarz is the owner of the patch, i will ping him to see if he still wanted to proceed, or else I will fix it.
15:21:20 <obondarev> Swami: a bit concerned with code duplication, I need to review again to see how we can avoid it
15:21:50 <obondarev> Swami: seems in John's patch there was no duplication
15:22:05 <Swami> obondarev: ok I saw your review comment, but I thought that there was not much of code duplication there, may be add in your comments.
15:22:34 <obondarev> Swami: I will
15:22:52 <Swami> ok let us move on to the next one.
15:22:55 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1462154
15:22:56 <openstack> Launchpad bug 1462154 in neutron "With DVR Pings to floating IPs replied with fixed-ips" [High,In progress] - Assigned to ZongKai LI (lzklibj)
15:23:20 <obondarev> Swami: also I think it's a good idea to fix bug first (no refactoring) to make it easier to backport
15:23:25 * carl_baldwin stuck in meetings today, will read logs soon
15:23:26 <Swami> This bug is in review for a while. I think carl_baldwin is reviewing it from day one.
15:23:57 <Swami> obondarev: I know while we working on reviewing the patches we do keep suggesting the refactor options.
15:24:11 <haleyb> i think stephen-ma knows a lot of the background on that as well
15:24:24 <stephen-ma> I am also reviewing the bugfixes for 1462154.
15:24:30 <Swami> haleyb: yes, he was the one to work on that patch.
15:24:42 <Swami> stephen-ma: how close is this to get resolved.
15:24:52 <stephen-ma> Is the bugfix being held up because of the rootwrap filter change?
15:25:29 <Swami> stephen-ma: I am not aware about it, but you should know better than for this bug.
15:26:17 <stephen-ma> I will review his latest patch submission today.
15:26:59 <Swami> stephen-ma: thanks.
15:27:13 <Swami> let us move on to the next one.
15:27:32 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1522824
15:27:33 <openstack> Launchpad bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] - Assigned to Oleg Bondarev (obondarev)
15:28:06 <obondarev> got some comments on that one half a hour ago, need to work on them
15:28:25 <obondarev> #link https://review.openstack.org/#/c/253569/
15:28:39 <Swami> obondarev: yes I did see some comments from rossella
15:29:13 <Swami> Let us move on to the next one.
15:29:16 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1445255
15:29:17 <openstack> Launchpad bug 1445255 in neutron "DVR FloatingIP to unbound port does not work" [Low,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:30:01 <Swami> This patch is ready for review. It would be great if I could get more reviews on the patch shown below. #link https://review.openstack.org/#/c/254439/
15:30:39 <Swami> This bug is specific to how dvr handles the unbound "allowed_address_pairs" with floatingip.
15:30:49 <obondarev> bug title is a bit confusing as such a bug was fixed a while ago, shouldn't we rename better rename it to state the problem more clearly?
15:31:44 <Swami> obondarev: yes I thought about it, will change the bug title to reflect the patch.
15:31:55 <obondarev> Swami: thanks
15:32:25 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1510796
15:32:27 <openstack> Launchpad bug 1510796 in neutron "Function sync_routers always call _get_dvr_sync_data in ha scenario" [Low,In progress] - Assigned to ZongKai LI (lzklibj)
15:32:45 <Swami> here is the link to the patch. #link https://review.openstack.org/#/c/239908/
15:32:50 <Swami> This patch needs review.
15:34:00 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1526175
15:34:01 <openstack> Launchpad bug 1526175 in neutron "ha router schedule to dvr agent in compute node" [Medium,In progress] - Assigned to zhang sheng (langyxxl)
15:34:16 <Swami> There is a patch for this bug and needs review.
15:34:30 <Swami> #link https://review.openstack.org/#/c/265499/
15:35:28 <Swami> The next one in the list is
15:35:31 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1505575
15:35:32 <openstack> Launchpad bug 1505575 in neutron "Fatal memory consumption by neutron-server with DVR at scale" [High,Fix released] - Assigned to Oleg Bondarev (obondarev)
15:35:45 <Swami> This bug is under review for a while
15:35:55 * haleyb sees his review backlog is at its maximum
15:35:59 <Swami> patch #link https://review.openstack.org/#/c/234067/
15:36:00 <obondarev> this bug is noi longer valid, no memory consumption
15:36:14 <obondarev> the review was retargeted to another bug
15:36:21 <Swami> obondarev: so are you planning to abandon this patch and close this bug.
15:36:30 <obondarev> Swami: no
15:36:41 <obondarev> it now closes another bug
15:36:42 <Swami> what was that bug number, can you provide me
15:36:57 <obondarev> https://bugs.launchpad.net/neutron/+bug/1516260
15:36:58 <openstack> Launchpad bug 1516260 in neutron "L3 agent sync_routers timeouts may cause cluster to fall down" [High,In progress] - Assigned to Oleg Bondarev (obondarev)
15:37:18 <Swami> obondarev: ok I will update the wiki with the right one.
15:37:34 <haleyb> obondarev: i see the paginate change is looking good, related to that, https://review.openstack.org/#/c/234067/ is that the main focus now?
15:37:37 <obondarev> Swami: thanks
15:38:24 <obondarev> haleyb: sorry, related to what?
15:38:52 <haleyb> related to the bug you listed, i hadn't noticed the review link swami posted
15:39:15 <obondarev> haleyb: ah, got it
15:39:26 <obondarev> so yeah, that is the patch
15:39:37 <haleyb> obondarev: and did the dhcp change that was simlar adopt the same method?  i have lost track of it
15:40:03 <obondarev> haleyb: same for me.. need to see it again
15:40:10 <Swami> The next one is the very old DVR HA bug. The server side patch has not merged yet.
15:40:22 <Swami> #link https://review.openstack.org/#/c/143169/
15:40:25 <haleyb> obondarev: ok, thanks, that was over the holiday break when i was out
15:40:53 <obondarev> I had a conversation on it with amuller
15:40:54 <Swami> adolfo is on vacation this week, so once he returns he might address the last review comment and merge conflict on this patch.
15:41:32 <Swami> haleyb: that's all I had for bugs today.
15:41:39 <obondarev> the problem is that this patch now is dealing with all the dvr scheduling complexity, which will be removed by dvr scheduling refactoring
15:42:12 <obondarev> so probably it's better to base in on top of refactoring patches
15:42:15 <Swami> obondarev: yes we fix and refactor something, there needs to be a change in this patch and that's the reason it is taking a long time.
15:42:59 <Swami> obondarev: agreed, I will ask adolfo to do it.
15:43:21 <Swami> haleyb: no more major bugs to discuss and I will hand it over to you.
15:43:35 <haleyb> so which refactoring patches have to merge first for that?  most in our bug list?
15:44:16 <Swami> haleyb: at this point most of the refactoring bugs have merged. But we should merge this patch first before we merge any other scheduler refactor patches.
15:45:13 <obondarev> Swami: I feel the opposite way
15:45:43 <obondarev> these are refactoring patches https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:bp/improve-dvr-l3-agent-binding
15:45:58 <Swami> obondarev: The reason I say is then this server side code should be changed and it will take a while, we have been waiting for this fix to go in from October.
15:46:02 <obondarev> which are eliminating dvr scheduler complexity
15:47:08 <obondarev> Swami: that's why I'm not insisting, it's just my opinion, once the ha + dvr patch is well tested and addresses all comments it can be merged
15:47:25 <Swami> obondarev: yes I agree with it. It need not wait.
15:47:44 <Swami> obondarev: But I understand that the ha complexity might be reduced when we add it on top of yours.
15:47:45 <obondarev> and I'll have to update that code as part of refactoring, but I'm ok with that
15:48:12 <haleyb> agreed, it's been ongoing since 2014 and every little change causes a rebase
15:48:25 <Swami> haleyb: agreed.
15:48:30 <haleyb> although PS66 isn't a record :)
15:49:02 <Swami> haleyb: But the issue here is agent side code landed 3 months back and the server side is still under review.
15:49:26 <Swami> haleyb: let us move on
15:49:33 <haleyb> Swami: true, hopefully adolfo can update quickly... moving on
15:49:46 <haleyb> #topic Gate failures
15:50:11 <haleyb> this has kind-of fallen off the radar
15:50:34 <Swami> haleyb: I don't see any new failures in the gate this week.
15:50:35 <obondarev> there was a faulure affecting both dvr and non-dvr multinode jobs
15:50:44 <obondarev> fixed recently
15:50:48 <Swami> obondarev: that has come done
15:51:04 <obondarev> does anyone know the bug?
15:51:05 <Swami> obondarev: I see the graph trending downhill.
15:51:35 <Swami> obondarev: I don't know the exact bug that caused the hump.
15:51:35 <obondarev> yeah it was fixed yesterday, I'm not sure was it some nova fix or smth else
15:51:40 <haleyb> Have we seen many Tempest "SSHTimeout" failures?
15:52:59 <obondarev> Swami: can you please post a logstash query that you're using?
15:53:10 <Swami> haleyb: obondarev: Just an update on the debug patch that I had for the SSHtimeout issue, I still see when I run it against the tempest, there are more inconsistent results for the ping test to pass. But when I run in my own setup it seems to pass. So I am not sure what in the tempest test is causing the pings to be inconsistent.
15:53:31 <Swami> obondarev: was this for the SSHtimeout
15:53:41 <obondarev> Swami: yeah
15:55:58 <Swami> obondarev: build_name:"gate-tempest-dsvm-neutron-dvr" AND build_status:"FAILURE" AND message:"SSHTimeout" AND project:"openstack/neutron"
15:56:22 <obondarev> Swami: cool, thanks!
15:56:28 <haleyb> Swami: is there more debug info we can add?
15:57:07 <Swami> haleyb: what is see as a strange behavior is it does get two ping packets and then it losses the rest of the packets or complains about unreachable.
15:57:21 <Swami> haleyb: I am thinking through it.
15:57:36 <Swami> s/what is see/what I see
15:57:39 <haleyb> ok, feel free to ping me for help
15:57:46 <Swami> haleyb: will do.
15:57:50 <haleyb> #topic Performance
15:58:03 <haleyb> obondarev: you had already posted https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:bp/improve-dvr-l3-agent-binding
15:58:09 <obondarev> yep
15:58:23 <haleyb> Is the only other thing the pagination?  at least for now?
15:58:55 <obondarev> sorry?
15:59:03 * haleyb finds the link
15:59:56 <obondarev> haleyb: what do you mean by the only other thing?
16:00:22 <haleyb> obondarev: just that i didn't see the router pagination review in that link i thnk
16:00:26 <obondarev> (we can go to neutron channel)
16:00:39 <haleyb> yeah, it's the end
16:00:42 <obondarev> haleyb: ah, the link is only about bp
16:00:54 <haleyb> #endmeeting