15:01:18 <haleyb> #startmeeting neutron_dvr
15:01:19 <openstack> Meeting started Wed Jan 27 15:01:18 2016 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:23 <openstack> The meeting name has been set to 'neutron_dvr'
15:01:26 <haleyb> #chair Swami
15:01:31 <openstack> Current chairs: Swami haleyb
15:02:10 * haleyb wonders if it will just be Swami and himself
15:02:26 <Swami> haleyb: are you also participating in the nova midcycle meetup this week
15:02:42 <haleyb> Swami: no
15:02:47 <Swami> haleyb: today it should be short
15:02:59 <haleyb> #topic Announcements
15:03:29 <haleyb> https://etherpad.openstack.org/p/neutron-mitaka-midcycle announced
15:04:02 <haleyb> DVR items were on agenda - is anyone planning on attending?  I see obondarev and myself as tentative
15:04:19 <Swami> haleyb: I will plan to attend.
15:04:48 <obondarev> hi
15:04:50 <Swami> haleyb: will you be there.
15:04:56 <Swami> obondarev: hi
15:05:00 <obondarev> I'm planning
15:05:19 <obondarev> working on logistics
15:06:09 <haleyb> Swami: i'm trying, have to get approval, and i'm on vacation the week before, Mexico to Minnesota will be a shock :)
15:06:23 <Swami> good to know that we have an agenda in place for DVR in the midcycle, haleyb: thanks for adding it to the agenda.
15:06:37 <haleyb> thank obondarev
15:06:49 <Swami> haleyb: same here, I need to get approval as well, with all changes happening.
15:07:27 <haleyb> anyways, let's keep in touch in case one or more need to drop out of going
15:07:41 <Swami> haleyb: ok makes sense
15:08:02 <haleyb> we can talk over irc instead of getting cold :)
15:08:19 <haleyb> #topic Bugs
15:08:24 <Swami> haleyb: yep
15:08:37 <haleyb> unless you want to talk live migration first
15:08:40 <Swami> There was one bug that was filed recently for this week.
15:09:06 <Swami> haleyb: no we can proceed with bugs and then come back to live migration, since we have already spoken about live migration.
15:09:36 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1538163
15:09:38 <openstack> Launchpad bug 1538163 in neutron "DVR: race in dvr serviceable port deletion" [Undecided,In progress] - Assigned to Oleg Bondarev (obondarev)
15:09:57 <Swami> obondarev: I think you have filed this bug.
15:10:07 <obondarev> Swami: yep
15:10:19 <haleyb> https://review.openstack.org/#/c/272634/ is out for review
15:10:19 <Swami> obondarev: you also have a patch for review.
15:10:22 <obondarev> it poped up during dvr refactoring work
15:10:37 <obondarev> I just noticed that there is a potential race
15:10:45 <Swami> obondarev: yes I just had a first pass review at it yesterday.
15:10:45 <obondarev> the patch is https://review.openstack.org/#/c/272634/
15:11:08 <obondarev> Swami: thanks, I've addressed your comments
15:11:17 <Swami> obondarev: yes I did notice today.
15:11:56 <Swami> let us move on to the next one.
15:12:01 <obondarev> so please review and we can move on
15:12:03 <obondarev> yeah
15:12:08 <Swami> The next high in the list is the HA and DVR
15:12:12 <haleyb> Swami: i had another new bug
15:12:40 <Swami> haleyb: have you filed.
15:12:42 <haleyb> but i'll wait until the gait failures part
15:12:54 <Swami> haleyb: ok no problem.
15:13:00 <Swami> #link https://review.openstack.org/#/c/143169/
15:13:58 <Swami> Again the DVR SNAT HA patch is also failing jenkins with couple of L3-HA tests, I have asked adolfo to take a look at it.
15:14:35 <Swami> otherwise it should be good and I have already saw carl_baldwin added a +2.
15:15:12 <haleyb> Is it me or is the title of that change "HA or DVR" instead of  "HA for DVR" ?
15:15:56 <Swami> haleyb: has the title changed, I did not notice.
15:16:04 <obondarev> haleyb: haha, good catch!
15:16:52 <haleyb> changed in PS66, doh, but not as important as finding why the test fails
15:17:25 <Swami> haleyb: might have been a wrong key hit when trying to ammend the patch.
15:17:36 <Swami> haleyb: yes good catch.
15:17:43 <obondarev> how might that happen, funny)
15:18:21 <Swami> obondarev: sometimes things happen.
15:18:42 <Swami> The next one in the list is
15:18:45 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1445255
15:18:46 <openstack> Launchpad bug 1445255 in neutron "DVR FloatingIP to unbound allowed_address_pairs does not work" [Low,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:18:47 <obondarev> Swami: indeed
15:19:33 <Swami> This one has a patch as well and in review for a while.
15:19:49 <Swami> https://review.openstack.org/#/c/254439/
15:20:26 <Swami> I think carl_baldwin is closely following this patch.
15:21:30 <Swami> The next one in the list is
15:21:33 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1522824
15:21:34 <openstack> Launchpad bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] - Assigned to shihanzhang (shihanzhang)
15:22:05 <Swami> #link  https://review.openstack.org/#/c/215467 - This patch is almost ready
15:23:09 <Swami> The next one in the list is
15:23:15 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1445255
15:23:16 <openstack> Launchpad bug 1445255 in neutron "DVR FloatingIP to unbound allowed_address_pairs does not work" [Low,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:23:31 <Swami> #link https://review.openstack.org/#/c/254439/
15:23:44 <Swami> haleyb: i have addressed review comments from you on this patch.
15:24:06 <haleyb> yes, thanks, i'll take a look
15:25:23 <Swami> I think that's all I had for bugs this week.
15:25:47 <Swami> Is there any other bugs that needs attention at this time.
15:26:00 <Swami> Off course we have the live migration bug.
15:26:33 <haleyb> There's at least two OVS/DVR bugs, let's jump to gate failures
15:26:43 <Swami> haleyb: obondarev: I did see that there were some comments added to the nova midcycle etherpad on the live migration work.
15:27:07 <Swami> So let us wait for their feedback and proceed on it.
15:27:08 <obondarev> can you share the libk please?
15:27:17 <obondarev> link*
15:27:23 <Swami> #link https://etherpad.openstack.org/p/mitaka-nova-midcycle
15:27:51 <Swami> haleyb: yes let us move on to the gate failures, that's all for the bugs.
15:27:54 <haleyb> #topic Gate Failures
15:28:16 <haleyb> I filed https://bugs.launchpad.net/bugs/1538387 yesterday after staring at it for a while
15:28:18 <openstack> Launchpad bug 1538387 in neutron "fdb_chg_ip_tun throwing exception because fdb_entries not in correct format" [High,In progress] - Assigned to Kevin Benton (kevinbenton)
15:28:54 <haleyb> Kevin sent out https://review.openstack.org/272986 this morning
15:29:09 <haleyb> Seems this l2pop issue has been there for over a year
15:29:45 <haleyb> I basically started looking through the logs for exceptions, since we shouldn't have any
15:29:54 <Swami> haleyb: is this bug only seen in DVR environments.
15:30:24 <Swami> haleyb: did you find any exceptions in the logs.
15:30:28 <haleyb> I only saw it in the dvr-multninode job, but it could be other places
15:30:36 <haleyb> I also found "DVR: Unable to retrieve subnet information for subnet_id"
15:30:44 <haleyb> looks like get_subnet_for_dvr() call in _bind_centralized_snat_port_on_dvr_subnet() needs to check if subnet_info is not {}
15:31:01 <haleyb> throwing a Keyerror
15:31:15 <haleyb> Was going to file that today
15:31:21 <Swami> haleyb: was this introduced by the recent refactor on this function.
15:31:46 <Swami> obondarev: I think you made some changes to this get_subnet_for_dvr recently.
15:31:47 <obondarev> haleyb: this was filed I guess
15:31:49 <haleyb> I don't think so - one caller of that function checks the return value, one does not
15:31:57 <obondarev> let me find the link
15:33:14 <obondarev> it is something that I reviewed recently..
15:33:53 <haleyb> Both the multinode and dvr-multinode jobs are pretty bad, over 50%, but it's the migration issue, at least the volume migration is the one test failing
15:34:16 <Swami> haleyb: is the volume migration issue also seen on single node jobs.
15:34:31 <obondarev> https://review.openstack.org/#/c/272025/
15:35:23 <haleyb> obondarev: thanks.  that check should probably be up near the call, i'll add that
15:35:43 <Swami> obondarev: is this patch related to the issue that haleyb was seeing in the logs related to subnet_info.
15:35:57 <obondarev> Swami: I guess so
15:36:06 <haleyb> i had even added myself but not made the connection
15:36:15 <haleyb> Swami: yes, same issue
15:36:44 <obondarev> what I noticed recently is that from time to time dvr multinode job fails with ~20 failed tests
15:36:58 <obondarev> not very oftem but still
15:37:00 <Swami> haleyb: in a single node there should not be such migration failures, but was it introduced or triggered by any other patch that merged.
15:37:18 <obondarev> usually it either passes or fails with 1 failed test
15:37:30 <Swami> haleyb: do you have the logstash filter to filter out the failures for this particular failure.
15:37:42 <haleyb> obondarev: same failure reason?  I guess if you see it again we should look closer
15:38:07 <obondarev> haleyb: not sure about failure reason
15:38:37 <haleyb> Swami: no, but there are very few failures (ERROR) in the logs these days once we fix the two I mentioned
15:39:02 <obondarev> one of examples http://logs.openstack.org/55/272555/2/check/gate-tempest-dsvm-neutron-dvr-multinode-full/e6f9faa/console.html
15:39:18 <obondarev> 17 failed tests
15:39:54 <obondarev> it seems like smth went wrong and tests start to fail
15:40:12 <obondarev> didn't have a chance to look closer yet
15:41:25 <Swami> obondarev: seems to state about the SSHtimeout issue.
15:41:33 <haleyb> i know a lot of mtu patches have been merging as well, both in neutron and the gate, which might help as well
15:41:37 <haleyb> https://review.openstack.org/#/q/topic:multinode-neutron-mtu
15:42:14 <obondarev> cool, hope it'l help
15:42:35 <Swami> haleyb: obondarev: btw the patch that I added for debugging the SSHtimeout issue is not quiet working right because of timing issue. I suspect by the time I try to ping the metadata is not in place.
15:43:24 <Swami> haleyb: obondarev: is there a test to validate if the metadata is properly received by the VM.
15:44:23 <obondarev> I think there are plenty in tempest
15:45:04 <obondarev> any that boots a vm and checks connectivity
15:46:20 <Swami> obondarev: we normally check it from external connectivity, but that does not tell us if it is a metadata issue or not.
15:46:37 <obondarev> ah
15:46:56 <obondarev> you mean which are checking specifically for metadata
15:47:08 <Swami> obondarev: yes.
15:47:16 <obondarev> can't remember
15:47:36 <haleyb> Swami: i don't think there is a test, but we do have the VM console log
15:47:45 <obondarev> right
15:48:02 <Swami> haleyb: yes, it is only through the vm console log we will be able to identify.
15:48:37 <Swami> haleyb: obondarev: is there a way to figure out from the router namespace that the metadata request was processed for the particular vm.
15:49:15 <haleyb> I don't know if the proxy logs are there, but the metadata server log is i believe
15:49:31 <Swami> haleyb: ok
15:50:36 <haleyb> http://logs.openstack.org/55/272555/2/check/gate-tempest-dsvm-neutron-dvr-multinode-full/e6f9faa/console.html#_2016-01-26_18_25_22_075
15:50:58 <haleyb> i don't know metadata well enough to know if that's a complete failure
15:51:51 <Swami> haleyb: so that seems that metadata might be another victim for the SSHtimeout issue.
15:52:16 <haleyb> Swami: yes, assuming these VMs are getting ssh keys that way
15:52:47 <obondarev> and one example is https://bugs.launchpad.net/neutron/+bug/1522824
15:52:48 <openstack> Launchpad bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] - Assigned to shihanzhang (shihanzhang)
15:52:51 <Swami> haleyb: yes I think in the tempest run that's how they pass the keys.
15:53:07 <obondarev> haleyb: Swami: exactly
15:54:15 <Swami> agreed.
15:54:39 <haleyb> that failure is different from the one i linked, seems metadata worked in my case, but only instance-id was returned
15:55:38 <haleyb> 5 minutes left, let's move onto last topic
15:55:46 <haleyb> #topic Performance/Scalability
15:55:54 <haleyb> obondarev: one patch left? :)
15:55:59 <obondarev> haleyb: right
15:56:02 <obondarev> the main one
15:56:34 <obondarev> it'll require some rebase work once HA for DVR merges
15:56:54 <obondarev> or it'll merge first :)
15:57:07 <haleyb> i borked the HA DVR patch updating the commit message, so it will take an hour or more to clear
15:58:14 <Swami> haleyb: no problem, anyway we have a couple of tests that were failing, let me check with adolfo on this.
15:58:21 <Swami> before we close.
15:58:36 <haleyb> he's getting a free recheck
15:58:57 <haleyb> #topic Open discussion
15:59:03 <Swami> haleyb: we were (obondarev and myself) planning to add in a general session talk on DVR improvements for Mitaka.
15:59:16 <Swami> haleyb: would you like to be part of that discussion.
15:59:35 <haleyb> Swami: sure, i was going to ping you offline about it, but works for me
15:59:56 <haleyb> let me know how i can help with abstract
16:00:16 <Swami> haleyb: ok I will loop you in the discussion. We have an abstract in google doc.
16:00:21 <obondarev> haleyb: https://docs.google.com/document/d/1WCjq0FL1NxSistA7nfceeXaf3gmVzDCvJm0Jmwt80Dw/edit
16:00:31 <Swami> obondarev: thanks for the link
16:00:38 <haleyb> thanks, and we need to close out for the ML2ers
16:00:40 <haleyb> #endmeeting