15:01:18 <haleyb> #startmeeting neutron_dvr 15:01:19 <openstack> Meeting started Wed Jan 27 15:01:18 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:23 <openstack> The meeting name has been set to 'neutron_dvr' 15:01:26 <haleyb> #chair Swami 15:01:31 <openstack> Current chairs: Swami haleyb 15:02:10 * haleyb wonders if it will just be Swami and himself 15:02:26 <Swami> haleyb: are you also participating in the nova midcycle meetup this week 15:02:42 <haleyb> Swami: no 15:02:47 <Swami> haleyb: today it should be short 15:02:59 <haleyb> #topic Announcements 15:03:29 <haleyb> https://etherpad.openstack.org/p/neutron-mitaka-midcycle announced 15:04:02 <haleyb> DVR items were on agenda - is anyone planning on attending? I see obondarev and myself as tentative 15:04:19 <Swami> haleyb: I will plan to attend. 15:04:48 <obondarev> hi 15:04:50 <Swami> haleyb: will you be there. 15:04:56 <Swami> obondarev: hi 15:05:00 <obondarev> I'm planning 15:05:19 <obondarev> working on logistics 15:06:09 <haleyb> Swami: i'm trying, have to get approval, and i'm on vacation the week before, Mexico to Minnesota will be a shock :) 15:06:23 <Swami> good to know that we have an agenda in place for DVR in the midcycle, haleyb: thanks for adding it to the agenda. 15:06:37 <haleyb> thank obondarev 15:06:49 <Swami> haleyb: same here, I need to get approval as well, with all changes happening. 15:07:27 <haleyb> anyways, let's keep in touch in case one or more need to drop out of going 15:07:41 <Swami> haleyb: ok makes sense 15:08:02 <haleyb> we can talk over irc instead of getting cold :) 15:08:19 <haleyb> #topic Bugs 15:08:24 <Swami> haleyb: yep 15:08:37 <haleyb> unless you want to talk live migration first 15:08:40 <Swami> There was one bug that was filed recently for this week. 15:09:06 <Swami> haleyb: no we can proceed with bugs and then come back to live migration, since we have already spoken about live migration. 15:09:36 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1538163 15:09:38 <openstack> Launchpad bug 1538163 in neutron "DVR: race in dvr serviceable port deletion" [Undecided,In progress] - Assigned to Oleg Bondarev (obondarev) 15:09:57 <Swami> obondarev: I think you have filed this bug. 15:10:07 <obondarev> Swami: yep 15:10:19 <haleyb> https://review.openstack.org/#/c/272634/ is out for review 15:10:19 <Swami> obondarev: you also have a patch for review. 15:10:22 <obondarev> it poped up during dvr refactoring work 15:10:37 <obondarev> I just noticed that there is a potential race 15:10:45 <Swami> obondarev: yes I just had a first pass review at it yesterday. 15:10:45 <obondarev> the patch is https://review.openstack.org/#/c/272634/ 15:11:08 <obondarev> Swami: thanks, I've addressed your comments 15:11:17 <Swami> obondarev: yes I did notice today. 15:11:56 <Swami> let us move on to the next one. 15:12:01 <obondarev> so please review and we can move on 15:12:03 <obondarev> yeah 15:12:08 <Swami> The next high in the list is the HA and DVR 15:12:12 <haleyb> Swami: i had another new bug 15:12:40 <Swami> haleyb: have you filed. 15:12:42 <haleyb> but i'll wait until the gait failures part 15:12:54 <Swami> haleyb: ok no problem. 15:13:00 <Swami> #link https://review.openstack.org/#/c/143169/ 15:13:58 <Swami> Again the DVR SNAT HA patch is also failing jenkins with couple of L3-HA tests, I have asked adolfo to take a look at it. 15:14:35 <Swami> otherwise it should be good and I have already saw carl_baldwin added a +2. 15:15:12 <haleyb> Is it me or is the title of that change "HA or DVR" instead of "HA for DVR" ? 15:15:56 <Swami> haleyb: has the title changed, I did not notice. 15:16:04 <obondarev> haleyb: haha, good catch! 15:16:52 <haleyb> changed in PS66, doh, but not as important as finding why the test fails 15:17:25 <Swami> haleyb: might have been a wrong key hit when trying to ammend the patch. 15:17:36 <Swami> haleyb: yes good catch. 15:17:43 <obondarev> how might that happen, funny) 15:18:21 <Swami> obondarev: sometimes things happen. 15:18:42 <Swami> The next one in the list is 15:18:45 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1445255 15:18:46 <openstack> Launchpad bug 1445255 in neutron "DVR FloatingIP to unbound allowed_address_pairs does not work" [Low,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:18:47 <obondarev> Swami: indeed 15:19:33 <Swami> This one has a patch as well and in review for a while. 15:19:49 <Swami> https://review.openstack.org/#/c/254439/ 15:20:26 <Swami> I think carl_baldwin is closely following this patch. 15:21:30 <Swami> The next one in the list is 15:21:33 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1522824 15:21:34 <openstack> Launchpad bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] - Assigned to shihanzhang (shihanzhang) 15:22:05 <Swami> #link https://review.openstack.org/#/c/215467 - This patch is almost ready 15:23:09 <Swami> The next one in the list is 15:23:15 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1445255 15:23:16 <openstack> Launchpad bug 1445255 in neutron "DVR FloatingIP to unbound allowed_address_pairs does not work" [Low,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:23:31 <Swami> #link https://review.openstack.org/#/c/254439/ 15:23:44 <Swami> haleyb: i have addressed review comments from you on this patch. 15:24:06 <haleyb> yes, thanks, i'll take a look 15:25:23 <Swami> I think that's all I had for bugs this week. 15:25:47 <Swami> Is there any other bugs that needs attention at this time. 15:26:00 <Swami> Off course we have the live migration bug. 15:26:33 <haleyb> There's at least two OVS/DVR bugs, let's jump to gate failures 15:26:43 <Swami> haleyb: obondarev: I did see that there were some comments added to the nova midcycle etherpad on the live migration work. 15:27:07 <Swami> So let us wait for their feedback and proceed on it. 15:27:08 <obondarev> can you share the libk please? 15:27:17 <obondarev> link* 15:27:23 <Swami> #link https://etherpad.openstack.org/p/mitaka-nova-midcycle 15:27:51 <Swami> haleyb: yes let us move on to the gate failures, that's all for the bugs. 15:27:54 <haleyb> #topic Gate Failures 15:28:16 <haleyb> I filed https://bugs.launchpad.net/bugs/1538387 yesterday after staring at it for a while 15:28:18 <openstack> Launchpad bug 1538387 in neutron "fdb_chg_ip_tun throwing exception because fdb_entries not in correct format" [High,In progress] - Assigned to Kevin Benton (kevinbenton) 15:28:54 <haleyb> Kevin sent out https://review.openstack.org/272986 this morning 15:29:09 <haleyb> Seems this l2pop issue has been there for over a year 15:29:45 <haleyb> I basically started looking through the logs for exceptions, since we shouldn't have any 15:29:54 <Swami> haleyb: is this bug only seen in DVR environments. 15:30:24 <Swami> haleyb: did you find any exceptions in the logs. 15:30:28 <haleyb> I only saw it in the dvr-multninode job, but it could be other places 15:30:36 <haleyb> I also found "DVR: Unable to retrieve subnet information for subnet_id" 15:30:44 <haleyb> looks like get_subnet_for_dvr() call in _bind_centralized_snat_port_on_dvr_subnet() needs to check if subnet_info is not {} 15:31:01 <haleyb> throwing a Keyerror 15:31:15 <haleyb> Was going to file that today 15:31:21 <Swami> haleyb: was this introduced by the recent refactor on this function. 15:31:46 <Swami> obondarev: I think you made some changes to this get_subnet_for_dvr recently. 15:31:47 <obondarev> haleyb: this was filed I guess 15:31:49 <haleyb> I don't think so - one caller of that function checks the return value, one does not 15:31:57 <obondarev> let me find the link 15:33:14 <obondarev> it is something that I reviewed recently.. 15:33:53 <haleyb> Both the multinode and dvr-multinode jobs are pretty bad, over 50%, but it's the migration issue, at least the volume migration is the one test failing 15:34:16 <Swami> haleyb: is the volume migration issue also seen on single node jobs. 15:34:31 <obondarev> https://review.openstack.org/#/c/272025/ 15:35:23 <haleyb> obondarev: thanks. that check should probably be up near the call, i'll add that 15:35:43 <Swami> obondarev: is this patch related to the issue that haleyb was seeing in the logs related to subnet_info. 15:35:57 <obondarev> Swami: I guess so 15:36:06 <haleyb> i had even added myself but not made the connection 15:36:15 <haleyb> Swami: yes, same issue 15:36:44 <obondarev> what I noticed recently is that from time to time dvr multinode job fails with ~20 failed tests 15:36:58 <obondarev> not very oftem but still 15:37:00 <Swami> haleyb: in a single node there should not be such migration failures, but was it introduced or triggered by any other patch that merged. 15:37:18 <obondarev> usually it either passes or fails with 1 failed test 15:37:30 <Swami> haleyb: do you have the logstash filter to filter out the failures for this particular failure. 15:37:42 <haleyb> obondarev: same failure reason? I guess if you see it again we should look closer 15:38:07 <obondarev> haleyb: not sure about failure reason 15:38:37 <haleyb> Swami: no, but there are very few failures (ERROR) in the logs these days once we fix the two I mentioned 15:39:02 <obondarev> one of examples http://logs.openstack.org/55/272555/2/check/gate-tempest-dsvm-neutron-dvr-multinode-full/e6f9faa/console.html 15:39:18 <obondarev> 17 failed tests 15:39:54 <obondarev> it seems like smth went wrong and tests start to fail 15:40:12 <obondarev> didn't have a chance to look closer yet 15:41:25 <Swami> obondarev: seems to state about the SSHtimeout issue. 15:41:33 <haleyb> i know a lot of mtu patches have been merging as well, both in neutron and the gate, which might help as well 15:41:37 <haleyb> https://review.openstack.org/#/q/topic:multinode-neutron-mtu 15:42:14 <obondarev> cool, hope it'l help 15:42:35 <Swami> haleyb: obondarev: btw the patch that I added for debugging the SSHtimeout issue is not quiet working right because of timing issue. I suspect by the time I try to ping the metadata is not in place. 15:43:24 <Swami> haleyb: obondarev: is there a test to validate if the metadata is properly received by the VM. 15:44:23 <obondarev> I think there are plenty in tempest 15:45:04 <obondarev> any that boots a vm and checks connectivity 15:46:20 <Swami> obondarev: we normally check it from external connectivity, but that does not tell us if it is a metadata issue or not. 15:46:37 <obondarev> ah 15:46:56 <obondarev> you mean which are checking specifically for metadata 15:47:08 <Swami> obondarev: yes. 15:47:16 <obondarev> can't remember 15:47:36 <haleyb> Swami: i don't think there is a test, but we do have the VM console log 15:47:45 <obondarev> right 15:48:02 <Swami> haleyb: yes, it is only through the vm console log we will be able to identify. 15:48:37 <Swami> haleyb: obondarev: is there a way to figure out from the router namespace that the metadata request was processed for the particular vm. 15:49:15 <haleyb> I don't know if the proxy logs are there, but the metadata server log is i believe 15:49:31 <Swami> haleyb: ok 15:50:36 <haleyb> http://logs.openstack.org/55/272555/2/check/gate-tempest-dsvm-neutron-dvr-multinode-full/e6f9faa/console.html#_2016-01-26_18_25_22_075 15:50:58 <haleyb> i don't know metadata well enough to know if that's a complete failure 15:51:51 <Swami> haleyb: so that seems that metadata might be another victim for the SSHtimeout issue. 15:52:16 <haleyb> Swami: yes, assuming these VMs are getting ssh keys that way 15:52:47 <obondarev> and one example is https://bugs.launchpad.net/neutron/+bug/1522824 15:52:48 <openstack> Launchpad bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] - Assigned to shihanzhang (shihanzhang) 15:52:51 <Swami> haleyb: yes I think in the tempest run that's how they pass the keys. 15:53:07 <obondarev> haleyb: Swami: exactly 15:54:15 <Swami> agreed. 15:54:39 <haleyb> that failure is different from the one i linked, seems metadata worked in my case, but only instance-id was returned 15:55:38 <haleyb> 5 minutes left, let's move onto last topic 15:55:46 <haleyb> #topic Performance/Scalability 15:55:54 <haleyb> obondarev: one patch left? :) 15:55:59 <obondarev> haleyb: right 15:56:02 <obondarev> the main one 15:56:34 <obondarev> it'll require some rebase work once HA for DVR merges 15:56:54 <obondarev> or it'll merge first :) 15:57:07 <haleyb> i borked the HA DVR patch updating the commit message, so it will take an hour or more to clear 15:58:14 <Swami> haleyb: no problem, anyway we have a couple of tests that were failing, let me check with adolfo on this. 15:58:21 <Swami> before we close. 15:58:36 <haleyb> he's getting a free recheck 15:58:57 <haleyb> #topic Open discussion 15:59:03 <Swami> haleyb: we were (obondarev and myself) planning to add in a general session talk on DVR improvements for Mitaka. 15:59:16 <Swami> haleyb: would you like to be part of that discussion. 15:59:35 <haleyb> Swami: sure, i was going to ping you offline about it, but works for me 15:59:56 <haleyb> let me know how i can help with abstract 16:00:16 <Swami> haleyb: ok I will loop you in the discussion. We have an abstract in google doc. 16:00:21 <obondarev> haleyb: https://docs.google.com/document/d/1WCjq0FL1NxSistA7nfceeXaf3gmVzDCvJm0Jmwt80Dw/edit 16:00:31 <Swami> obondarev: thanks for the link 16:00:38 <haleyb> thanks, and we need to close out for the ML2ers 16:00:40 <haleyb> #endmeeting