15:04:07 <haleyb> #startmeeting neutron_dvr 15:04:08 <openstack> Meeting started Wed Sep 21 15:04:07 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:04:09 <Swami> cool 15:04:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:04:13 <openstack> The meeting name has been set to 'neutron_dvr' 15:04:26 <haleyb> #chair Swami 15:04:26 <openstack> Current chairs: Swami haleyb 15:04:33 <haleyb> #topic Announcements 15:05:07 <haleyb> The sky is blue, the grass is green... and RC1 was cut 15:05:26 <Swami> haleyb: no rain so far. 15:06:09 <haleyb> not much here either 15:06:35 <Swami> haleyb: :-) 15:07:11 <haleyb> Any critical fixes that need to get into newton should be raised to our leader, i am but a minion 15:07:39 <Swami> haleyb: sure 15:08:23 <haleyb> So master is open, and things will start merging, but we're still trying to focus on high priority bug fixes 15:08:47 <haleyb> with that... 15:08:47 <Swami> haleyb: good 15:08:50 <haleyb> #topic Bugs 15:08:58 <Swami> haleyb: thanks 15:09:07 <Swami> this week we had one new bug reported. 15:09:26 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1625333 15:09:27 <openstack> Launchpad bug 1625333 in neutron "Booting VM with a Floating IP and pinging it via that takes a long time with errors in L3-Agent logs when using DVR" [Undecided,New] 15:09:43 <Swami> haleyb: I think you have already replied to this bug. 15:10:01 <Swami> Is this related to the non_local_binding setting. 15:10:42 <Swami> We see the 'GARP' fails from the FIP namespace. 15:10:58 <haleyb> Swami: yes, it seems to be, but that should have been set, so a reproducer is needed 15:11:02 <Swami> So that's the reason for the delay in response. 15:11:48 <Swami> haleyb: Ya there are redundant settings for the non_local_bind to be set. I don't see any error message for it to set. 15:12:04 <Swami> So something else might be causing this issue. 15:13:03 <haleyb> We'd need to know kernel version, or get on a system exhibiting the problem, or see the full logs for the l3-agent as i don't see them there 15:13:36 <Swami> haleyb: yes you are right. I don't think we are seeing such error logs in the gate. 15:14:15 <haleyb> i'll add another comment as i don't think it's reasonable to ask that we load, configure and run this new rally test 15:14:49 <Swami> haleyb: makes sense 15:15:50 <Swami> That's all for the new bugs, and let us go over the old bugs. 15:16:04 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1612192 15:16:07 <openstack> Launchpad bug 1612192 in neutron "L3 DVR: Unable to complete operation on subnet" [High,Confirmed] 15:17:06 <Swami> Do we still see this message in the gate check jobs? 15:18:46 <Swami> You have mentioned last week, that we don't see this error for the last 6 days, if so should we still have the severity as High. 15:19:19 <haleyb> sorry, gott call 15:19:43 <Swami> haleyb: no problem, go ahead 15:20:39 <Swami> The next one in the list is 15:20:53 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1612804 15:20:54 <openstack> Launchpad bug 1612804 in neutron "test_shelve_instance fails with sshtimeout" [High,Confirmed] 15:21:22 <Swami> This bug is also under the watch list. 15:21:58 <haleyb> i'm back 15:22:06 <Swami> haleyb: great 15:22:46 <haleyb> i don't think we've seen that in weeks now, i'll run a logstash and update 15:22:52 <Swami> haleyb: both the bugs mentioned above can be under the watch list and let us see if this has been triggered by something else other than dvr 15:23:43 <haleyb> of course i jinxed myself and see them now 15:24:17 <haleyb> but i need to do some work on it, they might not be in the gate 15:24:24 <Swami> haleyb: do you mean these errors are still seen upstream 15:25:25 <haleyb> the logstash link from the bug shows errors 15:25:49 <Swami> haleyb: ok 15:25:57 <haleyb> but i need to filter by build_queue:gate 15:26:44 <Swami> The most vulnerable tests for dvr are the 'shelve_instance', 'volume_boot_pattern' and dhcpv6_statefull and dhcpv6_stateless. 15:27:16 <haleyb> there have been 7 in the gate the past 7 days 15:27:49 <Swami> haleyb: ok, and is it only seen for the dvr and not for the non-dvr case. 15:28:57 <haleyb> yes, i think just dvr 15:29:08 <Swami> haleyb: ok 15:29:18 <haleyb> actually one is ovn 15:29:30 <Swami> haleyb: thanks 15:29:36 <Swami> The next in the list is 15:29:39 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1606741 15:29:40 <openstack> Launchpad bug 1606741 in neutron "Metadata service for instances is unavailable when the l3-agent on the compute host is dvr_snat mode" [High,In progress] - Assigned to Brian Haley (brian-haley) 15:29:58 <haleyb> why do i own that :) 15:30:05 <Swami> haleyb: we discussed about this bug. Are we still considering this as a bug and need to fix. 15:30:29 <Swami> Not sure, someone assigned your name to the bug recently. May be the recent bug deputy would have done it. 15:30:46 <haleyb> i think i updated the review, jenkins did it 15:31:09 <Swami> It might be a minor fix where it checks for the agent type and takes necessary action. 15:31:37 <Swami> We might not be configuring it for the dvr_snat agent. 15:31:58 <haleyb> the patch needs an update at the least based on the comments 15:32:57 <Swami> haleyb: yes let us wait for the update. 15:34:35 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1571676 15:34:36 <openstack> Launchpad bug 1571676 in neutron "After binding a floating IP to VM, the static route can't work in DVR." [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:34:58 <Swami> Patch needs review #link https://review.openstack.org/#/c/308068/ 15:36:16 <haleyb> i need to re-review, but think it's close 15:36:34 <Swami> haleyb: yes it should be close. 15:36:40 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1506567 15:36:41 <openstack> Launchpad bug 1506567 in neutron "No information from Neutron Metering agent" [Undecided,Confirmed] 15:37:21 <Swami> I need to work on it. I started to triage it but got distracted. 15:37:56 <Swami> That's all I had for bugs this week. 15:38:11 <haleyb> i think it has similarities to the ipv6 PD issue/fix ritesh had done, but we need a generic way for an agent to ask what namespace to work on 15:38:27 <haleyb> thanks for all the updates Swami 15:38:39 <Swami> haleyb: yes I agree. 15:39:22 <haleyb> jschwarz: any HA bugs to add? you fixed them all, right? :) 15:40:22 <Swami> haleyb: I think there is one ha functional test issue still 15:40:24 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1580648 15:40:25 <openstack> Launchpad bug 1580648 in neutron "Two HA routers in master state during functional test" [Undecided,Opinion] 15:40:31 <jschwarz> haleyb, I hope so :) 15:40:33 <jschwarz> haleyb, I was on PTO for the last week, but I haven't received any mails about new bugs, etcv 15:40:49 <Swami> jschwarz:^^ 15:41:08 <jschwarz> Swami, I'll add this one to my todo list and will have a good response by next week hopefully 15:41:33 <jschwarz> alas, I haven't got anything to add yet 15:41:56 <haleyb> jschwarz: should i assign it to you? 15:42:28 <jschwarz> haleyb, lets assign it to me with the "John should decide if this is a bug or not" and I'll work on it 15:42:55 <jschwarz> haleyb, opinions seem to be conflicted on that one so that seems like the best in-the-middle approach 15:44:19 <haleyb> jschwarz: it's all yours now :) 15:44:24 <jschwarz> haleyb, yay! :) 15:45:26 <haleyb> #topic Gate failures 15:46:38 <haleyb> the failures had been up and down, but now it seems the tempest dvr gate job is at 10% failure 15:46:49 <haleyb> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=5&fullscreen 15:47:47 <Swami> haleyb: that is not good. 15:47:53 <haleyb> this started yesterday, but actually now i see it's falling this morning, could be directly releated to the regular job 15:49:13 <haleyb> i will watch it today and look for failures, but sometimes we are just the victim 15:49:23 <Swami> haleyb: as always 15:50:03 <haleyb> #topic Open Discussion 15:50:21 <jschwarz> So I talked a few weeks back about adding a new DVR+HA gate 15:50:39 <jschwarz> Unfortunately I just lack the capacity to deal with this altogether and I've left it untouched since 15:50:48 <Swami> jschwarz: yes 15:51:07 <jschwarz> I'd be happy if some generous contributer help me with this one 15:51:32 <haleyb> jschwarz: is it all a yaml change in the infra jobs? 15:51:37 <Swami> jschwarz: I think we should first get the infra permission to run three node for multinode job. 15:52:03 <Swami> jschwarz: where two node can run dvr_snat ha, and the remaining can run the dvr agent type. 15:52:16 <jschwarz> haleyb, I have no idea of how to work infra stuff 15:52:25 <jschwarz> Swami, agreed 15:52:33 <haleyb> i would think we could add an experimental job that is 3 nodes 15:52:48 <jschwarz> this seems like mostly a biocratic task that I don't have the capacity to do atm 15:53:11 <haleyb> jschwarz: hmm, i have done very little there myself, but should be similar to the DVR job 15:53:26 * haleyb googles biocratic :) 15:53:39 * jschwarz 's spelling is bad ;-) 15:53:47 <Swami> haleyb: need to add config details for the ha. 15:54:17 <jschwarz> haleyb, so if you can find the exact code that hosts the DVR jobs, I can probably do some copy-pasting and propose on that, and get the discussion going on that patch instead 15:54:26 <haleyb> Swami: right, and possibly packages? 15:54:44 <Swami> jschwarz: I can point you to the patch in infra that I added for making the multinode job. 15:54:46 <haleyb> jschwarz: i'll look, but some of that infra code is kryptonite 15:54:56 <jschwarz> Swami, awesome 15:55:01 * haleyb takes a step back and lets Swami help 15:55:15 <jschwarz> ok, so lets co-operate on this one and see how it goes 15:55:24 <jschwarz> I'll do my best to put some cycles into this 15:55:30 <Swami> jschwarz: I will send you the link to the patch and we can follow up on this. 15:55:48 <jschwarz> Swami, much appreciated 15:56:15 <jschwarz> In other news, I have plans to submit some patches that refactors the l3 scheduler in the coming weeks 15:56:39 <jschwarz> me and kevinbenton worked on a DB change that got in for N, so refactors can also be backported to N now, which is good for us 15:56:39 <Swami> jschwarz: will look for it. 15:56:49 <haleyb> +1 15:56:51 <jschwarz> I'll make sure to add you guys as reviewers 15:57:12 <jschwarz> seems like haleyb likes to ask for patches to review ;-) 15:57:53 <haleyb> that's the only way i can fall asleep at night now, like reading a book 15:58:18 <Swami> haleyb: great medicine to sleep. :-) 15:58:40 <haleyb> i had one other bug that i must have forgot to add 15:58:42 <haleyb> https://bugs.launchpad.net/neutron/+bug/1624515 15:58:43 <openstack> Launchpad bug 1624515 in neutron "DVR: SNAT port not found in the list error in check jobs" [Medium,In progress] - Assigned to Brian Haley (brian-haley) 15:59:02 <Swami> haleyb: yes 15:59:11 <haleyb> Swami: oleg said he's fine assuming i add a test, so i'll fix that up 15:59:20 <Swami> haleyb: ok thanks, 15:59:56 <haleyb> we need to write a DVR book to hand out at the summit one day... 16:00:05 <jschwarz> haleyb, sounds like a plan 16:00:11 <haleyb> thanks everyone, top of hour 16:00:14 * jschwarz is working on a DVR lecture for some folks atm 16:00:19 <haleyb> #endmeeting