15:04:07 #startmeeting neutron_dvr 15:04:08 Meeting started Wed Sep 21 15:04:07 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:04:09 cool 15:04:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:04:13 The meeting name has been set to 'neutron_dvr' 15:04:26 #chair Swami 15:04:26 Current chairs: Swami haleyb 15:04:33 #topic Announcements 15:05:07 The sky is blue, the grass is green... and RC1 was cut 15:05:26 haleyb: no rain so far. 15:06:09 not much here either 15:06:35 haleyb: :-) 15:07:11 Any critical fixes that need to get into newton should be raised to our leader, i am but a minion 15:07:39 haleyb: sure 15:08:23 So master is open, and things will start merging, but we're still trying to focus on high priority bug fixes 15:08:47 with that... 15:08:47 haleyb: good 15:08:50 #topic Bugs 15:08:58 haleyb: thanks 15:09:07 this week we had one new bug reported. 15:09:26 #link https://bugs.launchpad.net/neutron/+bug/1625333 15:09:27 Launchpad bug 1625333 in neutron "Booting VM with a Floating IP and pinging it via that takes a long time with errors in L3-Agent logs when using DVR" [Undecided,New] 15:09:43 haleyb: I think you have already replied to this bug. 15:10:01 Is this related to the non_local_binding setting. 15:10:42 We see the 'GARP' fails from the FIP namespace. 15:10:58 Swami: yes, it seems to be, but that should have been set, so a reproducer is needed 15:11:02 So that's the reason for the delay in response. 15:11:48 haleyb: Ya there are redundant settings for the non_local_bind to be set. I don't see any error message for it to set. 15:12:04 So something else might be causing this issue. 15:13:03 We'd need to know kernel version, or get on a system exhibiting the problem, or see the full logs for the l3-agent as i don't see them there 15:13:36 haleyb: yes you are right. I don't think we are seeing such error logs in the gate. 15:14:15 i'll add another comment as i don't think it's reasonable to ask that we load, configure and run this new rally test 15:14:49 haleyb: makes sense 15:15:50 That's all for the new bugs, and let us go over the old bugs. 15:16:04 #link https://bugs.launchpad.net/neutron/+bug/1612192 15:16:07 Launchpad bug 1612192 in neutron "L3 DVR: Unable to complete operation on subnet" [High,Confirmed] 15:17:06 Do we still see this message in the gate check jobs? 15:18:46 You have mentioned last week, that we don't see this error for the last 6 days, if so should we still have the severity as High. 15:19:19 sorry, gott call 15:19:43 haleyb: no problem, go ahead 15:20:39 The next one in the list is 15:20:53 #link https://bugs.launchpad.net/neutron/+bug/1612804 15:20:54 Launchpad bug 1612804 in neutron "test_shelve_instance fails with sshtimeout" [High,Confirmed] 15:21:22 This bug is also under the watch list. 15:21:58 i'm back 15:22:06 haleyb: great 15:22:46 i don't think we've seen that in weeks now, i'll run a logstash and update 15:22:52 haleyb: both the bugs mentioned above can be under the watch list and let us see if this has been triggered by something else other than dvr 15:23:43 of course i jinxed myself and see them now 15:24:17 but i need to do some work on it, they might not be in the gate 15:24:24 haleyb: do you mean these errors are still seen upstream 15:25:25 the logstash link from the bug shows errors 15:25:49 haleyb: ok 15:25:57 but i need to filter by build_queue:gate 15:26:44 The most vulnerable tests for dvr are the 'shelve_instance', 'volume_boot_pattern' and dhcpv6_statefull and dhcpv6_stateless. 15:27:16 there have been 7 in the gate the past 7 days 15:27:49 haleyb: ok, and is it only seen for the dvr and not for the non-dvr case. 15:28:57 yes, i think just dvr 15:29:08 haleyb: ok 15:29:18 actually one is ovn 15:29:30 haleyb: thanks 15:29:36 The next in the list is 15:29:39 #link https://bugs.launchpad.net/neutron/+bug/1606741 15:29:40 Launchpad bug 1606741 in neutron "Metadata service for instances is unavailable when the l3-agent on the compute host is dvr_snat mode" [High,In progress] - Assigned to Brian Haley (brian-haley) 15:29:58 why do i own that :) 15:30:05 haleyb: we discussed about this bug. Are we still considering this as a bug and need to fix. 15:30:29 Not sure, someone assigned your name to the bug recently. May be the recent bug deputy would have done it. 15:30:46 i think i updated the review, jenkins did it 15:31:09 It might be a minor fix where it checks for the agent type and takes necessary action. 15:31:37 We might not be configuring it for the dvr_snat agent. 15:31:58 the patch needs an update at the least based on the comments 15:32:57 haleyb: yes let us wait for the update. 15:34:35 #link https://bugs.launchpad.net/neutron/+bug/1571676 15:34:36 Launchpad bug 1571676 in neutron "After binding a floating IP to VM, the static route can't work in DVR." [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:34:58 Patch needs review #link https://review.openstack.org/#/c/308068/ 15:36:16 i need to re-review, but think it's close 15:36:34 haleyb: yes it should be close. 15:36:40 #link https://bugs.launchpad.net/neutron/+bug/1506567 15:36:41 Launchpad bug 1506567 in neutron "No information from Neutron Metering agent" [Undecided,Confirmed] 15:37:21 I need to work on it. I started to triage it but got distracted. 15:37:56 That's all I had for bugs this week. 15:38:11 i think it has similarities to the ipv6 PD issue/fix ritesh had done, but we need a generic way for an agent to ask what namespace to work on 15:38:27 thanks for all the updates Swami 15:38:39 haleyb: yes I agree. 15:39:22 jschwarz: any HA bugs to add? you fixed them all, right? :) 15:40:22 haleyb: I think there is one ha functional test issue still 15:40:24 #link https://bugs.launchpad.net/neutron/+bug/1580648 15:40:25 Launchpad bug 1580648 in neutron "Two HA routers in master state during functional test" [Undecided,Opinion] 15:40:31 haleyb, I hope so :) 15:40:33 haleyb, I was on PTO for the last week, but I haven't received any mails about new bugs, etcv 15:40:49 jschwarz:^^ 15:41:08 Swami, I'll add this one to my todo list and will have a good response by next week hopefully 15:41:33 alas, I haven't got anything to add yet 15:41:56 jschwarz: should i assign it to you? 15:42:28 haleyb, lets assign it to me with the "John should decide if this is a bug or not" and I'll work on it 15:42:55 haleyb, opinions seem to be conflicted on that one so that seems like the best in-the-middle approach 15:44:19 jschwarz: it's all yours now :) 15:44:24 haleyb, yay! :) 15:45:26 #topic Gate failures 15:46:38 the failures had been up and down, but now it seems the tempest dvr gate job is at 10% failure 15:46:49 http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=5&fullscreen 15:47:47 haleyb: that is not good. 15:47:53 this started yesterday, but actually now i see it's falling this morning, could be directly releated to the regular job 15:49:13 i will watch it today and look for failures, but sometimes we are just the victim 15:49:23 haleyb: as always 15:50:03 #topic Open Discussion 15:50:21 So I talked a few weeks back about adding a new DVR+HA gate 15:50:39 Unfortunately I just lack the capacity to deal with this altogether and I've left it untouched since 15:50:48 jschwarz: yes 15:51:07 I'd be happy if some generous contributer help me with this one 15:51:32 jschwarz: is it all a yaml change in the infra jobs? 15:51:37 jschwarz: I think we should first get the infra permission to run three node for multinode job. 15:52:03 jschwarz: where two node can run dvr_snat ha, and the remaining can run the dvr agent type. 15:52:16 haleyb, I have no idea of how to work infra stuff 15:52:25 Swami, agreed 15:52:33 i would think we could add an experimental job that is 3 nodes 15:52:48 this seems like mostly a biocratic task that I don't have the capacity to do atm 15:53:11 jschwarz: hmm, i have done very little there myself, but should be similar to the DVR job 15:53:26 * haleyb googles biocratic :) 15:53:39 * jschwarz 's spelling is bad ;-) 15:53:47 haleyb: need to add config details for the ha. 15:54:17 haleyb, so if you can find the exact code that hosts the DVR jobs, I can probably do some copy-pasting and propose on that, and get the discussion going on that patch instead 15:54:26 Swami: right, and possibly packages? 15:54:44 jschwarz: I can point you to the patch in infra that I added for making the multinode job. 15:54:46 jschwarz: i'll look, but some of that infra code is kryptonite 15:54:56 Swami, awesome 15:55:01 * haleyb takes a step back and lets Swami help 15:55:15 ok, so lets co-operate on this one and see how it goes 15:55:24 I'll do my best to put some cycles into this 15:55:30 jschwarz: I will send you the link to the patch and we can follow up on this. 15:55:48 Swami, much appreciated 15:56:15 In other news, I have plans to submit some patches that refactors the l3 scheduler in the coming weeks 15:56:39 me and kevinbenton worked on a DB change that got in for N, so refactors can also be backported to N now, which is good for us 15:56:39 jschwarz: will look for it. 15:56:49 +1 15:56:51 I'll make sure to add you guys as reviewers 15:57:12 seems like haleyb likes to ask for patches to review ;-) 15:57:53 that's the only way i can fall asleep at night now, like reading a book 15:58:18 haleyb: great medicine to sleep. :-) 15:58:40 i had one other bug that i must have forgot to add 15:58:42 https://bugs.launchpad.net/neutron/+bug/1624515 15:58:43 Launchpad bug 1624515 in neutron "DVR: SNAT port not found in the list error in check jobs" [Medium,In progress] - Assigned to Brian Haley (brian-haley) 15:59:02 haleyb: yes 15:59:11 Swami: oleg said he's fine assuming i add a test, so i'll fix that up 15:59:20 haleyb: ok thanks, 15:59:56 we need to write a DVR book to hand out at the summit one day... 16:00:05 haleyb, sounds like a plan 16:00:11 thanks everyone, top of hour 16:00:14 * jschwarz is working on a DVR lecture for some folks atm 16:00:19 #endmeeting