15:01:24 <haleyb> #startmeeting neutron_dvr 15:01:26 <openstack> Meeting started Wed Aug 31 15:01:24 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:29 <openstack> The meeting name has been set to 'neutron_dvr' 15:01:36 <haleyb> #chair Swami 15:01:37 <openstack> Current chairs: Swami haleyb 15:02:00 <haleyb> #topic Announcements 15:02:22 <haleyb> N-3 is closing this week 15:02:43 <haleyb> any new features that haven't merged will need to be granted a FFE 15:02:45 <Swami> almost there 15:03:11 <haleyb> We still will be able to fix bugs, especially if they help the check/gate job stability 15:03:14 <Swami> haleyb: what do you think about the Fast path exit, do we need a FFE or can we move it to the next release and backport it. 15:04:04 <haleyb> Swami: i know we don't usually backport new features, but there doesn't seem to be a config change required 15:04:21 <haleyb> how close do you think it is? 15:04:23 <Swami> it can still be considered as a bug 15:04:32 <Swami> It is pretty good. 15:04:52 <Swami> I have addressed couple of review comments yesterday as well on the route rules. 15:06:09 <Swami> haleyb: if you get some time today you can review the rule patch for the fast path exit and we can keep it ready, it if works out we can merge. 15:06:37 <haleyb> I will have to look at the reviews again, then we can move forward once we're happy 15:06:56 <Swami> haleyb: ok, make sense. 15:07:07 <Swami> at this point, we don't need to hurry. 15:07:07 <haleyb> #topic Bugs 15:07:19 <haleyb> right, i'd rather get it right 15:07:42 <Swami> haleyb: we don't have new bugs filed this week. So we pretty much have to address all the existing bugs. 15:08:20 * jschwarz is also here, but in a different meeting 15:08:25 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1612192 15:08:25 <openstack> Launchpad bug 1612192 in neutron "L3 DVR: Unable to complete operation on subnet" [Critical,Confirmed] 15:08:25 <haleyb> No new bugs? must be a record 15:09:19 <Swami> haleyb: Did you get a chance to triage this. 15:09:51 <Swami> I think this is seen in the gate. Are we still seeing this in the gate after the gate stabilization. 15:09:58 <haleyb> Swami: no, been mostly looking at the dvr multinode job failures 15:10:17 <haleyb> i don't know if we're still seeing it, need to ask logstash 15:10:33 <Swami> Both the criticals that we have were seen in the gate. 15:11:09 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1612804 15:11:09 <openstack> Launchpad bug 1612804 in neutron "test_shelve_instance fails with sshtimeout" [Critical,Confirmed] 15:11:53 <Swami> This was also reported as seen in the gate. 15:12:52 <haleyb> regarding the first, there have been 8 occurences in the past two weeks, all on one day 15:12:55 * jschwarz is back now 15:13:13 <Swami> haleyb: so will it be due to the gate issues or a bad patch. 15:13:16 <Swami> jschwarz: hi 15:13:30 <haleyb> Swami: actually 8 in past 30 days 15:13:57 <haleyb> all happened in 24 hours on 8/23-24 15:14:08 <Swami> haleyb: so let us monitor it for another week and see what happens. 15:14:11 <haleyb> i will look and close if necessary 15:15:22 <haleyb> the second happens more often, but i don't know if it's neutron, have not dug 15:15:43 <Swami> haleyb: ok thanks. 15:15:44 <haleyb> i'll guess and say it's an issue saving to storage 15:16:17 <Swami> haleyb: you may be right. 15:16:40 <Swami> haleyb: we have seen the shelve_instance fail earlier as well. 15:17:28 <haleyb> Swami: hmm, some show a failure to get DHCP 15:17:51 <Swami> haleyb: so is there two different symptoms for the same failure. 15:17:53 <haleyb> and that's in the non-dvr test 15:18:47 <haleyb> Swami: any time we can't ssh to an instance it's neutron's fault :( 15:19:01 <Swami> haleyb: yeah! you got it. 15:19:22 <haleyb> s/neutron/dvr 15:20:02 <Swami> The next set of bugs are related to DVR+HA+L3 15:20:14 <Swami> we have discussed about these bugs earlier. 15:20:23 <jschwarz> haleyb, we're seeing a problem where in certain cases, with HA-only routers, instances lose connectivity 15:20:29 <jschwarz> haleyb, that's still DVR's fault ;-) 15:20:46 <haleyb> jschwarz: let me know when patches are up :) 15:20:53 <jschwarz> haleyb, :D 15:21:26 <haleyb> jschwarz: are there any HA patches that need attention? 15:21:27 <Swami> jschwarz: mostly the L3+HA+DVR combination is generating lot more bugs. 15:21:59 <Swami> jschwarz: any update on the L3+HA+DVR bugs that you are working on. 15:22:05 <jschwarz> yes, sorry 15:22:12 <jschwarz> I came unprepared :) 15:22:17 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1597461 15:22:17 <openstack> Launchpad bug 1597461 in neutron "L3 HA: 2 masters after reboot of controller" [High,Fix released] - Assigned to John Schwarz (jschwarz) 15:22:21 <Swami> jschwarz: no problem 15:22:39 <jschwarz> Swami, the patch for that merged a few days ago 15:22:54 <Swami> jschwarz: so can we close this bug, is it all fixed. 15:23:13 <jschwarz> Swami, yes, with a side note that a couple of other repos are complaining it broke some of their tests 15:23:30 <Swami> jschwarz: that's not good. 15:23:53 <jschwarz> Swami, one was already taken care of, the other one I'm not sure if it has (it was from networking-odl) 15:24:03 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1602320 15:24:03 <openstack> Launchpad bug 1602320 in neutron "ha + distributed router: keepalived process kill vrrp child process" [Undecided,In progress] - Assigned to Dongcan Ye (hellochosen) 15:24:40 <jschwarz> Swami, https://review.openstack.org/#/c/357458/9/neutron/services/l3_router/l3_router_plugin.py@77 15:25:38 <Swami> jschwarz: so are you planning to rollback or fix it with a different patch 15:25:57 <jschwarz> Swami, I see a bug fix here: https://review.openstack.org/#/c/363175/ 15:26:10 <jschwarz> Swami, I think it's best to keep it separated and not rollback 15:26:21 <jschwarz> Swami, I will work with Isaku to provide a good fix for this ASAP 15:26:30 <Swami> jschwarz: ok thanks. 15:27:00 <Swami> jschwarz: do you have any input on the keepalived bug that I posted above. 15:27:19 <jschwarz> Swami, nope :< 15:27:40 <jschwarz> Swami, it looks like this patch has seen no action for 2 weeks now 15:27:51 <jschwarz> I'll ping Dongcan Ye to see if this can be moved forward 15:28:04 <Swami> jschwarz: thanks 15:28:36 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1602614 15:28:36 <openstack> Launchpad bug 1602614 in neutron "DVR + L3 HA loss during failover is higher that it is expected" [High,In progress] - Assigned to venkata anil (anil-venkata) 15:29:09 <jschwarz> Swami, the fix is https://review.openstack.org/#/c/255237/ - the long l2pop patch by anilvenkata 15:29:33 <jschwarz> it's being reviewed by Carl, Assaf and Kevin and since it's a complicated fix it's taking a while 15:29:34 <Swami> jschwarz: I think that is still under review. 15:29:43 <anilvenkata> yes 15:29:44 <jschwarz> hopefully this will get in N's RC 15:29:44 <Swami> jschwarz: ok makes sense. 15:29:56 <anilvenkata> getting different suggestions :) 15:30:03 <Swami> jschwarz: ok 15:30:19 <Swami> anilvenkata: thanks 15:30:32 <anilvenkata> thanks Swami jschwarz haleyb 15:31:27 <jschwarz> Swami, we also have https://bugs.launchpad.net/neutron/+bug/1607381 15:31:27 <openstack> Launchpad bug 1607381 in neutron "HA router in l3 dvr_snat/legacy agent has no ha_port" [Undecided,In progress] - Assigned to LIU Yulong (dragon889) 15:31:27 <carl_baldwin> I did not get to that review yesterday. I hope to get to it today. 15:31:37 <carl_baldwin> anilvenkata: jschwarz: ^ 15:31:43 <jschwarz> carl_baldwin, ack, thanks :) 15:31:55 <anilvenkata> carl_baldwin, thanks Carl 15:31:57 <jschwarz> the bug I linked to is being dealt with in https://review.openstack.org/#/c/265672/ 15:32:04 <Swami> anilvenkata: can we close this bug or is it still valid #link https://bugs.launchpad.net/neutron/+bug/1595043 15:32:04 <openstack> Launchpad bug 1595043 in neutron "Make DVR portbinding implementation useful for HA ports" [Medium,In progress] - Assigned to venkata anil (anil-venkata) 15:32:22 <anilvenkata> Swami, we will keep it 15:32:31 <anilvenkata> Swami, we have other issues with HA 15:32:37 <Swami> anilvenkata: ok. 15:32:44 <anilvenkata> Swami, thanks Swami 15:33:19 <jschwarz> going back, https://review.openstack.org/#/c/265672/ just needs reviews IMO - it looks to be ready 15:33:57 <Swami> jschwarz: thanks 15:34:07 <Swami> The next one is 15:34:12 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1593354 15:34:12 <openstack> Launchpad bug 1593354 in neutron "SNAT HA failed because of missing nat rule in snat namespace iptable" [Undecided,New] 15:34:42 <jschwarz> I have not seen this one before (and it wasn't even triaged) 15:34:54 <Swami> jschwarz: yes 15:35:12 <Swami> haleyb: did you get a chance to triage this bug. 15:35:43 <jschwarz> anilvenkata, ^ is this something that your patch deals with? 15:35:54 <haleyb> Swami: no 15:36:10 <anilvenkata> jschwarz, no for bug 1593354 15:36:10 <openstack> bug 1593354 in neutron "SNAT HA failed because of missing nat rule in snat namespace iptable" [Undecided,New] https://launchpad.net/bugs/1593354 15:36:38 <Swami> haleyb: thanks, 15:36:45 <jschwarz> Swami, I'll have a pass through it tomorrow and triage this, unless haleyb beats me to it 15:38:06 <Swami> jschwarz: thanks 15:38:42 <Swami> That's all I had for the bugs this week. 15:39:11 <haleyb> any other bugs to talk about from anyone? 15:39:53 * jschwarz claps his hands to show they are empty 15:40:28 <haleyb> you need to clap again and make all the bugs go away :) 15:40:38 * jschwarz claps again 15:40:48 <Swami> jschwarz: thanks 15:40:50 * jschwarz magically fails to clap as his hands miss each other 15:41:08 <haleyb> #topic Gate failures 15:41:47 <haleyb> the dvr and dvr-multinode (and regular multinode) jobs still have higher failure rates 15:42:31 <haleyb> i am continuing to triage this when i have time, filing bugs as i go 15:42:31 <carl_baldwin> Any insight in to the high multi-node rate? 15:43:36 <haleyb> carl_baldwin: DHCP is failing, but have not determined the timeline to see if it's the agent not starting 15:44:29 <carl_baldwin> haleyb: Are you all alone in looking in to this? 15:44:33 <haleyb> kevinbenton and armax have been cleaning-up the dhcp issues i've been finding, but none actually fix the bug 15:44:51 <armax> haleyb: dvr is not happy still 15:45:32 <haleyb> armax: yes, i continue to look 15:46:55 <haleyb> and i do see the neutron-dvr job is about 12% failure in the gate today 15:47:10 <Swami> haleyb: is this for the single node 15:47:27 <haleyb> yes, http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=5&fullscreen 15:47:46 <haleyb> gate-tempest-dsvm-neutron-dvr gate job 15:48:12 <Swami> haleyb: thanks 15:48:41 <haleyb> but the failure rate is falling, don't know if it was related to the dhcp patch that merged last night 15:49:08 <Swami> haleyb: will this also fix the multinode issue with respect to dhcp. 15:49:40 <haleyb> Swami: don't think so, dhcp could fail for many reasons as we know... 15:50:30 <haleyb> carl_baldwin: i will look into the gate dvr failure, and yes i'm all alone doing it at the moment, but will reach out to people 15:50:31 <Swami> haleyb: yes. 15:51:02 <carl_baldwin> haleyb: let me know, I might be able to help some. 15:51:37 <haleyb> i need to find a patch failing first 15:52:12 <Swami> haleyb: is there a particular test that fails with respect to dhcp or is it random. 15:52:38 <Swami> Is it the instance that is not able to get the dhcp and so the ssh fails. 15:52:54 <haleyb> Swami: TestNetworkBasicOps is usually the one 15:53:44 <Swami> haleyb: thanks 15:53:51 <haleyb> right, console shows dhcp failing and no IP on eth0. i started looking on the server and dhcp agent and found some issues, but it's a work in progress 15:55:10 <Swami> haleyb: thanks 15:55:18 <haleyb> #topic Open discussion 15:55:37 <haleyb> Swami: only note on wiki is live migration patch 15:56:01 <Swami> haleyb: what is that. I don't get it. 15:56:36 <haleyb> https://review.openstack.org/#/c/275073/https://review.openstack.org/#/c/275073/ 15:56:49 <haleyb> nova patch for live migration w/neutron 15:57:15 <Swami> haleyb: yes got it. We are still waiting for a +2 on it. 15:57:34 <haleyb> even the experimental job was happy 15:57:49 <Swami> I think john ran the experimental job, so let us wait and see. 15:58:18 <Swami> #link https://review.openstack.org/#/c/353788/ 15:58:31 <Swami> haleyb: I need your blessing on this patch. 15:58:57 <haleyb> Swami: yes, saw that, will look 15:59:21 <Swami> haleyb: thanks 15:59:36 <haleyb> we're at end of hour, thanks for all the hard work everyone! 15:59:38 <Swami> There is one backport patch as well for /stable/liberyt 15:59:45 <Swami> we will take it offline. 15:59:47 <haleyb> #endmeeting