15:00:34 <haleyb> #startmeeting neutron_dvr 15:00:34 <openstack> Meeting started Wed Aug 10 15:00:34 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:38 <openstack> The meeting name has been set to 'neutron_dvr' 15:00:52 <haleyb> #chair Swami Swami_ 15:00:52 <openstack> Warning: Nick not in channel: Swami 15:00:54 <openstack> Current chairs: Swami Swami_ haleyb 15:01:27 <haleyb> #topic Announcements 15:02:03 <haleyb> midcycle is next week, https://etherpad.openstack.org/p/newton-neutron-midcycle 15:02:49 <Swami_> we are already there at the mid-cycle for newton. 15:02:54 <Swami_> time runs fast 15:02:57 <haleyb> doesn't look like any of the normal participants here will be there 15:03:13 <haleyb> it's actually late being in N-3 15:03:26 <Swami_> haleyb: what do you think should be our priority for the mid-cylce. 15:03:32 <Swami_> Cleaning up the bug log. 15:04:08 <Swami_> Probably we should clean up all the 'ha' related bugs. 15:04:35 <haleyb> Swami_: yes, we need to get some of the bugs closed. Tracking down the multinode failures as well, i saw a trace today on one 15:04:57 <Swami_> haleyb: anything interesting on the multi-node failures. 15:06:10 <haleyb> Swami_: well, just a failure that i hadn't seen before, can talk about it in bugs or open discussion 15:06:16 <haleyb> #topic Bugs 15:06:32 <Swami_> haleyb: ok thanks 15:06:50 <Swami_> This week we had this gate failure bug 15:07:11 <Swami_> #link https://bugs.launchpad.net/neutron/+bug/1609540 15:07:11 <openstack> Launchpad bug 1609540 in neutron "Deleting csnat port fails due to no fixed ips" [Critical,In progress] - Assigned to Carl Baldwin (carl-baldwin) 15:07:40 <Swami_> A patch has been proposed as a work around and I think still we have not fixed the root issue why the fixed_ips are none. 15:07:53 <Swami_> #link https://review.openstack.org/350783 15:08:13 <haleyb> that patch merged 15:08:23 <Swami_> haleyb: Yes it merged. 15:09:04 <Swami_> The next one high in the list is #link https://bugs.launchpad.net/neutron/+bug/1597461 15:09:04 <openstack> Swami_: Error: Could not gather data from Launchpad for bug #1597461 (https://launchpad.net/bugs/1597461). The error has been logged 15:09:38 <Swami_> #link https://bugs.launchpad.net/neutron/+bug/1597461 15:09:45 <Swami_> reposting the link 15:09:48 <haleyb> yes, that is easy to reproduce 15:09:59 <Swami_> haleyb: did you find out the root cause. 15:10:37 <haleyb> jschwarz: ^^ can i drag you in here to talk about this, don't know if you had time yet 15:11:02 <haleyb> Swami_: i do not have a root cause yet 15:11:16 <Swami_> haleyb: ok, no problem. 15:12:32 <Swami_> The next one in the list is 15:12:37 <Swami_> #link https://bugs.launchpad.net/neutron/+bug/1606741 15:12:37 <openstack> Launchpad bug 1606741 in neutron "Metadata service for instances is unavailable when the l3-agent on the compute host is dvr_snat mode" [High,New] - Assigned to Zhixin Li (lizhixin) 15:13:07 <Swami_> This bug has a patch and I did see that you have reviewed this patch already. 15:13:12 <Swami_> Here is the patch link 15:13:26 <Swami_> #link https://review.openstack.org/352686 15:13:59 <haleyb> yes, that seems fixable, i had posted comments yesterday 15:14:37 <Swami_> I did see that the changes made in this patch is related to /l3/ha, so does this problem persist only when you have dvr_snat and ha enabled or irrespective of ha, it happens. 15:15:21 <haleyb> i think you need ha to hit that code 15:15:45 <Swami_> haleyb: Ok, then probably the bug description should be changed. 15:16:13 <Swami_> haleyb: Yes that patch seemed to be a simple fix. 15:16:30 <Swami_> haleyb: hopefully we should see a revision quick. 15:17:02 <Swami_> The next one is #link https://bugs.launchpad.net/neutron/+bug/1595043 15:17:02 <openstack> Launchpad bug 1595043 in neutron "Make DVR portbinding implementation useful for HA ports" [Medium,In progress] - Assigned to venkata anil (anil-venkata) 15:17:02 <haleyb> hope so 15:17:18 <Swami_> I think anilvenkata had a new patch. 15:17:22 <anilvenkata> Swami_, yes 15:17:30 <Swami_> #link https://review.openstack.org/324302 15:17:52 <anilvenkata> Swami_, I have abandon this patch 15:18:04 <Swami_> anilvenkata: thanks for considering the backport options and abandoning the old ones. 15:18:32 <anilvenkata> Swami_, need reviewers for my l2pop ha patch 15:18:35 <Swami_> anilvenkata: I hope this patch will not have any issues with backport. 15:18:40 <haleyb> https://review.openstack.org/#/c/255237is new patch 15:19:00 <haleyb> https://review.openstack.org/#/c/255237 15:19:09 <anilvenkata> Swami_, haleyb https://review.openstack.org/#/c/255237 yes this patch is there for a long time 15:19:21 <anilvenkata> Swami_, haleyb need reviewers for this patch 15:19:24 <Swami_> anilvenkata: yes got it. 15:19:53 <anilvenkata> Swami_, haleyb this patch also solves https://bugs.launchpad.net/neutron/+bug/1602614 15:19:53 <openstack> Launchpad bug 1602614 in neutron "DVR + L3 HA loss during failover is higher that it is expected" [Undecided,In progress] - Assigned to venkata anil (anil-venkata) 15:19:53 <Swami_> anilvenkata: will review it. 15:20:15 <anilvenkata> Swami_, haleyb thanks 15:20:56 <Swami_> anilvenkata: That was my next bug to discuss. Since you have already posted it here, it saves my time. 15:21:19 <anilvenkata> yes, that patch solves this bug also 15:21:39 <Swami_> There is another bug related to ha and vrrp. 15:21:43 <Swami_> #link https://bugs.launchpad.net/neutron/+bug/1602320 15:21:43 <openstack> Launchpad bug 1602320 in neutron "ha + distributed router: keepalived process kill vrrp child process" [Undecided,In progress] - Assigned to Dongcan Ye (hellochosen) 15:22:25 <Swami_> This has not been triaged yet and I did see jschwarz comment in there, that it is expected behavior, but we need to close the loop on this. 15:22:58 <haleyb> https://review.openstack.org/#/c/342730/ was sent out a couple of weeks ago 15:24:32 <Swami_> haleyb: thanks for the link 15:24:48 <haleyb> Swami_: i'll update the meeting wiki afterwards 15:24:53 <Swami_> ok. 15:25:23 <Swami_> #link https://bugs.launchpad.net/neutron/+bug/1596473 15:25:23 <openstack> Launchpad bug 1596473 in neutron "Packet loss with DVR and IPv6" [Undecided,Incomplete] 15:26:26 <Swami_> haleyb: I think this is incomplete, may be there is nothing to discuss ehre. 15:26:30 <Swami_> s/ehre/here 15:27:03 <haleyb> Right, submitter has not responded, and there's only so many things we can try and reproduce 15:27:17 <haleyb> i will close and hopefully get their at tention 15:27:31 <haleyb> or at least poke them again 15:27:31 <Swami_> haleyb: ok 15:27:40 <Swami_> The next one in the list is 15:27:43 <Swami_> #link https://bugs.launchpad.net/neutron/+bug/1506567 15:27:44 <openstack> Launchpad bug 1506567 in neutron "No information from Neutron Metering agent" [Undecided,New] 15:28:56 <Swami_> It seems there is a workaround posted there, may be we should look into it. 15:29:01 <Swami_> #link https://bugs.launchpad.net/neutron/+bug/1506567/comments/5 15:29:01 <openstack> Launchpad bug 1506567 in neutron "No information from Neutron Metering agent" [Undecided,New] 15:29:26 <haleyb> I think we talked about this last week too. It's a known issue that some of the agents don't know what namespace and/or interface to use when on a DVR compute node 15:29:38 <haleyb> RA has the same issue 15:30:03 <Swami_> haleyb: yes I remember talking about it. 15:30:33 <Swami_> #link https://bugs.launchpad.net/neutron/+bug/1599287 15:30:33 <openstack> Swami_: Error: Could not gather data from Launchpad for bug #1599287 (https://launchpad.net/bugs/1599287). The error has been logged 15:30:44 <Swami_> There is patch under review 15:30:47 <Swami_> #link https://review.openstack.org/337855 15:32:06 <Swami_> obondarev has some comments on this patch. 15:32:19 <haleyb> yes, but it is getting close 15:32:23 <Swami_> I will take a look at it and respond to his comments. 15:33:05 <Swami_> haleyb: obondarev's comment rings a bell, I need to check one more case, before I respond to his comments. 15:33:38 <Swami_> I will recheck it today and will repost a patch or will respond. 15:34:00 <haleyb> sounds good 15:34:08 <Swami_> One the fast-path-exit RFE patch, I do have the agent patch in good shape. 15:34:14 <Swami_> haleyb: can you take a look at it. 15:34:33 <Swami_> #link https://review.openstack.org/#/c/283757/ 15:34:47 <haleyb> i'll take a look 15:34:47 <Swami_> This would also help the service_type networks 15:35:09 <Swami_> This creates the fip namespace on all nodes, irrespective of the fip. 15:35:52 <Swami_> I think that's all I had for the bugs this week. 15:36:15 <haleyb> anyone else have bugs to discuss ? 15:37:00 <haleyb> #topic Gate failures 15:37:50 <haleyb> So the gate has been a mess overall, not exactly dvr's fault 15:37:52 <Swami_> haleyb: Is it getting better. 15:38:48 <haleyb> the dvr just started spiking again, about 5% failure now 15:38:54 <haleyb> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=5&fullscreen 15:39:25 <Swami_> looking at the graph 15:39:31 <haleyb> that just started earlier today, don't know what the issue is 15:40:20 <haleyb> The check queue has gotten better, but still showing increases - http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=8&fullscreen 15:40:26 <Swami_> haleyb: ok 15:40:38 <haleyb> of course that assumes every patch is perfect since a bug in a patch reflects in that 15:40:39 <Swami_> haleyb: this is going to be a never ending story. 15:40:54 <haleyb> groundhog day 15:41:35 <haleyb> http://logs.openstack.org/51/337851/19/check/gate-tempest-dsvm-neutron-dvr-multinode-full/c944b3d/logs/screen-q-dhcp.txt.gz#_2016-08-10_08_43_58_552 is something i noticed today in one of my patches, seems interesting 15:41:40 <Swami_> haleyb: agreed. 15:41:59 <haleyb> multinode dvr test, one VM failed dhcp, but it was due to agent not starting 15:42:30 <Swami_> haleyb: that is good. 15:43:00 <haleyb> if good is bad :) 15:43:24 <haleyb> it seems we should be able to debug it from the log (i hope) 15:43:51 <Swami_> haleyb: sure if it is obvious. 15:44:03 <haleyb> it never is, but i will look and see 15:44:38 <Swami_> haleyb: ping 15:45:10 <haleyb> sorry, had to talk in that other meeting, but failed 15:46:35 <Swami_> haleyb: yes I realized 15:46:55 <haleyb> i had nothing more on the gate today 15:47:09 <Swami_> haleyb: thanks 15:47:12 <haleyb> #topic Stable backports 15:47:45 <Swami_> #link https://review.openstack.org/#/c/351923/ 15:47:51 <haleyb> nothing in particular for stable, just keep doing backports 15:48:02 <Swami_> #link https://review.openstack.org/#/c/351947/ 15:48:06 <haleyb> I already +2'd that :) 15:48:17 <haleyb> any other stable backports that need attention 15:48:26 <Swami_> haleyb: I need another +2 for these patches. Can you ping ihar. 15:49:00 <Swami_> ok. 15:49:21 <Swami_> I need to backport this to liberty. 15:49:25 <Swami_> #link https://review.openstack.org/#/c/348372/6 15:49:49 <Swami_> but we have a dependency on #link https://review.openstack.org/#/c/351923/ 15:51:30 <haleyb> https://review.openstack.org/#/c/351947/1 first, then that, but yes, those need to go back 15:51:52 <Swami_> haleyb: yes 15:52:13 <haleyb> any others 15:52:29 <Swami_> haleyb: that's it. 15:52:51 <haleyb> #topic Open Discussion 15:53:13 <haleyb> Ok, let the tomatoes fly! :) 15:53:54 <Swami_> haleyb: I might need some help/guidance from you on creating the iproute chains for the floatingip namespace for fast path exit. 15:54:19 <haleyb> iptables ? 15:54:20 <Swami_> This might also help for the floatingip namespace static routes for nexthop. 15:54:54 <Swami_> Basically we have to add static routes for every tenant owned cidr in the fipnamespace. 15:55:14 <Swami_> We should figure out what is the best way to do this without affecting what he have today. 15:56:35 <haleyb> ok, i can help with that 15:56:40 <Swami_> I do have a patch right now that adds the static route, I will try to polish it a bit and will pull you in for review and you can provide your feedback. 15:56:56 <Swami_> #link https://review.openstack.org/#/c/297468/ 15:58:02 <haleyb> i'll take a look 15:58:02 <Swami_> haleyb: I wanted to have it working before the mid-cycle so that we can churn it out. But will see where it goes. 15:58:56 <Swami_> That's all I had for today. 15:59:27 <haleyb> Swami_: ok. i know you won't be there, but https://etherpad.openstack.org/p/newton-neutron-midcycle-workitems had a list of things to discuss at midcycle if you want to add it, maybe irc discussion 15:59:52 <Swami_> haleyb: sure will add it to the list. 16:00:07 <haleyb> we are out of time, keep fixing those bugs! :) 16:00:10 <haleyb> #endmeeting