15:00:36 <mlavalle> #startmeeting neutron_l3 15:00:37 <openstack> Meeting started Thu Oct 11 15:00:36 2018 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:40 <openstack> The meeting name has been set to 'neutron_l3' 15:00:49 <Swami> hi 15:00:54 <mlavalle> hi Swami 15:01:03 <haleyb> hi 15:01:09 <liuyulong> hello 15:01:18 <mlavalle> hi haleyb 15:01:24 <mlavalle> welcome liuyulong 15:01:42 <tidwellr> hi 15:02:20 <mlavalle> #topic Announcements 15:02:39 <mlavalle> Stein-1 is one week and a half away 15:03:00 <mlavalle> The week of October 22 - 26 15:03:51 <mlavalle> and the Summit in Berlin is not that far in the future, November 13 - 15 15:04:02 <mlavalle> anybody here attending? 15:05:16 <haleyb> not me 15:05:21 <mlavalle> I take that as a no 15:05:28 <mlavalle> so let's move on 15:05:34 <mlavalle> #topic Bugs 15:05:43 <mlavalle> Swami: please go ahead 15:05:58 <Swami> mlavalle: thanks 15:06:26 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1796491 15:06:26 <openstack> Launchpad bug 1796491 in neutron "DVR Floating IP setup in the SNAT namespace of the network node and also in the qrouter namespace in the compute node" [Undecided,New] 15:07:42 <Swami> I am currently looking into this bug. I was not able to reproduce this issue and also the code is in place to remove the fip cidr from the snat namespace after the migration. I have update the bug report with my findings and let me see in the mean time if I could reproduce it. I suspect that there may be some timing issue or a race that might cause this issue. 15:07:54 <Swami> The problem is reported in Pike and Queens. 15:08:12 <Swami> Have you guys seen this in the master 15:09:17 <haleyb> i haven't 15:09:39 <Swami> haleyb: ok thanks, I will see if I can reproduce it in the master, if not I will check on the Pike. 15:09:55 <Swami> The next one in the list is. 15:09:59 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1794991 15:09:59 <openstack> Launchpad bug 1794991 in neutron "Inconsistent flows with DVR l2pop VxLAN on br-tun" [Undecided,New] 15:10:36 <mlavalle> that reporter was involved in the previous bug as well 15:10:51 <Swami> It seems the l2pop vxlan flows are not getting populated properly in a multinode scenario. So they are having issues with DHCP connection and also VM to VM connection. 15:11:31 <mlavalle> also Pike 15:11:48 <Swami> Again they say it is not consistently reproduceable but happens. They also mentioned that there is no rpc-timeout's seen. 15:12:17 <Swami> I have seen such issue when there is an rpc_timeout where l2pop tries to fetch all the fdb_entries. 15:13:06 <Swami> Also I have requested for their ovs_vswitchd logs to see if the vxlan interfaces where created properly. But let us see. 15:13:44 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1786272 15:13:44 <openstack> Launchpad bug 1786272 in neutron "Connection between two virtual routers does not work with DVR" [Medium,In progress] - Assigned to Slawek Kaplonski (slaweq) 15:13:58 <slaweq> hi, sorry for being late 15:14:28 <Swami> Patch is in review: #link https://review.openstack.org/#/c/597567/ 15:14:55 <Swami> This patch is almost complete and I did see slaweq was addressing some last issues with the agent handling multiple routers. 15:15:21 <slaweq> yes, please take a look if that makes sense for You :) 15:15:45 <Swami> So nothing else to discuss in this bug. Ok will take a look at it again today. Also this bug brought out another bug with HA-DVR. 15:16:07 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1797037 15:16:07 <openstack> Launchpad bug 1797037 in neutron "Extra routes configured on routers are not set in the router namespace and snat namespace with DVR-HA routers" [Medium,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:16:41 <mlavalle> Swami: you alfready pushed a fix for that one, right? 15:17:04 <Swami> The extra routes are not configured in the router namespace when DVR routers are configured. I did check that the routes are only configured in the 'vrrp_conf' and the snat_namespace when HA is configured. 15:17:11 <Swami> Yes, I have a patch up for review. 15:17:33 <Swami> #link https://review.openstack.org/#/c/609273/ 15:17:41 <Swami> It needs another +2. 15:18:03 <mlavalle> I looked at it last night 15:18:11 <slaweq> I can only say that I tested it together with this patch for 2 routers connected together and it worked fine :) 15:18:18 <mlavalle> and had the same question as haleyb: https://review.openstack.org/#/c/609273/3/neutron/agent/l3/dvr_edge_ha_router.py@123 15:19:21 <Swami> mlavalle: Yes it seems that because of the inheritance from different classes we have to override to certain functions to call it properly. 15:19:32 <mlavalle> I even played with a unit test of DvrEdgeHaRouter 15:19:38 <Swami> mlavalle: Let me comment it in the patch. 15:19:42 <haleyb> yes, i don't quite understand that super(), but it's just a python thing... 15:20:14 <mlavalle> in my experimenting with the unit tests, the object seems to have the correct method 15:20:31 <mlavalle> update_routing_table 15:20:37 <Swami> mlavalle: ok 15:21:08 <mlavalle> it should come from DvrEdgeRouter, right? 15:21:23 <Swami> mlavalle: Yes it should come from DVrEdgeRouter 15:21:37 <haleyb> yes, HaRouter doesn't have it 15:21:44 <mlavalle> I can tell you that in the unit tests, it is getting the method from there 15:22:09 <mlavalle> but maybe in a deployment, and override is heppening somehow 15:22:32 <Swami> mlavalle: Let me check the 'DVREdgeHARouter' and will update if the 'update_routing_table' override is required or not? I will update you on this. 15:22:36 <mlavalle> I was planning to add some debug calls to the code in master in my deployment 15:22:58 <mlavalle> and let it run and see if we are hitting the correct method 15:23:04 <mlavalle> would that be helpful? 15:23:06 <Swami> mlavalle: I already have some debug calls added, I will test it again. 15:23:20 <mlavalle> ok, then I'll let you test 15:23:33 <mlavalle> I was only trying to help 15:23:40 <mlavalle> because it seems odd 15:23:54 <Swami> mlavalle: sure, 15:24:06 <mlavalle> and if we have an inheritance problem, it might be worth it finding out why 15:24:23 <Swami> mlavalle: yes, you are right. 15:24:39 <Swami> The next one is #link https://bugs.launchpad.net/neutron/+bug/1774459 15:24:39 <openstack> Launchpad bug 1774459 in neutron "Update permanent ARP entries for allowed_address_pair IPs in DVR Routers" [High,Confirmed] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:24:57 * mlavalle was going to leave a comment last night in the patch after dinner. but was tired and figured could be discussed with Swami during today's meeting 15:25:20 <Swami> mlavalle: no problem. 15:25:22 <Swami> #link https://review.openstack.org/601336 15:25:46 <mlavalle> ahh nice! 15:26:09 <Swami> Here is the patch. I have a question on the Ryu controller for adding in the IN_Packet handler. I left a question in there, if anyone can take a look and comment on it, I can proceed with my work. 15:26:39 <mlavalle> I'll take a look later 15:26:39 <Swami> I had a question related to 'registration' of the in_packet handler call. 15:26:43 <Swami> mlavalle: thanks 15:26:59 <mlavalle> if I can't help, I'll ping ajo 15:27:45 <Swami> mlavalle: Sure thanks that would help. 15:27:50 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1795222 15:27:50 <openstack> Launchpad bug 1795222 in neutron "[l3] router agent side gateway IP not changed if directly change IP address" [Medium,In progress] - Assigned to LIU Yulong (dragon889) 15:28:08 <Swami> #link https://review.openstack.org/606876 15:28:13 <Swami> There is a patch up for review. 15:29:10 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1785227 15:29:10 <openstack> Launchpad bug 1785227 in neutron "Router port: no dataplane update on change" [Medium,Confirmed] 15:29:39 <Swami> I think these two bugs are kind of related. But I need to recheck once again before marking it as duplicate. I will check it out today. 15:30:10 * mlavalle will review the patch^^^^ today 15:30:18 <Swami> mlavalle: that's all I had for bugs today. Back to you. 15:30:29 <slaweq> I have one more thing 15:30:38 <slaweq> there is bug https://bugs.launchpad.net/neutron/+bug/1796703 15:30:38 <openstack> Launchpad bug 1796703 in neutron "HA router interfaces in standby state" [Undecided,New] 15:30:56 <slaweq> which I think should be checked by some L3 experts 15:31:07 <slaweq> so please take a look at this one if You can :) 15:31:28 <liuyulong> Pike again 15:31:48 <Swami> slaweq: ok I will take a look at it. 15:31:53 <slaweq> thx Swami 15:32:27 <mlavalle> Next bug I have is https://bugs.launchpad.net/neutron/+bug/1789434 15:32:27 <openstack> Launchpad bug 1789434 in neutron "neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA failing 100% times" [High,Confirmed] - Assigned to Manjeet Singh Bhatia (manjeet-s-bhatia) 15:32:45 <mlavalle> This one was assigned to manjeets.... any progress? 15:33:04 <manjeets> mlavalle, I've sent update email. 15:33:13 <manjeets> my finding is it even exists in rocky 15:33:48 <manjeets> now looking onto dvr_sync and sync_ha_state to compare if ha case is missing any notification for port updates 15:34:51 <mlavalle> thanks for the update :-) 15:35:30 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1791989 15:35:30 <openstack> Launchpad bug 1791989 in neutron "grenade-dvr-multinode job fails" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:35:59 <mlavalle> slaweq found the cause for this one 15:36:03 <mlavalle> \o/ 15:36:20 <mlavalle> and we have a patch to make the job voting again: https://review.openstack.org/#/c/609437/ 15:36:48 <slaweq> yes :) in patch description there is explained what fixed that 15:37:10 <mlavalle> haleyb: take a look when you have some time^^^^ 15:37:18 <haleyb> and i had created a related patch as well to add permanent ARP entries for the veth pair IPs, just an optimization, https://review.openstack.org/#/c/607685/ 15:37:50 <haleyb> will look, see zuul is now happy 15:38:08 <mlavalle> haleyb: the patch you pushed is good for review? 15:39:00 <haleyb> mlavalle: yes, should be good. was just something we noticed during debug 15:39:51 <mlavalle> cool, I'll take a look later today 15:40:01 <Swami> haleyb: is that permanent arp entry there a requirement. 15:40:02 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1787919 15:40:02 <openstack> Launchpad bug 1787919 in neutron "Upgrade router to L3 HA broke IPv6" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 15:40:14 <mlavalle> #undo 15:40:15 <openstack> Removing item from minutes: #link https://bugs.launchpad.net/neutron/+bug/1785227 15:41:16 <haleyb> Swami: no, we had just seen STALE arp entries, and since we know the IP/MAC just felt we could add the corresponding entries 15:41:19 <Swami> haleyb: My question is it failing just because of the missing ARP entries and is the ARP entries failing. 15:41:48 <Swami> haleyb: Ok thanks. 15:41:51 <haleyb> it wound-up being unrelated, we thought it was initially. 15:42:12 <mlavalle> ok next one is https://bugs.launchpad.net/neutron/+bug/1787919 15:42:12 <openstack> Launchpad bug 1787919 in neutron "Upgrade router to L3 HA broke IPv6" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 15:42:33 <mlavalle> For this one I have the environment to reproduce it 15:42:50 <mlavalle> I have also setup similar conditions as the ones reported in the bug 15:42:56 <mlavalle> debugging at this moment 15:43:34 <mlavalle> and that's all I have for today 15:43:42 <mlavalle> #topic Open Agenda 15:43:51 <mlavalle> anything we should discuss today? 15:44:00 <xubozhang> hi 15:44:11 <xubozhang> i got some errors in unit testing 15:44:11 <mlavalle> hi xubozhang 15:44:23 <xubozhang> not sure how to debug them 15:44:36 <Swami> mlavalle: haleyb: I had one question. 15:44:50 <mlavalle> xubozhang: have the patch url handy? 15:45:11 <xubozhang> the patch still has too many issues 15:45:35 <xubozhang> i got some assertions failed 15:45:55 <mlavalle> xubozhang: do you know how to run the unit tests with debugger enabled? 15:46:05 <xubozhang> nope 15:46:28 <mlavalle> give me the url of the patch and I'll leave you a comment with instructions there 15:46:36 <xubozhang> i tried tox -v -e py35 15:47:26 <mlavalle> Swami: go ahead 15:47:48 <xubozhang> https://review.openstack.org/#/c/528336/ 15:47:59 <mlavalle> xubozhang: ack. will leave a comment there 15:48:05 <Swami> haleyb: i posted a question in your IRC room. If the external_process monitor is monitoring a process, how do we kill that monitoring action. In this case ip_monitor, without rebooting the l3_agent. 15:48:05 <xubozhang> thanks! 15:49:05 <Swami> Somehow in one of our setup the PID of the ip_monitor is removed and also the snat_namespace is removed. But the external_process monitor constantly tried to restart the ip_monitor. 15:49:15 <haleyb> Swami: i hadn't seen it, sometimes irc doesn't create a chat tab and i miss those things 15:49:36 <Swami> haleyb: no problem. I posted in neutron channel on thursday the same question. 15:49:42 * mlavalle thought haleyb was ignoring his comments about the Red Sox 15:50:00 <haleyb> "Is there a way to stop the External Process monitor from monitoring a service with rebooting the l3-agent. In this case it is the ip-monitor process running in snat-namespace for HA." 15:50:07 <haleyb> Swami: that was it, right? 15:50:12 <Swami> haleyb: yes 15:50:31 <Swami> haleyb: there is a typo there, without rebooting the l3-agent. 15:51:18 <haleyb> ack, let me copy into the chat 15:51:45 <haleyb> mlavalle: and i wouldn't ignore Red Sox banter either :) 15:51:52 <Swami> You can reproduce this when you try to delete the snat-namespace and PID of ip_monitor when HA is configured. The logs are filled with ip_monitor trying to restart. 15:52:27 <Swami> haleyb: you can check it and let me know. 15:52:29 <Swami> Thanks 15:52:37 <Swami> mlavalle: that's all I had. 15:52:38 <haleyb> Swami: we should file a bug too 15:52:56 * mlavalle feels relieved haleyb is not ignoring him 15:53:23 <haleyb> moving from office to home always messes with my irc :( 15:53:57 <mlavalle> ok team, thanks for attending 15:54:01 <tidwellr> I don't want to take much time, but wanted to give a quick update on DVR-aware neutron-dynamic-routing and point folks at https://review.openstack.org/#/c/581098/ 15:54:03 <Swami> haleyb: Sure I will file a bug. 15:54:14 <mlavalle> tidwellr: goa head 15:54:31 <mlavalle> tidwellr: asking for reviews? 15:55:03 <tidwellr> yes, particularly with the advent multiple port bindings 15:55:17 <mlavalle> ok added to my pile 15:55:21 <tidwellr> not sure that is being handled correctly 15:55:33 <tidwellr> so your review would be helpful 15:56:14 <mlavalle> I should also mention that tidwellr has volunteered to add the ability to remove ip ranges from subnet pools 15:56:51 <mlavalle> so that revert patch in openstack client should probably be stopped 15:57:05 <tidwellr> well, maybe not 15:57:07 <Swami> tidwellr: will take a look. 15:57:27 <tidwellr> I'm envisioning some API changes that the client will need to be aware of 15:57:50 <mlavalle> so changes will probably still be needed in the client 15:57:55 <tidwellr> yes 15:58:07 <mlavalle> let's just hold off on the revert 15:58:11 <mlavalle> for the time being 15:58:48 <mlavalle> thanks for the update tidwellr ! 15:59:06 <mlavalle> have a great weekend y'all! 15:59:13 <mlavalle> #endmeeting