15:00:36 #startmeeting neutron_l3 15:00:37 Meeting started Thu Oct 11 15:00:36 2018 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:38 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:40 The meeting name has been set to 'neutron_l3' 15:00:49 hi 15:00:54 hi Swami 15:01:03 hi 15:01:09 hello 15:01:18 hi haleyb 15:01:24 welcome liuyulong 15:01:42 hi 15:02:20 #topic Announcements 15:02:39 Stein-1 is one week and a half away 15:03:00 The week of October 22 - 26 15:03:51 and the Summit in Berlin is not that far in the future, November 13 - 15 15:04:02 anybody here attending? 15:05:16 not me 15:05:21 I take that as a no 15:05:28 so let's move on 15:05:34 #topic Bugs 15:05:43 Swami: please go ahead 15:05:58 mlavalle: thanks 15:06:26 #link https://bugs.launchpad.net/neutron/+bug/1796491 15:06:26 Launchpad bug 1796491 in neutron "DVR Floating IP setup in the SNAT namespace of the network node and also in the qrouter namespace in the compute node" [Undecided,New] 15:07:42 I am currently looking into this bug. I was not able to reproduce this issue and also the code is in place to remove the fip cidr from the snat namespace after the migration. I have update the bug report with my findings and let me see in the mean time if I could reproduce it. I suspect that there may be some timing issue or a race that might cause this issue. 15:07:54 The problem is reported in Pike and Queens. 15:08:12 Have you guys seen this in the master 15:09:17 i haven't 15:09:39 haleyb: ok thanks, I will see if I can reproduce it in the master, if not I will check on the Pike. 15:09:55 The next one in the list is. 15:09:59 #link https://bugs.launchpad.net/neutron/+bug/1794991 15:09:59 Launchpad bug 1794991 in neutron "Inconsistent flows with DVR l2pop VxLAN on br-tun" [Undecided,New] 15:10:36 that reporter was involved in the previous bug as well 15:10:51 It seems the l2pop vxlan flows are not getting populated properly in a multinode scenario. So they are having issues with DHCP connection and also VM to VM connection. 15:11:31 also Pike 15:11:48 Again they say it is not consistently reproduceable but happens. They also mentioned that there is no rpc-timeout's seen. 15:12:17 I have seen such issue when there is an rpc_timeout where l2pop tries to fetch all the fdb_entries. 15:13:06 Also I have requested for their ovs_vswitchd logs to see if the vxlan interfaces where created properly. But let us see. 15:13:44 #link https://bugs.launchpad.net/neutron/+bug/1786272 15:13:44 Launchpad bug 1786272 in neutron "Connection between two virtual routers does not work with DVR" [Medium,In progress] - Assigned to Slawek Kaplonski (slaweq) 15:13:58 hi, sorry for being late 15:14:28 Patch is in review: #link https://review.openstack.org/#/c/597567/ 15:14:55 This patch is almost complete and I did see slaweq was addressing some last issues with the agent handling multiple routers. 15:15:21 yes, please take a look if that makes sense for You :) 15:15:45 So nothing else to discuss in this bug. Ok will take a look at it again today. Also this bug brought out another bug with HA-DVR. 15:16:07 #link https://bugs.launchpad.net/neutron/+bug/1797037 15:16:07 Launchpad bug 1797037 in neutron "Extra routes configured on routers are not set in the router namespace and snat namespace with DVR-HA routers" [Medium,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:16:41 Swami: you alfready pushed a fix for that one, right? 15:17:04 The extra routes are not configured in the router namespace when DVR routers are configured. I did check that the routes are only configured in the 'vrrp_conf' and the snat_namespace when HA is configured. 15:17:11 Yes, I have a patch up for review. 15:17:33 #link https://review.openstack.org/#/c/609273/ 15:17:41 It needs another +2. 15:18:03 I looked at it last night 15:18:11 I can only say that I tested it together with this patch for 2 routers connected together and it worked fine :) 15:18:18 and had the same question as haleyb: https://review.openstack.org/#/c/609273/3/neutron/agent/l3/dvr_edge_ha_router.py@123 15:19:21 mlavalle: Yes it seems that because of the inheritance from different classes we have to override to certain functions to call it properly. 15:19:32 I even played with a unit test of DvrEdgeHaRouter 15:19:38 mlavalle: Let me comment it in the patch. 15:19:42 yes, i don't quite understand that super(), but it's just a python thing... 15:20:14 in my experimenting with the unit tests, the object seems to have the correct method 15:20:31 update_routing_table 15:20:37 mlavalle: ok 15:21:08 it should come from DvrEdgeRouter, right? 15:21:23 mlavalle: Yes it should come from DVrEdgeRouter 15:21:37 yes, HaRouter doesn't have it 15:21:44 I can tell you that in the unit tests, it is getting the method from there 15:22:09 but maybe in a deployment, and override is heppening somehow 15:22:32 mlavalle: Let me check the 'DVREdgeHARouter' and will update if the 'update_routing_table' override is required or not? I will update you on this. 15:22:36 I was planning to add some debug calls to the code in master in my deployment 15:22:58 and let it run and see if we are hitting the correct method 15:23:04 would that be helpful? 15:23:06 mlavalle: I already have some debug calls added, I will test it again. 15:23:20 ok, then I'll let you test 15:23:33 I was only trying to help 15:23:40 because it seems odd 15:23:54 mlavalle: sure, 15:24:06 and if we have an inheritance problem, it might be worth it finding out why 15:24:23 mlavalle: yes, you are right. 15:24:39 The next one is #link https://bugs.launchpad.net/neutron/+bug/1774459 15:24:39 Launchpad bug 1774459 in neutron "Update permanent ARP entries for allowed_address_pair IPs in DVR Routers" [High,Confirmed] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:24:57 * mlavalle was going to leave a comment last night in the patch after dinner. but was tired and figured could be discussed with Swami during today's meeting 15:25:20 mlavalle: no problem. 15:25:22 #link https://review.openstack.org/601336 15:25:46 ahh nice! 15:26:09 Here is the patch. I have a question on the Ryu controller for adding in the IN_Packet handler. I left a question in there, if anyone can take a look and comment on it, I can proceed with my work. 15:26:39 I'll take a look later 15:26:39 I had a question related to 'registration' of the in_packet handler call. 15:26:43 mlavalle: thanks 15:26:59 if I can't help, I'll ping ajo 15:27:45 mlavalle: Sure thanks that would help. 15:27:50 #link https://bugs.launchpad.net/neutron/+bug/1795222 15:27:50 Launchpad bug 1795222 in neutron "[l3] router agent side gateway IP not changed if directly change IP address" [Medium,In progress] - Assigned to LIU Yulong (dragon889) 15:28:08 #link https://review.openstack.org/606876 15:28:13 There is a patch up for review. 15:29:10 #link https://bugs.launchpad.net/neutron/+bug/1785227 15:29:10 Launchpad bug 1785227 in neutron "Router port: no dataplane update on change" [Medium,Confirmed] 15:29:39 I think these two bugs are kind of related. But I need to recheck once again before marking it as duplicate. I will check it out today. 15:30:10 * mlavalle will review the patch^^^^ today 15:30:18 mlavalle: that's all I had for bugs today. Back to you. 15:30:29 I have one more thing 15:30:38 there is bug https://bugs.launchpad.net/neutron/+bug/1796703 15:30:38 Launchpad bug 1796703 in neutron "HA router interfaces in standby state" [Undecided,New] 15:30:56 which I think should be checked by some L3 experts 15:31:07 so please take a look at this one if You can :) 15:31:28 Pike again 15:31:48 slaweq: ok I will take a look at it. 15:31:53 thx Swami 15:32:27 Next bug I have is https://bugs.launchpad.net/neutron/+bug/1789434 15:32:27 Launchpad bug 1789434 in neutron "neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA failing 100% times" [High,Confirmed] - Assigned to Manjeet Singh Bhatia (manjeet-s-bhatia) 15:32:45 This one was assigned to manjeets.... any progress? 15:33:04 mlavalle, I've sent update email. 15:33:13 my finding is it even exists in rocky 15:33:48 now looking onto dvr_sync and sync_ha_state to compare if ha case is missing any notification for port updates 15:34:51 thanks for the update :-) 15:35:30 Next one is https://bugs.launchpad.net/neutron/+bug/1791989 15:35:30 Launchpad bug 1791989 in neutron "grenade-dvr-multinode job fails" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:35:59 slaweq found the cause for this one 15:36:03 \o/ 15:36:20 and we have a patch to make the job voting again: https://review.openstack.org/#/c/609437/ 15:36:48 yes :) in patch description there is explained what fixed that 15:37:10 haleyb: take a look when you have some time^^^^ 15:37:18 and i had created a related patch as well to add permanent ARP entries for the veth pair IPs, just an optimization, https://review.openstack.org/#/c/607685/ 15:37:50 will look, see zuul is now happy 15:38:08 haleyb: the patch you pushed is good for review? 15:39:00 mlavalle: yes, should be good. was just something we noticed during debug 15:39:51 cool, I'll take a look later today 15:40:01 haleyb: is that permanent arp entry there a requirement. 15:40:02 Next one is https://bugs.launchpad.net/neutron/+bug/1787919 15:40:02 Launchpad bug 1787919 in neutron "Upgrade router to L3 HA broke IPv6" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 15:40:14 #undo 15:40:15 Removing item from minutes: #link https://bugs.launchpad.net/neutron/+bug/1785227 15:41:16 Swami: no, we had just seen STALE arp entries, and since we know the IP/MAC just felt we could add the corresponding entries 15:41:19 haleyb: My question is it failing just because of the missing ARP entries and is the ARP entries failing. 15:41:48 haleyb: Ok thanks. 15:41:51 it wound-up being unrelated, we thought it was initially. 15:42:12 ok next one is https://bugs.launchpad.net/neutron/+bug/1787919 15:42:12 Launchpad bug 1787919 in neutron "Upgrade router to L3 HA broke IPv6" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 15:42:33 For this one I have the environment to reproduce it 15:42:50 I have also setup similar conditions as the ones reported in the bug 15:42:56 debugging at this moment 15:43:34 and that's all I have for today 15:43:42 #topic Open Agenda 15:43:51 anything we should discuss today? 15:44:00 hi 15:44:11 i got some errors in unit testing 15:44:11 hi xubozhang 15:44:23 not sure how to debug them 15:44:36 mlavalle: haleyb: I had one question. 15:44:50 xubozhang: have the patch url handy? 15:45:11 the patch still has too many issues 15:45:35 i got some assertions failed 15:45:55 xubozhang: do you know how to run the unit tests with debugger enabled? 15:46:05 nope 15:46:28 give me the url of the patch and I'll leave you a comment with instructions there 15:46:36 i tried tox -v -e py35 15:47:26 Swami: go ahead 15:47:48 https://review.openstack.org/#/c/528336/ 15:47:59 xubozhang: ack. will leave a comment there 15:48:05 haleyb: i posted a question in your IRC room. If the external_process monitor is monitoring a process, how do we kill that monitoring action. In this case ip_monitor, without rebooting the l3_agent. 15:48:05 thanks! 15:49:05 Somehow in one of our setup the PID of the ip_monitor is removed and also the snat_namespace is removed. But the external_process monitor constantly tried to restart the ip_monitor. 15:49:15 Swami: i hadn't seen it, sometimes irc doesn't create a chat tab and i miss those things 15:49:36 haleyb: no problem. I posted in neutron channel on thursday the same question. 15:49:42 * mlavalle thought haleyb was ignoring his comments about the Red Sox 15:50:00 "Is there a way to stop the External Process monitor from monitoring a service with rebooting the l3-agent. In this case it is the ip-monitor process running in snat-namespace for HA." 15:50:07 Swami: that was it, right? 15:50:12 haleyb: yes 15:50:31 haleyb: there is a typo there, without rebooting the l3-agent. 15:51:18 ack, let me copy into the chat 15:51:45 mlavalle: and i wouldn't ignore Red Sox banter either :) 15:51:52 You can reproduce this when you try to delete the snat-namespace and PID of ip_monitor when HA is configured. The logs are filled with ip_monitor trying to restart. 15:52:27 haleyb: you can check it and let me know. 15:52:29 Thanks 15:52:37 mlavalle: that's all I had. 15:52:38 Swami: we should file a bug too 15:52:56 * mlavalle feels relieved haleyb is not ignoring him 15:53:23 moving from office to home always messes with my irc :( 15:53:57 ok team, thanks for attending 15:54:01 I don't want to take much time, but wanted to give a quick update on DVR-aware neutron-dynamic-routing and point folks at https://review.openstack.org/#/c/581098/ 15:54:03 haleyb: Sure I will file a bug. 15:54:14 tidwellr: goa head 15:54:31 tidwellr: asking for reviews? 15:55:03 yes, particularly with the advent multiple port bindings 15:55:17 ok added to my pile 15:55:21 not sure that is being handled correctly 15:55:33 so your review would be helpful 15:56:14 I should also mention that tidwellr has volunteered to add the ability to remove ip ranges from subnet pools 15:56:51 so that revert patch in openstack client should probably be stopped 15:57:05 well, maybe not 15:57:07 tidwellr: will take a look. 15:57:27 I'm envisioning some API changes that the client will need to be aware of 15:57:50 so changes will probably still be needed in the client 15:57:55 yes 15:58:07 let's just hold off on the revert 15:58:11 for the time being 15:58:48 thanks for the update tidwellr ! 15:59:06 have a great weekend y'all! 15:59:13 #endmeeting