14:00:54 <mlavalle> #startmeeting neutron_l3 14:00:55 <openstack> Meeting started Wed May 15 14:00:54 2019 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:56 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:58 <openstack> The meeting name has been set to 'neutron_l3' 14:01:13 <tidwellr> o/ 14:01:20 <haleyb> o/ 14:01:41 <panda> o/ (partially - in another meeting) 14:02:39 <mlavalle> hey panda, I was wondering about you yesterday. Nice to see you here :-) 14:02:57 <mlavalle> good morning haleyb, tidwellr 14:03:07 <ralonsoh> hi 14:03:14 <mlavalle> hey ralonsoh 14:03:28 <tidwellr> mlavalle: happy wednesday! 14:05:16 <mlavalle> ok, let's get going 14:05:42 <mlavalle> #topic Announcements 14:06:11 <mlavalle> We are on our way to the T-1 milestone, June 3 - 7 14:06:19 <mlavalle> #link https://releases.openstack.org/train/schedule.html 14:07:41 <mlavalle> These are our photos from the recent PTG: 14:07:43 <njohnston> o/ 14:07:46 <mlavalle> #link https://www.dropbox.com/sh/fydqjehy9h5y728/AAC1gIc5bJwwNd5JkcQ6Pqtra/Neutron?dl=0&subfolder_nav_tracking=1 14:08:28 <mlavalle> everybody handsome as usual 14:08:45 <mlavalle> especially in the one with the props 14:09:33 <mlavalle> any other announcements from the team? 14:10:08 <mlavalle> ok, let's move on 14:10:40 <mlavalle> #topic Bugs 14:11:27 <mlavalle> First, we have a critical issue https://bugs.launchpad.net/neutron/+bug/1824571 14:11:28 <openstack> Launchpad bug 1824571 in neutron "l3agent can't create router if there are multiple external networks" [Critical,Confirmed] - Assigned to Miguel Lavalle (minsel) 14:11:52 <mlavalle> it was recently promoted to critical by slaweq 14:12:07 <mlavalle> so I better hurry up with a fix for this 14:12:52 <mlavalle> I have an environment ready to reproduce the issue 14:14:26 <mlavalle> Next bug is https://bugs.launchpad.net/neutron/+bug/1774459 14:14:28 <openstack> Launchpad bug 1774459 in neutron "Update permanent ARP entries for allowed_address_pair IPs in DVR Routers" [High,Confirmed] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 14:15:33 <mlavalle> We didn't have a chance to discuss this issue in Denver 14:15:51 <mlavalle> couldn't reach Swami over google hangouts 14:16:05 <mlavalle> but it seems he is making progress: https://review.opendev.org/#/c/651905/ 14:16:09 <tidwellr> before he left Denver he wanted me to just mention he needs reviews 14:17:05 <mlavalle> in the commit message he indicates it is related to the bug 14:17:13 <mlavalle> are there more patches coming? 14:17:15 <tidwellr> there's also https://review.opendev.org/#/c/616272/ that I think is also related 14:18:26 <tidwellr> and maybe this one too https://review.opendev.org/#/c/601336/ 14:19:22 <mlavalle> yes, these two indicate also in the commit message that are related 14:19:37 * mlavalle leaving a note in the bug pointing to these 2 patches ^^^^ 14:21:42 <mlavalle> Last one I have today is https://bugs.launchpad.net/neutron/+bug/1823038 14:21:43 <openstack> Launchpad bug 1823038 in neutron "Neutron-keepalived-state-change fails to check initial router state" [High,Confirmed] 14:22:13 <mlavalle> which seems to have already fixed 14:22:55 <ralonsoh> not yet 14:23:03 <ralonsoh> I'm going to propose a patch for it 14:23:16 <ralonsoh> the agent is now run under neutron-rootwrap 14:23:21 <ralonsoh> and privsep is failing 14:23:46 <ralonsoh> so I'm removing this new code added and keep only the privsep initialization 14:24:09 <mlavalle> can I assign it to you? 14:24:19 <ralonsoh> I'm just helping Slawek 14:24:29 <ralonsoh> he knows the status of this patch 14:24:48 <mlavalle> ah ok 14:25:06 <ralonsoh> that's all 14:25:10 <mlavalle> thanks for the update :) 14:25:39 <mlavalle> any other bugs we should discuss today? 14:26:24 <liuyulong> One more https://bugs.launchpad.net/neutron/+bug/1821912 14:26:25 <openstack> Launchpad bug 1821912 in neutron "intermittent ssh failures in various scenario tests" [High,In progress] - Assigned to LIU Yulong (dragon889) 14:26:56 <liuyulong> Seems we hit this more more frequently 14:27:54 <mlavalle> is it a work around? 14:27:58 <liuyulong> I have two direction of repair 14:28:20 <liuyulong> one: https://review.opendev.org/#/c/659009/ wait until the floating IP is active 14:28:38 <mlavalle> is this a work around? ^^^^ 14:29:27 <liuyulong> It can be one work around 14:29:42 <liuyulong> the another is we can not rely on that nova DB instance status 14:30:03 <liuyulong> every time the guest OS is booting, then the test case is trying to login 14:30:52 <liuyulong> So I wonder if we can ping the fixed IP first, then try to login it 14:31:36 <liuyulong> But seems the tempest now does not allow that 14:31:41 <mlavalle> so what you are saying is that we don't have an underlying connectivity / authentication problem, but rather a testing problem? 14:32:06 <liuyulong> I have tried both in this https://review.opendev.org/659009, but now revert back to only have "waiting for floating IP status" 14:33:15 <liuyulong> In the recent merged patch, they all have a lot of "recheck", maybe we can increase this bug level too. 14:33:48 <mlavalle> critical? 14:34:57 <liuyulong> Not entirely, Slawek mentioned that nova metadata may have something wrong. 14:35:17 <mlavalle> ok, a combination of causes 14:35:47 <tidwellr> liuyulong: just thinking out loud and maybe it's crazy, but I wonder if there's way to set a static route on the host that would allow us to reach the fixed IP 14:37:51 <liuyulong> mlavalle, Slawek and I are now aiming to different direction 14:38:29 <liuyulong> I also noticed that l3-agent may have a really long router processing time, 40s+ in some cases. 14:38:32 <mlavalle> liuyulong: sounds good to. I am also adding a 3rd direction: tcpdump in the namespace 14:39:13 <liuyulong> tidwellr, I'm not quite sure, but if tempest can only reach the API, the route may not work. 14:39:55 <tidwellr> yep, it's just a thought 14:40:58 <liuyulong> This is really a tough one... 14:41:05 <mlavalle> yes it is 14:41:47 <mlavalle> anything else on this bug? 14:42:01 <liuyulong> not from me 14:42:18 <mlavalle> thanks for the update :-) 14:42:23 <mlavalle> any other bugs? 14:43:19 <haleyb> mlavalle: there's one in the open agenda section, but if we're in a buggy mood we can discuss now 14:43:30 <mlavalle> shoot 14:43:40 <haleyb> https://bugs.launchpad.net/neutron/+bug/1818824 14:43:42 <openstack> Launchpad bug 1818824 in neutron "When a fip is added to a vm with dvr, previous connections loss the connectivity" [Low,In progress] - Assigned to Gabriele Cerami (gcerami) 14:44:03 <tidwellr> I saw the chatter about this in IRC yesterday 14:44:27 <haleyb> In short, there is a difference between DVR/centralized here 14:44:52 <panda> I tried to lay down some solution, but a behavioural decision has to be made before 14:45:31 <haleyb> if an instance is using the default snat IP and a floating is associated, should we be deleting the conntrack entries for the existing connections 14:46:32 <haleyb> i'm inclined to think we should always be cleaning them, since the instance should start using the floating IP 14:46:34 <tidwellr> is there a concrete example of a workload in a VM that is affected by this? 14:47:50 <haleyb> i don't think so. it's an edge case since in order to trigger something in an instance you need a floating to get in first (or login from another instance on the private network) 14:48:15 <liuyulong> haleyb, +1, yes, it should stop the previous connection to save the SNAT node bandwidth. 14:48:42 <mlavalle> liuyulong always bringing the operator perspective 14:48:47 <mlavalle> nice 14:48:51 <tidwellr> I'm inclined to agree, once the FIP is associated force all traffic to use it 14:49:56 <haleyb> liuyulong: it doesn't happen with centralized routing today, you can have a connection continue to use the snat IP until it closes. DVR "breaks" it simply because it forces everything into the fip namespace where it dies 14:51:24 <mlavalle> I agree with haleyb and tidwellr 14:51:30 <haleyb> so it seems we agree the conntrack entries should have been cleaned. i think if we make that change soon-ish we'll be able to get some feedback if we break something during the T cycle 14:51:45 <mlavalle> yes! 14:51:50 <mlavalle> ]the sooner the better 14:51:53 <tidwellr> +1 14:51:55 <panda> for bot DVR and non DVR scnarios ? 14:52:08 <haleyb> and i don't think we documented the behavior, so we should do that too 14:52:32 <panda> in DVR currently the connections just starve, they are not closed 14:52:49 <haleyb> panda: yes, both. with dvr it's essentially cleaned by the routing change, right? 14:53:14 <haleyb> as you say starved since the connection is broken 14:53:45 <panda> haleyb: i'ts not cleaned at all, the package try to follow the new route but they just die somewhere, so the connection clears for the timeout 14:53:58 <panda> I'm trying to understand if the need to be explicitly closed instead 14:54:12 <liuyulong> Could be a bug for centralized router, since we never test that. 14:55:09 <mlavalle> I'd say explicitely close it 14:55:16 <liuyulong> For dvr with centralized floating IPs, what's the hehavior now? 14:55:16 <haleyb> panda: right, but removing the stale conntrack entries would make the connection fail quickly and not timeout slowly 14:55:47 <mlavalle> good point 14:55:54 <haleyb> liuyulong: that's a good question, don't know 14:56:03 <liuyulong> previous connection may stay, IMO 14:58:15 <panda> liuyulong: and have a different behaviour for the two scenarios ? I think the idea here was to look for consistency 14:59:29 <panda> my personal preference is to try and maintain the old connection, but just because I found it a good entry point to experiment and learn the code :) 14:59:46 <haleyb> if we had a floating IP assigned and it got removed, conntrack gets cleaned-up, i think we should treat the default snat IP similarly - the (dis)association event flips which is used 14:59:47 <mlavalle> we are running out of time 15:00:00 <mlavalle> I lean towards consitency of behavior 15:00:13 <mlavalle> #endmeeting