14:02:52 <mlavalle> #startmeeting neutron_l3 14:02:53 <openstack> Meeting started Wed Mar 6 14:02:52 2019 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:02:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:02:56 <openstack> The meeting name has been set to 'neutron_l3' 14:03:12 <haleyb> hi 14:03:15 <mlavalle> hi 14:03:24 <slaweq> hi 14:03:29 <ralonsoh> hi 14:03:33 <liuyulong> hi 14:03:38 <mlavalle> sorry for the delay. dog was not very cooperative during our walk. lots of stops to sniff 14:03:52 <slaweq> LOL 14:04:33 <njohnston_> o/ 14:04:37 <davidsha> o/ 14:04:38 <mlavalle> #topic Announcements 14:04:41 <haleyb> so a dog being a dog? 14:04:51 <mlavalle> more or less 14:04:58 <mlavalle> LOL 14:05:25 <mlavalle> We all now this is the week of Stein-3 milestone 14:05:40 <mlavalle> so we are at the end of the cycle 14:06:44 <mlavalle> I see messages in my inbox about PTL non candidacies and candidacies.... so we are in PTL candidacy season 14:07:06 <mlavalle> any other announcements from the team? 14:07:37 <mlavalle> ok, let's move on 14:07:47 <mlavalle> ahhh, forgot 14:07:59 <mlavalle> #chair liuyulong 14:08:00 <openstack> Current chairs: liuyulong mlavalle 14:08:26 <mlavalle> liuyulong: just let me know when you are ready to run this meeting.... 14:08:38 <mlavalle> #topic Bugs 14:09:01 <liuyulong> Lots of bug I'm working on now, : ) 14:09:14 <mlavalle> First bug for today is https://bugs.launchpad.net/neutron/+bug/1818334 14:09:15 <openstack> Launchpad bug 1818334 in neutron "Functional test test_concurrent_create_port_forwarding_update_port is failing" [Medium,Confirmed] 14:09:32 <mlavalle> we discussed this one yesterday during the CI meeting 14:09:42 <mlavalle> liuyulong: are you working on it? 14:09:58 <liuyulong> not yet 14:09:58 <mlavalle> slaweq: should we increase its priority? 14:10:31 <slaweq> mlavalle: I don't think so, this one isn't happening very often IIRC 14:10:40 <slaweq> but let me check in logstash 14:10:59 <mlavalle> yeah, I got a little more sense of urgency yesterday 14:11:39 <mlavalle> liuyulong: so you are planning to work on it. Can I assign it to you? 14:11:55 <slaweq> looks like it failed 5 times in last 7 days 14:11:59 <liuyulong> slaweq, yes, 20 times last 30 days. 14:12:06 <liuyulong> mlavalle, OK 14:12:40 <slaweq> IMO we have much more important issues currently, but we can set it to high as it impacts our gates 14:12:51 <liuyulong> +1 14:13:11 <mlavalle> ok 14:13:45 <liuyulong> The functional and fullstack test looks not so much friendly to us. 14:13:47 <mlavalle> liuyulong: are you dragon889 in launchpad? 14:13:59 <liuyulong> mlavalle, yes 14:14:26 <mlavalle> ok assigned to you 14:14:40 <mlavalle> it would be good if you can get to it asap 14:15:26 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1818614 14:15:27 <openstack> Launchpad bug 1818614 in neutron "Various L3HA functional tests fails often" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 14:15:45 <slaweq> I started looking into this today 14:16:03 <mlavalle> slaweq: you working in this one. Any comments? 14:16:06 <slaweq> for now I only pushed patch https://review.openstack.org/#/c/641127/ 14:16:19 <slaweq> to add journal.log in functional tests logs 14:16:46 <slaweq> what I found in logs which I checked is that keepalived wasn't spawned for router, like http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22line%20690%2C%20in%20wait_until_true%5C%22 14:16:51 <slaweq> sorry wrong link 14:17:03 <slaweq> http://logs.openstack.org/74/640874/2/check/neutron-functional/37e3040/logs/dsvm-functional-logs/neutron.tests.functional.agent.l3.extensions.test_port_forwarding_extension.TestL3AgentFipPortForwardingExtensionDVR.test_dvr_ha_router_failover_without_gw.txt.gz#_2019-03-05_23_36_44_978 14:17:06 <slaweq> this one is good 14:17:21 <slaweq> but I don't know why keepalived is not starting 14:17:36 <slaweq> I couldn't of course reproduce this issue locally :/ 14:17:51 <slaweq> but I will continue work on it and will update launchpad when I will find something 14:17:54 <haleyb> looks like radvd didn't start either 14:18:13 <slaweq> haleyb: but radvd isn't necessary always I think 14:18:29 <slaweq> and IIRC it is like that in "good" runs too 14:18:44 <haleyb> ah, just noticed same error 14:20:18 <liuyulong> Anyone noticed/checked is there infra updating/upgrading? 14:20:31 <slaweq> I was looking e.g. on keepalived version 14:20:37 <slaweq> it's the same for very long time 14:20:46 <slaweq> so that is not the case here 14:20:52 <mlavalle> and it would reflect in other kind of tests 14:20:55 <mlavalle> wouldn't it? 14:21:00 <mlavalle> like scenario 14:21:06 <slaweq> probably 14:21:24 <slaweq> but also please note that (almost) all scenario jobs are already running on Bionic 14:21:36 <slaweq> and functional tests are still legacy jobs and running on Xenial 14:21:41 <mlavalle> ahhhh 14:21:44 <slaweq> maybe there is some difference there 14:21:56 <slaweq> I'm not sure 14:22:06 <mlavalle> yeah, there may be some difference 14:22:25 <slaweq> yep, so I will continue work on it probably today evening and tomorrow 14:22:52 <liuyulong> The dsvm-functional instance, I mean the virtual machine for devstack, is changed? 14:23:41 <slaweq> liuyulong: I'm not 100% sure, maybe some packets were changed 14:24:09 <slaweq> I will try to compare with job from e.g. 2 weeks ago 14:24:17 <slaweq> but I'm not sure that this will help 14:24:47 <liuyulong> slaweq, cool, thanks 14:25:43 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1818015 14:25:44 <openstack> Launchpad bug 1818015 in neutron "VLAN manager removed external port mapping when it was still in use" [Critical,New] 14:26:04 <mlavalle> This one was classified as critical last week by bug deputy 14:26:25 <mlavalle> but we don't see it in our jobs 14:26:48 <mlavalle> and we don't have any other reports about it 14:26:52 <mlavalle> am I right? 14:27:46 <mlavalle> submitter indicates he cannot reproduce 14:28:06 <mlavalle> so I will lower priority to medium and respond with some questions 14:28:09 <mlavalle> makes sense? 14:28:24 <njohnston_> makes sense 14:28:58 <slaweq> yes, we should get some l3 and ovs agent logs from time when this happend for them 14:29:07 <slaweq> maybe even You can mark it as incomplete for now? 14:29:37 <mlavalle> slaweq: yes, that's what I'll do 14:29:42 <slaweq> mlavalle++ 14:29:59 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1795870 14:30:01 <openstack> Launchpad bug 1795870 in neutron "Trunk scenario test test_trunk_subport_lifecycle fails from time to time" [High,In progress] - Assigned to Miguel Lavalle (minsel) 14:30:36 <mlavalle> For this one I have two patches proposed as fix. We know they work. This is the latest run: http://logs.openstack.org/10/636710/5/check/neutron-tempest-plugin-dvr-multinode-scenario/24e0ec4/testr_results.html.gz 14:31:54 <slaweq> mlavalle: do You have links to those patches? or should I look for them in gerrit? 14:32:04 <haleyb> https://review.openstack.org/#/c/636710/ is one 14:32:22 <haleyb> https://review.openstack.org/#/c/639375/ is other 14:32:22 <mlavalle> https://review.openstack.org/#/c/639375 14:32:28 <mlavalle> is the other 14:32:46 <slaweq> thank You both haleyb and mlavalle :) 14:32:51 <haleyb> there are actually 4 in the bug, maybe the first two can be abandoned? 14:33:15 <mlavalle> haleyb: yes, I'll do that. the other two were really tests 14:33:51 <mlavalle> now I need to create a plausible test where: 14:33:58 <mlavalle> 1) a process is spawned 14:34:04 <slaweq> mlavalle: please check functional tests in those patches - it looks that failures might be related 14:34:27 <mlavalle> slaweq: I know the functional test I created didn't work 14:34:42 <slaweq> no, but some of existing tests are failing 14:35:18 <mlavalle> I will do that 14:35:26 <slaweq> thx 14:35:45 <mlavalle> really what I am trying to get at is to ask suggestions on the best way to test this 14:36:05 <mlavalle> we don't have tests for kill filters in our tree, do we? 14:36:30 <slaweq> nope AFAIK 14:37:14 <mlavalle> do you think a functional test is the best approach? 14:37:44 <slaweq> so do You want to spawn process and then try simply to kill it? 14:38:11 <mlavalle> yes, but in that proceess has to call setproctitle 14:39:08 <mlavalle> to change its name 14:39:25 <mlavalle> its command^^^^ 14:39:39 <slaweq> maybe we can add/change somehow existing L3 fullstack tests and check if after removing router e.g. keepalived processes are killed 14:39:51 <slaweq> (that's only an idea, I don't know if it's good one) 14:40:29 <mlavalle> ok, I'll keep digging 14:40:44 <mlavalle> let's move on 14:40:58 <mlavalle> any other bugs we should discuss today? 14:41:10 <haleyb> there were two new ones 14:41:18 <haleyb> https://bugs.launchpad.net/neutron/+bug/1818805 14:41:19 <openstack> Launchpad bug 1818805 in neutron "Conntrack rules in the qrouter are not deleted when a fip is removed with dvr" [Undecided,New] 14:41:32 <slaweq> haleyb: was faster than me with them :) 14:41:36 <slaweq> thx haleyb 14:41:51 <haleyb> i have not triaged yet, but can take a look 14:42:03 <mlavalle> ok, thanks 14:42:21 <haleyb> https://bugs.launchpad.net/neutron/+bug/1818824 14:42:23 <openstack> Launchpad bug 1818824 in neutron "When a fip is added to a vm with dvr, previous connections loss the connectivity" [Undecided,New] 14:42:52 <haleyb> it's related in that it's conntrack w/DVR, so maybe there is a regression there on matching connections 14:43:08 <slaweq> this one is "interesting" because it's different behaviour for dvr and non-dvr routers 14:44:17 <slaweq> so I guess it's a bug in dvr implementatio as existing connection shouldn't be broken IMO but maybe it's a "feature" and bug is in no-DVR solution :) 14:44:47 <haleyb> "feature", yes :) 14:45:29 <slaweq> :) 14:46:06 <mlavalle> are we saying this is not a bug? 14:46:24 <slaweq> I wanted to ask You as more experienced L3 experts :) 14:46:53 <slaweq> what is expected behaviour because it should be the same for each implementation IMO 14:46:57 <haleyb> no, just joking, i haven't fully looked at the second, but it's a difference in behavior between centralized/dvr so probably a bug 14:47:07 <haleyb> jinx 14:47:25 <haleyb> slaweq probably doesn't get that 14:47:32 <mlavalle> LOL 14:47:37 <slaweq> I got it :) 14:47:43 <mlavalle> so will you continue looking at it? 14:48:31 <haleyb> i can look at both as i've got a dvr setup running and should be easy to see the first i hope 14:48:49 <liuyulong> How to transmit the 'previous connection' contrack state from network node to compute node? 14:50:32 <haleyb> yeah, we can't do that. i hadn't read it completely but now see that's the difference 14:50:45 <liuyulong> Centralized floating IPs may not have such issue. : ) 14:50:56 <haleyb> i don't think connections using the default snat IP should continue once a floating IP is assigned 14:51:18 <liuyulong> I mean dvr_no_external with centralized floating IPs。 14:52:07 <haleyb> right, but then we have different outcomes depending on deployment 14:53:33 <mlavalle> ok, let's move on 14:53:42 <mlavalle> any other bugs? 14:54:22 <mlavalle> ok 14:54:29 <mlavalle> #topic On demand agenda 14:54:43 <mlavalle> I have one additional topic 14:55:34 <mlavalle> in our downstream CI (Verizonmedia) we are seeing this unit test failing frequently: https://github.com/openstack/neutron/blob/master/neutron/tests/unit/scheduler/test_dhcp_agent_scheduler.py#L524 14:56:06 <mlavalle> do any of you remember seeing this failure upstream? 14:56:27 <slaweq> nope 14:56:35 <slaweq> at least I don't remember 14:56:50 <njohnston_> seems new and different to me 14:56:51 <mlavalle> yeah me neither 14:57:10 <haleyb> not particularly, but there were some changes in the dhcp agent regarding the network list building i thought, if it's related 14:57:29 <haleyb> oh, that's the scheduler, never mind 14:57:50 <mlavalle> haleyb: any quick pointers where to look? 14:58:45 <haleyb> mlavalle: i think they were agent changes, so maybe not related to this 14:58:56 <mlavalle> ok cool. thanks 14:59:06 <mlavalle> any other topics we should discuss today? 14:59:44 <mlavalle> ok, thanks for attending 14:59:54 <mlavalle> have a nice rest of your day 14:59:57 <davidsha> o/ 14:59:58 <mlavalle> #endmeeting