14:02:52 <mlavalle> #startmeeting neutron_l3
14:02:53 <openstack> Meeting started Wed Mar  6 14:02:52 2019 UTC and is due to finish in 60 minutes.  The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:02:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:02:56 <openstack> The meeting name has been set to 'neutron_l3'
14:03:12 <haleyb> hi
14:03:15 <mlavalle> hi
14:03:24 <slaweq> hi
14:03:29 <ralonsoh> hi
14:03:33 <liuyulong> hi
14:03:38 <mlavalle> sorry for the delay. dog was not very cooperative during our walk. lots of stops to sniff
14:03:52 <slaweq> LOL
14:04:33 <njohnston_> o/
14:04:37 <davidsha> o/
14:04:38 <mlavalle> #topic Announcements
14:04:41 <haleyb> so a dog being a dog?
14:04:51 <mlavalle> more or less
14:04:58 <mlavalle> LOL
14:05:25 <mlavalle> We all now this is the week of Stein-3 milestone
14:05:40 <mlavalle> so we are at the end of the cycle
14:06:44 <mlavalle> I see messages in my inbox about PTL non candidacies and candidacies.... so we are in PTL candidacy season
14:07:06 <mlavalle> any other announcements from the team?
14:07:37 <mlavalle> ok, let's move on
14:07:47 <mlavalle> ahhh, forgot
14:07:59 <mlavalle> #chair liuyulong
14:08:00 <openstack> Current chairs: liuyulong mlavalle
14:08:26 <mlavalle> liuyulong: just let me know when you are ready to run this meeting....
14:08:38 <mlavalle> #topic Bugs
14:09:01 <liuyulong> Lots of bug I'm working on now, : )
14:09:14 <mlavalle> First bug for today is https://bugs.launchpad.net/neutron/+bug/1818334
14:09:15 <openstack> Launchpad bug 1818334 in neutron "Functional test test_concurrent_create_port_forwarding_update_port is failing" [Medium,Confirmed]
14:09:32 <mlavalle> we discussed this one yesterday during the CI meeting
14:09:42 <mlavalle> liuyulong: are you working on it?
14:09:58 <liuyulong> not yet
14:09:58 <mlavalle> slaweq: should we increase its priority?
14:10:31 <slaweq> mlavalle: I don't think so, this one isn't happening very often IIRC
14:10:40 <slaweq> but let me check in logstash
14:10:59 <mlavalle> yeah, I got a little more sense of urgency yesterday
14:11:39 <mlavalle> liuyulong: so you are planning to work on it. Can I assign it to you?
14:11:55 <slaweq> looks like it failed 5 times in last 7 days
14:11:59 <liuyulong> slaweq, yes, 20 times last 30 days.
14:12:06 <liuyulong> mlavalle, OK
14:12:40 <slaweq> IMO we have much more important issues currently, but we can set it to high as it impacts our gates
14:12:51 <liuyulong> +1
14:13:11 <mlavalle> ok
14:13:45 <liuyulong> The functional and fullstack test looks not so much friendly to us.
14:13:47 <mlavalle> liuyulong: are you dragon889 in launchpad?
14:13:59 <liuyulong> mlavalle, yes
14:14:26 <mlavalle> ok assigned to you
14:14:40 <mlavalle> it would be good if you can get to it asap
14:15:26 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1818614
14:15:27 <openstack> Launchpad bug 1818614 in neutron "Various L3HA functional tests fails often" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
14:15:45 <slaweq> I started looking into this today
14:16:03 <mlavalle> slaweq: you working in this one. Any comments?
14:16:06 <slaweq> for now I only pushed patch https://review.openstack.org/#/c/641127/
14:16:19 <slaweq> to add journal.log in functional tests logs
14:16:46 <slaweq> what I found in logs which I checked is that keepalived wasn't spawned for router, like http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22line%20690%2C%20in%20wait_until_true%5C%22
14:16:51 <slaweq> sorry wrong link
14:17:03 <slaweq> http://logs.openstack.org/74/640874/2/check/neutron-functional/37e3040/logs/dsvm-functional-logs/neutron.tests.functional.agent.l3.extensions.test_port_forwarding_extension.TestL3AgentFipPortForwardingExtensionDVR.test_dvr_ha_router_failover_without_gw.txt.gz#_2019-03-05_23_36_44_978
14:17:06 <slaweq> this one is good
14:17:21 <slaweq> but I don't know why keepalived is not starting
14:17:36 <slaweq> I couldn't of course reproduce this issue locally :/
14:17:51 <slaweq> but I will continue work on it and will update launchpad when I will find something
14:17:54 <haleyb> looks like radvd didn't start either
14:18:13 <slaweq> haleyb: but radvd isn't necessary always I think
14:18:29 <slaweq> and IIRC it is like that in "good" runs too
14:18:44 <haleyb> ah, just noticed same error
14:20:18 <liuyulong> Anyone noticed/checked is there infra updating/upgrading?
14:20:31 <slaweq> I was looking e.g. on keepalived version
14:20:37 <slaweq> it's the same for very long time
14:20:46 <slaweq> so that is not the case here
14:20:52 <mlavalle> and it would reflect in other kind of tests
14:20:55 <mlavalle> wouldn't it?
14:21:00 <mlavalle> like scenario
14:21:06 <slaweq> probably
14:21:24 <slaweq> but also please note that (almost) all scenario jobs are already running on Bionic
14:21:36 <slaweq> and functional tests are still legacy jobs and running on Xenial
14:21:41 <mlavalle> ahhhh
14:21:44 <slaweq> maybe there is some difference there
14:21:56 <slaweq> I'm not sure
14:22:06 <mlavalle> yeah, there may be some difference
14:22:25 <slaweq> yep, so I will continue work on it probably today evening and tomorrow
14:22:52 <liuyulong> The dsvm-functional instance, I mean the virtual machine for devstack, is changed?
14:23:41 <slaweq> liuyulong: I'm not 100% sure, maybe some packets were changed
14:24:09 <slaweq> I will try to compare with job from e.g. 2 weeks ago
14:24:17 <slaweq> but I'm not sure that this will help
14:24:47 <liuyulong> slaweq, cool, thanks
14:25:43 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1818015
14:25:44 <openstack> Launchpad bug 1818015 in neutron "VLAN manager removed external port mapping when it was still in use" [Critical,New]
14:26:04 <mlavalle> This one was classified as critical last week by bug deputy
14:26:25 <mlavalle> but we don't see it in our jobs
14:26:48 <mlavalle> and we don't have any other reports about it
14:26:52 <mlavalle> am I right?
14:27:46 <mlavalle> submitter indicates he cannot reproduce
14:28:06 <mlavalle> so I will lower priority to medium and respond with some questions
14:28:09 <mlavalle> makes sense?
14:28:24 <njohnston_> makes sense
14:28:58 <slaweq> yes, we should get some l3 and ovs agent logs from time when this happend for them
14:29:07 <slaweq> maybe even You can mark it as incomplete for now?
14:29:37 <mlavalle> slaweq: yes, that's what I'll do
14:29:42 <slaweq> mlavalle++
14:29:59 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1795870
14:30:01 <openstack> Launchpad bug 1795870 in neutron "Trunk scenario test test_trunk_subport_lifecycle fails from time to time" [High,In progress] - Assigned to Miguel Lavalle (minsel)
14:30:36 <mlavalle> For this one I have two patches proposed as fix. We know they work. This is the latest run: http://logs.openstack.org/10/636710/5/check/neutron-tempest-plugin-dvr-multinode-scenario/24e0ec4/testr_results.html.gz
14:31:54 <slaweq> mlavalle: do You have links to those patches? or should I look for them in gerrit?
14:32:04 <haleyb> https://review.openstack.org/#/c/636710/ is one
14:32:22 <haleyb> https://review.openstack.org/#/c/639375/ is other
14:32:22 <mlavalle> https://review.openstack.org/#/c/639375
14:32:28 <mlavalle> is the other
14:32:46 <slaweq> thank You both haleyb and mlavalle :)
14:32:51 <haleyb> there are actually 4 in the bug, maybe the first two can be abandoned?
14:33:15 <mlavalle> haleyb: yes, I'll do that. the other two were really tests
14:33:51 <mlavalle> now I need to create a plausible test where:
14:33:58 <mlavalle> 1) a process is spawned
14:34:04 <slaweq> mlavalle: please check functional tests in those patches - it looks that failures might be related
14:34:27 <mlavalle> slaweq: I know the functional test I created didn't work
14:34:42 <slaweq> no, but some of existing tests are failing
14:35:18 <mlavalle> I will do that
14:35:26 <slaweq> thx
14:35:45 <mlavalle> really what I am trying to get at is to ask suggestions on the best way to test this
14:36:05 <mlavalle> we don't have tests for kill filters in our tree, do we?
14:36:30 <slaweq> nope AFAIK
14:37:14 <mlavalle> do you think a functional test is the best approach?
14:37:44 <slaweq> so do You want to spawn process and then try simply to kill it?
14:38:11 <mlavalle> yes, but in that proceess has to call setproctitle
14:39:08 <mlavalle> to change its name
14:39:25 <mlavalle> its command^^^^
14:39:39 <slaweq> maybe we can add/change somehow existing L3 fullstack tests and check if after removing router e.g. keepalived processes are killed
14:39:51 <slaweq> (that's only an idea, I don't know if it's good one)
14:40:29 <mlavalle> ok, I'll keep digging
14:40:44 <mlavalle> let's move on
14:40:58 <mlavalle> any other bugs we should discuss today?
14:41:10 <haleyb> there were two new ones
14:41:18 <haleyb> https://bugs.launchpad.net/neutron/+bug/1818805
14:41:19 <openstack> Launchpad bug 1818805 in neutron "Conntrack rules in the qrouter are not deleted when a fip is removed with dvr" [Undecided,New]
14:41:32 <slaweq> haleyb: was faster than me with them :)
14:41:36 <slaweq> thx haleyb
14:41:51 <haleyb> i have not triaged yet, but can take a look
14:42:03 <mlavalle> ok, thanks
14:42:21 <haleyb> https://bugs.launchpad.net/neutron/+bug/1818824
14:42:23 <openstack> Launchpad bug 1818824 in neutron "When a fip is added to a vm with dvr, previous connections loss the connectivity" [Undecided,New]
14:42:52 <haleyb> it's related in that it's conntrack w/DVR, so maybe there is a regression there on matching connections
14:43:08 <slaweq> this one is "interesting" because it's different behaviour for dvr and non-dvr routers
14:44:17 <slaweq> so I guess it's a bug in dvr implementatio as existing connection shouldn't be broken IMO but maybe it's a "feature" and bug is in no-DVR solution :)
14:44:47 <haleyb> "feature", yes :)
14:45:29 <slaweq> :)
14:46:06 <mlavalle> are we saying this is not a bug?
14:46:24 <slaweq> I wanted to ask You as more experienced L3 experts :)
14:46:53 <slaweq> what is expected behaviour because it should be the same for each implementation IMO
14:46:57 <haleyb> no, just joking, i haven't fully looked at the second, but it's a difference in behavior between centralized/dvr so probably a bug
14:47:07 <haleyb> jinx
14:47:25 <haleyb> slaweq probably doesn't get that
14:47:32 <mlavalle> LOL
14:47:37 <slaweq> I got it :)
14:47:43 <mlavalle> so will you continue looking at it?
14:48:31 <haleyb> i can look at both as i've got a dvr setup running and should be easy to see the first i hope
14:48:49 <liuyulong> How to transmit the 'previous connection' contrack state from network node to compute node?
14:50:32 <haleyb> yeah, we can't do that.  i hadn't read it completely but now see that's the difference
14:50:45 <liuyulong> Centralized floating IPs may not have such issue. : )
14:50:56 <haleyb> i don't think connections using the default snat IP should continue once a floating IP is assigned
14:51:18 <liuyulong> I mean dvr_no_external with centralized floating IPs。
14:52:07 <haleyb> right, but then we have different outcomes depending on deployment
14:53:33 <mlavalle> ok, let's move on
14:53:42 <mlavalle> any other bugs?
14:54:22 <mlavalle> ok
14:54:29 <mlavalle> #topic On demand agenda
14:54:43 <mlavalle> I have one additional topic
14:55:34 <mlavalle> in our downstream CI (Verizonmedia) we are seeing this unit test failing frequently: https://github.com/openstack/neutron/blob/master/neutron/tests/unit/scheduler/test_dhcp_agent_scheduler.py#L524
14:56:06 <mlavalle> do any of you remember seeing this failure upstream?
14:56:27 <slaweq> nope
14:56:35 <slaweq> at least I don't remember
14:56:50 <njohnston_> seems new and different to me
14:56:51 <mlavalle> yeah me neither
14:57:10 <haleyb> not particularly, but there were some changes in the dhcp agent regarding the network list building i thought, if it's related
14:57:29 <haleyb> oh, that's the scheduler, never mind
14:57:50 <mlavalle> haleyb: any quick pointers where to look?
14:58:45 <haleyb> mlavalle: i think they were agent changes, so maybe not related to this
14:58:56 <mlavalle> ok cool. thanks
14:59:06 <mlavalle> any other topics we should discuss today?
14:59:44 <mlavalle> ok, thanks for attending
14:59:54 <mlavalle> have a nice rest of your day
14:59:57 <davidsha> o/
14:59:58 <mlavalle> #endmeeting