16:00:01 <slaweq> #startmeeting neutron_ci
16:00:01 <openstack> Meeting started Tue Sep 24 16:00:01 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:05 <njohnston> o/
16:00:06 <openstack> The meeting name has been set to 'neutron_ci'
16:00:07 <slaweq> hi (again) :)
16:00:28 <ralonsoh> hi
16:00:31 <bcafarel> long time no see :)
16:00:51 <njohnston> lol
16:01:13 <slaweq> :)
16:01:17 <slaweq> ok, let's start
16:01:26 <slaweq> #topic Actions from previous meetings
16:01:30 <slaweq> first one is
16:01:40 <slaweq> mlavalle to continue investigating router migrations issue
16:02:32 <slaweq> but I think mlavalle is not here now
16:02:34 <bcafarel> (also open up Grafana dashboard in some tab: http://grafana.openstack.org/dashboard/db/neutron-failure-rate )
16:02:39 <slaweq> bcafarel: right
16:02:48 <slaweq> I forgot about it, thx for reminder
16:03:08 <slaweq> let's move on to the second action from last week
16:03:18 <slaweq> ralonsoh to check if https://review.opendev.org/#/c/679428/ should be backported to stable branches
16:03:33 <ralonsoh> no,t hat's not needed
16:03:43 <ralonsoh> this method is not in stable branches
16:04:17 <slaweq> ralonsoh: thx, that's good
16:04:27 <slaweq> ok, next one
16:04:29 <slaweq> ralonsoh to report bug and investigate issue with neutron_tempest_plugin.scenario.test_qos.QoSTest.test_qos_basic_and_update
16:05:08 <ralonsoh> yes, that was related to https://bugs.launchpad.net/neutron/+bug/1833721
16:05:08 <openstack> Launchpad bug 1833721 in neutron "ip_lib synchronized decorator should wrap the privileged one" [Medium,Fix released] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
16:05:20 <ralonsoh> #link https://review.opendev.org/#/c/683109/
16:05:51 <slaweq> really
16:06:00 <slaweq> ?
16:06:15 <ralonsoh> sorry
16:06:18 <slaweq> but why it was causing always failure of this one test? do You know?
16:06:34 <ralonsoh> yes, the problem was related to a ip_lib method timeout
16:07:14 <ralonsoh> sorry, I mix the bugs
16:07:21 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1844516
16:07:21 <openstack> Launchpad bug 1844516 in neutron "[neutron-tempest-plugin] SSH timeout exceptions when executing remote commands" [Medium,Fix released] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
16:07:36 <ralonsoh> #link https://review.opendev.org/#/c/682864/
16:07:49 <ralonsoh> that was the real problem for the qos test
16:08:11 <slaweq> ralonsoh: ok, that sounds more correct :)
16:08:26 <slaweq> so we should be fine with this issue now, thx a lot
16:08:37 <ralonsoh> during the BW test the ssh connection was raising a timeout exception
16:08:47 <ralonsoh> so this patch tries to mitigate that
16:10:18 <ralonsoh> (that's all from my side)
16:10:22 <slaweq> thx ralonsoh
16:10:48 <slaweq> ok, next one
16:10:51 <slaweq> also on ralonsoh :)
16:10:56 <slaweq> ralonsoh to report bug with "Multiple possible networks found"
16:11:07 <ralonsoh> yes one sec
16:11:40 <ralonsoh> #link https://bugs.launchpad.net/tempest/+bug/1844568
16:11:40 <openstack> Launchpad bug 1844568 in tempest "[compute] "create_test_server" if networks is undefined and more than one network is present" [Undecided,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
16:11:48 <ralonsoh> and the patch #link https://review.opendev.org/#/c/682964/
16:12:10 <ralonsoh> the problem is, soemtimes, when a port is created without specifying the network
16:12:15 <slaweq> ahh, You even created patch. Nice :)
16:12:26 <ralonsoh> there are more than one network, so Nova fails
16:12:50 <ralonsoh> with exception "NetworkAmbiguous"
16:12:52 <ralonsoh> that's all
16:13:11 <slaweq> thx for update ralonsoh
16:13:25 <slaweq> and the last one was
16:13:27 <slaweq> slaweq to report bug with fullstack test_bw_limit_qos_port_removed test
16:13:37 <slaweq> I reported bug today https://bugs.launchpad.net/neutron/+bug/1845176
16:13:38 <openstack> Launchpad bug 1845176 in neutron "Removing of QoS queue in neutron-ovs-agent fails due to existing references" [Medium,Confirmed]
16:13:46 <slaweq> but didn't have time to look into it more
16:14:41 <ralonsoh> maybe I can take a look at this one tomorrow
16:14:49 <slaweq> I will left it unassigned for now, maybe someone will want to take a look
16:14:55 <ralonsoh> I'll ping you if I start looking at this one
16:14:56 <slaweq> if not I will try to check it
16:15:02 <slaweq> ralonsoh: sure, thx :)
16:15:24 <slaweq> that was all actions from last week which I had
16:15:32 <slaweq> so let's move on to the next topic
16:15:33 <slaweq> #topic Stadium projects
16:15:50 <slaweq> Python 3 migration
16:15:56 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status
16:16:13 <slaweq> njohnston: do You have any updates on this maybe?
16:17:22 <njohnston> No, I have not seen lajoskatona or yamamoto online
16:17:36 <njohnston> I need to check on the status of bagpipe, I had thought we were done with that
16:18:33 <bcafarel> all reviews seem merged, so maybe just missing an etherpad update yes
16:18:40 <slaweq> it seems so
16:18:58 <slaweq> njohnston: will You check that and update etherpad or should I do it?
16:20:04 <njohnston> I'll do it
16:20:10 <slaweq> njohnston: thx a lot
16:20:21 <njohnston> #action njohnston update python 3 stagium etherpad for bagpipe
16:20:55 <slaweq> ok, next is
16:20:57 <slaweq> tempest-plugins migration
16:21:02 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo
16:21:33 <slaweq> for neutron-dynamic-routing patch is ready for review https://review.opendev.org/#/c/652099
16:21:43 <slaweq> please take a look on it if You will have few minutes
16:21:49 <bcafarel> thanks tidwellr for fixing my nitpicking comment :)
16:22:21 <bcafarel> also related small patch: https://review.opendev.org/#/c/683935/
16:23:19 <slaweq> thx bcafarel +2 already
16:23:33 <bcafarel> that was fast!
16:23:39 <slaweq> :)
16:23:47 <ralonsoh> +2
16:24:17 <njohnston> +2+W
16:24:21 <bcafarel> :)
16:24:28 <slaweq> ok, anything else related to stadium projects You want to discuss today?
16:25:03 <njohnston> I had a little gate fix for neutron-fwaas functional tests
16:25:04 <njohnston> https://review.opendev.org/#/c/684386/
16:25:15 <njohnston> take a look; it's blocking the PDF goal merge for that project
16:25:40 <slaweq> njohnston: https://review.opendev.org/#/c/652812/2
16:25:45 <slaweq> this one was first I think :)
16:26:05 <slaweq> and it seems to be fixing same issue, right?
16:26:19 <njohnston> indeed!  I did not catch that in my search; we can push that one instead
16:26:37 <slaweq> it was there since April
16:26:39 <slaweq> LOL
16:26:45 <slaweq> I found it just today
16:27:47 <slaweq> njohnston: thx for bringing it up here :)
16:28:05 <ralonsoh> (we should be more careful when changing class definitions)
16:28:35 <ralonsoh> (me the first)
16:28:48 <slaweq> yes, we should
16:29:25 <slaweq> that's why on one of last PTGs we agreed to add some stadium project's jobs to neutron check queue
16:29:37 <slaweq> but only networking-ovn and midonet was interested in that
16:29:51 <slaweq> other stadium projects didn't propose such jobs
16:30:07 <slaweq> ahh, sorry
16:30:15 <slaweq> there is also ironic job and openstacksdk now
16:30:30 <slaweq> and tripleo :)
16:31:02 <ralonsoh> lots of fun now
16:31:07 <slaweq> :)
16:31:53 <slaweq> but we will discuss future of stadium projects in Shanghai so we can also think about their CI :)
16:32:07 <slaweq> ok, lets move on
16:32:16 <slaweq> #topic Grafana
16:32:16 <bcafarel> sounds like an interesting subtopic yes
16:32:30 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:33:30 <ralonsoh> I think we have improved the stability last week
16:34:20 <slaweq> ralonsoh: yes, but I still see quite high numbers on functional/fullstack jobs
16:34:45 <slaweq> and als grenade jobs are failing about 20% this week :/
16:34:56 <ralonsoh> I'm trying to debug all those FT/fullstack ones
16:35:21 <slaweq> for those I have some examples today
16:35:50 <slaweq> also UT are failing around 20-30% recently
16:36:04 <slaweq> but for those I didn't found any specific issue not related to patch on which it was running
16:36:13 <slaweq> so I would not bother with that too much for now
16:37:07 <ralonsoh> slaweq, if you find something suspicious, ping me
16:37:14 <slaweq> ralonsoh: sure, thx
16:37:27 <slaweq> any other findings from grafana?
16:38:05 <slaweq> ok, lets move on than
16:38:45 <slaweq> #topic fullstack/functional
16:38:59 <slaweq> for functional tests I found couple of issues
16:39:05 <slaweq> neutron.tests.functional.agent.linux.test_l3_tc_lib.TcLibTestCase.test_get_existing_filter_ids
16:39:11 <slaweq> https://99b34ecc9afda69f8d26-4c2619fbf66a72a5befb0d8c52d9c271.ssl.cf1.rackcdn.com/682418/7/check/neutron-functional-python27/fb1cc17/testr_results.html.gz - similar issue like we saw last week also,
16:39:46 <ralonsoh> but I think this is a problem in testcase library
16:39:48 <ralonsoh> in py2
16:40:17 <slaweq> but what is causing such problem?
16:40:25 <slaweq> we have to have some trigger for it IMO
16:40:46 <ralonsoh> slaweq, I tried to debug this but not too much
16:40:53 <ralonsoh> because is not happening in Py3
16:41:06 <slaweq> yes, I saw it twice in py2 job
16:41:29 <slaweq> maybe we can just live with it for few more weeks and than will be gone when we will get rid of py2 finally
16:42:02 <ralonsoh> that's the spirit: if a test fails, remove it!!!!
16:42:11 <slaweq> ralonsoh: LOL
16:42:12 <ralonsoh> +1
16:42:21 <slaweq> but don't tell anyone ;)
16:42:31 <bcafarel> so, use secret.review.openstack.org ?
16:42:38 <ralonsoh> hehehehe
16:42:38 <bcafarel> s/openstack/opendev/
16:42:52 <slaweq> in the past I was proposing here to use https://pypi.org/project/pytest-vw/ module
16:43:04 <slaweq> but we are not using pytest so it's hard :/
16:43:06 <slaweq> :D
16:43:42 <slaweq> ok, now seriously :)
16:43:49 <slaweq> another issue wchich I found
16:43:57 <slaweq> neutron.tests.functional.agent.linux.test_iptables.IptablesManagerTestCase.test_tcp_output
16:44:02 <slaweq> https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8fe/682418/7/gate/neutron-functional/8fe4fa5/testr_results.html.gz - looks like something new for me, needs to be checked,
16:44:15 <ralonsoh> I didn't find the problem there
16:44:34 <ralonsoh> IMO, the rule was never applied
16:45:06 <slaweq> ralonsoh: in this test_tcp_output ?
16:45:11 <ralonsoh> yes
16:45:23 <slaweq> so that is the problem, no?
16:45:28 <ralonsoh> yes
16:45:38 <ralonsoh> but I can't confirm this 100%
16:46:12 <slaweq> I will take a look into this more deeply this week
16:46:39 <slaweq> #action slaweq to invesigate issue with  neutron.tests.functional.agent.linux.test_iptables.IptablesManagerTestCase.test_tcp_output
16:46:57 <slaweq> ok, another one:
16:46:59 <slaweq> neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_spawns_conflicting_pid_base_process
16:47:06 <slaweq> https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c52/683915/2/gate/neutron-functional-python27/c529449/testr_results.html.gz
16:47:15 <ralonsoh> yes this one
16:47:26 <ralonsoh> #link https://review.opendev.org/#/c/684249/
16:47:39 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1845150
16:47:39 <openstack> Launchpad bug 1845150 in neutron "[FT] "keepalived" needs network interfaces configured as in its own config" [High,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
16:48:12 <slaweq> ralonsoh: are You sure this is real reason of failure?
16:48:25 <slaweq> I was checking journal log from this job today
16:48:39 <ralonsoh> I think so, the keepalived process stops when eth0 does not have an IP
16:49:11 <ralonsoh> and this may cause the test to fail
16:49:29 <ralonsoh> I found that when developing https://review.opendev.org/#/c/681671/
16:50:00 <ralonsoh> the os.kill() process didn't work because the keepalived process was not there anymore
16:50:03 <slaweq> but I found in journal log many occurences of "Cannot find an IP address to use for interface"
16:50:11 <ralonsoh> I know
16:50:16 <slaweq> so it seems that in many other, passing tests it was the same
16:51:17 <ralonsoh> the problem is that keepalived does not stop working inmediatly
16:51:51 <slaweq> ahh, so some race
16:51:57 <ralonsoh> I think so
16:52:07 <ralonsoh> But, of course, there could be another problem
16:52:09 <slaweq> and in most cases test is finished before keepalived crash
16:52:13 <ralonsoh> but IMO this patch is legit
16:52:16 <slaweq> makes sense
16:52:24 <slaweq> yes, I just +2'ed this patch
16:53:06 <slaweq> ok, and the last one from functional tests:
16:53:08 <slaweq> neutron.tests.functional.agent.linux.test_bridge_lib.FdbInterfaceTestCase.test_add_delete(no_namespace)
16:53:18 <slaweq> https://9722deaa313ebebb56dc-c08b881decb3106ff13d720dd4a26025.ssl.cf5.rackcdn.com/681846/5/check/neutron-functional-python27/c3dc48a/testr_results.html.gz
16:53:26 <ralonsoh> again with those test cases (I'm the father)
16:53:36 <slaweq> LOL
16:53:47 <ralonsoh> I pushed a patch to make the interface names random
16:53:54 <ralonsoh> this should not happen
16:54:02 <ralonsoh> I'll take a look at this tomorrow
16:54:07 <slaweq> thx
16:54:29 <slaweq> #action ralonsoh to take a look at failing neutron.tests.functional.agent.linux.test_bridge_lib.FdbInterfaceTestCase.test_add_delete(no_namespace)
16:54:49 <slaweq> ok, that's all for functional tests from me
16:54:59 <slaweq> anything else You have regarding functional job?
16:55:37 <slaweq> ok, I take that as no
16:55:40 <slaweq> so fullstack
16:55:48 <slaweq> neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement
16:55:54 <slaweq> https://d3359cf0499b7fa3a209-a0ae01eb6742268974ea7eef585da77c.ssl.cf1.rackcdn.com/681846/5/check/neutron-fullstack/014ed7d/testr_results.html.gz
16:56:12 <slaweq> I think that here it would be good if rubasov could take a look maybe
16:56:47 <slaweq> I will ask him tomorrow if he will have some time to take a look
16:57:17 <slaweq> #action slaweq to ask rubasov if he can check neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement
16:57:31 <slaweq> and the last one for today:
16:57:33 <slaweq> neutron.tests.fullstack.test_l3_agent.TestLegacyL3Agent.test_mtu_update
16:57:37 <slaweq> https://df0eb3e2e26f1607f7d8-b5f72c94f829be93029a2756be493e29.ssl.cf2.rackcdn.com/679813/2/gate/neutron-fullstack/dfbde3f/testr_results.html.gz
16:58:19 <ralonsoh> I didn't find the error there, the L3 agent is setting the mtu in the interface
16:59:01 <slaweq> I will take a look deeply on that one during this week
16:59:14 <slaweq> #action slaweq to investigate neutron.tests.fullstack.test_l3_agent.TestLegacyL3Agent.test_mtu_update
16:59:22 <slaweq> we are almost out of time now
16:59:51 <slaweq> I just want to ask You for review of 2 patches https://review.opendev.org/#/c/683853/ and https://review.opendev.org/#/c/681607/ if You will have some time
16:59:55 <slaweq> thx in advance :)
17:00:00 <slaweq> and we are out of time
17:00:04 <slaweq> thx for attendind
17:00:07 <ralonsoh> bye
17:00:09 <slaweq> #endmeeting