16:00:01 <slaweq> #startmeeting neutron_ci 16:00:01 <openstack> Meeting started Tue Sep 24 16:00:01 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:05 <njohnston> o/ 16:00:06 <openstack> The meeting name has been set to 'neutron_ci' 16:00:07 <slaweq> hi (again) :) 16:00:28 <ralonsoh> hi 16:00:31 <bcafarel> long time no see :) 16:00:51 <njohnston> lol 16:01:13 <slaweq> :) 16:01:17 <slaweq> ok, let's start 16:01:26 <slaweq> #topic Actions from previous meetings 16:01:30 <slaweq> first one is 16:01:40 <slaweq> mlavalle to continue investigating router migrations issue 16:02:32 <slaweq> but I think mlavalle is not here now 16:02:34 <bcafarel> (also open up Grafana dashboard in some tab: http://grafana.openstack.org/dashboard/db/neutron-failure-rate ) 16:02:39 <slaweq> bcafarel: right 16:02:48 <slaweq> I forgot about it, thx for reminder 16:03:08 <slaweq> let's move on to the second action from last week 16:03:18 <slaweq> ralonsoh to check if https://review.opendev.org/#/c/679428/ should be backported to stable branches 16:03:33 <ralonsoh> no,t hat's not needed 16:03:43 <ralonsoh> this method is not in stable branches 16:04:17 <slaweq> ralonsoh: thx, that's good 16:04:27 <slaweq> ok, next one 16:04:29 <slaweq> ralonsoh to report bug and investigate issue with neutron_tempest_plugin.scenario.test_qos.QoSTest.test_qos_basic_and_update 16:05:08 <ralonsoh> yes, that was related to https://bugs.launchpad.net/neutron/+bug/1833721 16:05:08 <openstack> Launchpad bug 1833721 in neutron "ip_lib synchronized decorator should wrap the privileged one" [Medium,Fix released] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:05:20 <ralonsoh> #link https://review.opendev.org/#/c/683109/ 16:05:51 <slaweq> really 16:06:00 <slaweq> ? 16:06:15 <ralonsoh> sorry 16:06:18 <slaweq> but why it was causing always failure of this one test? do You know? 16:06:34 <ralonsoh> yes, the problem was related to a ip_lib method timeout 16:07:14 <ralonsoh> sorry, I mix the bugs 16:07:21 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1844516 16:07:21 <openstack> Launchpad bug 1844516 in neutron "[neutron-tempest-plugin] SSH timeout exceptions when executing remote commands" [Medium,Fix released] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:07:36 <ralonsoh> #link https://review.opendev.org/#/c/682864/ 16:07:49 <ralonsoh> that was the real problem for the qos test 16:08:11 <slaweq> ralonsoh: ok, that sounds more correct :) 16:08:26 <slaweq> so we should be fine with this issue now, thx a lot 16:08:37 <ralonsoh> during the BW test the ssh connection was raising a timeout exception 16:08:47 <ralonsoh> so this patch tries to mitigate that 16:10:18 <ralonsoh> (that's all from my side) 16:10:22 <slaweq> thx ralonsoh 16:10:48 <slaweq> ok, next one 16:10:51 <slaweq> also on ralonsoh :) 16:10:56 <slaweq> ralonsoh to report bug with "Multiple possible networks found" 16:11:07 <ralonsoh> yes one sec 16:11:40 <ralonsoh> #link https://bugs.launchpad.net/tempest/+bug/1844568 16:11:40 <openstack> Launchpad bug 1844568 in tempest "[compute] "create_test_server" if networks is undefined and more than one network is present" [Undecided,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:11:48 <ralonsoh> and the patch #link https://review.opendev.org/#/c/682964/ 16:12:10 <ralonsoh> the problem is, soemtimes, when a port is created without specifying the network 16:12:15 <slaweq> ahh, You even created patch. Nice :) 16:12:26 <ralonsoh> there are more than one network, so Nova fails 16:12:50 <ralonsoh> with exception "NetworkAmbiguous" 16:12:52 <ralonsoh> that's all 16:13:11 <slaweq> thx for update ralonsoh 16:13:25 <slaweq> and the last one was 16:13:27 <slaweq> slaweq to report bug with fullstack test_bw_limit_qos_port_removed test 16:13:37 <slaweq> I reported bug today https://bugs.launchpad.net/neutron/+bug/1845176 16:13:38 <openstack> Launchpad bug 1845176 in neutron "Removing of QoS queue in neutron-ovs-agent fails due to existing references" [Medium,Confirmed] 16:13:46 <slaweq> but didn't have time to look into it more 16:14:41 <ralonsoh> maybe I can take a look at this one tomorrow 16:14:49 <slaweq> I will left it unassigned for now, maybe someone will want to take a look 16:14:55 <ralonsoh> I'll ping you if I start looking at this one 16:14:56 <slaweq> if not I will try to check it 16:15:02 <slaweq> ralonsoh: sure, thx :) 16:15:24 <slaweq> that was all actions from last week which I had 16:15:32 <slaweq> so let's move on to the next topic 16:15:33 <slaweq> #topic Stadium projects 16:15:50 <slaweq> Python 3 migration 16:15:56 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:16:13 <slaweq> njohnston: do You have any updates on this maybe? 16:17:22 <njohnston> No, I have not seen lajoskatona or yamamoto online 16:17:36 <njohnston> I need to check on the status of bagpipe, I had thought we were done with that 16:18:33 <bcafarel> all reviews seem merged, so maybe just missing an etherpad update yes 16:18:40 <slaweq> it seems so 16:18:58 <slaweq> njohnston: will You check that and update etherpad or should I do it? 16:20:04 <njohnston> I'll do it 16:20:10 <slaweq> njohnston: thx a lot 16:20:21 <njohnston> #action njohnston update python 3 stagium etherpad for bagpipe 16:20:55 <slaweq> ok, next is 16:20:57 <slaweq> tempest-plugins migration 16:21:02 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:21:33 <slaweq> for neutron-dynamic-routing patch is ready for review https://review.opendev.org/#/c/652099 16:21:43 <slaweq> please take a look on it if You will have few minutes 16:21:49 <bcafarel> thanks tidwellr for fixing my nitpicking comment :) 16:22:21 <bcafarel> also related small patch: https://review.opendev.org/#/c/683935/ 16:23:19 <slaweq> thx bcafarel +2 already 16:23:33 <bcafarel> that was fast! 16:23:39 <slaweq> :) 16:23:47 <ralonsoh> +2 16:24:17 <njohnston> +2+W 16:24:21 <bcafarel> :) 16:24:28 <slaweq> ok, anything else related to stadium projects You want to discuss today? 16:25:03 <njohnston> I had a little gate fix for neutron-fwaas functional tests 16:25:04 <njohnston> https://review.opendev.org/#/c/684386/ 16:25:15 <njohnston> take a look; it's blocking the PDF goal merge for that project 16:25:40 <slaweq> njohnston: https://review.opendev.org/#/c/652812/2 16:25:45 <slaweq> this one was first I think :) 16:26:05 <slaweq> and it seems to be fixing same issue, right? 16:26:19 <njohnston> indeed! I did not catch that in my search; we can push that one instead 16:26:37 <slaweq> it was there since April 16:26:39 <slaweq> LOL 16:26:45 <slaweq> I found it just today 16:27:47 <slaweq> njohnston: thx for bringing it up here :) 16:28:05 <ralonsoh> (we should be more careful when changing class definitions) 16:28:35 <ralonsoh> (me the first) 16:28:48 <slaweq> yes, we should 16:29:25 <slaweq> that's why on one of last PTGs we agreed to add some stadium project's jobs to neutron check queue 16:29:37 <slaweq> but only networking-ovn and midonet was interested in that 16:29:51 <slaweq> other stadium projects didn't propose such jobs 16:30:07 <slaweq> ahh, sorry 16:30:15 <slaweq> there is also ironic job and openstacksdk now 16:30:30 <slaweq> and tripleo :) 16:31:02 <ralonsoh> lots of fun now 16:31:07 <slaweq> :) 16:31:53 <slaweq> but we will discuss future of stadium projects in Shanghai so we can also think about their CI :) 16:32:07 <slaweq> ok, lets move on 16:32:16 <slaweq> #topic Grafana 16:32:16 <bcafarel> sounds like an interesting subtopic yes 16:32:30 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:33:30 <ralonsoh> I think we have improved the stability last week 16:34:20 <slaweq> ralonsoh: yes, but I still see quite high numbers on functional/fullstack jobs 16:34:45 <slaweq> and als grenade jobs are failing about 20% this week :/ 16:34:56 <ralonsoh> I'm trying to debug all those FT/fullstack ones 16:35:21 <slaweq> for those I have some examples today 16:35:50 <slaweq> also UT are failing around 20-30% recently 16:36:04 <slaweq> but for those I didn't found any specific issue not related to patch on which it was running 16:36:13 <slaweq> so I would not bother with that too much for now 16:37:07 <ralonsoh> slaweq, if you find something suspicious, ping me 16:37:14 <slaweq> ralonsoh: sure, thx 16:37:27 <slaweq> any other findings from grafana? 16:38:05 <slaweq> ok, lets move on than 16:38:45 <slaweq> #topic fullstack/functional 16:38:59 <slaweq> for functional tests I found couple of issues 16:39:05 <slaweq> neutron.tests.functional.agent.linux.test_l3_tc_lib.TcLibTestCase.test_get_existing_filter_ids 16:39:11 <slaweq> https://99b34ecc9afda69f8d26-4c2619fbf66a72a5befb0d8c52d9c271.ssl.cf1.rackcdn.com/682418/7/check/neutron-functional-python27/fb1cc17/testr_results.html.gz - similar issue like we saw last week also, 16:39:46 <ralonsoh> but I think this is a problem in testcase library 16:39:48 <ralonsoh> in py2 16:40:17 <slaweq> but what is causing such problem? 16:40:25 <slaweq> we have to have some trigger for it IMO 16:40:46 <ralonsoh> slaweq, I tried to debug this but not too much 16:40:53 <ralonsoh> because is not happening in Py3 16:41:06 <slaweq> yes, I saw it twice in py2 job 16:41:29 <slaweq> maybe we can just live with it for few more weeks and than will be gone when we will get rid of py2 finally 16:42:02 <ralonsoh> that's the spirit: if a test fails, remove it!!!! 16:42:11 <slaweq> ralonsoh: LOL 16:42:12 <ralonsoh> +1 16:42:21 <slaweq> but don't tell anyone ;) 16:42:31 <bcafarel> so, use secret.review.openstack.org ? 16:42:38 <ralonsoh> hehehehe 16:42:38 <bcafarel> s/openstack/opendev/ 16:42:52 <slaweq> in the past I was proposing here to use https://pypi.org/project/pytest-vw/ module 16:43:04 <slaweq> but we are not using pytest so it's hard :/ 16:43:06 <slaweq> :D 16:43:42 <slaweq> ok, now seriously :) 16:43:49 <slaweq> another issue wchich I found 16:43:57 <slaweq> neutron.tests.functional.agent.linux.test_iptables.IptablesManagerTestCase.test_tcp_output 16:44:02 <slaweq> https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8fe/682418/7/gate/neutron-functional/8fe4fa5/testr_results.html.gz - looks like something new for me, needs to be checked, 16:44:15 <ralonsoh> I didn't find the problem there 16:44:34 <ralonsoh> IMO, the rule was never applied 16:45:06 <slaweq> ralonsoh: in this test_tcp_output ? 16:45:11 <ralonsoh> yes 16:45:23 <slaweq> so that is the problem, no? 16:45:28 <ralonsoh> yes 16:45:38 <ralonsoh> but I can't confirm this 100% 16:46:12 <slaweq> I will take a look into this more deeply this week 16:46:39 <slaweq> #action slaweq to invesigate issue with neutron.tests.functional.agent.linux.test_iptables.IptablesManagerTestCase.test_tcp_output 16:46:57 <slaweq> ok, another one: 16:46:59 <slaweq> neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_spawns_conflicting_pid_base_process 16:47:06 <slaweq> https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c52/683915/2/gate/neutron-functional-python27/c529449/testr_results.html.gz 16:47:15 <ralonsoh> yes this one 16:47:26 <ralonsoh> #link https://review.opendev.org/#/c/684249/ 16:47:39 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1845150 16:47:39 <openstack> Launchpad bug 1845150 in neutron "[FT] "keepalived" needs network interfaces configured as in its own config" [High,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:48:12 <slaweq> ralonsoh: are You sure this is real reason of failure? 16:48:25 <slaweq> I was checking journal log from this job today 16:48:39 <ralonsoh> I think so, the keepalived process stops when eth0 does not have an IP 16:49:11 <ralonsoh> and this may cause the test to fail 16:49:29 <ralonsoh> I found that when developing https://review.opendev.org/#/c/681671/ 16:50:00 <ralonsoh> the os.kill() process didn't work because the keepalived process was not there anymore 16:50:03 <slaweq> but I found in journal log many occurences of "Cannot find an IP address to use for interface" 16:50:11 <ralonsoh> I know 16:50:16 <slaweq> so it seems that in many other, passing tests it was the same 16:51:17 <ralonsoh> the problem is that keepalived does not stop working inmediatly 16:51:51 <slaweq> ahh, so some race 16:51:57 <ralonsoh> I think so 16:52:07 <ralonsoh> But, of course, there could be another problem 16:52:09 <slaweq> and in most cases test is finished before keepalived crash 16:52:13 <ralonsoh> but IMO this patch is legit 16:52:16 <slaweq> makes sense 16:52:24 <slaweq> yes, I just +2'ed this patch 16:53:06 <slaweq> ok, and the last one from functional tests: 16:53:08 <slaweq> neutron.tests.functional.agent.linux.test_bridge_lib.FdbInterfaceTestCase.test_add_delete(no_namespace) 16:53:18 <slaweq> https://9722deaa313ebebb56dc-c08b881decb3106ff13d720dd4a26025.ssl.cf5.rackcdn.com/681846/5/check/neutron-functional-python27/c3dc48a/testr_results.html.gz 16:53:26 <ralonsoh> again with those test cases (I'm the father) 16:53:36 <slaweq> LOL 16:53:47 <ralonsoh> I pushed a patch to make the interface names random 16:53:54 <ralonsoh> this should not happen 16:54:02 <ralonsoh> I'll take a look at this tomorrow 16:54:07 <slaweq> thx 16:54:29 <slaweq> #action ralonsoh to take a look at failing neutron.tests.functional.agent.linux.test_bridge_lib.FdbInterfaceTestCase.test_add_delete(no_namespace) 16:54:49 <slaweq> ok, that's all for functional tests from me 16:54:59 <slaweq> anything else You have regarding functional job? 16:55:37 <slaweq> ok, I take that as no 16:55:40 <slaweq> so fullstack 16:55:48 <slaweq> neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement 16:55:54 <slaweq> https://d3359cf0499b7fa3a209-a0ae01eb6742268974ea7eef585da77c.ssl.cf1.rackcdn.com/681846/5/check/neutron-fullstack/014ed7d/testr_results.html.gz 16:56:12 <slaweq> I think that here it would be good if rubasov could take a look maybe 16:56:47 <slaweq> I will ask him tomorrow if he will have some time to take a look 16:57:17 <slaweq> #action slaweq to ask rubasov if he can check neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement 16:57:31 <slaweq> and the last one for today: 16:57:33 <slaweq> neutron.tests.fullstack.test_l3_agent.TestLegacyL3Agent.test_mtu_update 16:57:37 <slaweq> https://df0eb3e2e26f1607f7d8-b5f72c94f829be93029a2756be493e29.ssl.cf2.rackcdn.com/679813/2/gate/neutron-fullstack/dfbde3f/testr_results.html.gz 16:58:19 <ralonsoh> I didn't find the error there, the L3 agent is setting the mtu in the interface 16:59:01 <slaweq> I will take a look deeply on that one during this week 16:59:14 <slaweq> #action slaweq to investigate neutron.tests.fullstack.test_l3_agent.TestLegacyL3Agent.test_mtu_update 16:59:22 <slaweq> we are almost out of time now 16:59:51 <slaweq> I just want to ask You for review of 2 patches https://review.opendev.org/#/c/683853/ and https://review.opendev.org/#/c/681607/ if You will have some time 16:59:55 <slaweq> thx in advance :) 17:00:00 <slaweq> and we are out of time 17:00:04 <slaweq> thx for attendind 17:00:07 <ralonsoh> bye 17:00:09 <slaweq> #endmeeting