15:00:28 <slaweq> #startmeeting neutron_ci 15:00:29 <openstack> Meeting started Wed Apr 22 15:00:28 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:30 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:30 <slaweq> hi 15:00:32 <openstack> The meeting name has been set to 'neutron_ci' 15:01:49 <bcafarel> o/ 15:01:51 <ralonsoh> hi 15:01:54 <maciejjozefczyk> hey 15:02:05 <njohnston> o/ 15:02:44 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:51 <slaweq> ok, lets start 15:02:55 <slaweq> #topic Actions from previous meetings 15:03:01 <slaweq> ralonsoh: check ovn jobs failure 15:03:17 <slaweq> fix is already merged: https://review.opendev.org/#/c/720248/ 15:03:19 <slaweq> thx ralonsoh :) 15:03:24 <ralonsoh> yw 15:03:36 <slaweq> second one 15:03:38 <slaweq> slaweq: ping yamamoto about midonet gate problems 15:03:47 <slaweq> I still don't have any reply from yamamoto 15:04:09 <slaweq> I will keep trying and maybe take a look at those broken UTs if I will have few minutes 15:04:22 <slaweq> and that's all for actions from last week 15:04:40 <slaweq> #topic Stadium projects 15:04:48 <slaweq> standardize on zuul v3 15:04:50 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 15:05:05 <slaweq> I don't think there was any update on this last week 15:05:20 <njohnston> https://review.opendev.org/#/c/672925 is failing on the networking-odl-functional jobs,. I am looking into it 15:05:57 <slaweq> njohnston: I just wanted to mention to try https://review.opendev.org/#/c/715439/3 but I think You already did 15:06:44 <slaweq> so with this patch only midonet will still be not done, right? 15:06:47 <njohnston> Lajos did that in PS 31 I think 15:06:53 <njohnston> slaweq: Yes I believe so 15:07:03 <slaweq> that's great 15:07:18 <slaweq> thx njohnston for update 15:07:53 <slaweq> according to CI issues in stadium projects, we have still this midonet bug 15:08:18 <slaweq> other than that I think that stadium projects are running pretty fine 15:08:46 <slaweq> do You have anything to add/ask regarding stadium? 15:08:53 <njohnston> nothing here 15:09:31 <bcafarel> small change in neutron-tempest-plugin, some jobs that were still using all-plugin tox target now use the standard "all" 15:09:55 <bcafarel> (fixing this deprecated one was needed to make https://review.opendev.org/#/c/721277/ passing) 15:09:59 <slaweq> bcafarel: and that will help us to not have issues like we had with Stein jobs recently, right? 15:10:28 <bcafarel> yes, and hopefully should not have any side effect :) 15:10:42 <slaweq> bcafarel++ thx 15:11:26 <slaweq> ok, and with that I think we can move to the next topic which is 15:11:27 <slaweq> #topic Stable branches 15:11:29 <slaweq> :) 15:11:35 <slaweq> Train dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 15:11:37 <slaweq> Stein dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 15:12:42 <slaweq> except this issue with neutron-tempest-plugin jobs on Stein it looks ok for me 15:12:55 <bcafarel> yep and that stein fix is now in 15:13:54 <slaweq> yes, I saw :) thx once again for taking care of this 15:14:56 <slaweq> ok, I think we can move on to next topic, right? 15:15:28 <njohnston> +1 15:15:58 <bcafarel> yes 15:16:03 <slaweq> #topic Grafana 15:17:13 <slaweq> in overall I think that it looks ok here too 15:17:18 <njohnston> BTW what is with the points appearing in the "Number of integrated Tempest jobs runs (Gate queue)" for the "24 hours" line? Seems like an error in the grafana config. 15:18:27 <slaweq> njohnston: are You asking about lack of data there? 15:19:07 <njohnston> "24 hours" doesn't sound like a zuul job name 15:19:33 <njohnston> so I don't know why it would be in the data set 15:19:49 <bcafarel> hmm I don't have it, refreshing 15:20:00 <slaweq> me neighter :) 15:20:06 <ralonsoh> nope 15:20:12 <njohnston> weird, I just refreshed it and it went away 15:20:14 <njohnston> that is super weird 15:20:21 <njohnston> ok, never mind :-) 15:20:22 <bcafarel> :) you scared it away 15:20:27 <slaweq> LOL 15:20:30 <ralonsoh> zuul! the source of problems and solutions 15:20:49 <slaweq> ok, so we at least solved one problem today ;P 15:21:24 <njohnston> LOL 15:21:44 <slaweq> from the other things, I saw today that we have a lot of non-voting jobs in check queue 15:22:00 <slaweq> maybe we should think about promoting some of them to be voting? 15:22:17 <slaweq> I'm not saying about doing it now but after we will cut stable/ussuri 15:22:20 <njohnston> I was going to suggest that openstacksdk-functional-devstack-networking might be a good candidate 15:22:31 <ralonsoh> right 15:22:43 <ralonsoh> can we wait until V? 15:22:52 <slaweq> ralonsoh: yes :) 15:23:13 <njohnston> and openstack-tox-py38 is another good candidate. No reason to move quickly though, I agree it would be good to wait until V 15:23:13 <ralonsoh> +1 to the idea 15:23:16 <slaweq> in next weeks I will prepare some data and proposals about what we can promote 15:23:23 <ralonsoh> agree 15:23:31 <bcafarel> sounds good some of these look interesting 15:23:33 <slaweq> thx 15:23:45 <bcafarel> like neutron-tempest-with-uwsgi (just quick pick in the list) 15:24:35 <njohnston> neutron-ovn-tempest-slow has <10% failures for some time it looks like 15:25:01 <slaweq> bcafarel: this one is actually already voting IIRC, we just still didn't merge https://review.opendev.org/#/c/718392/ 15:25:36 <bcafarel> oh I thought I had seen it going in nvm then 15:25:56 <bcafarel> ok I just need to have my eyes sight, it was just Andreas' +2 15:26:04 <slaweq> :) 15:26:24 <slaweq> I added frickler to it today so hopefully he will check it soon 15:27:22 <slaweq> ok, I think we can continue with other topics now 15:27:24 <slaweq> #topic fullstack/functional 15:27:44 <slaweq> today I found one new example of timeout in neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase 15:27:49 <slaweq> https://a1c2986e7388db3a1401-541b7a48fdccc7de277eccd1d7d5bce5.ssl.cf5.rackcdn.com/718690/2/check/neutron-functional/9f51d60/testr_results.html 15:27:56 <slaweq> but this time it's during creation of interface 15:28:06 <slaweq> ralonsoh: does it rings a bell for You? 15:28:34 <ralonsoh> this is during the interface creation, not the namespace 15:28:37 <slaweq> isn't that this GIL related issue which You were fixing recently? 15:28:39 <ralonsoh> but I can take a look at it 15:28:42 <ralonsoh> nope 15:29:28 <slaweq> ok, I though that maybe it may be same/similar root cause 15:29:36 <slaweq> thx for taking care of it :) 15:29:54 <slaweq> #action ralonsoh to check timeout during interface creation in functional tests 15:30:33 <slaweq> and I also found 2 ovn related issues in functional tests: 15:30:35 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_abb/717851/2/gate/neutron-functional/abb91cb/testr_results.html 15:30:38 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e2c/717083/6/check/neutron-functional/e2c63e4/testr_results.html 15:30:58 <slaweq> maciejjozefczyk: does it rings a bell for You maybe? 15:31:26 <maciejjozefczyk> slaweq, looks like the same story that the timeout change should solve 15:31:40 <slaweq> maciejjozefczyk: do You have link to patch? 15:32:13 <maciejjozefczyk> #link https://review.opendev.org/#/c/717704/ 15:32:37 <maciejjozefczyk> but looks like it is still the case or something smiliar 15:33:24 <slaweq> maciejjozefczyk: yes, at least one of those failures is from this week 15:33:35 <slaweq> so both probably have this timeouts patch already 15:34:28 <maciejjozefczyk> I'm gonna take a look on those two 15:34:45 <slaweq> thx maciejjozefczyk 15:34:54 <maciejjozefczyk> and reopen bug: https://bugs.launchpad.net/neutron/+bug/1868110 15:34:54 <openstack> Launchpad bug 1868110 in neutron "[OVN] neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log randomly fails" [High,Fix released] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) 15:34:59 <slaweq> #action maciejjozefczyk to take a look at ovn related functional test failures 15:35:25 <slaweq> and that's all from me regarding functional/fullstack tests 15:35:32 <slaweq> anything else You want to add? 15:35:54 <ralonsoh> no 15:37:10 <slaweq> so lets move on 15:37:12 <slaweq> #topic Tempest/Scenario 15:37:19 <slaweq> first of all 15:37:24 <slaweq> I proposed patch https://review.opendev.org/#/c/721805/ to enable l3_ha in scenario jobs 15:37:38 <slaweq> IMO we lack of L3 HA coverage in our CI 15:37:48 <slaweq> and that would be easy way to have it covered somehow 15:38:07 <slaweq> even if those are singlenode jobs, it will spawn keepalived, and all that stuff for each router 15:38:44 <ralonsoh> we'll catch "functional" problems with keepalived 15:38:50 <ralonsoh> +1 15:38:54 <slaweq> yep 15:39:15 <njohnston> very good 15:39:16 <bcafarel> sounds good, does that reduce coverage on non-l3 ha? 15:39:17 <slaweq> in fact I got this idea when I did similar patch for tripleo standalone job yesterday 15:39:31 <slaweq> and I found new bug with keepalived 2.x with it immediately :) 15:39:35 <bcafarel> (not the code part I know the best) 15:39:53 <slaweq> bcafarel: we still have tempest jobs for legacy, non-ha routers 15:40:13 <bcafarel> ok then no objection at all! 15:40:22 <slaweq> but IMO we should focus more on testing L3HA as IMO it's more used than legacy routers 15:40:42 <njohnston> agreed, definitely 15:41:01 <slaweq> thx for supporting this :) 15:41:27 <slaweq> from other things related to scenario jobs, I found one issue recently 15:41:32 <slaweq> in neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle - timeout while waiting for port to be ACTIVE 15:41:37 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d8c/717851/2/check/neutron-ovn-tempest-ovs-release/d8c0282/testr_results.html 15:42:13 <slaweq> anyone wants to take a look into this? 15:42:37 <slaweq> it's in ovn job 15:42:43 <maciejjozefczyk> slaweq, I think Jakub was checking it in d/s 15:43:21 <maciejjozefczyk> slaweq, I'll create a lp and check with Jakub 15:43:27 <slaweq> maciejjozefczyk: thx a lot 15:44:01 <slaweq> #action maciejjozefczyk to report LP regarding failing neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle in neutron-ovn-tempest-ovs-release job 15:44:21 <slaweq> ok, and that's all what I have for today 15:44:29 <slaweq> periodic jobs are working fine 15:44:41 <slaweq> anything else You want to discuss today? 15:46:02 <slaweq> if not, I think we can finish a bit earlier today 15:46:07 <slaweq> thx for attending 15:46:09 <slaweq> o/ 15:46:11 <bcafarel> o/ 15:46:13 <slaweq> #endmeeting