15:00:11 <slaweq> #startmeeting neutron_ci 15:00:11 <openstack> Meeting started Wed Mar 18 15:00:11 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:15 <openstack> The meeting name has been set to 'neutron_ci' 15:00:16 <njohnston> o/ 15:00:32 <bcafarel> o/ 15:00:37 <ralonsoh> hi 15:00:51 <slaweq> ok, lets go :) 15:00:55 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:00:56 <slaweq> Please open now :) 15:01:29 <slaweq> #topic Actions from previous meetings 15:01:41 <slaweq> maciejjozefczyk to take a look at https://bugs.launchpad.net/neutron/+bug/1865453 15:01:42 <openstack> Launchpad bug 1865453 in neutron "neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestVirtualPorts.test_virtual_port_created_before fails randomly" [High,In progress] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) 15:03:14 <slaweq> I just ping maciej to join here 15:03:19 <slaweq> maybe he has some update 15:04:32 <slaweq> ok, lets move on for now 15:04:36 <slaweq> maybe he will join later 15:04:47 <slaweq> ralonsoh to check "neutron_tempest_plugin.scenario.test_multicast.MulticastTestIPv4 failing often on deletion of server" 15:04:56 <ralonsoh> I tried but I didn't find anything 15:05:05 <ralonsoh> sorry, I can't find the cause of this error 15:05:07 <maciejjozefczyk> sorry, Im here 15:05:18 <ralonsoh> (let's go back to the previous one) 15:05:19 <maciejjozefczyk> forgot about '-3' suffix:) 15:05:48 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1865453 15:05:49 <openstack> Launchpad bug 1865453 in neutron "neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestVirtualPorts.test_virtual_port_created_before fails randomly" [High,In progress] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) 15:05:53 <ralonsoh> maciejjozefczyk, ^^ 15:06:00 <slaweq> thx ralonsoh 15:07:00 <maciejjozefczyk> ok, so. I tried to debug whats going on there 15:07:26 <maciejjozefczyk> proposed a change to retry on failed asserts 15:07:28 <maciejjozefczyk> #link https://review.opendev.org/#/c/712888/ 15:08:03 <maciejjozefczyk> but anyways, even with those, we still have random failures of tests in that class, like 1 test per 20 runs of functional tests (based the experiment from the link) 15:08:16 <maciejjozefczyk> and actually I don't really know what causes that 15:08:17 <slaweq> maciejjozefczyk++ for hack with multiple functional jobs :) 15:08:53 <maciejjozefczyk> Does it hurt as very much? I mean on gates? 15:09:02 <maciejjozefczyk> if so, i vote for setting those tests as unstable for now 15:09:14 <maciejjozefczyk> cause we could spend hours on debugging it... 15:09:42 <maciejjozefczyk> and better solution would be to do virtual ports association in some more clever way, like setting a proper device_type 15:09:49 <maciejjozefczyk> in neutron API by octavia 15:09:57 <maciejjozefczyk> that will solve those failures instantly 15:10:20 <maciejjozefczyk> thats what I discussed with Lucas 15:10:28 <slaweq> maciejjozefczyk: so I'm fine with marking those tests as unstable for now, let's make other's life easier :) 15:10:40 <njohnston> maciejjozefczyk: That is pretty darn cool what you did in that change 15:11:02 <ralonsoh> +1 to make then unstable, for now 15:11:10 <ralonsoh> mark* 15:11:21 <maciejjozefczyk> njohnston, about multiplexing functionals? 15:11:23 <slaweq> maciejjozefczyk: will You propose patch to mark it as unstable? 15:11:26 <maciejjozefczyk> slaweq, yes 15:11:30 <slaweq> thx a lot 15:11:35 <njohnston> maciejjozefczyk: yes 15:11:41 <maciejjozefczyk> njohnston, creds for Jakub Libosvar :) 15:11:53 <slaweq> #action maciejjozefczyk to mark TestVirtualPorts functional tests as unstable for now 15:12:31 <slaweq> ok, so lets back to second action item 15:12:33 <slaweq> ralonsoh to check "neutron_tempest_plugin.scenario.test_multicast.MulticastTestIPv4 failing often on deletion of server" 15:12:49 <slaweq> You said that You couldn't find root cause of this 15:12:51 <ralonsoh> again, sorry for not finding abything 15:12:53 <slaweq> but is it still happening? 15:12:59 <slaweq> I didn't saw it this week 15:13:18 <ralonsoh> maybe (maybe) this is cause by another test executed in parallel 15:13:43 <ralonsoh> yes, I didn't see this test failing in the last 10 days 15:14:02 <slaweq> so lets hope it will not hit as again :D 15:14:08 <slaweq> and forget about it for now 15:14:13 <slaweq> do You agree with that? 15:14:15 <slaweq> :) 15:14:19 <ralonsoh> I'll freeze it for now 15:14:24 <slaweq> ++ 15:15:48 <slaweq> ok, last one from last week 15:15:50 <slaweq> slaweq to prepare etherpad to track progress with ipv6-only testing goal 15:15:53 <slaweq> Etherpad https://etherpad.openstack.org/p/neutron-stadium-ipv6-testing 15:16:09 <slaweq> but I didn't had time to work on any of those patches so far 15:16:16 <slaweq> if anyone wants to help, would be great 15:16:31 <slaweq> and also I will add this etherpad to the stadium projects topic for next weeks 15:16:41 <slaweq> to not forget about it and to track what is still todo 15:16:45 <slaweq> are You ok with that? 15:17:22 <ralonsoh> yes 15:17:43 <bcafarel> sounds good 15:17:59 <njohnston> +1 15:18:22 <slaweq> ok 15:18:41 <njohnston> I'll try to work on it as well 15:18:45 <slaweq> thx njohnston 15:18:50 <slaweq> ok, lets move on 15:18:53 <slaweq> #topic Stadium projects 15:19:03 <slaweq> standardize on zuul v3 15:19:05 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 15:19:10 <slaweq> I don't think there was any progress on that 15:19:37 <slaweq> but maybe I missed something 15:20:02 <njohnston> I did not see any progress either 15:20:23 <slaweq> ok, anything else regarding stadium projects and ci for today? 15:21:17 <maciejjozefczyk> yes 15:21:35 <maciejjozefczyk> In OVN octavia provider driver we're setting up devstack plugin and CI (for now non voting) 15:21:59 <maciejjozefczyk> #link https://review.opendev.org/#/c/708870/ 15:22:14 <maciejjozefczyk> If you have a minute to check if that makes sense, would be great :) 15:23:35 <slaweq> maciejjozefczyk: noted, I will check it tomorrow morning 15:24:30 <slaweq> ok, lets move on 15:24:31 <slaweq> #topic Grafana 15:24:42 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:25:20 <maciejjozefczyk> slaweq, thanks 15:26:19 <slaweq> I checked today average number of rechecks from last days 15:26:20 <slaweq> Average number of rechecks in last weeks: 15:26:22 <slaweq> week 11 of 2020: 5.4 15:26:24 <slaweq> week 12 of 2020: 0.67 15:26:26 <slaweq> week 12 is now 15:26:30 <njohnston> wow nice 15:26:35 <slaweq> but it looks generally better than it was 15:26:57 <maciejjozefczyk> That means its better or we don't produce code :)? 15:27:04 <slaweq> and if we will fix some functional tests failures, should be even better 15:27:43 <slaweq> maciejjozefczyk: my script don't reports how many patches was merged during the week but I think that we do still produce code 15:28:19 <maciejjozefczyk> slaweq, ok :) 15:28:56 <slaweq> looking at grafana, tempest and neutron-tempest-plugin jobs looks pretty good 15:28:57 <bcafarel> I hope we still do :) 15:29:03 <slaweq> specially voting ones 15:29:24 <slaweq> the highest failure rates are still on functional job and grenade jobs 15:30:31 <slaweq> but I know that people from nova are working on this failure which causes many grenade failures so hopefully it will be better soon 15:31:05 <slaweq> and our functional job failures are mostly related to those ovn tests which maciejjozefczyk will mark as unstable for now 15:31:25 <slaweq> anything else You want to add regarding grafana? 15:32:31 <bcafarel> nothing here 15:32:48 <slaweq> ok, lets move on 15:32:56 <slaweq> #topic fullstack/functional 15:33:08 <slaweq> first I wanted to ask njohnston How it's going with Marking security group fullstack tests as stable - https://review.opendev.org/710782 15:33:21 <slaweq> should we try to merge this? 15:34:15 <njohnston> We have had 8 successes and 2 failures due to 'broken pipe' on ssh 15:34:32 <njohnston> and 1 failure due to an apt cache issue that was before the tests were even started 15:34:49 <slaweq> hmm, I'm affraid about this broken pipe failures 15:35:02 <slaweq> do You have link to such failure? 15:35:06 <njohnston> broken pipe example: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f5c/710782/3/check/neutron-fullstack/f5c09cb/testr_results.html 15:36:02 <njohnston> broken pipe 2: https://82f990e55ca70e41f871-36409e2f060733ae498eef8cd27f40f4.ssl.cf5.rackcdn.com/710782/3/check/neutron-fullstack/126cd0c/testr_results.html 15:36:16 <bcafarel> is it borken pipe, or _StringException before? 15:36:49 <bcafarel> (_StringException reminding me of py3 fixes some time ago) 15:36:50 <slaweq> IMHO broken pipe raised in netcat.test_connectivity() 15:37:01 <bcafarel> ok 15:37:06 <slaweq> so looks for me like we still have some issue there 15:37:33 <slaweq> lets not merge it yet, I will try to take a look into that 15:37:39 <slaweq> ok? 15:37:47 <bcafarel> +1 15:37:48 <njohnston> I am of the same mind 15:38:13 <slaweq> #action slaweq to investigate fullstack SG test broken pipe failures 15:39:18 <slaweq> ok 15:39:21 <slaweq> now functional tests 15:39:36 <slaweq> I found this week 2 ovn related failures which seems new for me 15:39:42 <slaweq> neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestVirtualPorts.test_virtual_port_delete_parents 15:39:43 <slaweq> https://1ad6f2e4d61e2bf5ff0b-20b98b64cfa6ea87451df6eaddafb782.ssl.cf5.rackcdn.com/712640/1/check/neutron-functional/17dbbf8/testr_results.html 15:39:45 <slaweq> neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log 15:39:47 <slaweq> https://4232a5777fe8521e933a-68bc071a5cbea1ed39a41226592204b6.ssl.cf5.rackcdn.com/712474/3/check/neutron-functional/3e06670/testr_results.html 15:39:57 <slaweq> can You take a look and check if that is something You already know? 15:40:22 <slaweq> this first one may be related to same issue like we discussed earlier with maciejjozefczyk 15:40:36 <slaweq> but assertion failure seems different here 15:40:48 <maciejjozefczyk> the test_ovn_db_sync is a different problem 15:41:04 <maciejjozefczyk> I can take a look 15:41:15 <slaweq> maciejjozefczyk: thx a lot 15:41:30 <slaweq> please open LP bug for it too, ok? 15:41:43 <maciejjozefczyk> slaweq, ok 15:42:31 <slaweq> ok 15:42:48 <slaweq> #action maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log 15:42:51 <slaweq> thx maciejjozefczyk 15:43:19 <slaweq> and one last thing regarding this topic from me 15:43:20 <maciejjozefczyk> slaweq, sure 15:44:00 <slaweq> I was looking into grafana dashboards and I think that fullstack-with-uwsgi job is as stable as fullstack job 15:44:11 <slaweq> should we maybe promote it to be voting? 15:44:47 <slaweq> I don't want to do the same with functional job for now as it's failing much more often (the same as non uwsgi functional job but both are failing too often) 15:45:06 <njohnston> +1 for making fullstack-uwsgi voting 15:45:18 <slaweq> dashboard is here: http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1&fullscreen&panelId=22 15:45:20 <slaweq> to check 15:45:50 <bcafarel> yes it looks good compared to fullstack 15:46:49 <slaweq> ok, I will propose patch for that 15:47:08 <slaweq> #action slaweq to change fullstack-with-uwsgi job to be voting 15:47:23 <slaweq> what about gating? should we add it to gate queue also, or not yet? 15:47:33 <slaweq> I think that we can but want to know Your opinion :) 15:48:51 <njohnston> Let's give it a couple of weeks voting and make sure we don't have buyer's remorse 15:49:06 <njohnston> no reason to hit the accelerator 15:49:13 <slaweq> ok 15:49:19 <slaweq> sounds reasonable 15:49:33 <bcafarel> also checking we do not have any uwsgi jobs in gate for now? 15:49:39 <slaweq> bcafarel: nope 15:49:48 <slaweq> we have only non-voting ones in check queue 15:50:52 <slaweq> ok, lets move on 15:50:54 <slaweq> #topic Tempest/Scenario 15:51:05 <slaweq> here I don't have any new issues 15:51:09 <slaweq> only one question 15:51:16 <slaweq> similar to the previous one 15:51:44 <slaweq> what do You think about making neutron-tempest-with-uwsgi job to be voting? 15:51:50 <slaweq> http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1&fullscreen&panelId=16 15:52:14 <slaweq> it also looks pretty stable 15:52:38 <slaweq> in last 30 days it was following other jobs basically and most of the time was belowe 10% of failures 15:52:45 <njohnston> +1 for this too 15:52:54 <bcafarel> more voting uwsgi jobs ++ 15:52:59 <maciejjozefczyk> ++ 15:52:59 <slaweq> :) 15:53:11 <slaweq> #action slaweq to make neutron-tempest-with-uwsgi voting too 15:53:23 <slaweq> ok, that was fast 15:53:31 <slaweq> so one last topic or today 15:53:33 <slaweq> #topic Periodic 15:53:53 <slaweq> I noticed that since few days our neutron-ovn-tempest-ovs-master-fedora job is failing with RETRY_LIMIT 15:54:10 <slaweq> if may be something related to fedora/devstack/infra 15:54:14 <slaweq> but we should check that 15:54:19 <slaweq> is there any volunteer? 15:54:47 <bcafarel> I can take a look (and grab ovn folks if help needed) 15:54:54 <slaweq> bcafarel: thx a lot 15:55:08 <slaweq> #action bcafarel to check neutron-ovn-tempest-ovs-master-fedora RETRY_LIMIT failures 15:55:22 <slaweq> ok, that's all from my side for today 15:55:34 <slaweq> is there anything else You want to talk about? 15:56:35 <maciejjozefczyk> bcafarel, ping me if you need any help about ovn 15:57:11 <bcafarel> :) maciejjozefczyk thanks will do (if it is not a generic fedora/infra issue)à 15:57:50 <slaweq> thx maciejjozefczyk :) 15:57:57 <slaweq> ok, I think we are done for today 15:58:00 <njohnston> o/ 15:58:01 <slaweq> thx for atttending 15:58:06 <slaweq> #endmeeting