15:00:11 <slaweq> #startmeeting neutron_ci
15:00:11 <openstack> Meeting started Wed Mar 18 15:00:11 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:15 <openstack> The meeting name has been set to 'neutron_ci'
15:00:16 <njohnston> o/
15:00:32 <bcafarel> o/
15:00:37 <ralonsoh> hi
15:00:51 <slaweq> ok, lets go :)
15:00:55 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:00:56 <slaweq> Please open now :)
15:01:29 <slaweq> #topic Actions from previous meetings
15:01:41 <slaweq> maciejjozefczyk to take a look at https://bugs.launchpad.net/neutron/+bug/1865453
15:01:42 <openstack> Launchpad bug 1865453 in neutron "neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestVirtualPorts.test_virtual_port_created_before fails randomly" [High,In progress] - Assigned to Maciej Jozefczyk (maciej.jozefczyk)
15:03:14 <slaweq> I just ping maciej to join here
15:03:19 <slaweq> maybe he has some update
15:04:32 <slaweq> ok, lets move on for now
15:04:36 <slaweq> maybe he will join later
15:04:47 <slaweq> ralonsoh to check "neutron_tempest_plugin.scenario.test_multicast.MulticastTestIPv4 failing often on deletion of server"
15:04:56 <ralonsoh> I tried but I didn't find anything
15:05:05 <ralonsoh> sorry, I can't find the cause of this error
15:05:07 <maciejjozefczyk> sorry, Im here
15:05:18 <ralonsoh> (let's go back to the previous one)
15:05:19 <maciejjozefczyk> forgot about '-3' suffix:)
15:05:48 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1865453
15:05:49 <openstack> Launchpad bug 1865453 in neutron "neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestVirtualPorts.test_virtual_port_created_before fails randomly" [High,In progress] - Assigned to Maciej Jozefczyk (maciej.jozefczyk)
15:05:53 <ralonsoh> maciejjozefczyk, ^^
15:06:00 <slaweq> thx ralonsoh
15:07:00 <maciejjozefczyk> ok, so. I tried to debug whats going on there
15:07:26 <maciejjozefczyk> proposed a change to retry on failed asserts
15:07:28 <maciejjozefczyk> #link https://review.opendev.org/#/c/712888/
15:08:03 <maciejjozefczyk> but anyways, even with those, we still have random failures of tests in that class, like 1 test per 20 runs of functional tests (based the experiment from the link)
15:08:16 <maciejjozefczyk> and actually I don't really know what causes that
15:08:17 <slaweq> maciejjozefczyk++ for hack with multiple functional jobs :)
15:08:53 <maciejjozefczyk> Does it hurt as very much? I mean on gates?
15:09:02 <maciejjozefczyk> if so, i vote for setting those tests as unstable for now
15:09:14 <maciejjozefczyk> cause we could spend hours on debugging it...
15:09:42 <maciejjozefczyk> and better solution would be to do virtual ports association in some more clever way, like setting a proper device_type
15:09:49 <maciejjozefczyk> in neutron API by octavia
15:09:57 <maciejjozefczyk> that will solve those failures instantly
15:10:20 <maciejjozefczyk> thats what I discussed with Lucas
15:10:28 <slaweq> maciejjozefczyk: so I'm fine with marking those tests as unstable for now, let's make other's life easier :)
15:10:40 <njohnston> maciejjozefczyk: That is pretty darn cool what you did in that change
15:11:02 <ralonsoh> +1 to make then unstable, for now
15:11:10 <ralonsoh> mark*
15:11:21 <maciejjozefczyk> njohnston, about multiplexing functionals?
15:11:23 <slaweq> maciejjozefczyk: will You propose patch to mark it as unstable?
15:11:26 <maciejjozefczyk> slaweq, yes
15:11:30 <slaweq> thx a lot
15:11:35 <njohnston> maciejjozefczyk: yes
15:11:41 <maciejjozefczyk> njohnston, creds for Jakub Libosvar :)
15:11:53 <slaweq> #action maciejjozefczyk to mark TestVirtualPorts functional tests as unstable for now
15:12:31 <slaweq> ok, so lets back to second action item
15:12:33 <slaweq> ralonsoh to check "neutron_tempest_plugin.scenario.test_multicast.MulticastTestIPv4 failing often on deletion of server"
15:12:49 <slaweq> You said that You couldn't find root cause of this
15:12:51 <ralonsoh> again, sorry for not finding abything
15:12:53 <slaweq> but is it still happening?
15:12:59 <slaweq> I didn't saw it this week
15:13:18 <ralonsoh> maybe (maybe) this is cause by another test executed in parallel
15:13:43 <ralonsoh> yes, I didn't see this test failing in the last 10 days
15:14:02 <slaweq> so lets hope it will not hit as again :D
15:14:08 <slaweq> and forget about it for now
15:14:13 <slaweq> do You agree with that?
15:14:15 <slaweq> :)
15:14:19 <ralonsoh> I'll freeze it for now
15:14:24 <slaweq> ++
15:15:48 <slaweq> ok, last one from last week
15:15:50 <slaweq> slaweq to prepare etherpad to track progress with ipv6-only testing goal
15:15:53 <slaweq> Etherpad https://etherpad.openstack.org/p/neutron-stadium-ipv6-testing
15:16:09 <slaweq> but I didn't had time to work on any of those patches so far
15:16:16 <slaweq> if anyone wants to help, would be great
15:16:31 <slaweq> and also I will add this etherpad to the stadium projects topic for next weeks
15:16:41 <slaweq> to not forget about it and to track what is still todo
15:16:45 <slaweq> are You ok with that?
15:17:22 <ralonsoh> yes
15:17:43 <bcafarel> sounds good
15:17:59 <njohnston> +1
15:18:22 <slaweq> ok
15:18:41 <njohnston> I'll try to work on it as well
15:18:45 <slaweq> thx njohnston
15:18:50 <slaweq> ok, lets move on
15:18:53 <slaweq> #topic Stadium projects
15:19:03 <slaweq> standardize on zuul v3
15:19:05 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop
15:19:10 <slaweq> I don't think there was any progress on that
15:19:37 <slaweq> but maybe I missed something
15:20:02 <njohnston> I did not see any progress either
15:20:23 <slaweq> ok, anything else regarding stadium projects and ci for today?
15:21:17 <maciejjozefczyk> yes
15:21:35 <maciejjozefczyk> In OVN octavia provider driver we're setting up devstack plugin and CI (for now non voting)
15:21:59 <maciejjozefczyk> #link https://review.opendev.org/#/c/708870/
15:22:14 <maciejjozefczyk> If you have a minute to check if that makes sense, would be great :)
15:23:35 <slaweq> maciejjozefczyk: noted, I will check it tomorrow morning
15:24:30 <slaweq> ok, lets move on
15:24:31 <slaweq> #topic Grafana
15:24:42 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:25:20 <maciejjozefczyk> slaweq, thanks
15:26:19 <slaweq> I checked today average number of rechecks from last days
15:26:20 <slaweq> Average number of rechecks in last weeks:
15:26:22 <slaweq> week 11 of 2020: 5.4
15:26:24 <slaweq> week 12 of 2020: 0.67
15:26:26 <slaweq> week 12 is now
15:26:30 <njohnston> wow nice
15:26:35 <slaweq> but it looks generally better than it was
15:26:57 <maciejjozefczyk> That means its better or we don't produce code :)?
15:27:04 <slaweq> and if we will fix some functional tests failures, should be even better
15:27:43 <slaweq> maciejjozefczyk: my script don't reports how many patches was merged during the week but I think that we do still produce code
15:28:19 <maciejjozefczyk> slaweq, ok :)
15:28:56 <slaweq> looking at grafana, tempest and neutron-tempest-plugin jobs looks pretty good
15:28:57 <bcafarel> I hope we still do :)
15:29:03 <slaweq> specially voting ones
15:29:24 <slaweq> the highest failure rates are still on functional job and grenade jobs
15:30:31 <slaweq> but I know that people from nova are working on this failure which causes many grenade failures so hopefully it will be better soon
15:31:05 <slaweq> and our functional job failures are mostly related to those ovn tests which maciejjozefczyk will mark as unstable for now
15:31:25 <slaweq> anything else You want to add regarding grafana?
15:32:31 <bcafarel> nothing here
15:32:48 <slaweq> ok, lets move on
15:32:56 <slaweq> #topic fullstack/functional
15:33:08 <slaweq> first I wanted to ask njohnston How it's going with Marking security group fullstack tests as stable - https://review.opendev.org/710782
15:33:21 <slaweq> should we try to merge this?
15:34:15 <njohnston> We have had 8 successes and 2 failures due to 'broken pipe' on ssh
15:34:32 <njohnston> and 1 failure due to an apt cache issue that was before the tests were even started
15:34:49 <slaweq> hmm, I'm affraid about this broken pipe failures
15:35:02 <slaweq> do You have link to such failure?
15:35:06 <njohnston> broken pipe example: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f5c/710782/3/check/neutron-fullstack/f5c09cb/testr_results.html
15:36:02 <njohnston> broken pipe 2: https://82f990e55ca70e41f871-36409e2f060733ae498eef8cd27f40f4.ssl.cf5.rackcdn.com/710782/3/check/neutron-fullstack/126cd0c/testr_results.html
15:36:16 <bcafarel> is it borken pipe, or _StringException before?
15:36:49 <bcafarel> (_StringException reminding me of py3 fixes some time ago)
15:36:50 <slaweq> IMHO broken pipe raised in netcat.test_connectivity()
15:37:01 <bcafarel> ok
15:37:06 <slaweq> so looks for me like we still have some issue there
15:37:33 <slaweq> lets not merge it yet, I will try to take a look into that
15:37:39 <slaweq> ok?
15:37:47 <bcafarel> +1
15:37:48 <njohnston> I am of the same mind
15:38:13 <slaweq> #action slaweq to investigate fullstack SG test broken pipe failures
15:39:18 <slaweq> ok
15:39:21 <slaweq> now functional tests
15:39:36 <slaweq> I found this week 2 ovn related failures which seems new for me
15:39:42 <slaweq> neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestVirtualPorts.test_virtual_port_delete_parents
15:39:43 <slaweq> https://1ad6f2e4d61e2bf5ff0b-20b98b64cfa6ea87451df6eaddafb782.ssl.cf5.rackcdn.com/712640/1/check/neutron-functional/17dbbf8/testr_results.html
15:39:45 <slaweq> neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log
15:39:47 <slaweq> https://4232a5777fe8521e933a-68bc071a5cbea1ed39a41226592204b6.ssl.cf5.rackcdn.com/712474/3/check/neutron-functional/3e06670/testr_results.html
15:39:57 <slaweq> can You take a look and check if that is something You already know?
15:40:22 <slaweq> this first one may be related to same issue like we discussed earlier with maciejjozefczyk
15:40:36 <slaweq> but assertion failure seems different here
15:40:48 <maciejjozefczyk> the test_ovn_db_sync is a different problem
15:41:04 <maciejjozefczyk> I can take a look
15:41:15 <slaweq> maciejjozefczyk: thx a lot
15:41:30 <slaweq> please open LP bug for it too, ok?
15:41:43 <maciejjozefczyk> slaweq, ok
15:42:31 <slaweq> ok
15:42:48 <slaweq> #action maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log
15:42:51 <slaweq> thx maciejjozefczyk
15:43:19 <slaweq> and one last thing regarding this topic from me
15:43:20 <maciejjozefczyk> slaweq, sure
15:44:00 <slaweq> I was looking into grafana dashboards and I think that fullstack-with-uwsgi job is as stable as fullstack job
15:44:11 <slaweq> should we maybe promote it to be voting?
15:44:47 <slaweq> I don't want to do the same with functional job for now as it's failing much more often (the same as non uwsgi functional job but both are failing too often)
15:45:06 <njohnston> +1 for making fullstack-uwsgi voting
15:45:18 <slaweq> dashboard is here: http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1&fullscreen&panelId=22
15:45:20 <slaweq> to check
15:45:50 <bcafarel> yes it looks good compared to fullstack
15:46:49 <slaweq> ok, I will propose patch for that
15:47:08 <slaweq> #action slaweq to change fullstack-with-uwsgi job to be voting
15:47:23 <slaweq> what about gating? should we add it to gate queue also, or not yet?
15:47:33 <slaweq> I think that we can but want to know Your opinion :)
15:48:51 <njohnston> Let's give it a couple of weeks voting and make sure we don't have buyer's remorse
15:49:06 <njohnston> no reason to hit the accelerator
15:49:13 <slaweq> ok
15:49:19 <slaweq> sounds reasonable
15:49:33 <bcafarel> also checking we do not have any uwsgi jobs in gate for now?
15:49:39 <slaweq> bcafarel: nope
15:49:48 <slaweq> we have only non-voting ones in check queue
15:50:52 <slaweq> ok, lets move on
15:50:54 <slaweq> #topic Tempest/Scenario
15:51:05 <slaweq> here I don't have any new issues
15:51:09 <slaweq> only one question
15:51:16 <slaweq> similar to the previous one
15:51:44 <slaweq> what do You think about making neutron-tempest-with-uwsgi job to be voting?
15:51:50 <slaweq> http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1&fullscreen&panelId=16
15:52:14 <slaweq> it also looks pretty stable
15:52:38 <slaweq> in last 30 days it was following other jobs basically and most of the time was belowe 10% of failures
15:52:45 <njohnston> +1 for this too
15:52:54 <bcafarel> more voting uwsgi jobs ++
15:52:59 <maciejjozefczyk> ++
15:52:59 <slaweq> :)
15:53:11 <slaweq> #action slaweq to make neutron-tempest-with-uwsgi voting too
15:53:23 <slaweq> ok, that was fast
15:53:31 <slaweq> so one last topic or today
15:53:33 <slaweq> #topic Periodic
15:53:53 <slaweq> I noticed that since few days our neutron-ovn-tempest-ovs-master-fedora job is failing with RETRY_LIMIT
15:54:10 <slaweq> if may be something related to fedora/devstack/infra
15:54:14 <slaweq> but we should check that
15:54:19 <slaweq> is there any volunteer?
15:54:47 <bcafarel> I can take a look (and grab ovn folks if help needed)
15:54:54 <slaweq> bcafarel: thx a lot
15:55:08 <slaweq> #action bcafarel to check neutron-ovn-tempest-ovs-master-fedora RETRY_LIMIT failures
15:55:22 <slaweq> ok, that's all from my side for today
15:55:34 <slaweq> is there anything else You want to talk about?
15:56:35 <maciejjozefczyk> bcafarel, ping me if you need any help about ovn
15:57:11 <bcafarel> :) maciejjozefczyk thanks will do (if it is not a generic fedora/infra issue)à
15:57:50 <slaweq> thx maciejjozefczyk :)
15:57:57 <slaweq> ok, I think we are done for today
15:58:00 <njohnston> o/
15:58:01 <slaweq> thx for atttending
15:58:06 <slaweq> #endmeeting