#openstack-meeting-3 log

15:00:57 <slaweq> #startmeeting neutron_ci
15:00:58 <openstack> Meeting started Wed Mar 25 15:00:57 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:59 <slaweq> hi
15:01:00 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:02 <openstack> The meeting name has been set to 'neutron_ci'
15:01:02 <ralonsoh> hi
15:01:20 <bcafarel> hello
15:01:24 <maciejjozefczyk> hey
15:01:51 <slaweq> ok, lets start
15:01:53 <slaweq> #topic Actions from previous meetings
15:02:00 <slaweq> first one
15:02:01 <slaweq> maciejjozefczyk to mark TestVirtualPorts functional tests as unstable for now
15:02:09 <njohnston> o/
15:02:40 <maciejjozefczyk> slaweq, thats done
15:02:55 <slaweq> thx maciejjozefczyk :)
15:02:57 <maciejjozefczyk> #link https://review.opendev.org/#/c/713860/
15:03:18 <slaweq> ok, next one
15:03:20 <slaweq> slaweq to investigate fullstack SG test broken pipe failures
15:03:33 <slaweq> and unfortunatelly I didn't had time to check it
15:03:35 <slaweq> sorry
15:03:38 <slaweq> #action slaweq to investigate fullstack SG test broken pipe failures
15:03:43 <slaweq> I will try this week
15:04:06 <slaweq> next one
15:04:08 <slaweq> maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log
15:05:07 <maciejjozefczyk> #link https://bugs.launchpad.net/neutron/+bug/1868110
15:05:08 <openstack> Launchpad bug 1868110 in neutron "[OVN] neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log randomly fails" [High,Confirmed] - Assigned to Maciej Jozefczyk (maciej.jozefczyk)
15:05:31 <maciejjozefczyk> I tried to figure it out, but for now I wasn't able to reproduce it, but I didn't spend too much time on it.
15:05:45 <maciejjozefczyk> This week I'm gonna spend more time on it.
15:06:04 <slaweq> today I saw yet another similar issue https://ab00d9261534a206496f-e4dcf05f554cd2b2192f6b35230c9943.ssl.cf1.rackcdn.com/708985/8/gate/neutron-functional/2132176/testr_results.html
15:07:36 <maciejjozefczyk> slaweq, that looks like different test in the same clas
15:07:38 <maciejjozefczyk> class*
15:07:58 <maciejjozefczyk> thanks for the link
15:09:30 <slaweq> #action maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log
15:09:42 <slaweq> next one
15:09:48 <slaweq> even next 2 :)
15:09:50 <slaweq> slaweq to change fullstack-with-uwsgi job to be voting
15:09:51 <slaweq> and
15:09:55 <slaweq> slaweq to make neutron-tempest-with-uwsgi voting too
15:10:00 <slaweq> both done in same patch
15:10:05 <slaweq> Patch https://review.opendev.org/714917
15:10:37 <bcafarel> and both passing (luckily)
15:11:16 <slaweq> yes :)
15:11:33 <slaweq> so please review it :)
15:11:43 <slaweq> and the last one from previous week
15:11:45 <slaweq> bcafarel to check neutron-ovn-tempest-ovs-master-fedora RETRY_LIMIT failures
15:12:01 * bcafarel looks for LP link
15:12:24 <bcafarel> root cause was new fedora image did not have directory created for cache, infra++ folks quickly fixed it
15:12:32 <bcafarel> #link https://bugs.launchpad.net/devstack/+bug/1868076
15:12:33 <openstack> Launchpad bug 1868076 in devstack "Fedora jobs fail in setup-devstack-cache: find: â€˜/opt/cache/filesâ€™: No such file or directory" [Undecided,Fix released]
15:12:50 <slaweq> thx bcafarel and infra-root :)
15:13:56 <slaweq> ok, lets move on
15:13:58 <slaweq> #topic Stadium projects
15:14:14 <slaweq> njohnston: any updates about migration to zuulv3?
15:14:47 <njohnston> I think it's the same as yesterday's team meeting - midonet, odl, and one change in bagpipe
15:15:04 <njohnston> but I have been in meetings continuously to this point so I have not had a chance to check today, sorry
15:15:18 <slaweq> ok
15:15:21 <slaweq> no problem
15:15:29 <slaweq> that isn't urgent for now :)
15:15:47 <slaweq> about IPv6-only deployments, I don't have any updates too
15:16:05 <slaweq> do You have anything else regarding stadium projects and ci for today?
15:16:31 <njohnston> nope!  Not aware of any pernicious issues.
15:17:25 <slaweq> ok, so lets move on
15:17:33 <slaweq> #topic Grafana
15:17:39 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:17:47 <slaweq> (sorry that I forgot it at the beginning)
15:18:22 <slaweq> Average number of rechecks in last weeks:
15:18:24 <slaweq> week 12 of 2020: 0.74
15:18:26 <slaweq> week 13 of 2020: 0.67
15:18:36 <slaweq> those numners from this and previous week looks really good :)
15:19:26 <slaweq> and when looking at grafana, it looks for me that we are in pretty good shape currently
15:19:38 <maciejjozefczyk> \o/
15:19:50 <slaweq> still same issues which we had, but fortunatelly mostly with non-voting jobs :)
15:21:50 <bcafarel> one step at a time, stabler voting jobs is already great!
15:22:12 <slaweq> bcafarel: indeed :)
15:22:49 <njohnston> +1
15:23:30 <slaweq> and, as our gate is working pretty good, I don't have many examples of new failures to discuss today
15:23:53 <slaweq> #topic fullstack/functional
15:24:01 <slaweq> I saw again issue in neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase
15:24:06 <slaweq> https://116f9c4d7ae86f7f062e-eb80c0e4aee14229f3b21e03b2a44f1c.ssl.cf2.rackcdn.com/712446/1/check/neutron-functional-with-uwsgi/cdb2795/testr_results.html
15:24:16 <slaweq> ralonsoh: maybe You will want to take a look ^^ :)
15:24:20 <ralonsoh> yes
15:24:42 <ralonsoh> but this problem is, again, in the ns deletion
15:24:43 <ralonsoh> pfffff
15:24:54 <slaweq> yep :/
15:25:06 <slaweq> IIRC it's always in ns deletion
15:25:16 <ralonsoh> ok, I'll take care of it
15:25:36 <slaweq> maybe we should add some "safe_cleanup" method in test
15:25:49 <slaweq> catch timeout there and retry
15:26:25 <ralonsoh> the problem is in the privsep method and the eventlet library
15:26:43 <ralonsoh> if, for any reason, we give the GIL, the timeout will happen
15:26:56 <ralonsoh> (I'll check this later)
15:27:35 <slaweq> ok, thx
15:27:47 <slaweq> may I add action with this for You?
15:28:45 <ralonsoh> sure
15:29:09 <slaweq> #action ralonsoh to check (again) issue with ns deletion in neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase
15:29:40 <slaweq> ok, and that's all from me regarding functional/fullstack jobs
15:29:53 <slaweq> do You have anything else You want to discuss, related to those jobs?
15:30:08 <njohnston> The security group tests for fullstack marked as stable - https://review.opendev.org/#/c/710782/
15:30:53 <njohnston> We have rechecked it 5 times since I removed the iptables-hybrid scenario and the fullstack job has not failed
15:31:04 <ralonsoh> so we drop LB testing
15:31:37 <njohnston> sorry I said iptables hybrid, I meant LB
15:31:52 <slaweq> but should we really drop this LB scenario from it?
15:31:56 <njohnston> yes, I think it's better to test the two other scenarios than to not test any
15:32:27 <slaweq> can You maybe just comment this scenario and add TODO to bring it back when it will be fixed?
15:32:33 <njohnston> sure thing
15:32:36 <ralonsoh> +1
15:32:54 <slaweq> thx
15:33:01 <slaweq> and I will +2 it :)
15:34:34 <slaweq> anything else related to fullstack/functional or we can move on?
15:36:08 <njohnston> go ahead
15:36:10 <slaweq> ok, lets move on
15:36:13 <slaweq> #topic Tempest/Scenario
15:36:25 <slaweq> here I also have only 1 issue to mention
15:36:47 <slaweq> I spotted again issue with server termination on multicast test
15:36:49 <slaweq> https://103900b4a03cdd60217d-625a0eb0440aa527fbdb216e8991f5a6.ssl.cf5.rackcdn.com/714726/1/check/neutron-ovn-tempest-ovs-release/2b52c20/testr_results.html
15:37:54 <slaweq> ralonsoh: do You want to take a look at it or do You want me to check it this week?
15:38:18 <ralonsoh> I don't know if I'll have time this week
15:38:23 <ralonsoh> sorry
15:38:25 <bcafarel> I think I saw it pop on few stable backports too (at least I remember typing "recheck test_multicast_between_vms_on_same_network")
15:38:41 <slaweq> ok, I will check that one
15:38:55 <slaweq> #action slaweq to check server termination on multicast test
15:39:06 <slaweq> bcafarel: good to know that it's not only on master branch :)
15:39:35 <slaweq> and that's all from me regarding tempest jobs
15:39:41 <slaweq> do You have anything else to discuss?
15:41:16 <slaweq> ok, so lets move on
15:41:26 <slaweq> one last thing from me or today
15:41:28 <slaweq> #topic Periodic
15:41:34 <slaweq> neutron-tempest-postgres-full - failing since 18.03 constantly
15:41:35 <slaweq> Errors like http://zuul.openstack.org/build/95cb0e8c4c664727a2235bf5ee8d76ca/log/controller/logs/screen-q-svc.txt?severity=4 every day
15:42:52 <slaweq> it's failing only on postgresql
15:43:02 <ralonsoh> another postgresql incompatibility? in our ORM definition
15:43:09 <slaweq> and IMO some db expert should take a look
15:43:11 <ralonsoh> that happened before
15:44:15 <slaweq> it's either one of our patches merged around 17.03 or some pgsql update
15:44:22 <slaweq> I will open LP for that
15:44:40 <slaweq> and I will check if PG versions didn't changed recently in jobs
15:45:01 <slaweq> but I don't think I will have time to check something more with this issue
15:46:17 <slaweq> #action slaweq to report LP about PGSQL periodic job failures
15:46:41 <slaweq> if You have any experience with postgresql, and some cycles, You are more than welcome to check it :)
15:47:09 <slaweq> ok, and that's all on my side for today
15:47:16 <slaweq> #topic Open discussion
15:47:27 <slaweq> do You have anything else related to our ci to discuss today?
15:47:32 <ralonsoh> no
15:48:26 <bcafarel> all good here
15:49:45 <slaweq> ok, so I will give You few minutes back :)
15:49:49 <slaweq> thx for attending
15:49:49 <bcafarel> yay
15:49:55 <bcafarel> o/
15:49:56 <slaweq> and have a great week
15:49:58 <slaweq> o/
15:50:01 <slaweq> #endmeeting