#openstack-meeting-3 log

15:00:19 <slaweq> #startmeeting neutron_ci
15:00:20 <openstack> Meeting started Wed Aug  5 15:00:19 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:23 <openstack> The meeting name has been set to 'neutron_ci'
15:00:36 <ralonsoh> hi
15:00:50 <ralonsoh> (lot of fun last weeks in the CI)
15:01:09 <slaweq> hi
15:01:18 <slaweq> ralonsoh: yes :/
15:01:23 <bcafarel> o/
15:02:38 <slaweq> ok, lets start, maybe others will join
15:02:45 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:58 <slaweq> please open and lets go with the meeting :)
15:03:04 <slaweq> #topic Actions from previous meetings
15:03:13 <slaweq> we have only one for today
15:03:15 <slaweq> bcafarel to clean stable branches jobs from *-master jobs
15:03:42 <bcafarel> https://review.opendev.org/#/c/743795/ is up for ussuri, I will push older ones when it looks good
15:04:15 <slaweq> thx bcafarel
15:04:19 <slaweq> I will review it today
15:04:37 <slaweq> ok, so lets move on
15:04:39 <slaweq> next topic
15:04:44 <slaweq> #topic Switch to Ubuntu Focal
15:05:00 <slaweq> Etherpad: https://etherpad.opendev.org/p/neutron-victoria-switch_to_focal
15:05:16 <slaweq> I made progress with functional and fullstack jobs
15:05:22 <maciejjozefczyk> \o
15:05:28 <slaweq> patch  https://review.opendev.org/#/c/734304/: should be ok with https://review.opendev.org/#/c/744500/
15:05:45 <ralonsoh> +1
15:06:14 <slaweq> but we should be aware that linuxbridge driver can't work now with ebtables-nft
15:06:25 <slaweq> we have to use legacy implementation to make it working
15:06:38 <slaweq> maybe some additional docs update will be needed too
15:08:18 <ralonsoh> (are we going to deprecate LB soon?)
15:08:24 <slaweq> ralonsoh: nope
15:08:26 <ralonsoh> ok
15:08:44 <slaweq> for other, tempest related jobs I will try to check this week
15:09:23 <slaweq> that's all from me about Ubuntu 20.04
15:09:33 <slaweq> do You have anything else to add?
15:09:41 <ralonsoh> no
15:09:47 <ralonsoh> well, the wsgi jobs
15:10:01 <ralonsoh> but maybe in following sections
15:10:03 <slaweq> ralonsoh: I added Your topic to the functional/fullstack section
15:10:05 <slaweq> :)
15:10:09 <ralonsoh> thanks
15:10:15 <bcafarel> other tempest jobs looked OK in https://review.opendev.org/#/c/738163/ so we should be OK there
15:10:45 <bcafarel> (at least the voting ones)
15:11:00 <slaweq> bcafarel: that's good
15:11:02 <slaweq> thx
15:11:22 <slaweq> so now it seems we need to ask gmann about his patch https://review.opendev.org/#/c/734700/
15:11:32 <slaweq> as our depends on it
15:12:52 <bcafarel> it will probably get in when more projects look ready to go
15:13:00 <bcafarel> hopefully neutron will be in that list soon!
15:13:17 <slaweq> ahh, ok
15:14:02 <slaweq> ok, lets move on to the next topic then
15:14:04 <slaweq> #topic Stadium projects
15:14:15 <slaweq> here we still have this one goal:
15:14:17 <slaweq> standardize on zuul v3
15:14:19 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop
15:14:34 <slaweq> but I didn't had time to look into those old jobs yet
15:15:13 <slaweq> if anyone has any cycles please take a look into those too :)
15:15:34 <slaweq> and that's all from my side about stadium projects
15:15:43 <slaweq> anything else You want to add?
15:15:55 <ralonsoh> no
15:16:14 <bcafarel> no
15:16:25 <slaweq> ok, so next topic
15:16:27 <slaweq> #topic Stable branches
15:16:32 <slaweq> Ussuri dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1
15:16:34 <slaweq> Train dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1
15:17:05 <slaweq> I don't know about any new issues there, except the one with pip which was fixed yesterday I believe
15:18:29 <slaweq> bcafarel: do You have any new issues from stable branches ci?
15:18:44 <bcafarel> yes I sent a few rechecks this morning, so far it seems we are back in order (though most are still in queue)
15:19:16 <slaweq> ok, good that ci for stable branches works fine
15:19:27 <bcafarel> +1
15:19:53 <slaweq> ok
15:19:56 <slaweq> so next topic
15:19:58 <slaweq> #topic Grafana
15:20:02 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:21:25 <slaweq> after fix of the lower-constraints job and grenade jobs (thx ralonsoh) it seems for me that it's pretty ok
15:21:34 <ralonsoh> I think so
15:22:01 <gmann> slaweq: ack, in internal meeting, will look after that
15:22:04 <slaweq> the only thing which worries me is neutron-ovn-tempest-full-multinode-ovs-master  failing 100% of time since few weeks
15:22:19 <slaweq> gmann: np, we just talked about Ubuntu 20.04
15:23:07 <ralonsoh> slaweq, I think this is just a timeout problem
15:23:17 <slaweq> ralonsoh: it is timeout mostly
15:23:26 <slaweq> but it's I think related to many test failures
15:23:35 <slaweq> and due to that finally we are hitting job's timeout
15:24:12 <ralonsoh> slaweq, I'll try to take a look this week
15:24:38 <slaweq> see for example https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6a5/738163/12/check/neutron-ovn-tempest-full-multinode-ovs-master/6a5ca86/job-output.txt
15:24:49 <slaweq> many tests failing to ssh to the instance
15:25:32 <slaweq> so that may be some problem with job's configuration IMO
15:25:43 <slaweq> ralonsoh: thx a lot
15:25:59 <slaweq> #action ralonsoh to check timing out neutron-ovn-tempest-full-multinode-ovs-master jobs
15:26:23 <slaweq> do You have anything else regarding grafana?
15:26:57 <ralonsoh> no
15:27:16 <slaweq> ok, so lets move on
15:27:17 <slaweq> #topic fullstack/functional
15:27:32 <slaweq> so ralonsoh have an idea to make "uwsgi" jobs default (neutron-fullstack-with-uwsgi, neutron-functional-with-uwsgi) and remove the non-uwsgi ones (to save CI time)
15:27:45 <ralonsoh> exactly
15:27:50 <ralonsoh> to save time in the CI
15:27:51 <slaweq> and I basically agree with that
15:28:10 <slaweq> we even discussed some time ago to switch uwsgi to be default in devstack
15:28:26 <slaweq> but there is some problem with grenade jobs and I didn't had time to investigate it then
15:28:31 <slaweq> so it's still pending
15:29:25 <slaweq> but the other thing is that even if we will switch uwsgi to be default, we will need IMO keep e.g. functional job without uwsgi as this deployment method isn't deprecated or unsupported so far
15:29:50 <slaweq> so I think we need to have jobs for both
15:29:53 <ralonsoh> ok, let's wait then
15:30:08 <slaweq> and also so far -uwsgi jobs are still not gating, only voting in check queue
15:31:05 <slaweq> maybe, if we want to move forward with uwsgi we can move non-uwsgi functional/fullstack jobs to the periodic queue and promote -uwsgi ones to be gating
15:31:07 <slaweq> wdyt?
15:31:20 <ralonsoh> that will save ci time
15:31:24 <ralonsoh> ok for me
15:31:54 <bcafarel> that sounds good, I suppose the -uwsgi jobs are stable?
15:32:05 <ralonsoh> pretty much
15:32:12 <slaweq> bcafarel: as stable as non-uwsgi ones
15:32:16 <ralonsoh> same as non-uwsgi
15:32:33 <bcafarel> ack, sounds good to me then!
15:32:36 <slaweq> so I will propose such patch then
15:32:59 <slaweq> #action slaweq to move uwsgi jobs to periodic queue and promote -uwsgi ones to be gating
15:33:24 <slaweq> and that's all regarding functional/fullstack jobs for today from me
15:33:31 <slaweq> lets move on
15:33:33 <slaweq> #topic Tempest/Scenario
15:33:54 <slaweq> here I found one new (for me) issue in neutron-ovn-tempest-slow
15:34:07 <slaweq> test tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_update_router_admin_state is failing very often
15:34:20 <slaweq> like e.g. https://2e5a48e004c25d40a540-8aab0ff599dd9fcfda722d0dee6871dd.ssl.cf1.rackcdn.com/743832/1/check/neutron-ovn-tempest-slow/268b7cf/testr_results.html
15:34:33 <slaweq> I reported bug https://bugs.launchpad.net/neutron/+bug/1890445
15:34:34 <openstack> Launchpad bug 1890445 in neutron "[ovn] Tempest test test_update_router_admin_state failing very often" [Critical,Confirmed]
15:34:46 <slaweq> maciejjozefczyk: ralonsoh are You aware of such issue maybe?
15:34:50 <ralonsoh> no
15:35:18 <maciejjozefczyk> hmm
15:35:43 <maciejjozefczyk> perhaps the same as https://bugs.launchpad.net/neutron/+bug/1885898
15:35:44 <openstack> Launchpad bug 1885898 in neutron "test connectivity through 2 routers fails in neutron-ovn-tempest-full-multinode-ovs-master job" [High,Confirmed] - Assigned to Maciej Jozefczyk (maciej.jozefczyk)
15:36:05 <maciejjozefczyk> I noticed that we have a race condition in updating router ports
15:36:23 <maciejjozefczyk> I was trying to fix/debug it here: https://review.opendev.org/#/c/740491/20
15:36:28 <maciejjozefczyk> for now without any luck :(
15:37:27 <slaweq> maciejjozefczyk: can You check if this is the same issue and maybe mark 1890445 as duplicate of 1885898?
15:37:58 <maciejjozefczyk> slaweq, yes sure
15:38:06 <slaweq> and maybe also blacklist this one test until this issue will be fixed, to make this job green
15:38:21 <maciejjozefczyk> ok, please assign it to me
15:38:24 <slaweq> thx maciejjozefczyk
15:38:30 <slaweq> #action maciejjozefczyk to check https://bugs.launchpad.net/neutron/+bug/1890445
15:38:31 <openstack> Launchpad bug 1890445 in neutron "[ovn] Tempest test test_update_router_admin_state failing very often" [Critical,Confirmed]
15:38:33 <maciejjozefczyk> kk
15:40:42 <slaweq> and last one thing for today
15:43:35 <bcafarel> ^ laptop issue our PTL will hulk smash it into reboot and come back
15:43:52 <ralonsoh> hahahaha
15:44:36 <ralonsoh> so much suspense... what is this last thing?
15:45:15 <slaweq_> I'm back
15:45:25 <slaweq_> sorry
15:45:30 <ralonsoh> np
15:45:43 <slaweq> so one last thing for today is periodic job based on fedora
15:45:49 <slaweq> which is again failing every day
15:45:57 <maciejjozefczyk> woot
15:46:00 <bcafarel> "periodic failure job" :(
15:46:00 <maciejjozefczyk> I just fixed that one :(
15:46:37 <slaweq> it's failing since 30.07
15:47:28 <slaweq> 15 tests fails every day
15:47:30 <slaweq> like e.g. https://zuul.openstack.org/build/64b62afdb4ce4f2fa8b1b2d97bb36c39
15:47:40 <slaweq> I will open LP for that one after the meeting
15:48:16 <maciejjozefczyk> ok, thanks slaweq
15:48:23 <ralonsoh> similar problem to the ovn tempest multinode
15:48:28 <ralonsoh> ssh access timeout
15:48:33 <maciejjozefczyk> somethign bad recently happened in ovn I think
15:48:45 <slaweq> ralonsoh: yes, but this is singlenode job AFAIR
15:51:52 <slaweq> in neutron logs I see errors like:
15:51:54 <slaweq> keystoneauth1.exceptions.auth_plugins.MissingAuthPlugin: An auth plugin is required to determine endpoint URL
15:52:03 <ralonsoh> this is because of the placement
15:52:04 <slaweq> but I don't know if that may be related
15:52:13 <ralonsoh> but this is not important
15:52:17 <slaweq> ok
15:52:20 <ralonsoh> if we can't update the placement, that's ok
15:52:50 <slaweq> so nothing else obviously wrong
15:52:58 <ralonsoh> no...
15:53:00 <slaweq> anyway, I will open bug and we can check later
15:53:21 <slaweq> that's not top priority as it's "only" fedora based periodic job
15:53:30 <slaweq> ok, that's all from me for today
15:53:40 <slaweq> do You have anything else You want to discuss today?
15:53:54 <ralonsoh> no
15:54:18 <bcafarel> all good from me
15:54:36 <slaweq> ok, so thx for attending and see You online
15:54:41 <slaweq> #endmeeting