15:00:10 <slaweq> #startmeeting neutron_ci
15:00:11 <openstack> Meeting started Wed Feb 12 15:00:10 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:13 <slaweq> hi
15:00:15 <openstack> The meeting name has been set to 'neutron_ci'
15:00:16 <ralonsoh> hi
15:01:22 <slaweq> bcafarel: is on pto but maybe njohnston will join soon
15:01:27 <njohnston> I am here
15:01:31 <slaweq> hi :)
15:01:36 <njohnston> sorry, also on a videoconference with the Octavia folks
15:01:40 <slaweq> so lets start
15:01:44 <slaweq> njohnston: no problem
15:01:51 <slaweq> #topic Actions from previous meetings
15:02:02 <slaweq> slaweq to backport https://review.opendev.org/#/c/695834/ to stable branches in neutron-vpnaas
15:02:11 <slaweq> I was checking that and I had some doubts about it
15:02:31 <slaweq> finally gmann and amotoki fixed stable/rocky branch in other way so this wasn't needed
15:03:02 <slaweq> next one
15:03:04 <slaweq> slaweq to update grafana dashboard with missing jobs
15:03:11 <slaweq> patch https://review.opendev.org/706271
15:03:17 <slaweq> waiting for second +2 now
15:03:46 <slaweq> next one
15:03:48 <slaweq> slaweq to open LP related to fullstack placement issue
15:03:55 <slaweq> Bug reported: https://bugs.launchpad.net/neutron/+bug/1862177
15:03:56 <openstack> Launchpad bug 1862177 in neutron "Fullstack tests failing due to problem with connection to the fake placement service" [High,In progress] - Assigned to Lajos Katona (lajos-katona)
15:04:09 <slaweq> lajoskatona and rubasov are on it already
15:04:20 <slaweq> so we are in good hands :)
15:04:55 <slaweq> next one
15:04:56 <slaweq> slaweq to open LP related to "hang" neutron-server
15:05:03 <slaweq> Bug reported: https://bugs.launchpad.net/neutron/+bug/1862178
15:05:04 <openstack> Launchpad bug 1862178 in neutron "Fullstack tests failing due to "hang" neutron-server process" [High,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
15:05:18 <ralonsoh> two patches
15:05:27 <ralonsoh> #link https://review.opendev.org/#/c/707151/
15:05:34 <ralonsoh> #link https://review.opendev.org/#/c/707222/
15:05:52 <ralonsoh> (second one is good stuff)
15:06:04 <slaweq> and do You think this will solve this issue?
15:06:19 <ralonsoh> I've tested the second one with my dev env
15:06:31 <ralonsoh> and I see a big performance improvement
15:06:53 <ralonsoh> a) the delete operation (bulk), removing unneded physnets
15:07:04 <ralonsoh> b) adding new VLAN tags in bulk mode
15:07:16 <ralonsoh> and of course, reducing the number of tags
15:07:43 <ralonsoh> that;s all
15:08:07 <slaweq> ok, thx ralonsoh for those patches - I hope this will help with this issue in fullstack tests too
15:08:24 <slaweq> ok, next one
15:08:26 <slaweq> ralonsoh  to check missing project_id issue
15:08:36 <ralonsoh> no luck sorry....
15:08:42 <slaweq> np
15:08:48 <ralonsoh> I can't find were/why/how this is happening...
15:08:59 <ralonsoh> *where
15:09:47 <slaweq> should we simply maybe send some patch to log in such case both expected and actual dict
15:10:01 <slaweq> maybe we will at least will know in which one it's missing
15:10:33 <ralonsoh> we can see that in the response
15:11:00 <ralonsoh> the problem is to find where this project_id is filtered and deleted from this returned dict
15:11:15 <ralonsoh> and, why is this happening 1 out of 500 tests?
15:12:28 <slaweq> ok, I see now
15:12:42 <slaweq> it's filtered out from the "actual" dict than
15:12:48 <slaweq> *then
15:13:10 <ralonsoh> exactly, but no one (SDK, client or server) should do this
15:13:27 <ralonsoh> I tried to find something related to the project_id/tenant_id migration
15:13:45 <ralonsoh> because in some places, some black magic is done to convert one into the other
15:13:55 <ralonsoh> but project_id should be always there
15:14:05 <ralonsoh> (not tenant_id, that should be removed)
15:15:24 <slaweq> but afaik tempest has got own implementation of clients
15:15:36 <ralonsoh> I know
15:15:46 <slaweq> it don't use OpenStack SDK or neutronclient
15:15:52 <slaweq> maybe there is some bug there?
15:16:03 <ralonsoh> let me check again this
15:16:17 <slaweq> ralonsoh: ok, thx
15:16:19 <ralonsoh> but this is the nth time I try to find this bug
15:16:40 <slaweq> or maybe we should add some debug log of every response which is going to be send from neutron-server
15:16:59 <slaweq> so we than can confirm if that wasn't send from server or was filtered out on client's side
15:17:21 <ralonsoh> we could
15:18:06 <slaweq> ok, so You will take another look into it this week, right?
15:18:14 <ralonsoh> sure!
15:18:19 <slaweq> thx a lot
15:19:05 <slaweq> #ralonsoh to check again mystery of vanishing project_id
15:19:12 <ralonsoh> hahahaha
15:19:21 <slaweq> :)
15:19:50 <slaweq> ok, that's all actions from last week
15:20:01 <slaweq> do You have anything else to add here or can we move on?
15:20:14 <ralonsoh> no thanks
15:20:25 <slaweq> ok, so lets move on
15:20:34 <slaweq> #topic Stadium projects
15:20:43 <slaweq> standardize on zuul v3
15:20:45 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop
15:20:50 <slaweq> there was slow progress this week
15:20:56 <njohnston> So I added a summary section at the top of that etherpad
15:21:04 <njohnston> so it's easy to find the pending updates
15:21:25 <slaweq> thx njohnston - that's really helpful
15:21:28 <njohnston> for zuulv3 we have 3 projects with pending changes, 2 that have no activity
15:21:37 <slaweq> and we are really close to finish it
15:21:54 <njohnston> yep!
15:21:56 <njohnston> I also noted a few remaining py27 things, including the neutron-tempest-plugin change
15:22:20 <slaweq> yes, neutron-tempest-plugin is something what I'm aware of
15:22:29 <slaweq> patch is ready
15:22:34 <njohnston> yep!
15:22:44 <slaweq> but I would like to first merge few fixes, release last version with support for py27
15:22:51 <slaweq> and than drop this support
15:24:16 <slaweq> njohnston: and I will check this midonet patch today
15:24:44 <slaweq> anything else regarding stadium projects?
15:24:56 <njohnston> nope, things are looking good.
15:26:06 <slaweq> I just commented in https://review.opendev.org/#/c/695094/
15:26:15 <slaweq> can You check it and tell me what do You think about it?
15:26:43 <ralonsoh> sure
15:27:00 <njohnston> will do
15:27:34 <slaweq> thx
15:28:01 <slaweq> ok, so lets move on
15:28:03 <slaweq> #topic Grafana
15:28:08 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:30:05 <slaweq> functional jobs were recently failing pretty often
15:30:24 <slaweq> but as I checked many of those failures were in patches related to ovn
15:32:56 <slaweq> other than that I think that graphs are pretty good
15:33:25 <ralonsoh> I think so
15:34:01 <slaweq> ok, lets talk about few issues which I found recently
15:34:04 <slaweq> #topic fullstack/functional
15:34:20 <slaweq> here I have only one thing to mention
15:34:27 <slaweq> Unauthorized commands like "ping" or "ncat":
15:34:28 <slaweq> https://2a0154cb9a3e47bde3ed-4a9629bf7847ad9c8b03c9755148c549.ssl.cf1.rackcdn.com/705660/4/check/neutron-functional/2e5030b/testr_results.html
15:34:30 <slaweq> https://656129f4adff35088518-c39e8636195a8a58924c560773952ce4.ssl.cf1.rackcdn.com/705480/3/check/neutron-functional/e49a784/testr_results.html
15:34:33 <ralonsoh> sorry for that
15:34:39 <slaweq> and also probably same issue https://ad181adc6d8db459c7ce-fbb316944f0ca23c676e132d61555672.ssl.cf1.rackcdn.com/705237/4/check/neutron-functional/ed9cfb5/testr_results.html
15:34:43 <ralonsoh> bug opened and patch submitted
15:34:50 <slaweq> I wonder why it not happens all the time
15:34:57 <ralonsoh> #link https://review.opendev.org/#/c/707368/
15:35:05 <ralonsoh> yes, that was my question
15:35:17 <ralonsoh> and I have no answer
15:35:25 <ralonsoh> but this patch should solve this problem
15:35:59 <slaweq> and also in https://656129f4adff35088518-c39e8636195a8a58924c560773952ce4.ssl.cf1.rackcdn.com/705480/3/check/neutron-functional/e49a784/testr_results.html error was because of unauthorized "ping" command
15:36:06 <slaweq> but ping is in this rootwrap filters file
15:36:24 <ralonsoh> ??
15:37:01 <slaweq> ralonsoh: first failed test in the link above
15:37:06 <slaweq> it failed with error:
15:37:11 <ralonsoh> yes yes
15:37:17 <ralonsoh> but why??
15:37:20 <slaweq> Unauthorized command: ip netns exec test-19869f4e-e878-47fe-8cf8-90b60f6269e1 ping 192.178.0.2 -W 1 -c 3 (no filter matched)
15:37:44 <slaweq> ralonsoh: yeah, that's the question
15:37:54 <ralonsoh> oook ok
15:37:56 <ralonsoh> maybe I229e926341c5e6c8b06f59950e3ae09864d0f1f6
15:38:01 <ralonsoh> is the problem
15:38:08 <ralonsoh> let me review this patch
15:38:21 <ralonsoh> https://review.opendev.org/#/c/705065/
15:38:47 <slaweq> ok
15:38:56 <slaweq> so I will assign it as an action to You, ok?
15:39:01 <ralonsoh> ok
15:40:07 <slaweq> #action ralonsoh to check issues with unauthorized ping and ncat commands in functional tests
15:41:06 <slaweq> ok, so lets talk about scenario jobs
15:41:08 <slaweq> #topic Tempest/Scenario
15:41:25 <slaweq> first, I proposed patch to increase timeout for tempest-ipv6-only job: https://review.opendev.org/707356
15:41:40 <slaweq> I will respond to haleyb's comment there in a minute
15:41:53 <haleyb> :)
15:43:02 <ralonsoh> OVN slow is taking almost 3 hours and sometimes I've seen timeouts
15:43:11 <slaweq> haleyb: answered :)
15:43:31 <slaweq> ralonsoh: yes, but that's slow tests
15:43:48 <slaweq> tempest-slow-py3 also takes almost 3 hours
15:44:25 <slaweq> ok, lets move on quickly to other issues
15:44:33 <slaweq> I saw couple of times ssh problem due to "socket.timeout: timed out" error in various tests, like:
15:44:43 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9aa/703376/5/check/neutron-tempest-dvr/9aa046a/testr_results.html
15:44:45 <slaweq> https://db85fb7d3af8e26f7154-0d96b608ecbdac6c8248619c1dff0910.ssl.cf5.rackcdn.com/704833/4/check/neutron-tempest-dvr/67fd708/testr_results.html
15:44:47 <slaweq> https://19574e4665a40f62095e-6b9500683e6a67d31c1bad572acf67ba.ssl.cf1.rackcdn.com/705982/6/check/neutron-tempest-dvr/8f3fbd0/testr_results.html
15:44:49 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_37a/705452/5/check/neutron-tempest-dvr-ha-multinode-full/37a65a9/testr_results.html
15:44:51 <slaweq> mostly in neutron-tempest-dvr job
15:44:59 <slaweq> but I saw it also in multinode dvr job
15:45:15 <slaweq> I think it's new issue and we should investigate it
15:45:19 <slaweq> any volunteers?
15:46:08 <slaweq> ok, I will report new bug and will try to look into logs
15:46:29 <slaweq> #action slaweq to report issue with ssh timeout on dvr jobs and check logs there
15:47:00 <slaweq> other than that, I saw one issue with nova's revert_resize, like e.g. in
15:47:06 <slaweq> https://3e447a3e4caf9c486a4d-b84d447537aa444ce20bcf5414a5ef0e.ssl.cf5.rackcdn.com/707248/1/check/neutron-tempest-dvr-ha-multinode-full/057233d/testr_results.html
15:47:07 <slaweq> https://3ceba9124358a5c9851b-33ba471340a760763569a038d91ca6b5.ssl.cf2.rackcdn.com/706875/2/check/neutron-tempest-dvr-ha-multinode-full/6d77e6d/testr_results.html
15:47:13 <slaweq> so I will report bug for nova about that
15:47:35 <slaweq> and that's all from my side regarding scenario jobs
15:47:41 <slaweq> anything else You want to add?
15:47:54 <ralonsoh> no thanks
15:47:59 <njohnston> nope
15:48:10 <slaweq> ok, thx
15:48:24 <slaweq> so (almost) last thing for today
15:48:29 <slaweq> #topic Periodic
15:48:38 <slaweq> neutron-ovn-tempest-ovs-master-fedora is failing everyday
15:48:48 <slaweq> we should check this job
15:49:25 <slaweq> it's failing on deploying devstack
15:49:25 <ralonsoh> do you have the link for the periodic jobs?
15:49:26 <slaweq> https://9fc08b4308a330f341b7-ee326d1edc43244c4c522686856ef03c.ssl.cf2.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-ovs-master-fedora/07d0c6f/job-output.txt
15:49:44 <slaweq> error is like:
15:49:46 <slaweq> 2020-02-11 07:05:25.363036 | controller | + lib/infra:install_infra:32               :   virtualenv -p python3 /opt/stack/requirements/.venv
15:49:47 <slaweq> 2020-02-11 07:05:25.441404 | controller | ERROR:root:ImportError: cannot import name ensure_text
15:50:12 <ralonsoh> I can check it
15:50:17 <slaweq> I think it's some issue related to fedora 29
15:50:25 <slaweq> ralonsoh: thx
15:50:53 <slaweq> #action ralonsoh to check periodic neutron-ovn-tempest-ovs-master-fedora job's failures
15:51:10 <slaweq> other periodic jobs looks fine
15:51:28 <slaweq> ok, and I have one more topic for tody
15:51:31 <slaweq> *today
15:51:32 <slaweq> #topic Open discussion
15:52:02 <slaweq> some time ago I asked on ML about future of lib/neutron and lib/neutron-legacy in devstack
15:52:31 <slaweq> I think haleyb said that there shouldn't be much additional work needed to finally make lib/neutron usable
15:53:10 <slaweq> so I would like to ask if there is any volunteer to work on that or maybe we should simply remove lib/neutron and rename lib/neutron-legacy to not be "legacy" anymore?
15:53:24 <slaweq> as now things can be confusing for users :)
15:53:30 <slaweq> any thoughts?
15:54:12 <ralonsoh> I'm ok with this but not a priority now
15:54:42 <ralonsoh> I can take it but not now
15:54:52 <slaweq> ralonsoh: sure, it's not top priority but IMO we should at least decide what to do with it finally
15:55:09 <slaweq> as now we are "in the middle" of 2 solutions since few years
15:55:21 <ralonsoh> I'm ok with finishing the migration
15:55:44 <slaweq> great
15:56:07 <slaweq> so lets keep this in mind and maybe move it slowly forward
15:56:19 <slaweq> ok, that's all from my side for today
15:56:27 <slaweq> anything else You want to talk today?
15:56:48 <ralonsoh> no
15:57:04 * njohnston just finished the other meeting, is catching up
15:57:09 <njohnston> nothing from me
15:57:16 <slaweq> ok, thx for attending
15:57:22 <slaweq> see You online
15:57:24 <slaweq> o/
15:57:27 <slaweq> #endmeeting