15:00:10 <slaweq> #startmeeting neutron_ci 15:00:11 <openstack> Meeting started Wed Feb 12 15:00:10 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:13 <slaweq> hi 15:00:15 <openstack> The meeting name has been set to 'neutron_ci' 15:00:16 <ralonsoh> hi 15:01:22 <slaweq> bcafarel: is on pto but maybe njohnston will join soon 15:01:27 <njohnston> I am here 15:01:31 <slaweq> hi :) 15:01:36 <njohnston> sorry, also on a videoconference with the Octavia folks 15:01:40 <slaweq> so lets start 15:01:44 <slaweq> njohnston: no problem 15:01:51 <slaweq> #topic Actions from previous meetings 15:02:02 <slaweq> slaweq to backport https://review.opendev.org/#/c/695834/ to stable branches in neutron-vpnaas 15:02:11 <slaweq> I was checking that and I had some doubts about it 15:02:31 <slaweq> finally gmann and amotoki fixed stable/rocky branch in other way so this wasn't needed 15:03:02 <slaweq> next one 15:03:04 <slaweq> slaweq to update grafana dashboard with missing jobs 15:03:11 <slaweq> patch https://review.opendev.org/706271 15:03:17 <slaweq> waiting for second +2 now 15:03:46 <slaweq> next one 15:03:48 <slaweq> slaweq to open LP related to fullstack placement issue 15:03:55 <slaweq> Bug reported: https://bugs.launchpad.net/neutron/+bug/1862177 15:03:56 <openstack> Launchpad bug 1862177 in neutron "Fullstack tests failing due to problem with connection to the fake placement service" [High,In progress] - Assigned to Lajos Katona (lajos-katona) 15:04:09 <slaweq> lajoskatona and rubasov are on it already 15:04:20 <slaweq> so we are in good hands :) 15:04:55 <slaweq> next one 15:04:56 <slaweq> slaweq to open LP related to "hang" neutron-server 15:05:03 <slaweq> Bug reported: https://bugs.launchpad.net/neutron/+bug/1862178 15:05:04 <openstack> Launchpad bug 1862178 in neutron "Fullstack tests failing due to "hang" neutron-server process" [High,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:05:18 <ralonsoh> two patches 15:05:27 <ralonsoh> #link https://review.opendev.org/#/c/707151/ 15:05:34 <ralonsoh> #link https://review.opendev.org/#/c/707222/ 15:05:52 <ralonsoh> (second one is good stuff) 15:06:04 <slaweq> and do You think this will solve this issue? 15:06:19 <ralonsoh> I've tested the second one with my dev env 15:06:31 <ralonsoh> and I see a big performance improvement 15:06:53 <ralonsoh> a) the delete operation (bulk), removing unneded physnets 15:07:04 <ralonsoh> b) adding new VLAN tags in bulk mode 15:07:16 <ralonsoh> and of course, reducing the number of tags 15:07:43 <ralonsoh> that;s all 15:08:07 <slaweq> ok, thx ralonsoh for those patches - I hope this will help with this issue in fullstack tests too 15:08:24 <slaweq> ok, next one 15:08:26 <slaweq> ralonsoh to check missing project_id issue 15:08:36 <ralonsoh> no luck sorry.... 15:08:42 <slaweq> np 15:08:48 <ralonsoh> I can't find were/why/how this is happening... 15:08:59 <ralonsoh> *where 15:09:47 <slaweq> should we simply maybe send some patch to log in such case both expected and actual dict 15:10:01 <slaweq> maybe we will at least will know in which one it's missing 15:10:33 <ralonsoh> we can see that in the response 15:11:00 <ralonsoh> the problem is to find where this project_id is filtered and deleted from this returned dict 15:11:15 <ralonsoh> and, why is this happening 1 out of 500 tests? 15:12:28 <slaweq> ok, I see now 15:12:42 <slaweq> it's filtered out from the "actual" dict than 15:12:48 <slaweq> *then 15:13:10 <ralonsoh> exactly, but no one (SDK, client or server) should do this 15:13:27 <ralonsoh> I tried to find something related to the project_id/tenant_id migration 15:13:45 <ralonsoh> because in some places, some black magic is done to convert one into the other 15:13:55 <ralonsoh> but project_id should be always there 15:14:05 <ralonsoh> (not tenant_id, that should be removed) 15:15:24 <slaweq> but afaik tempest has got own implementation of clients 15:15:36 <ralonsoh> I know 15:15:46 <slaweq> it don't use OpenStack SDK or neutronclient 15:15:52 <slaweq> maybe there is some bug there? 15:16:03 <ralonsoh> let me check again this 15:16:17 <slaweq> ralonsoh: ok, thx 15:16:19 <ralonsoh> but this is the nth time I try to find this bug 15:16:40 <slaweq> or maybe we should add some debug log of every response which is going to be send from neutron-server 15:16:59 <slaweq> so we than can confirm if that wasn't send from server or was filtered out on client's side 15:17:21 <ralonsoh> we could 15:18:06 <slaweq> ok, so You will take another look into it this week, right? 15:18:14 <ralonsoh> sure! 15:18:19 <slaweq> thx a lot 15:19:05 <slaweq> #ralonsoh to check again mystery of vanishing project_id 15:19:12 <ralonsoh> hahahaha 15:19:21 <slaweq> :) 15:19:50 <slaweq> ok, that's all actions from last week 15:20:01 <slaweq> do You have anything else to add here or can we move on? 15:20:14 <ralonsoh> no thanks 15:20:25 <slaweq> ok, so lets move on 15:20:34 <slaweq> #topic Stadium projects 15:20:43 <slaweq> standardize on zuul v3 15:20:45 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 15:20:50 <slaweq> there was slow progress this week 15:20:56 <njohnston> So I added a summary section at the top of that etherpad 15:21:04 <njohnston> so it's easy to find the pending updates 15:21:25 <slaweq> thx njohnston - that's really helpful 15:21:28 <njohnston> for zuulv3 we have 3 projects with pending changes, 2 that have no activity 15:21:37 <slaweq> and we are really close to finish it 15:21:54 <njohnston> yep! 15:21:56 <njohnston> I also noted a few remaining py27 things, including the neutron-tempest-plugin change 15:22:20 <slaweq> yes, neutron-tempest-plugin is something what I'm aware of 15:22:29 <slaweq> patch is ready 15:22:34 <njohnston> yep! 15:22:44 <slaweq> but I would like to first merge few fixes, release last version with support for py27 15:22:51 <slaweq> and than drop this support 15:24:16 <slaweq> njohnston: and I will check this midonet patch today 15:24:44 <slaweq> anything else regarding stadium projects? 15:24:56 <njohnston> nope, things are looking good. 15:26:06 <slaweq> I just commented in https://review.opendev.org/#/c/695094/ 15:26:15 <slaweq> can You check it and tell me what do You think about it? 15:26:43 <ralonsoh> sure 15:27:00 <njohnston> will do 15:27:34 <slaweq> thx 15:28:01 <slaweq> ok, so lets move on 15:28:03 <slaweq> #topic Grafana 15:28:08 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:30:05 <slaweq> functional jobs were recently failing pretty often 15:30:24 <slaweq> but as I checked many of those failures were in patches related to ovn 15:32:56 <slaweq> other than that I think that graphs are pretty good 15:33:25 <ralonsoh> I think so 15:34:01 <slaweq> ok, lets talk about few issues which I found recently 15:34:04 <slaweq> #topic fullstack/functional 15:34:20 <slaweq> here I have only one thing to mention 15:34:27 <slaweq> Unauthorized commands like "ping" or "ncat": 15:34:28 <slaweq> https://2a0154cb9a3e47bde3ed-4a9629bf7847ad9c8b03c9755148c549.ssl.cf1.rackcdn.com/705660/4/check/neutron-functional/2e5030b/testr_results.html 15:34:30 <slaweq> https://656129f4adff35088518-c39e8636195a8a58924c560773952ce4.ssl.cf1.rackcdn.com/705480/3/check/neutron-functional/e49a784/testr_results.html 15:34:33 <ralonsoh> sorry for that 15:34:39 <slaweq> and also probably same issue https://ad181adc6d8db459c7ce-fbb316944f0ca23c676e132d61555672.ssl.cf1.rackcdn.com/705237/4/check/neutron-functional/ed9cfb5/testr_results.html 15:34:43 <ralonsoh> bug opened and patch submitted 15:34:50 <slaweq> I wonder why it not happens all the time 15:34:57 <ralonsoh> #link https://review.opendev.org/#/c/707368/ 15:35:05 <ralonsoh> yes, that was my question 15:35:17 <ralonsoh> and I have no answer 15:35:25 <ralonsoh> but this patch should solve this problem 15:35:59 <slaweq> and also in https://656129f4adff35088518-c39e8636195a8a58924c560773952ce4.ssl.cf1.rackcdn.com/705480/3/check/neutron-functional/e49a784/testr_results.html error was because of unauthorized "ping" command 15:36:06 <slaweq> but ping is in this rootwrap filters file 15:36:24 <ralonsoh> ?? 15:37:01 <slaweq> ralonsoh: first failed test in the link above 15:37:06 <slaweq> it failed with error: 15:37:11 <ralonsoh> yes yes 15:37:17 <ralonsoh> but why?? 15:37:20 <slaweq> Unauthorized command: ip netns exec test-19869f4e-e878-47fe-8cf8-90b60f6269e1 ping 192.178.0.2 -W 1 -c 3 (no filter matched) 15:37:44 <slaweq> ralonsoh: yeah, that's the question 15:37:54 <ralonsoh> oook ok 15:37:56 <ralonsoh> maybe I229e926341c5e6c8b06f59950e3ae09864d0f1f6 15:38:01 <ralonsoh> is the problem 15:38:08 <ralonsoh> let me review this patch 15:38:21 <ralonsoh> https://review.opendev.org/#/c/705065/ 15:38:47 <slaweq> ok 15:38:56 <slaweq> so I will assign it as an action to You, ok? 15:39:01 <ralonsoh> ok 15:40:07 <slaweq> #action ralonsoh to check issues with unauthorized ping and ncat commands in functional tests 15:41:06 <slaweq> ok, so lets talk about scenario jobs 15:41:08 <slaweq> #topic Tempest/Scenario 15:41:25 <slaweq> first, I proposed patch to increase timeout for tempest-ipv6-only job: https://review.opendev.org/707356 15:41:40 <slaweq> I will respond to haleyb's comment there in a minute 15:41:53 <haleyb> :) 15:43:02 <ralonsoh> OVN slow is taking almost 3 hours and sometimes I've seen timeouts 15:43:11 <slaweq> haleyb: answered :) 15:43:31 <slaweq> ralonsoh: yes, but that's slow tests 15:43:48 <slaweq> tempest-slow-py3 also takes almost 3 hours 15:44:25 <slaweq> ok, lets move on quickly to other issues 15:44:33 <slaweq> I saw couple of times ssh problem due to "socket.timeout: timed out" error in various tests, like: 15:44:43 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9aa/703376/5/check/neutron-tempest-dvr/9aa046a/testr_results.html 15:44:45 <slaweq> https://db85fb7d3af8e26f7154-0d96b608ecbdac6c8248619c1dff0910.ssl.cf5.rackcdn.com/704833/4/check/neutron-tempest-dvr/67fd708/testr_results.html 15:44:47 <slaweq> https://19574e4665a40f62095e-6b9500683e6a67d31c1bad572acf67ba.ssl.cf1.rackcdn.com/705982/6/check/neutron-tempest-dvr/8f3fbd0/testr_results.html 15:44:49 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_37a/705452/5/check/neutron-tempest-dvr-ha-multinode-full/37a65a9/testr_results.html 15:44:51 <slaweq> mostly in neutron-tempest-dvr job 15:44:59 <slaweq> but I saw it also in multinode dvr job 15:45:15 <slaweq> I think it's new issue and we should investigate it 15:45:19 <slaweq> any volunteers? 15:46:08 <slaweq> ok, I will report new bug and will try to look into logs 15:46:29 <slaweq> #action slaweq to report issue with ssh timeout on dvr jobs and check logs there 15:47:00 <slaweq> other than that, I saw one issue with nova's revert_resize, like e.g. in 15:47:06 <slaweq> https://3e447a3e4caf9c486a4d-b84d447537aa444ce20bcf5414a5ef0e.ssl.cf5.rackcdn.com/707248/1/check/neutron-tempest-dvr-ha-multinode-full/057233d/testr_results.html 15:47:07 <slaweq> https://3ceba9124358a5c9851b-33ba471340a760763569a038d91ca6b5.ssl.cf2.rackcdn.com/706875/2/check/neutron-tempest-dvr-ha-multinode-full/6d77e6d/testr_results.html 15:47:13 <slaweq> so I will report bug for nova about that 15:47:35 <slaweq> and that's all from my side regarding scenario jobs 15:47:41 <slaweq> anything else You want to add? 15:47:54 <ralonsoh> no thanks 15:47:59 <njohnston> nope 15:48:10 <slaweq> ok, thx 15:48:24 <slaweq> so (almost) last thing for today 15:48:29 <slaweq> #topic Periodic 15:48:38 <slaweq> neutron-ovn-tempest-ovs-master-fedora is failing everyday 15:48:48 <slaweq> we should check this job 15:49:25 <slaweq> it's failing on deploying devstack 15:49:25 <ralonsoh> do you have the link for the periodic jobs? 15:49:26 <slaweq> https://9fc08b4308a330f341b7-ee326d1edc43244c4c522686856ef03c.ssl.cf2.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-ovs-master-fedora/07d0c6f/job-output.txt 15:49:44 <slaweq> error is like: 15:49:46 <slaweq> 2020-02-11 07:05:25.363036 | controller | + lib/infra:install_infra:32 : virtualenv -p python3 /opt/stack/requirements/.venv 15:49:47 <slaweq> 2020-02-11 07:05:25.441404 | controller | ERROR:root:ImportError: cannot import name ensure_text 15:50:12 <ralonsoh> I can check it 15:50:17 <slaweq> I think it's some issue related to fedora 29 15:50:25 <slaweq> ralonsoh: thx 15:50:53 <slaweq> #action ralonsoh to check periodic neutron-ovn-tempest-ovs-master-fedora job's failures 15:51:10 <slaweq> other periodic jobs looks fine 15:51:28 <slaweq> ok, and I have one more topic for tody 15:51:31 <slaweq> *today 15:51:32 <slaweq> #topic Open discussion 15:52:02 <slaweq> some time ago I asked on ML about future of lib/neutron and lib/neutron-legacy in devstack 15:52:31 <slaweq> I think haleyb said that there shouldn't be much additional work needed to finally make lib/neutron usable 15:53:10 <slaweq> so I would like to ask if there is any volunteer to work on that or maybe we should simply remove lib/neutron and rename lib/neutron-legacy to not be "legacy" anymore? 15:53:24 <slaweq> as now things can be confusing for users :) 15:53:30 <slaweq> any thoughts? 15:54:12 <ralonsoh> I'm ok with this but not a priority now 15:54:42 <ralonsoh> I can take it but not now 15:54:52 <slaweq> ralonsoh: sure, it's not top priority but IMO we should at least decide what to do with it finally 15:55:09 <slaweq> as now we are "in the middle" of 2 solutions since few years 15:55:21 <ralonsoh> I'm ok with finishing the migration 15:55:44 <slaweq> great 15:56:07 <slaweq> so lets keep this in mind and maybe move it slowly forward 15:56:19 <slaweq> ok, that's all from my side for today 15:56:27 <slaweq> anything else You want to talk today? 15:56:48 <ralonsoh> no 15:57:04 * njohnston just finished the other meeting, is catching up 15:57:09 <njohnston> nothing from me 15:57:16 <slaweq> ok, thx for attending 15:57:22 <slaweq> see You online 15:57:24 <slaweq> o/ 15:57:27 <slaweq> #endmeeting