15:00:04 <slaweq> #startmeeting neutron_ci 15:00:04 <openstack> Meeting started Wed Jan 29 15:00:04 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:06 <slaweq> hi 15:00:08 <openstack> The meeting name has been set to 'neutron_ci' 15:00:46 <njohnston> o/ 15:01:01 <slaweq> welcome on CI meeting at new hour and in new room 15:01:03 <slaweq> :) 15:01:20 <bcafarel> o/ 15:01:45 <ralonsoh> hi 15:02:29 <slaweq> ok, lets start now 15:02:45 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:51 <slaweq> please open now :) 15:02:58 <slaweq> #topic Actions from previous meetings 15:03:11 <slaweq> ralonsoh to increase log level for ovsdbapp in fullstack/functional jobs 15:03:20 <ralonsoh> one sec 15:03:32 <ralonsoh> #link https://review.opendev.org/#/c/703791/ 15:03:47 <ralonsoh> it's still failing ... 15:04:00 <slaweq> ahh, this is the patch for that :) 15:04:01 <ralonsoh> I need to review how to configure properly the ml2 plugin in zuul 15:04:11 <slaweq> I commented it today again 15:04:34 <slaweq> but now I think that it will not work like that 15:04:48 <slaweq> as You need to set proper config options in test's setUp method 15:05:17 <slaweq> probably somewhere in https://github.com/openstack/neutron/blob/master/neutron/tests/functional/base.py 15:05:22 <slaweq> for functional tests 15:05:40 <slaweq> in those jobs we are not using config files at all 15:05:48 <ralonsoh> you are right 15:05:51 <ralonsoh> not in the FTs 15:06:06 <ralonsoh> but I should configure like this in fullstack 15:06:08 <slaweq> and in fullstack it is similar 15:06:27 <ralonsoh> ok, I'll check it later today 15:06:28 <slaweq> it's here: https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/resources/config.py 15:06:43 <ralonsoh> you are right 15:06:44 <slaweq> sorry that I didn't wrote it earlier 15:06:55 <ralonsoh> the agent is configured there 15:06:56 <ralonsoh> thanks!! 15:07:00 <slaweq> idk why I missed that and tried to fix Your patch in "Your way" :) 15:07:49 <slaweq> ok, next one was: 15:07:51 <slaweq> slaweq to open bug for issue with get_dp_id in os_ken 15:08:01 <slaweq> I reported it here https://bugs.launchpad.net/neutron/+bug/1861269 15:08:01 <openstack> Launchpad bug 1861269 in neutron "Functional tests failing due to failure with getting datapath ID from ovs" [High,Confirmed] 15:08:10 <ralonsoh> slaweq, I think I have a possible solution for this 15:08:11 <slaweq> but I didn't assign it to myself 15:08:22 <ralonsoh> #link https://review.opendev.org/#/c/704397/ 15:08:30 <ralonsoh> I need to justify it 15:08:47 <ralonsoh> but the point is, although we have multithreading because of os-ken 15:09:02 <bcafarel> funny, usually in this kind of review the "fix" is add a sleep, not remove one :) 15:09:02 <ralonsoh> the ovs agent code should not give the GIL to other tasks 15:09:21 <ralonsoh> that means: do not use sleep, what will stop the thread execution 15:09:40 <ralonsoh> if other threads are not expecting the GIL then those threads won't return it back 15:10:03 <ralonsoh> I've rechecked this patch several times, no errors in FT and fullstack (related) 15:10:03 <slaweq> ralonsoh: makes sense IMO 15:10:17 <ralonsoh> I'll add a proper explanation in the patch 15:10:30 <slaweq> bcafarel: LOL, that's true, usually we need to add sleep to "fix" something :) 15:10:34 <ralonsoh> hehehhehe 15:10:44 <slaweq> ralonsoh: please link Your patch to this bug also 15:10:48 <ralonsoh> sure 15:11:04 <slaweq> thx ralonsoh 15:11:20 <bcafarel> wow that line is old, it comes directly from "Introduce Ryu based OpenFlow implementation" 15:11:28 <ralonsoh> exactly! 15:12:31 <slaweq> ok, next one 15:12:33 <slaweq> slaweq to try to skip cleaning up neutron resources in fullstack job 15:12:53 <slaweq> I was trying it locally and it was fast about 4-5 seconds on test 15:13:03 <slaweq> so I think it's not worth to do it 15:13:17 <slaweq> and I didn't send any patch 15:13:42 <ralonsoh> yeah, not too much (and maybe we can introduce new errors) 15:13:49 <slaweq> ralonsoh: exactly 15:14:06 <slaweq> so risk of unpredictible side effects is too high IMO 15:14:46 <slaweq> ok, that was all from last week 15:14:52 <slaweq> #topic Stadium projects 15:15:17 <slaweq> as we talked yesterday, we finished dropping py2 support in Neutron 15:15:21 <slaweq> \o/ 15:15:33 <ralonsoh> fantastic 15:15:37 <slaweq> so lets use etherpad https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop to track migration to zuul v3 15:16:16 <slaweq> but I have one more thing about dropping py2 support 15:16:19 <slaweq> there is patch https://review.opendev.org/#/c/704257/ 15:16:24 <slaweq> for neutron-tempest-plugin 15:16:45 <slaweq> it's not working properly for Rocky jobs 15:17:32 <slaweq> and I have a question: shouldn't we first make tag of neutron-tempest-plugin repo and use this tag for rocky with py27 15:17:44 <slaweq> and than go with this patch to drop py27 completly? 15:17:50 <ralonsoh> exactly 15:18:02 <slaweq> or how it will work with rocky after we will merge it? 15:18:14 <njohnston> that makes sense to me 15:18:16 <ralonsoh> we need to tag it first to use it in rocky tests 15:18:18 <slaweq> gmann: njohnston: am I missing something here? 15:19:30 <njohnston> I am not completely sure - gmann has been very active in this area, I believe he has a plan, but I am not sure all the details 15:20:19 <slaweq> ok, I will ask about it in review 15:21:14 <slaweq> ok, njohnston any other updates about zuulv3 migration? 15:22:02 <njohnston> I don't have any updates; it has been accepted as an official goal for the V cycle, so we are way ahead of schedule, but it will be good to finish up because of all the reasons 15:22:32 <njohnston> there are only 3 or 4 stadium projects that have changes left 15:22:40 <slaweq> not too many 15:22:56 <njohnston> bcafarel is working on bgpvpn https://review.opendev.org/#/c/703601/ 15:22:59 <bcafarel> stadium-wise there is also amotoki's question on vpnaas failing on rocky and moving to use neutron-tempest-plugin there 15:23:38 <slaweq> bcafarel: do You have any patch with failure example? 15:24:00 <bcafarel> for vpnaas? https://review.opendev.org/#/c/590569/ 15:24:56 <bcafarel> also for bgpvpn I was wondering: there is an install job which does not run any tests (as per its name), should we migrate it or just drop it? I think other tests cover the "is it installable?" part 15:25:41 <njohnston> neutron-dynamic-routing also has bcafarel's magic touch https://review.opendev.org/#/c/703582/ ; I don't see anything zuulv3 related for networking-odl, networking-midonet, neutron-vpnaas 15:27:22 <njohnston> thats it for me 15:27:32 <slaweq> thx njohnston 15:27:41 <slaweq> speaking about this vpnaas rocky issue 15:27:59 <slaweq> am I understanding correct that if we would pin tempest used for rocky than it would be fine? 15:28:33 <slaweq> or we should use for rocky branch job defined in neutron-tempest-plugin repo (like for master now)? 15:29:56 <njohnston> I'll defer on that to gmann 15:30:24 <slaweq> ok, I will talk with him about it 15:30:41 <slaweq> #action slaweq to talk with gmann about vpnaas jobs on rocky 15:30:53 * slaweq starts hating rocky branch now 15:31:02 <bcafarel> :) 15:31:18 * njohnston welcomes slaweq to the club 15:31:20 <bcafarel> "old but not enough" branch 15:31:40 <slaweq> lol, that's true 15:31:57 <njohnston> I have had too many backports that go "train: green; stein: green; rocky: RED; queens: green" 15:31:58 <slaweq> good news is that it's just few more weeks and it will be EM 15:32:55 <slaweq> ok, lets move on 15:33:09 <slaweq> or do You have anything else related to stadium for today? 15:33:20 <njohnston> nope, nothing else 15:34:09 <slaweq> ok, so lets move on 15:34:11 <slaweq> #topic Grafana 15:34:20 <slaweq> #link http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1 15:34:52 <slaweq> from what I can say, we are much better with scenario jobs now 15:35:08 <slaweq> but our biggest problems are fullstack/functional jobs 15:35:12 <slaweq> and grenade jobs 15:35:57 <slaweq> and the biggest issue from those is functional job 15:36:05 <njohnston> yep 15:37:54 <slaweq> and I think that we are missing some ovn related job on the dashboard now 15:38:05 <slaweq> I will check that and update dashboard if needed 15:38:20 <slaweq> #action slaweq to update grafana dashboard with missing jobs 15:38:42 <bcafarel> are all the jobs in? I think I saw some reviews on functional ovn (at least) 15:39:10 <slaweq> but functional tests will be run together with our "old" functional job I think 15:39:22 <bcafarel> ah ok :) 15:39:51 <slaweq> anything else related to grafana for today? 15:40:29 <slaweq> ok, so lets move on than 15:40:40 <slaweq> #topic fullstack/functional 15:40:53 <slaweq> I have few examples of failures in functional job 15:41:10 <slaweq> first again ovsdbapp command timeouts: 15:41:12 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_abb/704530/4/gate/neutron-functional/abbc532/testr_results.html 15:41:14 <slaweq> https://ce2d3847c3fd9d644a91-e099c5c03695c7198c297e75ec3f8d05.ssl.cf2.rackcdn.com/704240/3/gate/neutron-functional/ab28823/testr_results.html 15:41:21 <slaweq> but I know ralonsoh is on it already 15:41:38 <slaweq> so it's just to point to the new examples of this issue 15:42:01 <ralonsoh> yes, let's see if we can have more information with the patch uploaded 15:42:26 <ralonsoh> but those are the main problems we see in FT and fullstack 15:42:31 <ralonsoh> 1) ovsdb timeouts 15:42:40 <ralonsoh> 2) the os-ken datapath timeout 15:42:46 <ralonsoh> 3) pyroute timeouts 15:42:57 <ralonsoh> (did I say "timeout" before?) 15:43:11 <slaweq> lol 15:43:26 <slaweq> yeah, timeouts are our biggest nightmare now :/ 15:43:29 <njohnston> lol 15:43:52 <slaweq> but it seems logical that removing "sleep" from code may solve timeouts :P 15:44:33 <slaweq> ok, next one than (this one is new for me, at least I don't remember anything like that) 15:44:38 <slaweq> failure in neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_correct_protection_allowed_address_pairs 15:44:46 <slaweq> https://4131d9f319da782ce250-f8398ccf7503ce4fb23659d29292afec.ssl.cf2.rackcdn.com/694568/16/check/neutron-functional/4fe1d73/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_correct_protection_allowed_address_pairs.txt 15:44:59 <slaweq> there are errors like 2020-01-29 09:11:10.172 22333 ERROR ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: error parsing update: ovsdb error: Modify non-existing row: ovs.db.error.Error: ovsdb error: Modify non-existing row 15:45:26 <ralonsoh> I have no idea in this one 15:45:46 <ralonsoh> how, a Linux Bridge test, is hitting an OVS error?? 15:45:50 * njohnston looks for otherwiseguy 15:45:51 <bcafarel> some race condition in test because of our timeout friend? 15:45:52 <slaweq> me neighter but I wonder why ovsdbapp is used in those Linuxbridge tests 15:46:00 <ralonsoh> that's the point 15:46:03 <ralonsoh> or LB or OVS 15:46:17 <bcafarel> oh 15:47:01 <slaweq> and here is error in test: https://4131d9f319da782ce250-f8398ccf7503ce4fb23659d29292afec.ssl.cf2.rackcdn.com/694568/16/check/neutron-functional/4fe1d73/testr_results.html 15:47:15 <slaweq> so it seems that it failed on preparation of test env 15:48:18 <ralonsoh> slaweq, that seems an error from a previous test 15:48:29 <ralonsoh> and a blocked greenlet thread 15:48:40 <ralonsoh> maybe it's too late 15:48:54 <ralonsoh> but the use of greenthreads, IMO, was not a good option 15:49:24 <ralonsoh> (remember python does NOT have multithreading at all) 15:49:28 <slaweq> but that's the only failed test in this job 15:49:35 <ralonsoh> I know... 15:50:56 <slaweq> lets see if we will have more such issues, as nobody saw it before so far maybe it will never happen again ;) 15:51:09 * slaweq don't belive himself even 15:51:19 <ralonsoh> hahahah 15:51:36 <njohnston> :-D 15:51:55 <slaweq> and I have one more, like: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_00e/701733/24/check/neutron-functional/00ede7c/testr_results.html 15:52:01 * bcafarel is not betting on it either 15:52:08 <slaweq> and this one I think I saw at least twice this week 15:52:19 <ralonsoh> no no 15:52:23 <ralonsoh> this is not a problem 15:52:35 <slaweq> no? 15:52:37 <slaweq> why? 15:52:40 <ralonsoh> that was related to an error in the OVN functional tests 15:52:46 <ralonsoh> but that's solved not 15:52:47 <ralonsoh> now 15:52:48 <ralonsoh> one sec 15:53:02 <ralonsoh> (also I pushed a DNM patch to test this) 15:53:07 <bcafarel> a sorting order issue right? 15:53:19 <slaweq> ahh, so it's related to ovn migration, right? 15:53:36 <ralonsoh> https://review.opendev.org/#/c/701733/24..26 15:53:43 <ralonsoh> please, read the diff and you'll understand 15:53:55 <ralonsoh> --> https://review.opendev.org/#/c/701733/24..26/neutron/tests/functional/plugins/ml2/drivers/ovn/mech_driver/ovsdb/test_maintenance.py 15:55:35 <ralonsoh> BTW, that problem in FTs was tested in https://review.opendev.org/#/c/704376 15:56:11 <slaweq> so it was trying to use test_extensions.setup_extensions_middleware(sg_mgr) as security groups api, instead of "normal" one, right? 15:56:42 <ralonsoh> exactly 15:56:51 <ralonsoh> that was needed in networking-ovn 15:56:57 <ralonsoh> but NOT in neutron repo 15:57:05 <ralonsoh> if you are using the basetest class 15:57:09 <slaweq> ok, good that it's not "yet another new issue with functional tests" :) 15:57:16 <ralonsoh> nonono 15:57:21 <slaweq> thx ralonsoh :) 15:57:24 <ralonsoh> yw! 15:57:55 <slaweq> so maybe something similar will be needed to fix failures like https://19dc65f6cfdf56a6f70b-c96c299047b55dcdeaefef8e344ceab6.ssl.cf5.rackcdn.com/702397/7/check/neutron-functional/8ca93bc/testr_results.html 15:58:07 <slaweq> I saw it also only in ovn related patches 15:58:30 <slaweq> and it seems that there is simly no needed route in neutron loaded 15:58:37 <ralonsoh> yes, first we need "part 1" patch 15:58:45 <ralonsoh> then will handle "part 2" 15:58:51 <slaweq> ok 15:58:53 <slaweq> thx 15:58:59 <slaweq> so those 2 from my list are fine than 15:59:19 <slaweq> for fullstack tests I saw such failure: 15:59:21 <slaweq> https://7994d6b1b4a3fac76e83-9707ce74906f3f341f743e6035ad1064.ssl.cf5.rackcdn.com/704397/2/check/neutron-fullstack/d44b482/controller/logs/dsvm-fullstack-logs/TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement_Open-vSwitch-agent_/neutron-server--2020-01-29--00-20-32-648918_log.txt 15:59:28 <slaweq> it's issue with connection to placement service 16:00:00 <slaweq> I will ask tomorrow rubasov and lajoskatona to take a look at it 16:00:05 <slaweq> maybe they can help with this 16:00:16 <slaweq> and we are out of time now 16:00:22 <slaweq> thx for the meeting guys 16:00:26 <ralonsoh> they have experience on this 16:00:29 <slaweq> see You around 16:00:29 <ralonsoh> bye!! 16:00:32 <slaweq> o/ 16:00:34 <slaweq> #endmeeting