15:00:04 #startmeeting neutron_ci 15:00:04 Meeting started Wed Jan 29 15:00:04 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:06 hi 15:00:08 The meeting name has been set to 'neutron_ci' 15:00:46 o/ 15:01:01 welcome on CI meeting at new hour and in new room 15:01:03 :) 15:01:20 o/ 15:01:45 hi 15:02:29 ok, lets start now 15:02:45 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:51 please open now :) 15:02:58 #topic Actions from previous meetings 15:03:11 ralonsoh to increase log level for ovsdbapp in fullstack/functional jobs 15:03:20 one sec 15:03:32 #link https://review.opendev.org/#/c/703791/ 15:03:47 it's still failing ... 15:04:00 ahh, this is the patch for that :) 15:04:01 I need to review how to configure properly the ml2 plugin in zuul 15:04:11 I commented it today again 15:04:34 but now I think that it will not work like that 15:04:48 as You need to set proper config options in test's setUp method 15:05:17 probably somewhere in https://github.com/openstack/neutron/blob/master/neutron/tests/functional/base.py 15:05:22 for functional tests 15:05:40 in those jobs we are not using config files at all 15:05:48 you are right 15:05:51 not in the FTs 15:06:06 but I should configure like this in fullstack 15:06:08 and in fullstack it is similar 15:06:27 ok, I'll check it later today 15:06:28 it's here: https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/resources/config.py 15:06:43 you are right 15:06:44 sorry that I didn't wrote it earlier 15:06:55 the agent is configured there 15:06:56 thanks!! 15:07:00 idk why I missed that and tried to fix Your patch in "Your way" :) 15:07:49 ok, next one was: 15:07:51 slaweq to open bug for issue with get_dp_id in os_ken 15:08:01 I reported it here https://bugs.launchpad.net/neutron/+bug/1861269 15:08:01 Launchpad bug 1861269 in neutron "Functional tests failing due to failure with getting datapath ID from ovs" [High,Confirmed] 15:08:10 slaweq, I think I have a possible solution for this 15:08:11 but I didn't assign it to myself 15:08:22 #link https://review.opendev.org/#/c/704397/ 15:08:30 I need to justify it 15:08:47 but the point is, although we have multithreading because of os-ken 15:09:02 funny, usually in this kind of review the "fix" is add a sleep, not remove one :) 15:09:02 the ovs agent code should not give the GIL to other tasks 15:09:21 that means: do not use sleep, what will stop the thread execution 15:09:40 if other threads are not expecting the GIL then those threads won't return it back 15:10:03 I've rechecked this patch several times, no errors in FT and fullstack (related) 15:10:03 ralonsoh: makes sense IMO 15:10:17 I'll add a proper explanation in the patch 15:10:30 bcafarel: LOL, that's true, usually we need to add sleep to "fix" something :) 15:10:34 hehehhehe 15:10:44 ralonsoh: please link Your patch to this bug also 15:10:48 sure 15:11:04 thx ralonsoh 15:11:20 wow that line is old, it comes directly from "Introduce Ryu based OpenFlow implementation" 15:11:28 exactly! 15:12:31 ok, next one 15:12:33 slaweq to try to skip cleaning up neutron resources in fullstack job 15:12:53 I was trying it locally and it was fast about 4-5 seconds on test 15:13:03 so I think it's not worth to do it 15:13:17 and I didn't send any patch 15:13:42 yeah, not too much (and maybe we can introduce new errors) 15:13:49 ralonsoh: exactly 15:14:06 so risk of unpredictible side effects is too high IMO 15:14:46 ok, that was all from last week 15:14:52 #topic Stadium projects 15:15:17 as we talked yesterday, we finished dropping py2 support in Neutron 15:15:21 \o/ 15:15:33 fantastic 15:15:37 so lets use etherpad https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop to track migration to zuul v3 15:16:16 but I have one more thing about dropping py2 support 15:16:19 there is patch https://review.opendev.org/#/c/704257/ 15:16:24 for neutron-tempest-plugin 15:16:45 it's not working properly for Rocky jobs 15:17:32 and I have a question: shouldn't we first make tag of neutron-tempest-plugin repo and use this tag for rocky with py27 15:17:44 and than go with this patch to drop py27 completly? 15:17:50 exactly 15:18:02 or how it will work with rocky after we will merge it? 15:18:14 that makes sense to me 15:18:16 we need to tag it first to use it in rocky tests 15:18:18 gmann: njohnston: am I missing something here? 15:19:30 I am not completely sure - gmann has been very active in this area, I believe he has a plan, but I am not sure all the details 15:20:19 ok, I will ask about it in review 15:21:14 ok, njohnston any other updates about zuulv3 migration? 15:22:02 I don't have any updates; it has been accepted as an official goal for the V cycle, so we are way ahead of schedule, but it will be good to finish up because of all the reasons 15:22:32 there are only 3 or 4 stadium projects that have changes left 15:22:40 not too many 15:22:56 bcafarel is working on bgpvpn https://review.opendev.org/#/c/703601/ 15:22:59 stadium-wise there is also amotoki's question on vpnaas failing on rocky and moving to use neutron-tempest-plugin there 15:23:38 bcafarel: do You have any patch with failure example? 15:24:00 for vpnaas? https://review.opendev.org/#/c/590569/ 15:24:56 also for bgpvpn I was wondering: there is an install job which does not run any tests (as per its name), should we migrate it or just drop it? I think other tests cover the "is it installable?" part 15:25:41 neutron-dynamic-routing also has bcafarel's magic touch https://review.opendev.org/#/c/703582/ ; I don't see anything zuulv3 related for networking-odl, networking-midonet, neutron-vpnaas 15:27:22 thats it for me 15:27:32 thx njohnston 15:27:41 speaking about this vpnaas rocky issue 15:27:59 am I understanding correct that if we would pin tempest used for rocky than it would be fine? 15:28:33 or we should use for rocky branch job defined in neutron-tempest-plugin repo (like for master now)? 15:29:56 I'll defer on that to gmann 15:30:24 ok, I will talk with him about it 15:30:41 #action slaweq to talk with gmann about vpnaas jobs on rocky 15:30:53 * slaweq starts hating rocky branch now 15:31:02 :) 15:31:18 * njohnston welcomes slaweq to the club 15:31:20 "old but not enough" branch 15:31:40 lol, that's true 15:31:57 I have had too many backports that go "train: green; stein: green; rocky: RED; queens: green" 15:31:58 good news is that it's just few more weeks and it will be EM 15:32:55 ok, lets move on 15:33:09 or do You have anything else related to stadium for today? 15:33:20 nope, nothing else 15:34:09 ok, so lets move on 15:34:11 #topic Grafana 15:34:20 #link http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1 15:34:52 from what I can say, we are much better with scenario jobs now 15:35:08 but our biggest problems are fullstack/functional jobs 15:35:12 and grenade jobs 15:35:57 and the biggest issue from those is functional job 15:36:05 yep 15:37:54 and I think that we are missing some ovn related job on the dashboard now 15:38:05 I will check that and update dashboard if needed 15:38:20 #action slaweq to update grafana dashboard with missing jobs 15:38:42 are all the jobs in? I think I saw some reviews on functional ovn (at least) 15:39:10 but functional tests will be run together with our "old" functional job I think 15:39:22 ah ok :) 15:39:51 anything else related to grafana for today? 15:40:29 ok, so lets move on than 15:40:40 #topic fullstack/functional 15:40:53 I have few examples of failures in functional job 15:41:10 first again ovsdbapp command timeouts: 15:41:12 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_abb/704530/4/gate/neutron-functional/abbc532/testr_results.html 15:41:14 https://ce2d3847c3fd9d644a91-e099c5c03695c7198c297e75ec3f8d05.ssl.cf2.rackcdn.com/704240/3/gate/neutron-functional/ab28823/testr_results.html 15:41:21 but I know ralonsoh is on it already 15:41:38 so it's just to point to the new examples of this issue 15:42:01 yes, let's see if we can have more information with the patch uploaded 15:42:26 but those are the main problems we see in FT and fullstack 15:42:31 1) ovsdb timeouts 15:42:40 2) the os-ken datapath timeout 15:42:46 3) pyroute timeouts 15:42:57 (did I say "timeout" before?) 15:43:11 lol 15:43:26 yeah, timeouts are our biggest nightmare now :/ 15:43:29 lol 15:43:52 but it seems logical that removing "sleep" from code may solve timeouts :P 15:44:33 ok, next one than (this one is new for me, at least I don't remember anything like that) 15:44:38 failure in neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_correct_protection_allowed_address_pairs 15:44:46 https://4131d9f319da782ce250-f8398ccf7503ce4fb23659d29292afec.ssl.cf2.rackcdn.com/694568/16/check/neutron-functional/4fe1d73/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_correct_protection_allowed_address_pairs.txt 15:44:59 there are errors like 2020-01-29 09:11:10.172 22333 ERROR ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: error parsing update: ovsdb error: Modify non-existing row: ovs.db.error.Error: ovsdb error: Modify non-existing row 15:45:26 I have no idea in this one 15:45:46 how, a Linux Bridge test, is hitting an OVS error?? 15:45:50 * njohnston looks for otherwiseguy 15:45:51 some race condition in test because of our timeout friend? 15:45:52 me neighter but I wonder why ovsdbapp is used in those Linuxbridge tests 15:46:00 that's the point 15:46:03 or LB or OVS 15:46:17 oh 15:47:01 and here is error in test: https://4131d9f319da782ce250-f8398ccf7503ce4fb23659d29292afec.ssl.cf2.rackcdn.com/694568/16/check/neutron-functional/4fe1d73/testr_results.html 15:47:15 so it seems that it failed on preparation of test env 15:48:18 slaweq, that seems an error from a previous test 15:48:29 and a blocked greenlet thread 15:48:40 maybe it's too late 15:48:54 but the use of greenthreads, IMO, was not a good option 15:49:24 (remember python does NOT have multithreading at all) 15:49:28 but that's the only failed test in this job 15:49:35 I know... 15:50:56 lets see if we will have more such issues, as nobody saw it before so far maybe it will never happen again ;) 15:51:09 * slaweq don't belive himself even 15:51:19 hahahah 15:51:36 :-D 15:51:55 and I have one more, like: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_00e/701733/24/check/neutron-functional/00ede7c/testr_results.html 15:52:01 * bcafarel is not betting on it either 15:52:08 and this one I think I saw at least twice this week 15:52:19 no no 15:52:23 this is not a problem 15:52:35 no? 15:52:37 why? 15:52:40 that was related to an error in the OVN functional tests 15:52:46 but that's solved not 15:52:47 now 15:52:48 one sec 15:53:02 (also I pushed a DNM patch to test this) 15:53:07 a sorting order issue right? 15:53:19 ahh, so it's related to ovn migration, right? 15:53:36 https://review.opendev.org/#/c/701733/24..26 15:53:43 please, read the diff and you'll understand 15:53:55 --> https://review.opendev.org/#/c/701733/24..26/neutron/tests/functional/plugins/ml2/drivers/ovn/mech_driver/ovsdb/test_maintenance.py 15:55:35 BTW, that problem in FTs was tested in https://review.opendev.org/#/c/704376 15:56:11 so it was trying to use test_extensions.setup_extensions_middleware(sg_mgr) as security groups api, instead of "normal" one, right? 15:56:42 exactly 15:56:51 that was needed in networking-ovn 15:56:57 but NOT in neutron repo 15:57:05 if you are using the basetest class 15:57:09 ok, good that it's not "yet another new issue with functional tests" :) 15:57:16 nonono 15:57:21 thx ralonsoh :) 15:57:24 yw! 15:57:55 so maybe something similar will be needed to fix failures like https://19dc65f6cfdf56a6f70b-c96c299047b55dcdeaefef8e344ceab6.ssl.cf5.rackcdn.com/702397/7/check/neutron-functional/8ca93bc/testr_results.html 15:58:07 I saw it also only in ovn related patches 15:58:30 and it seems that there is simly no needed route in neutron loaded 15:58:37 yes, first we need "part 1" patch 15:58:45 then will handle "part 2" 15:58:51 ok 15:58:53 thx 15:58:59 so those 2 from my list are fine than 15:59:19 for fullstack tests I saw such failure: 15:59:21 https://7994d6b1b4a3fac76e83-9707ce74906f3f341f743e6035ad1064.ssl.cf5.rackcdn.com/704397/2/check/neutron-fullstack/d44b482/controller/logs/dsvm-fullstack-logs/TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement_Open-vSwitch-agent_/neutron-server--2020-01-29--00-20-32-648918_log.txt 15:59:28 it's issue with connection to placement service 16:00:00 I will ask tomorrow rubasov and lajoskatona to take a look at it 16:00:05 maybe they can help with this 16:00:16 and we are out of time now 16:00:22 thx for the meeting guys 16:00:26 they have experience on this 16:00:29 see You around 16:00:29 bye!! 16:00:32 o/ 16:00:34 #endmeeting