15:00:07 #startmeeting neutron_ci 15:00:08 Meeting started Wed Jul 29 15:00:07 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:09 hi 15:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:12 The meeting name has been set to 'neutron_ci' 15:00:34 hi 15:01:55 o/ 15:02:30 lets start 15:02:40 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:54 we don't have any actions from last meeting for today 15:03:03 so lets move directly to the next topic 15:03:07 #topic Stadium projects 15:03:21 and first item there which is 15:03:23 standardize on zuul v3 15:03:27 Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 15:03:44 based on recent mail from tosky I updated our etherpad a bit 15:03:52 so we have a bit more things to do there 15:04:14 I will try to send some patches related to it this and next week 15:04:34 but if You want to help, feel free to send some patches and add links to them in the etherpad 15:05:54 oh I recognize my comment on networking-sfc from main etherpad :) 15:06:00 :) 15:06:28 anything else regarding stadium projects for today? 15:07:03 just general organization question should we add a focal migration section here? (or else in the meeting) 15:07:18 there is separate topic for this in the agenda 15:07:33 I had started https://etherpad.opendev.org/p/neutron-victoria-switch_to_focal at some point close in style to zuulv3 transition one 15:08:11 thx bcafarel 15:08:15 that's great 15:09:45 I think we need to focus more on that goal 15:09:54 as we really need to do that soon IMO 15:10:32 +1 and at least from me I keep getting sidetracked on other stuff :( 15:10:32 yep, Victoria is not so far away 15:10:58 we have ovs installed from pip now but functional job timeouts after that 15:11:29 bcafarel: ovs from the pip? You mean from the repo, right? 15:11:30 o/, sorry for being late 15:12:06 python-ovs 15:12:09 python-openvswith to be precise 15:12:14 ralonsoh: always faster :) 15:12:16 exactly 15:12:27 ahh, ok :) 15:13:07 is it this patch https://review.opendev.org/#/c/734304/ ? 15:13:28 it is 15:14:05 it's failing due to some errors related to ovn probably: https://60633e52ebee9acccc9c-bc7527a6a4fcb9c26b6a927801c9ca9a.ssl.cf2.rackcdn.com/734304/8/check/neutron-functional/dd799fc/job-output.txt 15:14:23 there is plenty of errors like " sqlite3.OperationalError: no such table: ovn_hash_ring" 15:14:43 but not only that 15:14:49 errors with ncat process 15:14:51 and others 15:15:25 I think someone needs to run locally vm with ubuntu focal and run those tests there to investigate 15:15:36 just rebased DNM https://review.opendev.org/#/c/738163/ to get fresh results on other jobs 15:16:17 yes but this thing about the ovn_hash_ring table... 15:16:21 I need to investigate that 15:16:24 and I also see there issues related to new ebtables 15:16:47 we had something similar in RH some time ago when we moved to RHEL/Centos 8 15:17:07 but we simply skipped those failing Linuxbridge related tests in d/s 15:18:06 based on that I think we really need to give high priority for that as we have some real issues on this new OS 15:19:02 I just increased timeout for functional job in https://review.opendev.org/#/c/734304/ 15:19:37 I hope it will finish without timeout so we will have cleaner look on what tests are failing there 15:20:08 and lets focus on that in next days 15:22:38 ok, I think we can move on now 15:22:40 #topic Stable branches 15:22:47 Ussuri dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 15:22:49 Train dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 15:24:24 from what I saw in recent backports, we are back to stable CI for stable branches 15:24:51 bcafarel: that's my understanding from the look for grafana :) 15:26:14 do we have anything else related to CI of stable branches for today? 15:26:18 or can we move on? 15:26:50 nothing from me at least 15:28:27 ok, lets move on 15:28:35 #topic Grafana 15:28:40 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:30:22 first of all, I today proposed patch https://review.opendev.org/743729 with small improvement (IMHO) 15:30:39 I proposed to move non-voting scenario jobs to separate graph 15:30:52 agree with this 15:31:02 as now we have 12 jobs on one graph and it's a bit hard to check quickly which job is failing often 15:31:11 and if it's voting or non-voting job 15:32:29 other than that I think we have a lot of functional jobs failures recently 15:32:41 due to a problem in ovn 15:32:51 https://review.opendev.org/#/c/743577/ 15:33:05 (most of them) 15:33:16 yes, I think that main problem is that one ralonsoh :) 15:33:52 but second issue which happens pretty often is ovsdbapp timeout still 15:33:55 like e.g. https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_83a/729931/1/check/neutron-fullstack-with-uwsgi/83a80df/testr_results.html 15:34:05 this one is from fullstack but I saw it also in functional tests 15:34:12 yes... I'm aware 15:34:21 and I don't know how to fix that 15:34:21 and I know that we already know about it 15:34:58 can't we maybe increase this timeout somehow in tests? 15:35:07 to e.g. 30 seconds instead of 10? 15:35:07 not a problem of timeout 15:35:11 ahh 15:35:24 but with eventlet and threads, the recurrent issue 15:35:35 ahh 15:36:21 for me this problem is like https://i.imgur.com/ofRuetq.jpg 15:36:47 LOL 15:37:36 so we can't do anything with that really? 15:38:14 still thinking about this 15:38:16 ralonsoh: and why we don't see it in e.g. scenario jobs but only in functional/fullstack? 15:38:21 or we do? 15:38:44 I think this is because the system is more busy during FT/fullstack 15:39:49 is this only cpu power i.e., or OVS internals< 15:39:50 ? 15:40:10 this is just a guess 15:40:31 but if you are waiting for a message and you use a "wait" method 15:40:53 and then no other thread returns the GIL on time, you'll have a timeout 15:41:08 is the solution to increase the timeout? I don't think so 15:41:24 the solution is to find out where to make an active wait not returning the GIL 15:41:41 (using evenlets) 15:45:28 Am I still connected? 15:45:33 ralonsoh: yes 15:45:43 cool 15:45:55 I just don't know what else I can add to this :) 15:46:02 hehehe me neither 15:46:23 ok, lets keep an eye on it, maybe someday someone will find solution :) 15:47:31 lets go to the next topic for today 15:47:32 #topic Tempest/Scenario 15:47:47 I found 2 new (for me) failures in last days 15:47:49 first: 15:47:51 neutron_tempest_plugin.api.admin.test_shared_network_extension.RBACSharedNetworksTest 15:47:56 https://bcbff5a63272be990f4a-3078f09a35fdfa0355b18572be8c3ad5.ssl.cf2.rackcdn.com/702197/5/check/neutron-tempest-plugin-api/068ce6c/testr_results.html 15:48:41 but now I think it was related to the patch on which it was run: https://review.opendev.org/#/c/702197/ 15:49:12 so please ignore it 15:49:23 second one was: 15:49:25 neutron_tempest_plugin.scenario.test_dns_integration.DNSIntegrationAdminTests 15:49:29 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_baf/740588/3/check/neutron-tempest-plugin-designate-scenario/baf5297/testr_results.html 15:50:11 did You saw such failures before? 15:50:21 the first one yes, not the second one 15:50:35 (actually I saw the first one because that was my patch) 15:50:35 ralonsoh: first one was on Your patch :P 15:50:39 hehehehe 15:51:04 second one failed due to error: 15:51:06 designateclient.exceptions.Conflict: Duplicate Zone 15:51:09 in neutron server logs 15:51:13 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_baf/740588/3/check/neutron-tempest-plugin-designate-scenario/baf5297/controller/logs/screen-q-svc.txt 15:51:28 I will keep an eye on this job to see if that will happen more 15:51:35 and if so, I will open LP for that issue 15:52:47 and that's basically all what I have for today 15:52:56 do You have anything else for today's meeting? 15:53:06 yes, a request for review 15:53:11 related to fullstack tests 15:53:19 https://review.opendev.org/#/c/738446/ 15:53:48 that will reduce the problem we have sometimes with test clashes 15:53:51 sure 15:53:56 I will review it this week 15:53:58 asap 15:54:19 * bcafarel adds to the pile 15:54:35 one additional stable topic brought in https://review.opendev.org/#/c/742519/ comments 15:54:56 do you think jobs running against master branches of other projects are useful in stable branches? 15:55:12 neutron-tempest-with-os-ken-master / neutron-ovn-tempest-ovs-master-fedora periodic jobs specifically 15:55:16 no 15:55:24 I think we should drop such jobs in stable branches 15:55:42 +1 15:55:42 stable branches by definition aren't expected to run with other things from master 15:56:06 ok I will send cleanup patches (less periodic jobs to run, infra will be happy) 15:56:48 thx bcafarel 15:57:06 #action bcafarel to clean stable branches jobs from *-master jobs 15:57:20 ^^ I had to add at least one today :P 15:57:28 all actions on me then :) 15:57:53 :) 15:58:01 ok, I think we can finish the meeting now 15:58:05 thx for attending 15:58:08 bye! 15:58:09 o/ 15:58:13 bye 15:58:15 and for taking care of ci in last 2 weeks :) 15:58:18 o/ 15:58:22 #endmeeting