15:00:07 <slaweq> #startmeeting neutron_ci 15:00:08 <openstack> Meeting started Wed Jul 29 15:00:07 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:09 <slaweq> hi 15:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:12 <openstack> The meeting name has been set to 'neutron_ci' 15:00:34 <ralonsoh> hi 15:01:55 <bcafarel> o/ 15:02:30 <slaweq> lets start 15:02:40 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:54 <slaweq> we don't have any actions from last meeting for today 15:03:03 <slaweq> so lets move directly to the next topic 15:03:07 <slaweq> #topic Stadium projects 15:03:21 <slaweq> and first item there which is 15:03:23 <slaweq> standardize on zuul v3 15:03:27 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 15:03:44 <slaweq> based on recent mail from tosky I updated our etherpad a bit 15:03:52 <slaweq> so we have a bit more things to do there 15:04:14 <slaweq> I will try to send some patches related to it this and next week 15:04:34 <slaweq> but if You want to help, feel free to send some patches and add links to them in the etherpad 15:05:54 <bcafarel> oh I recognize my comment on networking-sfc from main etherpad :) 15:06:00 <slaweq> :) 15:06:28 <slaweq> anything else regarding stadium projects for today? 15:07:03 <bcafarel> just general organization question should we add a focal migration section here? (or else in the meeting) 15:07:18 <slaweq> there is separate topic for this in the agenda 15:07:33 <bcafarel> I had started https://etherpad.opendev.org/p/neutron-victoria-switch_to_focal at some point close in style to zuulv3 transition one 15:08:11 <slaweq> thx bcafarel 15:08:15 <slaweq> that's great 15:09:45 <slaweq> I think we need to focus more on that goal 15:09:54 <slaweq> as we really need to do that soon IMO 15:10:32 <bcafarel> +1 and at least from me I keep getting sidetracked on other stuff :( 15:10:32 <tosky> yep, Victoria is not so far away 15:10:58 <bcafarel> we have ovs installed from pip now but functional job timeouts after that 15:11:29 <slaweq> bcafarel: ovs from the pip? You mean from the repo, right? 15:11:30 <lajoskatona> o/, sorry for being late 15:12:06 <ralonsoh> python-ovs 15:12:09 <bcafarel> python-openvswith to be precise 15:12:14 <bcafarel> ralonsoh: always faster :) 15:12:16 <ralonsoh> exactly 15:12:27 <slaweq> ahh, ok :) 15:13:07 <slaweq> is it this patch https://review.opendev.org/#/c/734304/ ? 15:13:28 <ralonsoh> it is 15:14:05 <slaweq> it's failing due to some errors related to ovn probably: https://60633e52ebee9acccc9c-bc7527a6a4fcb9c26b6a927801c9ca9a.ssl.cf2.rackcdn.com/734304/8/check/neutron-functional/dd799fc/job-output.txt 15:14:23 <slaweq> there is plenty of errors like " sqlite3.OperationalError: no such table: ovn_hash_ring" 15:14:43 <slaweq> but not only that 15:14:49 <slaweq> errors with ncat process 15:14:51 <slaweq> and others 15:15:25 <slaweq> I think someone needs to run locally vm with ubuntu focal and run those tests there to investigate 15:15:36 <bcafarel> just rebased DNM https://review.opendev.org/#/c/738163/ to get fresh results on other jobs 15:16:17 <ralonsoh> yes but this thing about the ovn_hash_ring table... 15:16:21 <ralonsoh> I need to investigate that 15:16:24 <slaweq> and I also see there issues related to new ebtables 15:16:47 <slaweq> we had something similar in RH some time ago when we moved to RHEL/Centos 8 15:17:07 <slaweq> but we simply skipped those failing Linuxbridge related tests in d/s 15:18:06 <slaweq> based on that I think we really need to give high priority for that as we have some real issues on this new OS 15:19:02 <slaweq> I just increased timeout for functional job in https://review.opendev.org/#/c/734304/ 15:19:37 <slaweq> I hope it will finish without timeout so we will have cleaner look on what tests are failing there 15:20:08 <slaweq> and lets focus on that in next days 15:22:38 <slaweq> ok, I think we can move on now 15:22:40 <slaweq> #topic Stable branches 15:22:47 <slaweq> Ussuri dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 15:22:49 <slaweq> Train dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 15:24:24 <bcafarel> from what I saw in recent backports, we are back to stable CI for stable branches 15:24:51 <slaweq> bcafarel: that's my understanding from the look for grafana :) 15:26:14 <slaweq> do we have anything else related to CI of stable branches for today? 15:26:18 <slaweq> or can we move on? 15:26:50 <bcafarel> nothing from me at least 15:28:27 <slaweq> ok, lets move on 15:28:35 <slaweq> #topic Grafana 15:28:40 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:30:22 <slaweq> first of all, I today proposed patch https://review.opendev.org/743729 with small improvement (IMHO) 15:30:39 <slaweq> I proposed to move non-voting scenario jobs to separate graph 15:30:52 <ralonsoh> agree with this 15:31:02 <slaweq> as now we have 12 jobs on one graph and it's a bit hard to check quickly which job is failing often 15:31:11 <slaweq> and if it's voting or non-voting job 15:32:29 <slaweq> other than that I think we have a lot of functional jobs failures recently 15:32:41 <ralonsoh> due to a problem in ovn 15:32:51 <ralonsoh> https://review.opendev.org/#/c/743577/ 15:33:05 <ralonsoh> (most of them) 15:33:16 <slaweq> yes, I think that main problem is that one ralonsoh :) 15:33:52 <slaweq> but second issue which happens pretty often is ovsdbapp timeout still 15:33:55 <slaweq> like e.g. https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_83a/729931/1/check/neutron-fullstack-with-uwsgi/83a80df/testr_results.html 15:34:05 <slaweq> this one is from fullstack but I saw it also in functional tests 15:34:12 <ralonsoh> yes... I'm aware 15:34:21 <ralonsoh> and I don't know how to fix that 15:34:21 <slaweq> and I know that we already know about it 15:34:58 <slaweq> can't we maybe increase this timeout somehow in tests? 15:35:07 <slaweq> to e.g. 30 seconds instead of 10? 15:35:07 <ralonsoh> not a problem of timeout 15:35:11 <slaweq> ahh 15:35:24 <ralonsoh> but with eventlet and threads, the recurrent issue 15:35:35 <slaweq> ahh 15:36:21 <ralonsoh> for me this problem is like https://i.imgur.com/ofRuetq.jpg 15:36:47 <slaweq> LOL 15:37:36 <slaweq> so we can't do anything with that really? 15:38:14 <ralonsoh> still thinking about this 15:38:16 <slaweq> ralonsoh: and why we don't see it in e.g. scenario jobs but only in functional/fullstack? 15:38:21 <slaweq> or we do? 15:38:44 <ralonsoh> I think this is because the system is more busy during FT/fullstack 15:39:49 <lajoskatona> is this only cpu power i.e., or OVS internals< 15:39:50 <lajoskatona> ? 15:40:10 <ralonsoh> this is just a guess 15:40:31 <ralonsoh> but if you are waiting for a message and you use a "wait" method 15:40:53 <ralonsoh> and then no other thread returns the GIL on time, you'll have a timeout 15:41:08 <ralonsoh> is the solution to increase the timeout? I don't think so 15:41:24 <ralonsoh> the solution is to find out where to make an active wait not returning the GIL 15:41:41 <ralonsoh> (using evenlets) 15:45:28 <ralonsoh> Am I still connected? 15:45:33 <slaweq> ralonsoh: yes 15:45:43 <ralonsoh> cool 15:45:55 <slaweq> I just don't know what else I can add to this :) 15:46:02 <ralonsoh> hehehe me neither 15:46:23 <slaweq> ok, lets keep an eye on it, maybe someday someone will find solution :) 15:47:31 <slaweq> lets go to the next topic for today 15:47:32 <slaweq> #topic Tempest/Scenario 15:47:47 <slaweq> I found 2 new (for me) failures in last days 15:47:49 <slaweq> first: 15:47:51 <slaweq> neutron_tempest_plugin.api.admin.test_shared_network_extension.RBACSharedNetworksTest 15:47:56 <slaweq> https://bcbff5a63272be990f4a-3078f09a35fdfa0355b18572be8c3ad5.ssl.cf2.rackcdn.com/702197/5/check/neutron-tempest-plugin-api/068ce6c/testr_results.html 15:48:41 <slaweq> but now I think it was related to the patch on which it was run: https://review.opendev.org/#/c/702197/ 15:49:12 <slaweq> so please ignore it 15:49:23 <slaweq> second one was: 15:49:25 <slaweq> neutron_tempest_plugin.scenario.test_dns_integration.DNSIntegrationAdminTests 15:49:29 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_baf/740588/3/check/neutron-tempest-plugin-designate-scenario/baf5297/testr_results.html 15:50:11 <slaweq> did You saw such failures before? 15:50:21 <ralonsoh> the first one yes, not the second one 15:50:35 <ralonsoh> (actually I saw the first one because that was my patch) 15:50:35 <slaweq> ralonsoh: first one was on Your patch :P 15:50:39 <ralonsoh> hehehehe 15:51:04 <slaweq> second one failed due to error: 15:51:06 <slaweq> designateclient.exceptions.Conflict: Duplicate Zone 15:51:09 <slaweq> in neutron server logs 15:51:13 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_baf/740588/3/check/neutron-tempest-plugin-designate-scenario/baf5297/controller/logs/screen-q-svc.txt 15:51:28 <slaweq> I will keep an eye on this job to see if that will happen more 15:51:35 <slaweq> and if so, I will open LP for that issue 15:52:47 <slaweq> and that's basically all what I have for today 15:52:56 <slaweq> do You have anything else for today's meeting? 15:53:06 <ralonsoh> yes, a request for review 15:53:11 <ralonsoh> related to fullstack tests 15:53:19 <ralonsoh> https://review.opendev.org/#/c/738446/ 15:53:48 <ralonsoh> that will reduce the problem we have sometimes with test clashes 15:53:51 <slaweq> sure 15:53:56 <slaweq> I will review it this week 15:53:58 <slaweq> asap 15:54:19 * bcafarel adds to the pile 15:54:35 <bcafarel> one additional stable topic brought in https://review.opendev.org/#/c/742519/ comments 15:54:56 <bcafarel> do you think jobs running against master branches of other projects are useful in stable branches? 15:55:12 <bcafarel> neutron-tempest-with-os-ken-master / neutron-ovn-tempest-ovs-master-fedora periodic jobs specifically 15:55:16 <slaweq> no 15:55:24 <slaweq> I think we should drop such jobs in stable branches 15:55:42 <maciejjozefczyk> +1 15:55:42 <slaweq> stable branches by definition aren't expected to run with other things from master 15:56:06 <bcafarel> ok I will send cleanup patches (less periodic jobs to run, infra will be happy) 15:56:48 <slaweq> thx bcafarel 15:57:06 <slaweq> #action bcafarel to clean stable branches jobs from *-master jobs 15:57:20 <slaweq> ^^ I had to add at least one today :P 15:57:28 <bcafarel> all actions on me then :) 15:57:53 <slaweq> :) 15:58:01 <slaweq> ok, I think we can finish the meeting now 15:58:05 <slaweq> thx for attending 15:58:08 <ralonsoh> bye! 15:58:09 <bcafarel> o/ 15:58:13 <lajoskatona> bye 15:58:15 <slaweq> and for taking care of ci in last 2 weeks :) 15:58:18 <slaweq> o/ 15:58:22 <slaweq> #endmeeting