#openstack-meeting-3 log

15:00:07 <slaweq> #startmeeting neutron_ci
15:00:08 <openstack> Meeting started Wed Jul 29 15:00:07 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:09 <slaweq> hi
15:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:12 <openstack> The meeting name has been set to 'neutron_ci'
15:00:34 <ralonsoh> hi
15:01:55 <bcafarel> o/
15:02:30 <slaweq> lets start
15:02:40 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:54 <slaweq> we don't have any actions from last meeting for today
15:03:03 <slaweq> so lets move directly to the next topic
15:03:07 <slaweq> #topic Stadium projects
15:03:21 <slaweq> and first item there which is
15:03:23 <slaweq> standardize on zuul v3
15:03:27 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop
15:03:44 <slaweq> based on recent mail from tosky I updated our etherpad a bit
15:03:52 <slaweq> so we have a bit more things to do there
15:04:14 <slaweq> I will try to send some patches related to it this and next week
15:04:34 <slaweq> but if You want to help, feel free to send some patches and add links to them in the etherpad
15:05:54 <bcafarel> oh I recognize my comment on networking-sfc from main etherpad :)
15:06:00 <slaweq> :)
15:06:28 <slaweq> anything else regarding stadium projects for today?
15:07:03 <bcafarel> just general organization question should we add a focal migration section here? (or else in the meeting)
15:07:18 <slaweq> there is separate topic for this in the agenda
15:07:33 <bcafarel> I had started https://etherpad.opendev.org/p/neutron-victoria-switch_to_focal at some point close in style to zuulv3 transition one
15:08:11 <slaweq> thx bcafarel
15:08:15 <slaweq> that's great
15:09:45 <slaweq> I think we need to focus more on that goal
15:09:54 <slaweq> as we really need to do that soon IMO
15:10:32 <bcafarel> +1 and at least from me I keep getting sidetracked on other stuff :(
15:10:32 <tosky> yep, Victoria is not so far away
15:10:58 <bcafarel> we have ovs installed from pip now but functional job timeouts after that
15:11:29 <slaweq> bcafarel: ovs from the pip? You mean from the repo, right?
15:11:30 <lajoskatona> o/, sorry for being late
15:12:06 <ralonsoh> python-ovs
15:12:09 <bcafarel> python-openvswith to be precise
15:12:14 <bcafarel> ralonsoh: always faster :)
15:12:16 <ralonsoh> exactly
15:12:27 <slaweq> ahh, ok :)
15:13:07 <slaweq> is it this patch https://review.opendev.org/#/c/734304/ ?
15:13:28 <ralonsoh> it is
15:14:05 <slaweq> it's failing due to some errors related to ovn probably: https://60633e52ebee9acccc9c-bc7527a6a4fcb9c26b6a927801c9ca9a.ssl.cf2.rackcdn.com/734304/8/check/neutron-functional/dd799fc/job-output.txt
15:14:23 <slaweq> there is plenty of errors like " sqlite3.OperationalError: no such table: ovn_hash_ring"
15:14:43 <slaweq> but not only that
15:14:49 <slaweq> errors with ncat process
15:14:51 <slaweq> and others
15:15:25 <slaweq> I think someone needs to run locally vm with ubuntu focal and run those tests there to investigate
15:15:36 <bcafarel> just rebased DNM https://review.opendev.org/#/c/738163/ to get fresh results on other jobs
15:16:17 <ralonsoh> yes but this thing about the ovn_hash_ring table...
15:16:21 <ralonsoh> I need to investigate that
15:16:24 <slaweq> and I also see there issues related to new ebtables
15:16:47 <slaweq> we had something similar in RH some time ago when we moved to RHEL/Centos 8
15:17:07 <slaweq> but we simply skipped those failing Linuxbridge related tests in d/s
15:18:06 <slaweq> based on that I think we really need to give high priority for that as we have some real issues on this new OS
15:19:02 <slaweq> I just increased timeout for functional job in https://review.opendev.org/#/c/734304/
15:19:37 <slaweq> I hope it will finish without timeout so we will have cleaner look on what tests are failing there
15:20:08 <slaweq> and lets focus on that in next days
15:22:38 <slaweq> ok, I think we can move on now
15:22:40 <slaweq> #topic Stable branches
15:22:47 <slaweq> Ussuri dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1
15:22:49 <slaweq> Train dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1
15:24:24 <bcafarel> from what I saw in recent backports, we are back to stable CI for stable branches
15:24:51 <slaweq> bcafarel: that's my understanding from the look for grafana :)
15:26:14 <slaweq> do we have anything else related to CI of stable branches for today?
15:26:18 <slaweq> or can we move on?
15:26:50 <bcafarel> nothing from me at least
15:28:27 <slaweq> ok, lets move on
15:28:35 <slaweq> #topic Grafana
15:28:40 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:30:22 <slaweq> first of all, I today proposed patch  https://review.opendev.org/743729 with small improvement (IMHO)
15:30:39 <slaweq> I proposed to move non-voting scenario jobs to separate graph
15:30:52 <ralonsoh> agree with this
15:31:02 <slaweq> as now we have 12 jobs on one graph and it's a bit hard to check quickly which job is failing often
15:31:11 <slaweq> and if it's voting or non-voting job
15:32:29 <slaweq> other than that I think we have a lot of functional jobs failures recently
15:32:41 <ralonsoh> due to a problem in ovn
15:32:51 <ralonsoh> https://review.opendev.org/#/c/743577/
15:33:05 <ralonsoh> (most of them)
15:33:16 <slaweq> yes, I think that main problem is that one ralonsoh :)
15:33:52 <slaweq> but second issue which happens pretty often is ovsdbapp timeout still
15:33:55 <slaweq> like e.g. https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_83a/729931/1/check/neutron-fullstack-with-uwsgi/83a80df/testr_results.html
15:34:05 <slaweq> this one is from fullstack but I saw it also in functional tests
15:34:12 <ralonsoh> yes... I'm aware
15:34:21 <ralonsoh> and I don't know how to fix that
15:34:21 <slaweq> and I know that we already know about it
15:34:58 <slaweq> can't we maybe increase this timeout somehow in tests?
15:35:07 <slaweq> to e.g. 30 seconds instead of 10?
15:35:07 <ralonsoh> not a problem of timeout
15:35:11 <slaweq> ahh
15:35:24 <ralonsoh> but with eventlet and threads, the recurrent issue
15:35:35 <slaweq> ahh
15:36:21 <ralonsoh> for me this problem is like https://i.imgur.com/ofRuetq.jpg
15:36:47 <slaweq> LOL
15:37:36 <slaweq> so we can't do anything with that really?
15:38:14 <ralonsoh> still thinking about this
15:38:16 <slaweq> ralonsoh: and why we don't see it in e.g. scenario jobs but only in functional/fullstack?
15:38:21 <slaweq> or we do?
15:38:44 <ralonsoh> I think this is because the system is more busy during FT/fullstack
15:39:49 <lajoskatona> is this only cpu power i.e., or OVS internals<
15:39:50 <lajoskatona> ?
15:40:10 <ralonsoh> this is just a guess
15:40:31 <ralonsoh> but if you are waiting for a message and you use a "wait" method
15:40:53 <ralonsoh> and then no other thread returns the GIL on time, you'll have a timeout
15:41:08 <ralonsoh> is the  solution to increase the timeout? I don't think so
15:41:24 <ralonsoh> the solution is to find out where to make an active wait not returning the GIL
15:41:41 <ralonsoh> (using evenlets)
15:45:28 <ralonsoh> Am I still connected?
15:45:33 <slaweq> ralonsoh: yes
15:45:43 <ralonsoh> cool
15:45:55 <slaweq> I just don't know what else I can add to this :)
15:46:02 <ralonsoh> hehehe me neither
15:46:23 <slaweq> ok, lets keep an eye on it, maybe someday someone will find solution :)
15:47:31 <slaweq> lets go to the next topic for today
15:47:32 <slaweq> #topic Tempest/Scenario
15:47:47 <slaweq> I found 2 new (for me) failures in last days
15:47:49 <slaweq> first:
15:47:51 <slaweq> neutron_tempest_plugin.api.admin.test_shared_network_extension.RBACSharedNetworksTest
15:47:56 <slaweq> https://bcbff5a63272be990f4a-3078f09a35fdfa0355b18572be8c3ad5.ssl.cf2.rackcdn.com/702197/5/check/neutron-tempest-plugin-api/068ce6c/testr_results.html
15:48:41 <slaweq> but now I think it was related to the patch on which it was run: https://review.opendev.org/#/c/702197/
15:49:12 <slaweq> so please ignore it
15:49:23 <slaweq> second one was:
15:49:25 <slaweq> neutron_tempest_plugin.scenario.test_dns_integration.DNSIntegrationAdminTests
15:49:29 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_baf/740588/3/check/neutron-tempest-plugin-designate-scenario/baf5297/testr_results.html
15:50:11 <slaweq> did You saw such failures before?
15:50:21 <ralonsoh> the first one yes, not the second one
15:50:35 <ralonsoh> (actually I saw the first one because that was my patch)
15:50:35 <slaweq> ralonsoh: first one was on Your patch :P
15:50:39 <ralonsoh> hehehehe
15:51:04 <slaweq> second one failed due to error:
15:51:06 <slaweq> designateclient.exceptions.Conflict: Duplicate Zone
15:51:09 <slaweq> in neutron server logs
15:51:13 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_baf/740588/3/check/neutron-tempest-plugin-designate-scenario/baf5297/controller/logs/screen-q-svc.txt
15:51:28 <slaweq> I will keep an eye on this job to see if that will happen more
15:51:35 <slaweq> and if so, I will open LP for that issue
15:52:47 <slaweq> and that's basically all what I have for today
15:52:56 <slaweq> do You have anything else for today's meeting?
15:53:06 <ralonsoh> yes, a request for review
15:53:11 <ralonsoh> related to fullstack tests
15:53:19 <ralonsoh> https://review.opendev.org/#/c/738446/
15:53:48 <ralonsoh> that will reduce the problem we have sometimes with test clashes
15:53:51 <slaweq> sure
15:53:56 <slaweq> I will review it this week
15:53:58 <slaweq> asap
15:54:19 * bcafarel adds to the pile
15:54:35 <bcafarel> one additional stable topic brought in https://review.opendev.org/#/c/742519/ comments
15:54:56 <bcafarel> do you think jobs running against master branches of other projects are useful in stable branches?
15:55:12 <bcafarel> neutron-tempest-with-os-ken-master / neutron-ovn-tempest-ovs-master-fedora periodic jobs specifically
15:55:16 <slaweq> no
15:55:24 <slaweq> I think we should drop such jobs in stable branches
15:55:42 <maciejjozefczyk> +1
15:55:42 <slaweq> stable branches by definition aren't expected to run with other things from master
15:56:06 <bcafarel> ok I will send cleanup patches (less periodic jobs to run, infra will be happy)
15:56:48 <slaweq> thx bcafarel
15:57:06 <slaweq> #action bcafarel to clean stable branches jobs from *-master jobs
15:57:20 <slaweq> ^^ I had to add at least one today :P
15:57:28 <bcafarel> all actions on me then :)
15:57:53 <slaweq> :)
15:58:01 <slaweq> ok, I think we can finish the meeting now
15:58:05 <slaweq> thx for attending
15:58:08 <ralonsoh> bye!
15:58:09 <bcafarel> o/
15:58:13 <lajoskatona> bye
15:58:15 <slaweq> and for taking care of ci in last 2 weeks :)
15:58:18 <slaweq> o/
15:58:22 <slaweq> #endmeeting