15:01:15 <slaweq_> #startmeeting neutron_ci 15:01:16 <openstack> Meeting started Wed Oct 7 15:01:15 2020 UTC and is due to finish in 60 minutes. The chair is slaweq_. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:18 <slaweq_> hi 15:01:20 <openstack> The meeting name has been set to 'neutron_ci' 15:01:23 <lajoskatona> Hi 15:01:45 <bcafarel> o/ 15:02:08 <ralonsoh> hi 15:02:32 <slaweq_> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:35 <slaweq_> Please open now :) 15:03:27 <slaweq_> #topic Actions from previous meetings 15:03:35 <slaweq_> bcafarel to update our grafana dashboards for stable branches 15:04:09 <bcafarel> in progress, not sent yet (I wanted to check jobs listed there) 15:04:18 <slaweq_> ok, thx bcafarel 15:04:24 <slaweq_> I will assign it to You for next week 15:04:30 <slaweq_> just to remember about it 15:04:33 <slaweq_> ok? 15:04:42 <bcafarel> sounds good, also to have reviewers if it gets forgotten 15:04:46 <slaweq_> #action bcafarel to update our grafana dashboards for stable branches 15:04:53 <slaweq_> thx a lot 15:05:05 <slaweq_> ok, next one 15:05:07 <slaweq_> ralonsoh to report a bug and check failing openstack-tox-py36-with-ovsdbapp-master periodic job 15:05:29 <ralonsoh> I sent a patch to try to solve it 15:05:31 <ralonsoh> one sec 15:05:45 <ralonsoh> (should be on the etherpad) 15:06:30 <slaweq_> I don't see it on etherpad 15:06:35 <ralonsoh> https://review.opendev.org/#/c/755256/ 15:06:57 <ralonsoh> avoid to monkey patch processutils 15:07:21 <ralonsoh> well, use the original current_thread _active 15:07:35 <ralonsoh> but we'll need a new version of oslo.concurrency 15:07:51 <slaweq_> and it seems that it helped 15:08:01 <ralonsoh> at least locally 15:08:12 <ralonsoh> but I can't say that in the CI 15:08:13 <slaweq_> https://zuul.openstack.org/buildset/aa6cb9d44d1a49368494071338c7415e 15:08:16 <slaweq_> :) 15:08:18 <slaweq_> it helped 15:08:39 <ralonsoh> ah8hh ok, this is2 another problem 15:08:41 <ralonsoh> sorry 15:09:01 <ralonsoh> #link https://review.opendev.org/#/c/749537/ 15:09:04 <ralonsoh> this is the patch 15:09:06 <ralonsoh> sorry again 15:09:20 <slaweq_> :) 15:09:24 <slaweq_> don't need to sorry 15:09:29 <slaweq_> good that it's fixed :) 15:09:34 <slaweq_> thx ralonsoh 15:09:40 <slaweq_> and thx otherwiseguy 15:10:19 <slaweq_> ok, so I think we can move on to the next topics 15:10:22 <slaweq_> #topic Switch to Ubuntu Focal 15:10:29 <slaweq_> Etherpad: https://etherpad.opendev.org/p/neutron-victoria-switch_to_focal 15:10:40 <slaweq_> we still have some stadium projects to check/change 15:10:49 <slaweq_> but I didn't had time this week 15:10:57 <slaweq_> do You have any other updates on that? 15:11:03 <ralonsoh> no 15:11:50 <lajoskatona> no 15:12:38 <bcafarel> https://review.opendev.org/#/c/754068/ longing for second +2 for sfc :) 15:13:02 <bcafarel> else topic:migrate-to-focal list looks good for us 15:13:15 <slaweq_> bcafarel: I already gave +2 :) 15:13:21 <slaweq_> so I can't help with that one now 15:13:28 <slaweq_> ralonsoh: lajoskatona but You can ;) 15:13:32 <ralonsoh> sure 15:13:43 <lajoskatona> done :-) 15:14:38 <slaweq_> thx 15:14:51 <bcafarel> thanks :) 15:15:03 <lajoskatona> Shall I have a slighly related question, do we need this any more: https://review.opendev.org/755721 ? 15:15:37 <slaweq_> lajoskatona: nope 15:15:45 <slaweq_> it was an issue with pypi mirror 15:16:08 <lajoskatona> slaweq_: yeah that's why I asked :-) I abandone it then 15:16:11 <slaweq_> and I think ralonsoh fixed it on devstack by capping setuptools version 15:16:21 <ralonsoh> but that was rejected 15:16:28 <ralonsoh> the problem was in the pypi server 15:16:35 <slaweq_> ralonsoh: ahh, ok 15:16:39 <ralonsoh> admins talked to pypi folks to solve that 15:16:45 <slaweq_> most important is that problem is fixed now :) 15:16:49 <ralonsoh> yes 15:16:53 <slaweq_> thx ralonsoh and lajoskatona for taking care of it :) 15:18:05 <slaweq_> ok 15:18:06 <lajoskatona> no problem 15:18:11 <slaweq_> regrading standardize on zuul v3 15:18:37 <slaweq_> we merged networking-odl patch https://review.opendev.org/#/c/725647/ 15:18:50 <slaweq_> so the last one missing is https://review.opendev.org/#/c/729591/ for neutron 15:19:30 <slaweq_> and it just failed again, at least functional tests job: https://40f71fdb4a17c8b8e33a-40a7733116b3138073a0fe5a58665a17.ssl.cf5.rackcdn.com/729591/21/check/neutron-functional-with-uwsgi/aace04f/testr_results.html 15:19:31 <tosky> which received its fair share of rechecks 15:19:33 <slaweq_> :/ 15:20:57 <ralonsoh> slaweq_, that's the other related problem I was talking this morning 15:21:06 <ralonsoh> now we don't fail in the OVN method 15:21:17 <ralonsoh> but in the "old_method" --> L3 plugin 15:21:25 <ralonsoh> I need to check if this is related 15:21:40 <ralonsoh> I'll talk to otherwiseguy 15:21:49 <slaweq_> ralonsoh: ok 15:22:01 <tosky> please remember to vote also on the networking-odl backport for stable/victoria: https://review.opendev.org/#/c/756324/ 15:22:38 <slaweq_> tosky: I already did 15:22:45 <slaweq_> I think we need bcafarel's vote also 15:23:07 <tosky> yeah, another stable core 15:23:12 <tosky> or neutron stable core 15:23:37 <bcafarel> reviewed and W+1 :) 15:24:09 <slaweq_> thx 15:24:47 <slaweq_> so I think we can move on to the next topic now 15:24:50 <slaweq_> #topic Stable branches 15:25:01 <slaweq_> Ussuri dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 15:25:04 <slaweq_> Train dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 15:25:48 <bcafarel> one thing I remember now on stable dashboards, we will also need a victoria template for neuton-tempest-plugin 15:25:57 <bcafarel> and switch neutron stable/victoria to it 15:26:13 <slaweq_> bcafarel: yes, true 15:26:18 <slaweq_> I will do this template 15:26:29 <slaweq_> thx for reminder 15:26:42 <slaweq_> #action slaweq to make neutron-tempest-plugin victoria template 15:26:55 <bcafarel> np, I remembered when my test dashboard came up empty for them 15:29:52 <slaweq_> btw. I have one new issue in stable/train 15:29:54 <slaweq_> https://bugs.launchpad.net/neutron/+bug/1898748 15:29:55 <openstack> Launchpad bug 1898748 in neutron "[stable/train] Creation of the QoS policy takes ages" [Critical,New] 15:30:06 <slaweq_> did You saw it already maybe? 15:30:14 <ralonsoh> no 15:30:26 <slaweq_> it seems that it breaks devstack gate for stable/train :/ 15:31:09 <bcafarel> I don't think I saw it either 15:31:26 <slaweq_> is there anyone who wants to check that maybe? 15:32:16 <slaweq_> if not, I will try to check that 15:32:21 <ralonsoh> I'll try to take a look at this error tomorrow 15:32:29 <slaweq_> thx ralonsoh :) 15:33:03 <slaweq_> ok, lets move on 15:33:08 <slaweq_> #topic Grafana 15:33:13 <slaweq_> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:34:47 <slaweq_> IMO worst thing from voting jobs is neutron-functional-with-uwsgi now 15:34:57 <slaweq_> and we have couple of issues there 15:35:22 <slaweq_> and also most of the ovn based jobs are failing 100% of times 15:36:33 <slaweq_> anything else You have regarding grafana in general? 15:36:43 <slaweq_> or should we move on to the specific job types? 15:37:37 <bcafarel> nothing from me 15:37:49 <slaweq_> ok, so lets move on 15:37:57 <slaweq_> #topic functional/fullstack 15:38:17 <slaweq_> I reported today https://bugs.launchpad.net/neutron/+bug/1898859 15:38:18 <openstack> Launchpad bug 1898859 in neutron "Functional test neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_spawns_conflicting_pid_vrrp_subprocess is failing" [High,Confirmed] 15:38:33 <slaweq_> as I saw it at least twice recently 15:38:51 <slaweq_> IIRC we already saw it in the past too but I wasn't sure if we have bug reported for that already 15:38:59 <ralonsoh> related to the ns deletion 15:39:07 <ralonsoh> https://review.opendev.org/#/c/754938/ 15:39:15 <ralonsoh> please, review ^^ 15:40:32 <slaweq_> ahh, right 15:40:35 <slaweq_> now I remember :) 15:40:53 <slaweq_> so I will mark https://bugs.launchpad.net/neutron/+bug/1898859 as duplicate of https://bugs.launchpad.net/neutron/+bug/1838793 15:40:55 <openstack> Launchpad bug 1898859 in neutron "Functional test neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_spawns_conflicting_pid_vrrp_subprocess is failing" [High,Confirmed] 15:40:55 <ralonsoh> I think you can join both LP bugs 15:40:56 <openstack> Launchpad bug 1838793 in neutron ""KeepalivedManagerTestCase" tests failing during namespace deletion" [High,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:40:58 <ralonsoh> yes 15:41:23 <slaweq_> lajoskatona: can You check that patch from ralonsoh? 15:41:36 <slaweq_> I hope it will help us a bit with this functional tests job :) 15:41:59 <lajoskatona> slaweq_: sure, I cheked it in the past, so has some background :-) 15:42:08 <slaweq_> lajoskatona: thx a lot 15:42:31 <slaweq_> and for other issues with functional tests I know that ralonsoh told me that he will open LPs 15:42:47 <ralonsoh> the one related to the agents 15:42:56 <ralonsoh> test_agent_show 15:45:01 <slaweq_> yes, did You report it already? 15:45:23 <ralonsoh> not yet 15:45:40 <ralonsoh> I'm still investigating the error 15:45:47 <slaweq_> k 15:47:13 <slaweq_> ok, lets move on then 15:47:15 <slaweq_> #topic Tempest/Scenario 15:47:35 <slaweq_> first, I reported today bug: https://bugs.launchpad.net/neutron/+bug/1898862 15:47:37 <openstack> Launchpad bug 1898862 in neutron "Job neutron-ovn-tempest-ovs-release-ipv6-only is failing 100% of times" [High,Confirmed] 15:48:02 <slaweq_> becuase neutron-ovn-tempest-ovs-release-ipv6-only is failing 100% of times and usually (or always even) there is 9 tests failing there 15:48:11 <slaweq_> so it's very reproducible 15:48:46 <slaweq_> I will try to ping lucasgomes or jlibosva to take a look at that one 15:49:18 <slaweq_> there is also ovn related issue https://bugs.launchpad.net/neutron/+bug/1885900 15:49:19 <openstack> Launchpad bug 1885900 in neutron "test_trunk_subport_lifecycle is failing in ovn based jobs" [Critical,Confirmed] - Assigned to Lucas Alvares Gomes (lucasagomes) 15:49:22 <slaweq_> which I saw today again 15:50:33 <slaweq_> and we still have some random ssh authentication failures 15:50:37 <slaweq_> like e.g. https://3b00945aa0cfe70597e9-73e59f2d88a36c349deccf374592c99f.ssl.cf5.rackcdn.com/755752/3/gate/neutron-tempest-linuxbridge/4bbc7f9/testr_results.html 15:50:43 <slaweq_> or https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_807/750166/5/gate/neutron-tempest-plugin-scenario-linuxbridge/8073d8a/testr_results.html 15:51:01 <slaweq_> and in those cases there is no any "pattern", like always same tests or always same backend 15:51:06 <slaweq_> it happens everywhere 15:51:35 <slaweq_> and I tend to think that this is issue which ralonsoh found some time ago in our d/s ci 15:51:45 <slaweq_> with paramiko and some race condition 15:51:47 <ralonsoh> with paramiko 15:51:49 <ralonsoh> yes 15:51:59 <slaweq_> I couldn't reproduce that locally 15:52:08 <slaweq_> but some race is there IMO 15:52:28 <ralonsoh> once paramiko tries to log into a VM without the keys, even when the keys are installed, the SSH connection is not possible 15:52:55 <slaweq_> maybe we can try to check console log first to see if ssh key was confiugred already 15:52:59 <slaweq_> before ssh to the instance 15:53:42 <slaweq_> if that will fail for any reason (e.g. custom guest os which don't log things like cirros), we can always try ssh at the end 15:53:47 <slaweq_> as "fallback" option 15:53:53 <slaweq_> wdyt? 15:54:20 <ralonsoh> it worths to try it 15:54:23 <slaweq_> we can maybe propose that first in neutron-tempest-plugin 15:54:27 <bcafarel> worth a try 15:54:31 <slaweq_> and if that will work, then propose to tempest too 15:54:52 <slaweq_> ok, I will give it a try 15:55:03 <ralonsoh> (I was doing the opposite: reviewing the paramiko code) 15:55:13 <slaweq_> #action slaweq to propose patch to check console log before ssh to instance 15:55:40 <slaweq_> ralonsoh: if You will find issue on paramiko's side, we can always revert workaround from neutron-tempest-plugin :) 15:55:47 <ralonsoh> of course 15:56:28 <slaweq_> ok, I have one more issue related to ovn jobs: https://bugs.launchpad.net/neutron/+bug/1898863 15:56:29 <openstack> Launchpad bug 1898863 in neutron "OVN based scenario jobs failing 100% of times" [Critical,Confirmed] 15:56:39 <slaweq_> did You saw that before? 15:56:57 <bcafarel> on dstat?? 15:57:01 <slaweq_> yes 15:57:07 <slaweq_> but I saw it only on ovn based jobs 15:57:09 <slaweq_> :/ 15:57:15 <ralonsoh> no sorry, that's new to me 15:57:44 <slaweq_> ok, anyone wants to take a look at that? 15:58:03 <slaweq_> if not than it's also fine for now as it affects "only" non-voting jobs 15:58:44 <ralonsoh> https://bugs.launchpad.net/ubuntu/+source/dstat/+bug/1866619 15:58:46 <openstack> Launchpad bug 1866619 in dstat (Ubuntu) "OverflowError when machine suspends and resumes after a longer while" [Undecided,Confirmed] 15:58:52 <ralonsoh> DistroRelease: Ubuntu 20.04 15:59:23 <slaweq_> so we will probably need to disable dstat as temporary workaround 15:59:28 <slaweq_> thx ralonsoh 15:59:31 <ralonsoh> yes 16:00:03 <slaweq_> ok 16:00:09 <slaweq_> we are out of time today 16:00:14 <slaweq_> thx for attending the meeting 16:00:16 <slaweq_> o/ 16:00:17 <ralonsoh> bye! 16:00:19 <slaweq_> #endmeeting