15:00:22 <slaweq> #startmeeting neutron_ci 15:00:26 <openstack> Meeting started Tue Jan 26 15:00:22 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:27 <slaweq> welcome again 15:00:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:30 <openstack> The meeting name has been set to 'neutron_ci' 15:00:35 <bcafarel> o/ 15:00:52 <lajoskatona> o/ 15:01:01 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:01:03 <slaweq> Please open now :) 15:01:22 <ralonsoh> hi 15:02:04 <slaweq> as we have already our regular attendees, let's start 15:02:06 <slaweq> #topic Actions from previous meetings 15:02:10 <slaweq> slaweq to move 3rd-party jobs to periodic queue 15:02:17 <slaweq> Patch https://review.opendev.org/c/openstack/neutron/+/771486 15:02:22 <slaweq> it's merged now 15:02:33 <slaweq> next one 15:02:35 <slaweq> ralonsoh to check problem with (not) deleted pid files in functional tests 15:02:54 <ralonsoh> sorry, for the next week 15:02:58 <slaweq> sure 15:03:02 <slaweq> #action ralonsoh to check problem with (not) deleted pid files in functional tests 15:03:11 <slaweq> next one 15:03:13 <slaweq> ralonsoh to check failing periodic task in functional test 15:03:27 <ralonsoh> I'm still reviewing this 15:04:00 <ralonsoh> (sorry, I'm not lazy but overloaded) 15:04:19 <slaweq> ralonsoh: I know :) 15:04:27 <slaweq> no worries 15:04:46 <slaweq> and his gone :P 15:05:23 <slaweq> ralonsoh: do You want me to add it as action item for You for next meeting too? 15:05:32 <ralonsoh> sure 15:05:39 <slaweq> #action ralonsoh to check failing periodic task in functional test 15:05:42 <slaweq> thx 15:05:47 <slaweq> next one 15:05:49 <slaweq> slaweq to check missing interface in namespace in functional tests 15:05:55 <slaweq> I was checking that issue 15:06:04 <slaweq> and I have no idea why this exception was raised there. Maybe it was some strange race condition because this interface was added, deleted, and then again added and deleted from ovs. 15:06:14 <slaweq> I didn't found any other cases like that so I will not bother with that more now. 15:07:03 <slaweq> and the last one for today 15:07:05 <slaweq> ralonsoh to check failing neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter.test_gateway_chassis_rebalance test 15:07:14 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/771489 15:07:19 <ralonsoh> but this is just to provide info 15:07:48 <ralonsoh> because when the lbp does not appear, we don't know what port we are talkign about 15:08:00 <ralonsoh> now we can check that in the SB and NB logs 15:08:06 <slaweq> sure 15:08:10 <slaweq> but we have bug reported 15:08:15 <ralonsoh> yes 15:08:16 <slaweq> and now this additional info in logs 15:08:25 <ralonsoh> https://bugs.launchpad.net/neutron/+bug/1912369 15:08:27 <openstack> Launchpad bug 1912369 in neutron "[FT] "test_gateway_chassis_rebalance" failing because lrp is not bound" [High,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:08:31 <slaweq> so hopefully we will not lost this issue from radars and will fix it some day 15:08:33 <slaweq> thx ralonsoh 15:09:20 <slaweq> ok, that was all action items for this week 15:09:23 <slaweq> #topic Stadium projects 15:09:45 <slaweq> as I mentioned in the previous meeting, networking-midonet gate is now broken 15:10:04 <slaweq> I sent email to Sam Morrison who is maintainer of it 15:10:10 <slaweq> I hope he will check that soon 15:10:21 <slaweq> anything else regarding other stadium projects and ci? 15:10:41 <lajoskatona> huh, 15:11:07 <lajoskatona> I checked before the meeting odl and bgpvpn, for those I have patches to fix master first 15:11:32 <lajoskatona> I switched l-c job to non-voting, I have no spare time recently to fix that :-( 15:11:47 <lajoskatona> If you have time to check those, that would be great 15:12:00 <slaweq> lajoskatona: ok, please give us links to the patches 15:12:05 <slaweq> I will review them today 15:12:21 <lajoskatona> than we can move back to stable vranches to fix them as well, but hearing the news about the new pip issue/feature with no py27 support.... 15:12:31 <lajoskatona> ok I collect them 15:13:17 <bcafarel> yes we have a short downtime before resuming snipping out l-c on stable 15:13:39 <lajoskatona> networking-odl: https://review.opendev.org/c/openstack/networking-odl/+/769877 & bgpvpn: https://review.opendev.org/c/openstack/networking-bgpvpn/+/771219 15:14:31 <slaweq> thx lajoskatona 15:14:50 <slaweq> I actually already +2 one of them :) 15:15:05 <lajoskatona> thanks 15:15:10 <slaweq> ok, so I think we can move on 15:15:13 <slaweq> #topic Stable branches 15:15:19 <slaweq> Victoria dashboard: https://grafana.opendev.org/d/HUCHup2Gz/neutron-failure-rate-previous-stable-release?orgId=1 15:15:21 <slaweq> Ussuri dashboard: https://grafana.opendev.org/d/smqHXphMk/neutron-failure-rate-older-stable-release?orgId=1 15:15:28 <slaweq> for victoria and ussuri I think all is good this week 15:15:38 <slaweq> and for older we already know about py27 issue 15:15:51 <slaweq> is there anything else to talk today? 15:15:58 <bcafarel> yes, it was good thing we pushed on merging stuff for stable releases last week 15:16:07 <bcafarel> so backlog is back to manageable 15:16:17 <ralonsoh> +1 15:17:09 <slaweq> I agree 15:17:52 <slaweq> ok, next topic 15:17:56 <slaweq> #topic Grafana 15:18:00 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:18:09 <slaweq> I need to update dashboard again 15:18:18 <slaweq> as we made many changes in jobs config recently 15:18:28 <slaweq> #action slaweq to update grafana dashboard 15:19:37 <slaweq> other than that I think that our dashboard looks pretty ok 15:19:47 <slaweq> I don't see any urgent issues there really 15:20:02 <slaweq> even number of rechecks on patches is significantly lower recently 15:20:56 <slaweq> do You want to discuss about something related to our grafana? 15:22:14 <bcafarel> seems good-looking to me 15:22:48 <slaweq> ok, lets move on 15:22:52 <slaweq> #topic fullstack/functional 15:23:08 <slaweq> I found just few failures which I want to show You today 15:23:12 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c2e/767922/9/gate/neutron-functional-with-uwsgi/c2e764b/testr_results.html 15:23:51 <ralonsoh> during the interface creation 15:24:47 <slaweq> ralonsoh: yes, did You saw it already? 15:25:18 <ralonsoh> yes, but this is similar to the occasional tiemout problems we have in the CI 15:25:30 <ralonsoh> and I don't know how to address them 15:25:40 <slaweq> is it normal that in the log from this test there is ovsdb-monitor output: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c2e/767922/9/gate/neutron-functional-with-uwsgi/c2e764b/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_correct_protection.txt 15:26:16 <slaweq> this is linuxbridge test 15:26:33 <ralonsoh> I don't know if this is a problem in privsep and how it handles the threads waiting for a reply 15:27:39 <slaweq> so You are saying that this output logged in that test case can be from the other test really? 15:27:47 <slaweq> or did I missunderstood something? 15:27:52 <ralonsoh> not really 15:28:08 <ralonsoh> what I'm saying is that the source of the problem could be the same 15:28:13 <ralonsoh> for all those timeouts in the CI 15:28:19 <slaweq> ahh, ok 15:28:21 <slaweq> I see 15:28:22 <ralonsoh> and could be in privsep 15:29:22 <slaweq> ok, I will report this bug in LP to have it there 15:29:34 <slaweq> and maybe someone will take a look at it someday 15:30:25 <slaweq> ok, next one 15:30:27 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_bd4/periodic/opendev.org/openstack/neutron/master/neutron-functional/bd41252/testr_results.html 15:31:00 <slaweq> #action slaweq to report functional tests timeout in LP 15:31:16 <slaweq> this second one is ovn related 15:31:22 <slaweq> ralonsoh: did You saw it before? 15:31:32 <slaweq> sorry 15:31:37 <slaweq> we already spoke about it 15:31:38 <ralonsoh> yes heheheheh 15:31:49 <slaweq> there wasn't question :) 15:32:10 <slaweq> maybe it has already Your extra logging? 15:32:32 <ralonsoh> no, I didn't see anything with the extra logging 15:32:33 <ralonsoh> sorry 15:33:00 <slaweq> strange 15:33:05 <slaweq> Your patch was merged 20.01 15:33:14 <slaweq> and that failure is from 25.01 15:33:29 <slaweq> and it's periodic job 15:33:37 <ralonsoh> ahh sorry 15:33:38 <slaweq> so it should have this extra logs 15:33:45 <ralonsoh> I didn't check those logs 15:34:10 <ralonsoh> let me check that tomorrow 15:34:15 <slaweq> sure 15:34:19 <slaweq> it has error like: 15:34:21 <slaweq> AssertionError: False is not true : lrp cr-lrp-4baa4344-e12a-4fa5-bb98-ced0aff32a57 failed to bind 15:34:30 <slaweq> so there is Your log added there 15:34:35 <ralonsoh> yeah] 15:34:38 <slaweq> maybe it will be helpful for You 15:34:40 <slaweq> :) 15:35:12 <slaweq> I pasted link to that failure in the LP's comment 15:35:27 <ralonsoh> (i was doinf this now hehehe) 15:35:31 <slaweq> ok, that's all about functional tests 15:35:35 <slaweq> now fullstack 15:35:39 <slaweq> https://c8aca39fc5ef53efe51e-9b8733996223bc8ec92919ead98525d0.ssl.cf2.rackcdn.com/771903/4/check/neutron-fullstack-with-uwsgi/c26c740/testr_results.html 15:36:06 <slaweq> I didn't check but this looks for me like some overloaded node maybe 15:36:30 <ralonsoh> looks like, timeouts while deploying the services 15:36:42 <slaweq> yes 15:36:48 <slaweq> so lets don't bother with that for now 15:36:50 <slaweq> :) 15:36:59 <slaweq> #topic Tempest/Scenario 15:37:09 <slaweq> here I have 2 failures for You 15:37:13 <slaweq> test_metadata_routed failure: 15:37:15 <slaweq> https://1f381f2949ecc1bf5cc8-62271a11c17b6e7ed1bc2d4fe711d2ec.ssl.cf5.rackcdn.com/771947/2/check/neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid/4a84eb2/testr_results.html 15:38:02 <slaweq> Failed to connect to fe80::a9fe:a9fe port 80: No route to host 15:38:25 <slaweq> first of all, there is missing console output in case of such failure 15:38:42 <slaweq> and then we need to understand why there wasn't route to that IPv6 address 15:38:56 <slaweq> maybe IPv6 was disabled in the instance? Maybe it's some race condition? 15:39:22 <slaweq> rubasov: I think it was test added by You, would You have some time to check it maybe ^^? 15:39:56 <lajoskatona> slaweq: I check it with rubasov 15:40:02 <slaweq> lajoskatona: thx a lot 15:40:15 <slaweq> #action lajoskatona to check with rubasov test_metadata_routed failure 15:40:39 <slaweq> so last one for today 15:40:42 <slaweq> https://54c503404da20ff97888-1e6c6bc44b22b869bdbf76d87d9eca83.ssl.cf2.rackcdn.com/771947/2/check/neutron-tempest-multinode-full-py3/03f7c14/testr_results.html 15:40:52 <slaweq> it is (again) some ssh failure in 2 tests 15:41:18 <slaweq> AuthenticationFailed exception 15:41:47 <ralonsoh> ? 15:41:52 <ralonsoh> is a SSHTimeout 15:42:04 <ralonsoh> at least the last link 15:42:20 <slaweq> Yes, but this SSHTimeout is due to paramiko.ssh_exception.AuthenticationException: Authentication failed. 15:42:25 <slaweq> in both tests 15:42:27 <ralonsoh> ah ok, thanks! 15:42:36 <ralonsoh> yes, sorrry 15:42:40 <slaweq> and I don't see there any attempt to get console output 15:42:54 <slaweq> so seems for me like our old paramiko bug again 15:42:58 <slaweq> I will check it in tempest 15:43:35 <slaweq> #action slaweq to check why console output wasn't checked in the failed tests from tempest.api.compute.servers.test_create_server.ServersTestJSON 15:44:31 <slaweq> and that's all from me for today 15:44:42 <slaweq> anything else regarding our ci to discuss today? 15:44:47 <ralonsoh> no 15:45:28 <lajoskatona> no 15:45:46 <slaweq> ok, so I'm giving You 15 minutes back 15:45:50 <slaweq> thx for attending 15:45:52 <slaweq> o/ 15:45:56 <slaweq> #endmeeting