#openstack-meeting-3 log

15:00:22 <slaweq> #startmeeting neutron_ci
15:00:26 <openstack> Meeting started Tue Jan 26 15:00:22 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:27 <slaweq> welcome again
15:00:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:30 <openstack> The meeting name has been set to 'neutron_ci'
15:00:35 <bcafarel> o/
15:00:52 <lajoskatona> o/
15:01:01 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:01:03 <slaweq> Please open now :)
15:01:22 <ralonsoh> hi
15:02:04 <slaweq> as we have already our regular attendees, let's start
15:02:06 <slaweq> #topic Actions from previous meetings
15:02:10 <slaweq> slaweq to move 3rd-party jobs to periodic queue
15:02:17 <slaweq> Patch https://review.opendev.org/c/openstack/neutron/+/771486
15:02:22 <slaweq> it's merged now
15:02:33 <slaweq> next one
15:02:35 <slaweq> ralonsoh to check problem with (not) deleted pid files in functional tests
15:02:54 <ralonsoh> sorry, for the next week
15:02:58 <slaweq> sure
15:03:02 <slaweq> #action ralonsoh to check problem with (not) deleted pid files in functional tests
15:03:11 <slaweq> next one
15:03:13 <slaweq> ralonsoh to check failing periodic task in functional test
15:03:27 <ralonsoh> I'm still reviewing this
15:04:00 <ralonsoh> (sorry, I'm not lazy but overloaded)
15:04:19 <slaweq> ralonsoh: I know :)
15:04:27 <slaweq> no worries
15:04:46 <slaweq> and his gone :P
15:05:23 <slaweq> ralonsoh: do You want me to add it as action item for You for next meeting too?
15:05:32 <ralonsoh> sure
15:05:39 <slaweq> #action ralonsoh to check failing periodic task in functional test
15:05:42 <slaweq> thx
15:05:47 <slaweq> next one
15:05:49 <slaweq> slaweq to check missing interface in namespace in functional tests
15:05:55 <slaweq> I was checking that issue
15:06:04 <slaweq> and I have no idea why this exception was raised there. Maybe it was some strange race condition because this interface was added, deleted, and then again added and deleted from ovs.
15:06:14 <slaweq> I didn't found any other cases like that so I will not bother with that more now.
15:07:03 <slaweq> and the last one for today
15:07:05 <slaweq> ralonsoh to check failing neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter.test_gateway_chassis_rebalance test
15:07:14 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/771489
15:07:19 <ralonsoh> but this is just to provide info
15:07:48 <ralonsoh> because when the lbp does not appear, we don't know what port we are talkign about
15:08:00 <ralonsoh> now we can check that in the SB and NB logs
15:08:06 <slaweq> sure
15:08:10 <slaweq> but we have bug reported
15:08:15 <ralonsoh> yes
15:08:16 <slaweq> and now this additional info in logs
15:08:25 <ralonsoh> https://bugs.launchpad.net/neutron/+bug/1912369
15:08:27 <openstack> Launchpad bug 1912369 in neutron "[FT] "test_gateway_chassis_rebalance" failing because lrp is not bound" [High,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
15:08:31 <slaweq> so hopefully we will not lost this issue from radars and will fix it some day
15:08:33 <slaweq> thx ralonsoh
15:09:20 <slaweq> ok, that was all action items for this week
15:09:23 <slaweq> #topic Stadium projects
15:09:45 <slaweq> as I mentioned in the previous meeting, networking-midonet gate is now broken
15:10:04 <slaweq> I sent email to Sam Morrison who is maintainer of it
15:10:10 <slaweq> I hope he will check that soon
15:10:21 <slaweq> anything else regarding other stadium projects and ci?
15:10:41 <lajoskatona> huh,
15:11:07 <lajoskatona> I checked before the meeting odl and bgpvpn, for those I have patches to fix master first
15:11:32 <lajoskatona> I switched l-c job to non-voting, I have no spare time recently to fix that :-(
15:11:47 <lajoskatona> If you have time to check those, that would be great
15:12:00 <slaweq> lajoskatona: ok, please give us links to the patches
15:12:05 <slaweq> I will review them today
15:12:21 <lajoskatona> than we can move back to stable vranches to fix them as well, but hearing the news about the new pip issue/feature with no py27 support....
15:12:31 <lajoskatona> ok I collect them
15:13:17 <bcafarel> yes we have a short downtime before resuming snipping out l-c on stable
15:13:39 <lajoskatona> networking-odl: https://review.opendev.org/c/openstack/networking-odl/+/769877 & bgpvpn: https://review.opendev.org/c/openstack/networking-bgpvpn/+/771219
15:14:31 <slaweq> thx lajoskatona
15:14:50 <slaweq> I actually already +2 one of them :)
15:15:05 <lajoskatona> thanks
15:15:10 <slaweq> ok, so I think we can move on
15:15:13 <slaweq> #topic Stable branches
15:15:19 <slaweq> Victoria dashboard: https://grafana.opendev.org/d/HUCHup2Gz/neutron-failure-rate-previous-stable-release?orgId=1
15:15:21 <slaweq> Ussuri dashboard: https://grafana.opendev.org/d/smqHXphMk/neutron-failure-rate-older-stable-release?orgId=1
15:15:28 <slaweq> for victoria and ussuri I think all is good this week
15:15:38 <slaweq> and for older we already know about py27 issue
15:15:51 <slaweq> is there anything else to talk today?
15:15:58 <bcafarel> yes, it was good thing we pushed on merging stuff for stable releases last week
15:16:07 <bcafarel> so backlog is back to manageable
15:16:17 <ralonsoh> +1
15:17:09 <slaweq> I agree
15:17:52 <slaweq> ok, next topic
15:17:56 <slaweq> #topic Grafana
15:18:00 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:18:09 <slaweq> I need to update dashboard again
15:18:18 <slaweq> as we made many changes in jobs config recently
15:18:28 <slaweq> #action slaweq to update grafana dashboard
15:19:37 <slaweq> other than that I think that our dashboard looks pretty ok
15:19:47 <slaweq> I don't see any urgent issues there really
15:20:02 <slaweq> even number of rechecks on patches is significantly lower recently
15:20:56 <slaweq> do You want to discuss about something related to our grafana?
15:22:14 <bcafarel> seems good-looking to me
15:22:48 <slaweq> ok, lets move on
15:22:52 <slaweq> #topic fullstack/functional
15:23:08 <slaweq> I found just few failures which I want to show You today
15:23:12 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c2e/767922/9/gate/neutron-functional-with-uwsgi/c2e764b/testr_results.html
15:23:51 <ralonsoh> during the interface creation
15:24:47 <slaweq> ralonsoh: yes, did You saw it already?
15:25:18 <ralonsoh> yes, but this is similar to the occasional tiemout problems we have in the CI
15:25:30 <ralonsoh> and I don't know how to address them
15:25:40 <slaweq> is it normal that in the log from this test there is ovsdb-monitor output: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c2e/767922/9/gate/neutron-functional-with-uwsgi/c2e764b/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_correct_protection.txt
15:26:16 <slaweq> this is linuxbridge test
15:26:33 <ralonsoh> I don't know if this is a problem in privsep and how it handles the threads waiting for a reply
15:27:39 <slaweq> so You are saying that this output logged in that test case can be from the other test really?
15:27:47 <slaweq> or did I missunderstood something?
15:27:52 <ralonsoh> not really
15:28:08 <ralonsoh> what I'm saying is that the source of the problem could be the same
15:28:13 <ralonsoh> for all those timeouts in the CI
15:28:19 <slaweq> ahh, ok
15:28:21 <slaweq> I see
15:28:22 <ralonsoh> and could be in privsep
15:29:22 <slaweq> ok, I will report this bug in LP to have it there
15:29:34 <slaweq> and maybe someone will take a look at it someday
15:30:25 <slaweq> ok, next one
15:30:27 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_bd4/periodic/opendev.org/openstack/neutron/master/neutron-functional/bd41252/testr_results.html
15:31:00 <slaweq> #action slaweq to report functional tests timeout in LP
15:31:16 <slaweq> this second one is ovn related
15:31:22 <slaweq> ralonsoh: did You saw it before?
15:31:32 <slaweq> sorry
15:31:37 <slaweq> we already spoke about it
15:31:38 <ralonsoh> yes heheheheh
15:31:49 <slaweq> there wasn't question :)
15:32:10 <slaweq> maybe it has already Your extra logging?
15:32:32 <ralonsoh> no, I didn't see anything with the extra logging
15:32:33 <ralonsoh> sorry
15:33:00 <slaweq> strange
15:33:05 <slaweq> Your patch was merged 20.01
15:33:14 <slaweq> and that failure is from 25.01
15:33:29 <slaweq> and it's periodic job
15:33:37 <ralonsoh> ahh sorry
15:33:38 <slaweq> so it should have this extra logs
15:33:45 <ralonsoh> I didn't check those logs
15:34:10 <ralonsoh> let me check that tomorrow
15:34:15 <slaweq> sure
15:34:19 <slaweq> it has error like:
15:34:21 <slaweq> AssertionError: False is not true : lrp cr-lrp-4baa4344-e12a-4fa5-bb98-ced0aff32a57 failed to bind
15:34:30 <slaweq> so there is Your log added there
15:34:35 <ralonsoh> yeah]
15:34:38 <slaweq> maybe it will be helpful for You
15:34:40 <slaweq> :)
15:35:12 <slaweq> I pasted link to that failure in the LP's comment
15:35:27 <ralonsoh> (i was doinf this now hehehe)
15:35:31 <slaweq> ok, that's all about functional tests
15:35:35 <slaweq> now fullstack
15:35:39 <slaweq> https://c8aca39fc5ef53efe51e-9b8733996223bc8ec92919ead98525d0.ssl.cf2.rackcdn.com/771903/4/check/neutron-fullstack-with-uwsgi/c26c740/testr_results.html
15:36:06 <slaweq> I didn't check but this looks for me like some overloaded node maybe
15:36:30 <ralonsoh> looks like, timeouts while deploying the services
15:36:42 <slaweq> yes
15:36:48 <slaweq> so lets don't bother with that for now
15:36:50 <slaweq> :)
15:36:59 <slaweq> #topic Tempest/Scenario
15:37:09 <slaweq> here I have 2 failures for You
15:37:13 <slaweq> test_metadata_routed failure:
15:37:15 <slaweq> https://1f381f2949ecc1bf5cc8-62271a11c17b6e7ed1bc2d4fe711d2ec.ssl.cf5.rackcdn.com/771947/2/check/neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid/4a84eb2/testr_results.html
15:38:02 <slaweq> Failed to connect to fe80::a9fe:a9fe port 80: No route to host
15:38:25 <slaweq> first of all, there is missing console output in case of such failure
15:38:42 <slaweq> and then we need to understand why there wasn't route to that IPv6 address
15:38:56 <slaweq> maybe IPv6 was disabled in the instance? Maybe it's some race condition?
15:39:22 <slaweq> rubasov: I think it was test added by You, would You have some time to check it maybe ^^?
15:39:56 <lajoskatona> slaweq: I check it with rubasov
15:40:02 <slaweq> lajoskatona: thx a lot
15:40:15 <slaweq> #action lajoskatona to check with rubasov test_metadata_routed failure
15:40:39 <slaweq> so last one for today
15:40:42 <slaweq> https://54c503404da20ff97888-1e6c6bc44b22b869bdbf76d87d9eca83.ssl.cf2.rackcdn.com/771947/2/check/neutron-tempest-multinode-full-py3/03f7c14/testr_results.html
15:40:52 <slaweq> it is (again) some ssh failure in 2 tests
15:41:18 <slaweq> AuthenticationFailed exception
15:41:47 <ralonsoh> ?
15:41:52 <ralonsoh> is a SSHTimeout
15:42:04 <ralonsoh> at least the last link
15:42:20 <slaweq> Yes, but this SSHTimeout is due to paramiko.ssh_exception.AuthenticationException: Authentication failed.
15:42:25 <slaweq> in both tests
15:42:27 <ralonsoh> ah ok, thanks!
15:42:36 <ralonsoh> yes, sorrry
15:42:40 <slaweq> and I don't see there any attempt to get console output
15:42:54 <slaweq> so seems for me like our old paramiko bug again
15:42:58 <slaweq> I will check it in tempest
15:43:35 <slaweq> #action slaweq to check why console output wasn't checked in the failed tests from tempest.api.compute.servers.test_create_server.ServersTestJSON
15:44:31 <slaweq> and that's all from me for today
15:44:42 <slaweq> anything else regarding our ci to discuss today?
15:44:47 <ralonsoh> no
15:45:28 <lajoskatona> no
15:45:46 <slaweq> ok, so I'm giving You 15 minutes back
15:45:50 <slaweq> thx for attending
15:45:52 <slaweq> o/
15:45:56 <slaweq> #endmeeting