#openstack-meeting-3 log

15:00:27 <slaweq> #startmeeting neutron_ci
15:00:28 <openstack> Meeting started Wed Jun 24 15:00:27 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:30 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:31 <slaweq> hi
15:00:32 <openstack> The meeting name has been set to 'neutron_ci'
15:00:36 <ralonsoh> hi
15:01:37 <bcafarel> hello
15:01:47 <slaweq> first of all
15:01:55 <slaweq> #link  http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:05 <slaweq> please open now to have it ready for later :)
15:02:08 <lajoskatona> o/
15:03:33 <slaweq> ok, lets start
15:03:39 <maciejjozefczyk> \o
15:03:40 <slaweq> #topic Actions from previous meetings
15:03:56 <slaweq> we have only one action from last week
15:03:58 <slaweq> slaweq to add additional logging for fullstack's firewall tests
15:04:08 <slaweq> I found the issue again in https://zuul.opendev.org/t/openstack/build/c5451e9e66fe4c14b2a09339a77fc449
15:04:10 <slaweq> After checking it seems for me that failure was at the beginning of the test, when connectivity with ncat was checked
15:04:12 <slaweq> I proposed patch https://review.opendev.org/#/c/737741/ for now.
15:04:14 <slaweq> After that lets see what will happen more
15:04:43 <njohnston> o/
15:05:00 <slaweq> that's all update from me about this
15:05:24 <slaweq> #topic Stadium projects
15:05:34 <slaweq> I don't have any updates about stadium projects today
15:05:39 <slaweq> but maybe You have something
15:06:02 <lajoskatona> nothing special from me, but if you have time please look there for things to review :-)
15:06:17 <bcafarel> not from me, we had enough CI trouble in neutron itself to keep busy
15:06:21 <njohnston> nothing for me either
15:06:40 <slaweq> ok, sure lajoskatona I will check list of opened patches in stadium projects
15:06:58 <slaweq> ok, so next topic
15:07:00 <slaweq> #topic Stable branches
15:07:05 <slaweq> Ussuri dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1
15:07:06 <slaweq> Train dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1
15:07:15 <slaweq> for ussuri and train it seems that it works fine now
15:07:16 <lajoskatona> slaweq: thanks
15:07:36 <slaweq> but for older releases, like rocky and queens, I think we still have uwsgi issue
15:07:43 <slaweq> bcafarel: are You aware of it?
15:07:59 <slaweq> I got many failures e.g. on https://review.opendev.org/#/c/737703/ today
15:08:27 <bcafarel> yes for EM branches we have lingering issue with grenade jobs ( gmann mentioned them in recent mail update)
15:08:39 <bcafarel> https://review.opendev.org/737414 should workaround it for the time being
15:08:52 <bcafarel> once it works, and backported to older branches
15:10:03 <slaweq> bcafarel: but it's only for grenade
15:10:14 <slaweq> and in my patch I saw failures in all tempest jobs too
15:10:29 <slaweq> basicaly everything except UT and pep8 was red
15:10:46 <bcafarel> oh sigh
15:11:20 <bcafarel> I did not check rocky thoroughly yet, still on stein :/
15:11:22 <slaweq> I see error like: "ls: cannot access 'uwsgi*': No such file or directory" in devstack log
15:13:00 <bcafarel> ack there was mention on uwsgi + rocky and older still WIP http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015558.html
15:13:01 <slaweq> bcafarel: do You have cycles to check that this week?
15:13:28 <bcafarel> slaweq: yes, at least to get some up-to-date status on it!
15:13:36 <slaweq> thx
15:13:59 <slaweq> #action bcafarel to check gate status on rocky and queens (uwsgi problem)
15:14:36 <slaweq> ok
15:14:39 <slaweq> lets move on
15:14:41 <slaweq> #topic Grafana
15:14:48 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:15:28 <slaweq> in master branch it looks ok this week IMO
15:17:55 <njohnston> +1
15:18:17 <slaweq> so lets talk about some issues in specific jobs
15:18:19 <slaweq> #topic fullstack/functional
15:18:30 <slaweq> first functional tests
15:18:37 <slaweq> I found today some db migration errors (again), like https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ec3/736269/3/check/neutron-functional-with-uwsgi/ec30046/testr_results.html
15:18:55 <slaweq> ralonsoh: is it something what You want to address with Your db migration script changes?
15:19:11 <ralonsoh> yes
15:19:19 <ralonsoh> those errors in test_walk_versions
15:19:41 <ralonsoh> and test_has_offline_migrations_all_heads_upgraded
15:19:41 <slaweq> ok
15:20:01 <slaweq> in general I think that those failures are due to slow node where it was run
15:20:18 <slaweq> but will be good if we can get rid of at least some of them
15:21:27 <slaweq> I also have other issue
15:21:31 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_63a/711425/11/check/neutron-functional/63ac4ca/testr_results.html
15:21:53 <slaweq> this seems for me like somethig related to our privileged ip_lib or pyroute2
15:22:18 <slaweq> as there are issues that interfaces are not found in namespace None
15:22:50 <slaweq> and there are also issue with _check_bridge_datapath_id() method in tests there
15:22:52 <ralonsoh> well, maybe that's correct and the interface was not created
15:23:59 <slaweq> ralonsoh: can be
15:24:18 <slaweq> so at least we should probably fix exception message IMO
15:24:19 <maciejjozefczyk> there is also this failure test_ovsdb_monitor.TestNBDbMonitorOverTcp.test_floatingip_mac_bindings (IndexError: list index out of range)... I can take a look on that one
15:24:29 <slaweq> or not, nvm :)
15:24:35 <slaweq> thx maciejjozefczyk
15:25:06 <ralonsoh> I can take a look at the bridge.get_datapath_id problem
15:25:13 <ralonsoh> if I  have some time this week
15:25:25 <slaweq> thx ralonsoh
15:25:49 <slaweq> #action maciejjozefczyk will check test_ovsdb_monitor.TestNBDbMonitorOverTcp.test_floatingip_mac_bindings failiure in https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_63a/711425/11/check/neutron-functional/63ac4ca/testr_results.html
15:26:07 <slaweq> #action ralonsoh will check get_datapath_id issues in https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_63a/711425/11/check/neutron-functional/63ac4ca/testr_results.htm
15:26:16 <slaweq> #undo
15:26:17 <openstack> Removing item from minutes: #action ralonsoh will check get_datapath_id issues in https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_63a/711425/11/check/neutron-functional/63ac4ca/testr_results.htm
15:26:18 <slaweq> #action ralonsoh will check get_datapath_id issues in https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_63a/711425/11/check/neutron-functional/63ac4ca/testr_results.html
15:27:48 <slaweq> I will check this errors with non existing devices
15:28:21 <slaweq> #action slaweq will check errors with non existing interface in https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_63a/711425/11/check/neutron-functional/63ac4ca/testr_results.html
15:28:47 <slaweq> ok, lets move on
15:28:49 <slaweq> #topic Tempest/Scenario
15:29:02 <slaweq> in scenario tests I found one interesting issue this week
15:29:17 <slaweq> it happend only once I think but maybe worth to deeper look
15:29:27 <slaweq> https://11b28b714aaa0f2eaa01-115c1089095738e3e088969e8724f0ca.ssl.cf1.rackcdn.com/712640/9/check/neutron-tempest-plugin-scenario-openvswitch/2ad316a/testr_results.html
15:29:35 <slaweq> error with address already allocated in subnet
15:29:40 <slaweq> maybe some bug in tests?
15:30:00 <ralonsoh> didn't I sent a patch for this?
15:30:10 <ralonsoh> send*
15:30:22 <slaweq> ralonsoh: I don't remember
15:30:29 <ralonsoh> (checking)
15:30:34 <slaweq> ahh, right
15:30:40 <slaweq> I think that now I remember
15:30:56 <ralonsoh> https://review.opendev.org/#/c/731267/
15:31:24 <slaweq> ok, this failure was before Your patch was merged
15:31:28 <slaweq> so we should be good now
15:31:31 <slaweq> thx ralonsoh
15:31:39 <ralonsoh> ahhh ok perfect
15:32:16 <bcafarel> ralonsoh: fixing CI issues before slaweq complains about them, nice!
15:32:22 <ralonsoh> hahaha
15:32:28 <slaweq> haha
15:32:31 <slaweq> true
15:32:35 <maciejjozefczyk> ralonsoh++ :D
15:32:51 <slaweq> ok, so one last topic for today
15:32:53 <slaweq> #topic Periodic
15:33:03 <slaweq> all except one job looks fine there
15:33:13 <slaweq> but neutron-ovn-tempest-ovs-master-fedora is broken again
15:33:31 <maciejjozefczyk> sigh
15:33:46 <slaweq> and it's broken on compilation of ovs
15:33:50 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b03/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-ovs-master-fedora/b039158/job-output.txt
15:34:48 <slaweq> maciejjozefczyk: can You take a look into that one?
15:35:12 <maciejjozefczyk> slaweq, yes
15:35:17 <slaweq> thx
15:35:26 <maciejjozefczyk> slaweq, I fixed that failure some time ago and it is failing again.. strange
15:35:32 <maciejjozefczyk> I'll take a closer look on that, thanks
15:35:32 <slaweq> #action maciejjozefczyk to check failing neutron-ovn-tempest-ovs-master-fedora periodic job
15:35:42 <slaweq> thx a lot
15:35:48 <slaweq> ok, that's all from me today
15:35:53 <slaweq> (that was fast ;))
15:36:09 <slaweq> anything else You want to discuss today?
15:36:21 <ralonsoh> no thanks
15:37:34 <slaweq> if not, I think I can give You back about 20 minutes today :)
15:37:36 <bcafarel> I just realized "stadium projects" section we had also community goals mixed in
15:37:53 <bcafarel> short update on that I started to fill https://etherpad.opendev.org/p/neutron-victoria-switch_to_focal
15:38:18 <bcafarel> hopefully it will be more complete by next week :)
15:38:22 <slaweq> thx bcafarel
15:38:24 <lajoskatona> bcafarel: good point,
15:38:48 <slaweq> btw, speaking about focal, it seems from https://review.opendev.org/#/c/737370/ that neutron is running fine there
15:38:54 <lajoskatona> I checked with odl but I got weird failures, and thought to wait some time till the waves go down :-)
15:39:02 <slaweq> there are some problems realated to cinder in tempest jobs
15:40:40 <lajoskatona> bcafarel, slaweq: for odl I had this:  https://review.opendev.org/736703
15:40:45 <bcafarel> lajoskatona: indeed, main patches in devstack and tempest should clear out most of issues - and most of our jobs will inherit directly
15:40:53 <lajoskatona> and it fails with nodeset not found or similar
15:41:15 <lajoskatona> bcafarel: that was my feeling as well :-)
15:41:39 <bcafarel> lajoskatona: or depending on https://review.opendev.org/734700 should help to get the nodes and job definitions
15:41:54 <slaweq> thx bcafarel and lajoskatona
15:42:32 <lajoskatona> bcafarel: thanks, I do a try with that, the wf -1 though a little frightening
15:42:50 <lajoskatona> but that's perhaps just some timing from gmann
15:44:07 <bcafarel> yes it is in "heavy progress" there
15:44:24 <bcafarel> though just for inheriting the nodeset it should be safe enough for testing
15:44:36 <gmann> yeah, 734700 is right to test. i was waiting for gate result there and i will announce the same on ML also.
15:44:53 <lajoskatona> gmann, bcafarel: thanks, I check it
15:45:05 <slaweq> gmann: in our tests in https://review.opendev.org/#/c/737370/ I saw some cinder related failures
15:45:18 <slaweq> gmann: do You know if cinder team is aware of them?
15:45:21 <gmann> bcafarel: if your job override nodeset then you need to change otherwise 734700 can take care of devstack base job nodeset switch to focal
15:45:25 <slaweq> or maybe I should open LP for them?
15:45:59 <gmann> slaweq: current known bug is this https://bugs.launchpad.net/nova/+bug/1882521
15:45:59 <openstack> Launchpad bug 1882521 in OpenStack Compute (nova) "Failing device detachments on Focal" [Undecided,New]
15:46:13 <bcafarel> gmann: ack I am making a list of jobs that will need some action (more than just 734700)
15:46:20 <gmann> which is volume detach issue but i have not checked your patch failure
15:46:28 <gmann> bcafarel: +1
15:47:09 <slaweq> gmann: seems like the same one
15:47:11 <slaweq> thx
15:49:46 <slaweq> ok, I think we can finish meeting now
15:49:49 <slaweq> thx for attending
15:49:53 <maciejjozefczyk> \o
15:49:56 <slaweq> and see You o/
15:49:59 <slaweq> #endmeeting