15:01:54 <slaweq> #startmeeting neutron_ci
15:01:55 <openstack> Meeting started Wed Mar 11 15:01:54 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:56 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:58 <openstack> The meeting name has been set to 'neutron_ci'
15:02:01 <njohnston> o/
15:02:06 <slaweq> njohnston: ralonsoh bcafarel: here it should be :)
15:02:07 <ralonsoh> hello again
15:02:11 <bcafarel> o/
15:02:25 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:28 <bcafarel> ah you were in another chan? that was why I felt lonely here :)
15:02:38 <slaweq> bcafarel: sorry :)
15:02:43 <slaweq> it was my mistake
15:02:55 <slaweq> ok, lets start
15:02:57 <slaweq> #topic Actions from previous meetings
15:03:06 <slaweq> first one
15:03:08 <slaweq> slaweq to remove neutron-tempest-dvr job from grafana
15:03:13 <slaweq> patch: https://review.opendev.org/712048
15:03:44 <slaweq> I don't think there is much to say about it so lets move on to the next one
15:03:47 <slaweq> maciejjozefczyk to take a look at https://bugs.launchpad.net/neutron/+bug/1865453
15:03:48 <openstack> Launchpad bug 1865453 in neutron "neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestVirtualPorts.test_virtual_port_created_before fails randomly" [High,Confirmed] - Assigned to Maciej Jozefczyk (maciej.jozefczyk)
15:04:20 <maciejjozefczyk> slaweq, yes, im going to do it asap, I was on pto
15:04:45 <slaweq> ok maciejjozefczyk, I will assign it to You for next week, ok?
15:05:11 <maciejjozefczyk> slaweq, im already assigned to this one
15:05:19 <slaweq> maciejjozefczyk: ok
15:05:28 <slaweq> #action maciejjozefczyk to take a look at https://bugs.launchpad.net/neutron/+bug/1865453
15:05:29 <openstack> Launchpad bug 1865453 in neutron "neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestVirtualPorts.test_virtual_port_created_before fails randomly" [High,Confirmed] - Assigned to Maciej Jozefczyk (maciej.jozefczyk)
15:05:41 <slaweq> maciejjozefczyk: do You think we should mark those tests as unstable temporary?
15:06:32 <maciejjozefczyk> slaweq, let me try to take a look and spend an hour on it, If I'll not find anything at first shot I'll send a patch to mark to unstable, aight?
15:06:44 <slaweq> maciejjozefczyk: sure, sounds good
15:07:05 <slaweq> ok, next one
15:07:07 <slaweq> ralonsoh to check "neutron_tempest_plugin.scenario.test_multicast.MulticastTestIPv4 failing often on deletion of server"
15:07:19 <ralonsoh> slaweq, sorry, I didn't have time for this one
15:07:58 <slaweq> ok
15:08:09 <slaweq> will You try to check it this week?
15:08:29 <ralonsoh> yes, sure, I think I'll have time
15:08:36 <slaweq> #action ralonsoh to check "neutron_tempest_plugin.scenario.test_multicast.MulticastTestIPv4 failing often on deletion of server"
15:08:38 <slaweq> ralonsoh: thx
15:08:52 <slaweq> and the last one
15:08:54 <slaweq> slaweq to check problem with console output in scenario test
15:09:00 <slaweq> Patch https://review.opendev.org/712054
15:09:17 <slaweq> I think this should solve this issue
15:09:49 <slaweq> please review when You will have some time
15:10:04 <njohnston> will do
15:10:06 <slaweq> and that's all on the list of actions from last week
15:10:08 <slaweq> thx njohnston
15:10:11 <bcafarel> "for server in servers: server = server.get("server") or server" that makes a funny line
15:10:38 <slaweq> bcafarel: yes, but it works :)
15:10:42 <maciejjozefczyk> slaweq, ;)
15:10:47 <bcafarel> true :)
15:11:19 <slaweq> #topic Stadium projects
15:11:27 <slaweq> standardize on zuul v3
15:11:29 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop
15:11:58 <slaweq> according to zuul v3 migration, we have only 3 stadium projects left on the list
15:12:04 <slaweq> networking-bagpipe
15:12:06 <slaweq> networking-midonet
15:12:08 <slaweq> networking-odl
15:12:22 <slaweq> for bagpipe there is some patch proposed to convert fullstack job
15:12:29 <slaweq> but it's very red currently
15:12:38 <slaweq> for midonet and odl there is nothing proposed yet
15:13:37 <slaweq> in overall we are doing good progress on this
15:13:56 <slaweq> anything else related to stadium projects for today?
15:14:27 <bcafarel> should we add some ipv6-only goal section here?
15:14:44 <slaweq> bcafarel: good point
15:14:52 <bcafarel> https://review.opendev.org/#/q/status:open+topic:ipv6-only-deployment-and-testing+(status:open+OR+status:merged)+(project:%255Eopenstack/neutron.*+OR+project:%255Eopenstack/networking-.*) you mentioned on neutron meeting mostly shows networking-* remaining patches
15:15:04 <slaweq> I will prepare etherpad with summary of what is wrong there and will add it to the agenda
15:15:39 <slaweq> #action slaweq to prepare etherpad to track progress with ipv6-only testing goal
15:15:47 <bcafarel> thanks! I tried to rebase networking-generic-switch patch but apparently there are other failures around (and did not check the others yet)
15:16:02 <slaweq> thx bcafarel
15:17:46 <slaweq> ok, lets move on
15:17:48 <slaweq> #topic Grafana
15:17:55 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:19:19 <slaweq> in overall we still have the same problems as last week(s)
15:19:42 <slaweq> high failure rate of dvr scenario jobs (but those are non-voting)
15:19:52 <slaweq> and pretty high failure rate of functional job and grenade jobs
15:20:48 <ralonsoh> do you have a list of grenade and FTs tests failing?
15:21:10 <slaweq> ralonsoh: for grenade jobs nope
15:21:27 <slaweq> and for functional tests, it's mostly those ovn mech driver tests
15:21:41 <slaweq> which maciejjozefczyk will check
15:21:50 <ralonsoh> you are right
15:22:04 <bcafarel> I sent quite a few "recheck grenade" on stable/train backports recently too, older branches are not affected
15:22:16 <slaweq> for grenade job, last time when I was checking it, it was mostly issue with some timeout on placement and instance in error state due to that
15:22:36 <slaweq> but maybe there is some different issue now
15:22:59 <slaweq> https://zuul.opendev.org/t/openstack/build/e820bdd66eb040fa9db010de1fe821b3
15:23:09 <slaweq> here is the grenade job failure from today :)
15:24:40 <slaweq> failures with "No valid host was found"
15:24:41 <slaweq> https://5e801b59e3f992e71df8-ab7619989a2ab5c37e2a2a5061e93a1d.ssl.cf5.rackcdn.com/711887/1/check/neutron-grenade-multinode/e820bdd/logs/grenade.sh.txt
15:25:54 <ralonsoh> No valid host was found. There are not enough hosts available
15:26:00 <ralonsoh> same error always
15:26:07 <slaweq> yep
15:26:22 <slaweq> and I believe it would be some issue in placement
15:27:09 <bcafarel> ralonsoh: some failures in http://paste.openstack.org/show/790544/ (I did not check them, maybe "no valid host" everywhere)
15:27:32 <ralonsoh> I'll check it
15:28:43 <slaweq> there is such error there: https://zuul.opendev.org/t/openstack/build/e820bdd66eb040fa9db010de1fe821b3/log/logs/screen-n-sch.txt#1031
15:28:48 <slaweq> I think it may be related
15:28:58 <slaweq> it's not from placement but nova
15:29:25 <slaweq> if anyone has got any cycles and can take a look, that would be great :)
15:29:54 <slaweq> ok, anything else related to grafana for today?
15:30:56 <slaweq> if not, I think we can move on to the next topic
15:31:17 <slaweq> I don't have anything about functional/fullstack jobs for today
15:31:24 <slaweq> so lets talk about scenario
15:31:31 <slaweq> #topic Tempest/Scenario jobs
15:31:58 <slaweq> as I said before, our biggest issues are with dvr related multinode jobs
15:32:14 <slaweq> so I spent some time today to prepare some summary of what tests are failing there
15:32:23 <slaweq> and I have: https://etherpad.openstack.org/p/neutron-dvr-jobs-issues
15:32:36 <slaweq> most of the failures are in neutron-tempest-plugin-dvr-multinode-scenario
15:32:44 <bcafarel> that is a long list :(
15:33:06 <slaweq> bcafarel: some links are there couple of times as in one job more than one test failed
15:33:26 <slaweq> and I put it in each section as I wanted to check how much each of those tests is failing
15:34:00 <slaweq> so our biggest issues are:
15:34:14 <slaweq> router migration from something (HA or Legacy) to dvr
15:34:31 <slaweq> and I believe that this is the same issue in both cases
15:34:38 <slaweq> and     neutron_tempest_plugin.scenario.test_connectivity.NetworkConnectivityTest.test_connectivity_through_2_routers
15:34:51 <slaweq> and also security_groups tests
15:35:17 <slaweq> but those may be hopefully fixed when we will use new cirros 0.5.0 in our gate
15:35:39 <slaweq> as maciejjozefczyk did some improvements in cirros image to address issues with metadata timeouts
15:35:58 <ralonsoh> I think there were some problems with cirros 5 and OVN
15:36:11 <ralonsoh> but maciejjozefczyk knows this better
15:36:33 <maciejjozefczyk> ralonsoh, I remember issue with ip link, but it should be fixed in 0.5.1
15:36:44 <ralonsoh> ahhh ok, perfect
15:36:51 <slaweq> maciejjozefczyk: is 0.5.1 already released?
15:37:40 <maciejjozefczyk> slaweq, yes: http://download.cirros-cloud.net/0.5.1/
15:38:00 <bcafarel> maciejjozefczyk: is this new new release for the failures you saw trying 0.5.0?
15:38:04 <slaweq> maciejjozefczyk: great, You have already patch to use it in neutron, right?
15:38:05 <maciejjozefczyk> im gonna verify it this week if all is fine, I saw some comments from Radoslaw here: https://review.opendev.org/#/c/711492/
15:38:36 <slaweq> maciejjozefczyk: can You also send patch to neutron-tempest-plugin to use it in our jobs?
15:38:51 <slaweq> we can then check how many (if any) tests more will be green
15:38:53 <slaweq> :)
15:39:21 <maciejjozefczyk> slaweq, yes sure, but I think that setting it in global devstack is the right way? I didnt find any setting for this in our tempest configuration, afair
15:39:46 <maciejjozefczyk> or I missed something, anyways, im gonna take a look on it this week
15:40:03 <ralonsoh> we retrieve that from devstack
15:40:08 <slaweq> maciejjozefczyk: setting it in devstack is the best place but we can do it for our jobs even just like DNM patch as I'm curius if that will really help
15:40:43 <maciejjozefczyk> slaweq, I already did it similiar way in https://review.opendev.org/#/c/711425/
15:40:51 <maciejjozefczyk> with depends-on
15:41:20 <maciejjozefczyk> anyways, lemme check this and I'll be back with results :)
15:41:27 <slaweq> ok, thx
15:41:34 <slaweq> so my plan for that is:
15:41:49 <slaweq> 1. we will check with new cirros if some of tests will be more stable,
15:42:08 <slaweq> 2. for router migrations tests and connectivity tests I will open new LP
15:42:17 <slaweq> and we will need to check them in next weeks
15:42:38 <slaweq> is it ok for You?
15:42:55 <maciejjozefczyk> slaweq, you have my sword
15:43:07 <ralonsoh> hahahaha
15:43:14 <slaweq> :)
15:44:14 <slaweq> anything else You have related to scenario/tempest jobs?
15:44:32 <maciejjozefczyk> I'm working on some improvements for QoS tests
15:44:55 <maciejjozefczyk> We spotted some issues while using QoS tests with OVN as a backend
15:45:05 <maciejjozefczyk> #link https://bugs.launchpad.net/neutron/+bug/1866039
15:45:06 <openstack> Launchpad bug 1866039 in neutron "[OVN] QoS gives different bandwidth limit measures than ml2/ovs" [High,In progress] - Assigned to Maciej Jozefczyk (maciej.jozefczyk)
15:45:15 <maciejjozefczyk> It should be addressed by:
15:45:18 <maciejjozefczyk> #link https://review.opendev.org/#/c/711048/
15:45:27 <maciejjozefczyk> If you'll have a minute please take a look:) thanks!
15:45:45 <slaweq> thx maciejjozefczyk, I will review
15:46:14 <maciejjozefczyk> all from me, thanks!
15:46:44 <slaweq> anything else? or should we move on?
15:47:41 <slaweq> ok, so lets move on
15:47:52 <slaweq> I see that njohnston added something related to fullstack tests
15:47:57 <slaweq> so lets get back to this topic now
15:48:03 <slaweq> #topic fullstack/functional
15:48:11 <slaweq> njohnston: You're up :)
15:48:24 <njohnston> Yes - I have a change to mark the fullstack security group tests as stable
15:48:41 <njohnston> https://review.opendev.org/710782
15:48:49 <njohnston> It has not failed in a bit
15:49:11 <njohnston> So I wanted to let you all know, and see how many passing rechecks you think are needed before calling it stable
15:49:37 <bcafarel> so disabling concurrency on security group tests seems to be enough?
15:49:48 <njohnston> it seems to have done the trick
15:50:03 <slaweq> I don't know for sure but IMO it may be worth to try
15:50:12 <slaweq> worst case we will mark it as unstable again
15:50:17 <ralonsoh> +1
15:51:01 <slaweq> thx njohnston for bringing this up :)
15:51:01 <njohnston> that's it for me
15:51:05 <slaweq> I almost forgot about it
15:51:10 <njohnston> :-)
15:51:48 <bcafarel> I'd feel safer with 2 or 3 additional rounds of test rechecks - will see how it runs locally
15:52:05 <njohnston> bcafarel: I can definitely do that
15:52:09 <slaweq> bcafarel: sure :)
15:52:44 <njohnston> so far it has passed the security group tests 5 times, I will do 3 more
15:52:45 <bcafarel> apart from that nice to see it was "only" this secgroup issue :)
15:53:28 <slaweq> great, so I will also keep an eye on it
15:53:42 <slaweq> do You have anything else to talk about today?
15:53:53 <njohnston> nope
15:53:59 <slaweq> if not, I think I can give You few minutes back
15:54:02 <bcafarel> yay
15:54:15 <njohnston> \o/
15:54:16 <ralonsoh> bye
15:54:17 <maciejjozefczyk> ;)
15:54:19 <slaweq> ok, so thx for attending
15:54:21 <slaweq> o/
15:54:23 <slaweq> #endmeeting