15:01:54 <slaweq> #startmeeting neutron_ci 15:01:55 <openstack> Meeting started Wed Mar 11 15:01:54 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:56 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:58 <openstack> The meeting name has been set to 'neutron_ci' 15:02:01 <njohnston> o/ 15:02:06 <slaweq> njohnston: ralonsoh bcafarel: here it should be :) 15:02:07 <ralonsoh> hello again 15:02:11 <bcafarel> o/ 15:02:25 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:28 <bcafarel> ah you were in another chan? that was why I felt lonely here :) 15:02:38 <slaweq> bcafarel: sorry :) 15:02:43 <slaweq> it was my mistake 15:02:55 <slaweq> ok, lets start 15:02:57 <slaweq> #topic Actions from previous meetings 15:03:06 <slaweq> first one 15:03:08 <slaweq> slaweq to remove neutron-tempest-dvr job from grafana 15:03:13 <slaweq> patch: https://review.opendev.org/712048 15:03:44 <slaweq> I don't think there is much to say about it so lets move on to the next one 15:03:47 <slaweq> maciejjozefczyk to take a look at https://bugs.launchpad.net/neutron/+bug/1865453 15:03:48 <openstack> Launchpad bug 1865453 in neutron "neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestVirtualPorts.test_virtual_port_created_before fails randomly" [High,Confirmed] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) 15:04:20 <maciejjozefczyk> slaweq, yes, im going to do it asap, I was on pto 15:04:45 <slaweq> ok maciejjozefczyk, I will assign it to You for next week, ok? 15:05:11 <maciejjozefczyk> slaweq, im already assigned to this one 15:05:19 <slaweq> maciejjozefczyk: ok 15:05:28 <slaweq> #action maciejjozefczyk to take a look at https://bugs.launchpad.net/neutron/+bug/1865453 15:05:29 <openstack> Launchpad bug 1865453 in neutron "neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestVirtualPorts.test_virtual_port_created_before fails randomly" [High,Confirmed] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) 15:05:41 <slaweq> maciejjozefczyk: do You think we should mark those tests as unstable temporary? 15:06:32 <maciejjozefczyk> slaweq, let me try to take a look and spend an hour on it, If I'll not find anything at first shot I'll send a patch to mark to unstable, aight? 15:06:44 <slaweq> maciejjozefczyk: sure, sounds good 15:07:05 <slaweq> ok, next one 15:07:07 <slaweq> ralonsoh to check "neutron_tempest_plugin.scenario.test_multicast.MulticastTestIPv4 failing often on deletion of server" 15:07:19 <ralonsoh> slaweq, sorry, I didn't have time for this one 15:07:58 <slaweq> ok 15:08:09 <slaweq> will You try to check it this week? 15:08:29 <ralonsoh> yes, sure, I think I'll have time 15:08:36 <slaweq> #action ralonsoh to check "neutron_tempest_plugin.scenario.test_multicast.MulticastTestIPv4 failing often on deletion of server" 15:08:38 <slaweq> ralonsoh: thx 15:08:52 <slaweq> and the last one 15:08:54 <slaweq> slaweq to check problem with console output in scenario test 15:09:00 <slaweq> Patch https://review.opendev.org/712054 15:09:17 <slaweq> I think this should solve this issue 15:09:49 <slaweq> please review when You will have some time 15:10:04 <njohnston> will do 15:10:06 <slaweq> and that's all on the list of actions from last week 15:10:08 <slaweq> thx njohnston 15:10:11 <bcafarel> "for server in servers: server = server.get("server") or server" that makes a funny line 15:10:38 <slaweq> bcafarel: yes, but it works :) 15:10:42 <maciejjozefczyk> slaweq, ;) 15:10:47 <bcafarel> true :) 15:11:19 <slaweq> #topic Stadium projects 15:11:27 <slaweq> standardize on zuul v3 15:11:29 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 15:11:58 <slaweq> according to zuul v3 migration, we have only 3 stadium projects left on the list 15:12:04 <slaweq> networking-bagpipe 15:12:06 <slaweq> networking-midonet 15:12:08 <slaweq> networking-odl 15:12:22 <slaweq> for bagpipe there is some patch proposed to convert fullstack job 15:12:29 <slaweq> but it's very red currently 15:12:38 <slaweq> for midonet and odl there is nothing proposed yet 15:13:37 <slaweq> in overall we are doing good progress on this 15:13:56 <slaweq> anything else related to stadium projects for today? 15:14:27 <bcafarel> should we add some ipv6-only goal section here? 15:14:44 <slaweq> bcafarel: good point 15:14:52 <bcafarel> https://review.opendev.org/#/q/status:open+topic:ipv6-only-deployment-and-testing+(status:open+OR+status:merged)+(project:%255Eopenstack/neutron.*+OR+project:%255Eopenstack/networking-.*) you mentioned on neutron meeting mostly shows networking-* remaining patches 15:15:04 <slaweq> I will prepare etherpad with summary of what is wrong there and will add it to the agenda 15:15:39 <slaweq> #action slaweq to prepare etherpad to track progress with ipv6-only testing goal 15:15:47 <bcafarel> thanks! I tried to rebase networking-generic-switch patch but apparently there are other failures around (and did not check the others yet) 15:16:02 <slaweq> thx bcafarel 15:17:46 <slaweq> ok, lets move on 15:17:48 <slaweq> #topic Grafana 15:17:55 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:19:19 <slaweq> in overall we still have the same problems as last week(s) 15:19:42 <slaweq> high failure rate of dvr scenario jobs (but those are non-voting) 15:19:52 <slaweq> and pretty high failure rate of functional job and grenade jobs 15:20:48 <ralonsoh> do you have a list of grenade and FTs tests failing? 15:21:10 <slaweq> ralonsoh: for grenade jobs nope 15:21:27 <slaweq> and for functional tests, it's mostly those ovn mech driver tests 15:21:41 <slaweq> which maciejjozefczyk will check 15:21:50 <ralonsoh> you are right 15:22:04 <bcafarel> I sent quite a few "recheck grenade" on stable/train backports recently too, older branches are not affected 15:22:16 <slaweq> for grenade job, last time when I was checking it, it was mostly issue with some timeout on placement and instance in error state due to that 15:22:36 <slaweq> but maybe there is some different issue now 15:22:59 <slaweq> https://zuul.opendev.org/t/openstack/build/e820bdd66eb040fa9db010de1fe821b3 15:23:09 <slaweq> here is the grenade job failure from today :) 15:24:40 <slaweq> failures with "No valid host was found" 15:24:41 <slaweq> https://5e801b59e3f992e71df8-ab7619989a2ab5c37e2a2a5061e93a1d.ssl.cf5.rackcdn.com/711887/1/check/neutron-grenade-multinode/e820bdd/logs/grenade.sh.txt 15:25:54 <ralonsoh> No valid host was found. There are not enough hosts available 15:26:00 <ralonsoh> same error always 15:26:07 <slaweq> yep 15:26:22 <slaweq> and I believe it would be some issue in placement 15:27:09 <bcafarel> ralonsoh: some failures in http://paste.openstack.org/show/790544/ (I did not check them, maybe "no valid host" everywhere) 15:27:32 <ralonsoh> I'll check it 15:28:43 <slaweq> there is such error there: https://zuul.opendev.org/t/openstack/build/e820bdd66eb040fa9db010de1fe821b3/log/logs/screen-n-sch.txt#1031 15:28:48 <slaweq> I think it may be related 15:28:58 <slaweq> it's not from placement but nova 15:29:25 <slaweq> if anyone has got any cycles and can take a look, that would be great :) 15:29:54 <slaweq> ok, anything else related to grafana for today? 15:30:56 <slaweq> if not, I think we can move on to the next topic 15:31:17 <slaweq> I don't have anything about functional/fullstack jobs for today 15:31:24 <slaweq> so lets talk about scenario 15:31:31 <slaweq> #topic Tempest/Scenario jobs 15:31:58 <slaweq> as I said before, our biggest issues are with dvr related multinode jobs 15:32:14 <slaweq> so I spent some time today to prepare some summary of what tests are failing there 15:32:23 <slaweq> and I have: https://etherpad.openstack.org/p/neutron-dvr-jobs-issues 15:32:36 <slaweq> most of the failures are in neutron-tempest-plugin-dvr-multinode-scenario 15:32:44 <bcafarel> that is a long list :( 15:33:06 <slaweq> bcafarel: some links are there couple of times as in one job more than one test failed 15:33:26 <slaweq> and I put it in each section as I wanted to check how much each of those tests is failing 15:34:00 <slaweq> so our biggest issues are: 15:34:14 <slaweq> router migration from something (HA or Legacy) to dvr 15:34:31 <slaweq> and I believe that this is the same issue in both cases 15:34:38 <slaweq> and neutron_tempest_plugin.scenario.test_connectivity.NetworkConnectivityTest.test_connectivity_through_2_routers 15:34:51 <slaweq> and also security_groups tests 15:35:17 <slaweq> but those may be hopefully fixed when we will use new cirros 0.5.0 in our gate 15:35:39 <slaweq> as maciejjozefczyk did some improvements in cirros image to address issues with metadata timeouts 15:35:58 <ralonsoh> I think there were some problems with cirros 5 and OVN 15:36:11 <ralonsoh> but maciejjozefczyk knows this better 15:36:33 <maciejjozefczyk> ralonsoh, I remember issue with ip link, but it should be fixed in 0.5.1 15:36:44 <ralonsoh> ahhh ok, perfect 15:36:51 <slaweq> maciejjozefczyk: is 0.5.1 already released? 15:37:40 <maciejjozefczyk> slaweq, yes: http://download.cirros-cloud.net/0.5.1/ 15:38:00 <bcafarel> maciejjozefczyk: is this new new release for the failures you saw trying 0.5.0? 15:38:04 <slaweq> maciejjozefczyk: great, You have already patch to use it in neutron, right? 15:38:05 <maciejjozefczyk> im gonna verify it this week if all is fine, I saw some comments from Radoslaw here: https://review.opendev.org/#/c/711492/ 15:38:36 <slaweq> maciejjozefczyk: can You also send patch to neutron-tempest-plugin to use it in our jobs? 15:38:51 <slaweq> we can then check how many (if any) tests more will be green 15:38:53 <slaweq> :) 15:39:21 <maciejjozefczyk> slaweq, yes sure, but I think that setting it in global devstack is the right way? I didnt find any setting for this in our tempest configuration, afair 15:39:46 <maciejjozefczyk> or I missed something, anyways, im gonna take a look on it this week 15:40:03 <ralonsoh> we retrieve that from devstack 15:40:08 <slaweq> maciejjozefczyk: setting it in devstack is the best place but we can do it for our jobs even just like DNM patch as I'm curius if that will really help 15:40:43 <maciejjozefczyk> slaweq, I already did it similiar way in https://review.opendev.org/#/c/711425/ 15:40:51 <maciejjozefczyk> with depends-on 15:41:20 <maciejjozefczyk> anyways, lemme check this and I'll be back with results :) 15:41:27 <slaweq> ok, thx 15:41:34 <slaweq> so my plan for that is: 15:41:49 <slaweq> 1. we will check with new cirros if some of tests will be more stable, 15:42:08 <slaweq> 2. for router migrations tests and connectivity tests I will open new LP 15:42:17 <slaweq> and we will need to check them in next weeks 15:42:38 <slaweq> is it ok for You? 15:42:55 <maciejjozefczyk> slaweq, you have my sword 15:43:07 <ralonsoh> hahahaha 15:43:14 <slaweq> :) 15:44:14 <slaweq> anything else You have related to scenario/tempest jobs? 15:44:32 <maciejjozefczyk> I'm working on some improvements for QoS tests 15:44:55 <maciejjozefczyk> We spotted some issues while using QoS tests with OVN as a backend 15:45:05 <maciejjozefczyk> #link https://bugs.launchpad.net/neutron/+bug/1866039 15:45:06 <openstack> Launchpad bug 1866039 in neutron "[OVN] QoS gives different bandwidth limit measures than ml2/ovs" [High,In progress] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) 15:45:15 <maciejjozefczyk> It should be addressed by: 15:45:18 <maciejjozefczyk> #link https://review.opendev.org/#/c/711048/ 15:45:27 <maciejjozefczyk> If you'll have a minute please take a look:) thanks! 15:45:45 <slaweq> thx maciejjozefczyk, I will review 15:46:14 <maciejjozefczyk> all from me, thanks! 15:46:44 <slaweq> anything else? or should we move on? 15:47:41 <slaweq> ok, so lets move on 15:47:52 <slaweq> I see that njohnston added something related to fullstack tests 15:47:57 <slaweq> so lets get back to this topic now 15:48:03 <slaweq> #topic fullstack/functional 15:48:11 <slaweq> njohnston: You're up :) 15:48:24 <njohnston> Yes - I have a change to mark the fullstack security group tests as stable 15:48:41 <njohnston> https://review.opendev.org/710782 15:48:49 <njohnston> It has not failed in a bit 15:49:11 <njohnston> So I wanted to let you all know, and see how many passing rechecks you think are needed before calling it stable 15:49:37 <bcafarel> so disabling concurrency on security group tests seems to be enough? 15:49:48 <njohnston> it seems to have done the trick 15:50:03 <slaweq> I don't know for sure but IMO it may be worth to try 15:50:12 <slaweq> worst case we will mark it as unstable again 15:50:17 <ralonsoh> +1 15:51:01 <slaweq> thx njohnston for bringing this up :) 15:51:01 <njohnston> that's it for me 15:51:05 <slaweq> I almost forgot about it 15:51:10 <njohnston> :-) 15:51:48 <bcafarel> I'd feel safer with 2 or 3 additional rounds of test rechecks - will see how it runs locally 15:52:05 <njohnston> bcafarel: I can definitely do that 15:52:09 <slaweq> bcafarel: sure :) 15:52:44 <njohnston> so far it has passed the security group tests 5 times, I will do 3 more 15:52:45 <bcafarel> apart from that nice to see it was "only" this secgroup issue :) 15:53:28 <slaweq> great, so I will also keep an eye on it 15:53:42 <slaweq> do You have anything else to talk about today? 15:53:53 <njohnston> nope 15:53:59 <slaweq> if not, I think I can give You few minutes back 15:54:02 <bcafarel> yay 15:54:15 <njohnston> \o/ 15:54:16 <ralonsoh> bye 15:54:17 <maciejjozefczyk> ;) 15:54:19 <slaweq> ok, so thx for attending 15:54:21 <slaweq> o/ 15:54:23 <slaweq> #endmeeting