16:00:55 #startmeeting neutron_ci 16:00:56 Meeting started Tue May 29 16:00:55 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:57 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:58 hello 16:00:59 The meeting name has been set to 'neutron_ci' 16:01:10 o/ 16:01:52 hi mlavalle 16:02:00 haleyb: are You around maybe? 16:02:23 slaweq: yes, sorry, just getting off other call 16:02:33 haleyb: no problem, hi :) 16:02:40 ok, so let's start 16:02:46 #topic Actions from previous meetings 16:03:00 first one is: 16:03:00 mlavalle will check why trunk tests are failing in dvr multinode scenario 16:03:17 I took a look at it 16:03:47 The first one is neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_subport_connectivity 16:04:30 it seems that with this patch we introduced a bug: https://review.openstack.org/#/c/567875 16:05:08 We are getting a bash syntax error: http://logs.openstack.org/05/570405/3/check/neutron-tempest-plugin-dvr-multinode-scenario/bd70e59/job-output.txt.gz#_2018-05-25_19_15_56_380909 16:05:30 I have examined a couple of intsnaces of the error and they are the same 16:06:02 good catch mlavalle 16:06:04 it seems to me the " is not needed in dhclient $IFACE.10" 16:06:20 so I will propose a patch to fix it 16:06:47 yes, good catch mlavalle :) 16:07:04 I am still investigating what is the problem with neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle 16:07:20 in that case we get ssh time out 16:07:29 I will continue working on it 16:07:45 ok, great that we have some progress on it, thx mlavalle 16:07:58 :-) 16:08:19 next one was: 16:08:19 slaweq will continue debugging slow rally tests issue 16:08:33 again I didn't have time to investigate it 16:08:44 we had the Summit 16:08:57 so understandable 16:09:00 but as it not happens often I "set low priority" to it in my queue :) 16:09:25 next one: 16:09:26 slaweq to debug failing security groups fullstack test: https://bugs.launchpad.net/neutron/+bug/1767829 16:09:27 Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:09:34 I started digging into it more today 16:09:49 but it also not happens often in gate recently 16:10:30 those are the tough ones 16:10:34 I can reproduce it together with patch https://review.openstack.org/#/c/470912/ locally - issue looks the same so I hope that maybe this patch is somehow triggering it often 16:10:56 if not, maybe I will at least find what is wrong with https://review.openstack.org/#/c/470912/ and help with this one :) 16:11:26 #action slaweq continue debugging fullstack security groups issue: https://bugs.launchpad.net/neutron/+bug/1767829 16:11:27 Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:11:28 slaweq: interesting. i tried to reproduce that manually and couldn't, we'll need to look at conntrack entries before/after 16:11:58 haleyb: I just had failed test with Your patch locally 10 minutes before meeting :) 16:12:08 I plan to continue work on it tomorrow morning 16:12:16 ack 16:12:37 I also added some additional logs in conntrack module to check what happens there - maybe I will find something tomorrow 16:12:52 ok, next one is: 16:12:54 slaweq to check why some scenario tests don't log instance console log 16:13:01 I added logging console log in case of remote_connectivity check will fail: https://review.openstack.org/#/c/568808/ 16:13:24 it's merged so we should have console log logged in case of such issues now 16:13:55 yeah I saw that one. Thanks! 16:14:15 and the last one was: 16:14:16 slaweq to switch neutron-tempest-plugin-designate-scenario to be voting 16:14:22 Done: https://review.openstack.org/#/c/568681/ 16:14:34 grafana dashboard is also updated 16:14:46 so we have one more voting job now :) 16:14:53 yeap 16:15:29 ok, moving on to next topic 16:15:30 #topic Grafana 16:15:37 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:16:05 * slaweq has storm here so internet connection might be broken accidentally 16:17:02 * mlavalle only sees blue skies and 95F (~35C) 16:17:09 mlavalle: LOL 16:17:25 Yesterday and today morning there was an issue with cmd2 python module. It now should be fixed by https://review.openstack.org/#/c/570822/ 16:17:52 because of this issue most of jobs were failing recently 16:17:56 cool, thanks for letting us know 16:18:03 but it should be good soon (I hope) 16:18:25 it was a good thing that it was slow due to the holiday in the USA 16:18:40 yes, exactly 16:18:49 there wasn't many patches in queue yesterday 16:19:03 second thing, just FYI: Last week there was also problem with openvswitch kernel module compilation which impacted fullstack and ovsfw scenario jobs - fixed already by https://review.openstack.org/#/c/570085/ 16:20:34 do You have anything to add/ask about grafana? 16:20:49 no, I think we need to wait for things to normalize 16:20:49 or we will now go to discuss different jobs? 16:20:57 mlavalle: I agree 16:21:14 so going to next topic then 16:21:15 #topic Fullstack 16:21:39 when I was preparing today's meeting I found two different problems with fullstack 16:21:56 one is issue with (probably) conntrack entries which we already discussed 16:22:19 and second one which happens more often: https://bugs.launchpad.net/neutron/+bug/1774006 16:22:20 Launchpad bug 1774006 in neutron "Fullstack security group test fails on _test_using_multiple_security_groups" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:22:37 Failure related (probably) to new test for multiple sg groups, e.g.: 16:22:47 http://logs.openstack.org/37/558037/6/check/neutron-fullstack/85bb570/logs/testr_results.html.gz 16:22:58 I want to take a look on it soon also 16:23:15 as I have prepared host for debugging fullstack tests now :) 16:23:42 #action slaweq will check fullstack multiple sg test failure: https://bugs.launchpad.net/neutron/+bug/1774006 16:23:43 Launchpad bug 1774006 in neutron "Fullstack security group test fails on _test_using_multiple_security_groups" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:24:14 or I can give one of You this pleasure to debug it if You want :) 16:24:23 sorry that I didn't ask earlier ;) 16:25:49 sorry, is that related to the other SG failure? or maybe it is the failure :( 16:26:15 different one... 16:26:19 haleyb: it is different failure than we discused before 16:26:38 that's why I opened new bug for it also 16:26:39 26 minutes ago :) 16:26:43 yep 16:26:59 did You already forgot about it? :D 16:27:12 is it the weekend yet? 16:27:23 LOL 16:27:46 that failure looks interesting, as if it's a nested network namespace or something 16:28:19 yes, but it is not failing 100% times so it's kind of race probably 16:28:49 but in fact there aren't any nested namespaces IIRC 16:29:16 there are namespaces which "simulates" vms and namespaces which "simulates" host with lb agent 16:29:27 in this namespace lb agent applies it's SG rules 16:29:46 such "hosts" are connected with some ovs bridge between each other 16:30:13 so in this case LB agents are even better isolated than it is done for ovs agents 16:30:53 the ping is generating it though right? that is odd 16:32:18 it isn't right 16:32:22 100% packets lost 16:33:57 where You see this correct ping exactly? 16:34:37 i dind't see a correct ping, just assumed it was the command and that was stderr 16:35:05 * haleyb is looking at logs and sees nothing yet besides some fbd RTNETLINK failures 16:36:18 one warning in api log, http://logs.openstack.org/37/558037/6/check/neutron-fullstack/85bb570/logs/dsvm-fullstack-logs/TestSecurityGroupsSameNetwork.test_securitygroup_linuxbridge-iptables_/neutron-server--2018-05-17--22-16-48-838393.txt.gz#_2018-05-17_22_16_53_090 16:36:22 looks unrelated though 16:36:44 yep, probably not related 16:38:09 nothing very clear in logs at first glance IMO 16:38:25 I will try to look deeper on this one during this week 16:40:12 ok, I think we can move to next topic then 16:40:13 #topic Scenarios 16:40:47 first thing here I wanted to discuss is from jlibosva 16:41:07 he proposed to make openvswitch fw driver as default in devstack: https://review.openstack.org/#/c/568297/ 16:41:19 and I wanted to ask what You think about it 16:41:33 as I said yesterday, I am fine with it 16:42:19 the ovsfw test seems to be stable in the check and gate queues, right? 16:42:25 from grafana dashboard ovsfw firewall job looks stable - it's not failing more than other jobs 16:42:46 I say let's give it a try 16:43:01 ok, so please vote on this patch mlavalle :) 16:43:14 and also if so, I have a question about our existing jobs then 16:43:23 done 16:43:46 if we will change default driver to openvswitch, I think we should change this ovsfw-scenario job into "iptables-scenario" job 16:43:56 as ovsfw will be done in default one, right? 16:44:09 yeah, that's good point 16:44:32 we need to keep the non default one alive 16:44:38 yep 16:44:58 so is anyone who wants to do this change or should I assing it to myself? 16:45:19 if you want to do it, go ahead 16:45:23 ok 16:45:50 #action slaweq to propose scenario job with iptables fw driver 16:46:20 ok, speaking about scenario jobs 16:46:46 we have still some of them on quite high failure rate, non-voting at least :) 16:47:02 I checked reasons of some of such failures 16:47:03 so: 16:47:11 neutron-tempest-multinode-full (non-voting): 16:47:27 failures in this one doesn't look like related to neutron: 16:47:28 * Block migration failure: http://logs.openstack.org/85/570085/5/check/neutron-tempest-multinode-full/09bea41/logs/testr_results.html.gz 16:47:33 * Rebuild server: http://logs.openstack.org/32/564132/2/check/neutron-tempest-multinode-full/a5262a2/logs/testr_results.html.gz 16:47:47 I didn't found other failures in last few days 16:48:07 job neutron-tempest-dvr-ha-multinode-full, example of failure: 16:48:08 * SSH to instance not available: http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/testr_results.html.gz 16:48:19 this one might be related to neutron 16:48:28 I think it is 16:50:22 I found such errors in l3 agent logs: http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/subnode-2/screen-q-l3.txt.gz?level=ERROR 16:50:30 is it known for You? 16:51:31 no, not familiar with that 16:51:56 and in neutron server: http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/screen-q-svc.txt.gz?level=ERROR 16:52:12 but I don't even know if this is related to this failed test exactly 16:52:21 me either, but can guess at the call to update the route and it not existing 16:55:02 can You maybe talk about it during L3 meeting then? 16:55:10 yes 16:55:18 maybe it will be familiar for someone from this team :) 16:55:18 we will take a look 16:55:26 thx mlavalle 16:55:40 slaweq: is there a bug filed for it? 16:55:52 no, I don't know about any 16:56:05 ok, I'll file bug 16:56:19 thx 16:56:39 ok, going quickly to next topic as we are almost out of time 16:56:45 #topic Rally 16:57:07 I noticed today that few days ago rally job name was changed in: https://review.openstack.org/#/c/558037/ 16:57:19 so we don't have stats from rally in grafana since then 16:57:42 today I send a patch to fix that: https://review.openstack.org/#/c/570949/ 16:57:50 so should be good when it will be merged 16:58:21 one moving quickly to last topic :) 16:58:25 #topic Open discussion 16:58:42 I was asked today if we are planning to create some tag in neutron_tempest_plugin repo 16:58:46 and if yes, then when 16:58:51 mlavalle: do You know? 16:59:05 we can do it whenever it is needed 16:59:19 how about next week with Rocky-2 16:59:25 ? 16:59:28 would be great IMO 16:59:34 thx 16:59:43 ok, will do it next week 16:59:47 thx mlavalle 16:59:48 towards the end 16:59:53 and one last thing to mention 17:00:00 According to session in Vancouver I started switching neutron projects to stestr as test runner, patches have common topic. Please review them: 17:00:00 https://review.openstack.org/#/q/status:open+branch:master+topic:switch-to-stestr 17:00:07 ok, we are out of time 17:00:10 thx 17:00:12 mlavalle: sounds good. i can't remember if i was supposed to write up something in our docs to say our cadencde for tags was every release 17:00:13 Tnamks! 17:00:13 #endmeeting