16:00:55 <slaweq> #startmeeting neutron_ci 16:00:56 <openstack> Meeting started Tue May 29 16:00:55 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:57 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:58 <slaweq> hello 16:00:59 <openstack> The meeting name has been set to 'neutron_ci' 16:01:10 <mlavalle> o/ 16:01:52 <slaweq> hi mlavalle 16:02:00 <slaweq> haleyb: are You around maybe? 16:02:23 <haleyb> slaweq: yes, sorry, just getting off other call 16:02:33 <slaweq> haleyb: no problem, hi :) 16:02:40 <slaweq> ok, so let's start 16:02:46 <slaweq> #topic Actions from previous meetings 16:03:00 <slaweq> first one is: 16:03:00 <slaweq> mlavalle will check why trunk tests are failing in dvr multinode scenario 16:03:17 <mlavalle> I took a look at it 16:03:47 <mlavalle> The first one is neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_subport_connectivity 16:04:30 <mlavalle> it seems that with this patch we introduced a bug: https://review.openstack.org/#/c/567875 16:05:08 <mlavalle> We are getting a bash syntax error: http://logs.openstack.org/05/570405/3/check/neutron-tempest-plugin-dvr-multinode-scenario/bd70e59/job-output.txt.gz#_2018-05-25_19_15_56_380909 16:05:30 <mlavalle> I have examined a couple of intsnaces of the error and they are the same 16:06:02 <slaweq> good catch mlavalle 16:06:04 <mlavalle> it seems to me the " is not needed in dhclient $IFACE.10" 16:06:20 <mlavalle> so I will propose a patch to fix it 16:06:47 <haleyb> yes, good catch mlavalle :) 16:07:04 <mlavalle> I am still investigating what is the problem with neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle 16:07:20 <mlavalle> in that case we get ssh time out 16:07:29 <mlavalle> I will continue working on it 16:07:45 <slaweq> ok, great that we have some progress on it, thx mlavalle 16:07:58 <mlavalle> :-) 16:08:19 <slaweq> next one was: 16:08:19 <slaweq> slaweq will continue debugging slow rally tests issue 16:08:33 <slaweq> again I didn't have time to investigate it 16:08:44 <mlavalle> we had the Summit 16:08:57 <mlavalle> so understandable 16:09:00 <slaweq> but as it not happens often I "set low priority" to it in my queue :) 16:09:25 <slaweq> next one: 16:09:26 <slaweq> slaweq to debug failing security groups fullstack test: https://bugs.launchpad.net/neutron/+bug/1767829 16:09:27 <openstack> Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:09:34 <slaweq> I started digging into it more today 16:09:49 <slaweq> but it also not happens often in gate recently 16:10:30 <mlavalle> those are the tough ones 16:10:34 <slaweq> I can reproduce it together with patch https://review.openstack.org/#/c/470912/ locally - issue looks the same so I hope that maybe this patch is somehow triggering it often 16:10:56 <slaweq> if not, maybe I will at least find what is wrong with https://review.openstack.org/#/c/470912/ and help with this one :) 16:11:26 <slaweq> #action slaweq continue debugging fullstack security groups issue: https://bugs.launchpad.net/neutron/+bug/1767829 16:11:27 <openstack> Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:11:28 <haleyb> slaweq: interesting. i tried to reproduce that manually and couldn't, we'll need to look at conntrack entries before/after 16:11:58 <slaweq> haleyb: I just had failed test with Your patch locally 10 minutes before meeting :) 16:12:08 <slaweq> I plan to continue work on it tomorrow morning 16:12:16 <haleyb> ack 16:12:37 <slaweq> I also added some additional logs in conntrack module to check what happens there - maybe I will find something tomorrow 16:12:52 <slaweq> ok, next one is: 16:12:54 <slaweq> slaweq to check why some scenario tests don't log instance console log 16:13:01 <slaweq> I added logging console log in case of remote_connectivity check will fail: https://review.openstack.org/#/c/568808/ 16:13:24 <slaweq> it's merged so we should have console log logged in case of such issues now 16:13:55 <mlavalle> yeah I saw that one. Thanks! 16:14:15 <slaweq> and the last one was: 16:14:16 <slaweq> slaweq to switch neutron-tempest-plugin-designate-scenario to be voting 16:14:22 <slaweq> Done: https://review.openstack.org/#/c/568681/ 16:14:34 <slaweq> grafana dashboard is also updated 16:14:46 <slaweq> so we have one more voting job now :) 16:14:53 <mlavalle> yeap 16:15:29 <slaweq> ok, moving on to next topic 16:15:30 <slaweq> #topic Grafana 16:15:37 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:16:05 * slaweq has storm here so internet connection might be broken accidentally 16:17:02 * mlavalle only sees blue skies and 95F (~35C) 16:17:09 <slaweq> mlavalle: LOL 16:17:25 <slaweq> Yesterday and today morning there was an issue with cmd2 python module. It now should be fixed by https://review.openstack.org/#/c/570822/ 16:17:52 <slaweq> because of this issue most of jobs were failing recently 16:17:56 <mlavalle> cool, thanks for letting us know 16:18:03 <slaweq> but it should be good soon (I hope) 16:18:25 <mlavalle> it was a good thing that it was slow due to the holiday in the USA 16:18:40 <slaweq> yes, exactly 16:18:49 <slaweq> there wasn't many patches in queue yesterday 16:19:03 <slaweq> second thing, just FYI: Last week there was also problem with openvswitch kernel module compilation which impacted fullstack and ovsfw scenario jobs - fixed already by https://review.openstack.org/#/c/570085/ 16:20:34 <slaweq> do You have anything to add/ask about grafana? 16:20:49 <mlavalle> no, I think we need to wait for things to normalize 16:20:49 <slaweq> or we will now go to discuss different jobs? 16:20:57 <slaweq> mlavalle: I agree 16:21:14 <slaweq> so going to next topic then 16:21:15 <slaweq> #topic Fullstack 16:21:39 <slaweq> when I was preparing today's meeting I found two different problems with fullstack 16:21:56 <slaweq> one is issue with (probably) conntrack entries which we already discussed 16:22:19 <slaweq> and second one which happens more often: https://bugs.launchpad.net/neutron/+bug/1774006 16:22:20 <openstack> Launchpad bug 1774006 in neutron "Fullstack security group test fails on _test_using_multiple_security_groups" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:22:37 <slaweq> Failure related (probably) to new test for multiple sg groups, e.g.: 16:22:47 <slaweq> http://logs.openstack.org/37/558037/6/check/neutron-fullstack/85bb570/logs/testr_results.html.gz 16:22:58 <slaweq> I want to take a look on it soon also 16:23:15 <slaweq> as I have prepared host for debugging fullstack tests now :) 16:23:42 <slaweq> #action slaweq will check fullstack multiple sg test failure: https://bugs.launchpad.net/neutron/+bug/1774006 16:23:43 <openstack> Launchpad bug 1774006 in neutron "Fullstack security group test fails on _test_using_multiple_security_groups" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:24:14 <slaweq> or I can give one of You this pleasure to debug it if You want :) 16:24:23 <slaweq> sorry that I didn't ask earlier ;) 16:25:49 <haleyb> sorry, is that related to the other SG failure? or maybe it is the failure :( 16:26:15 <haleyb> different one... 16:26:19 <slaweq> haleyb: it is different failure than we discused before 16:26:38 <slaweq> that's why I opened new bug for it also 16:26:39 <haleyb> 26 minutes ago :) 16:26:43 <slaweq> yep 16:26:59 <slaweq> did You already forgot about it? :D 16:27:12 <haleyb> is it the weekend yet? 16:27:23 <slaweq> LOL 16:27:46 <haleyb> that failure looks interesting, as if it's a nested network namespace or something 16:28:19 <slaweq> yes, but it is not failing 100% times so it's kind of race probably 16:28:49 <slaweq> but in fact there aren't any nested namespaces IIRC 16:29:16 <slaweq> there are namespaces which "simulates" vms and namespaces which "simulates" host with lb agent 16:29:27 <slaweq> in this namespace lb agent applies it's SG rules 16:29:46 <slaweq> such "hosts" are connected with some ovs bridge between each other 16:30:13 <slaweq> so in this case LB agents are even better isolated than it is done for ovs agents 16:30:53 <haleyb> the ping is generating it though right? that is odd 16:32:18 <slaweq> it isn't right 16:32:22 <slaweq> 100% packets lost 16:33:57 <slaweq> where You see this correct ping exactly? 16:34:37 <haleyb> i dind't see a correct ping, just assumed it was the command and that was stderr 16:35:05 * haleyb is looking at logs and sees nothing yet besides some fbd RTNETLINK failures 16:36:18 <haleyb> one warning in api log, http://logs.openstack.org/37/558037/6/check/neutron-fullstack/85bb570/logs/dsvm-fullstack-logs/TestSecurityGroupsSameNetwork.test_securitygroup_linuxbridge-iptables_/neutron-server--2018-05-17--22-16-48-838393.txt.gz#_2018-05-17_22_16_53_090 16:36:22 <haleyb> looks unrelated though 16:36:44 <slaweq> yep, probably not related 16:38:09 <slaweq> nothing very clear in logs at first glance IMO 16:38:25 <slaweq> I will try to look deeper on this one during this week 16:40:12 <slaweq> ok, I think we can move to next topic then 16:40:13 <slaweq> #topic Scenarios 16:40:47 <slaweq> first thing here I wanted to discuss is from jlibosva 16:41:07 <slaweq> he proposed to make openvswitch fw driver as default in devstack: https://review.openstack.org/#/c/568297/ 16:41:19 <slaweq> and I wanted to ask what You think about it 16:41:33 <mlavalle> as I said yesterday, I am fine with it 16:42:19 <mlavalle> the ovsfw test seems to be stable in the check and gate queues, right? 16:42:25 <slaweq> from grafana dashboard ovsfw firewall job looks stable - it's not failing more than other jobs 16:42:46 <mlavalle> I say let's give it a try 16:43:01 <slaweq> ok, so please vote on this patch mlavalle :) 16:43:14 <slaweq> and also if so, I have a question about our existing jobs then 16:43:23 <mlavalle> done 16:43:46 <slaweq> if we will change default driver to openvswitch, I think we should change this ovsfw-scenario job into "iptables-scenario" job 16:43:56 <slaweq> as ovsfw will be done in default one, right? 16:44:09 <mlavalle> yeah, that's good point 16:44:32 <mlavalle> we need to keep the non default one alive 16:44:38 <slaweq> yep 16:44:58 <slaweq> so is anyone who wants to do this change or should I assing it to myself? 16:45:19 <mlavalle> if you want to do it, go ahead 16:45:23 <slaweq> ok 16:45:50 <slaweq> #action slaweq to propose scenario job with iptables fw driver 16:46:20 <slaweq> ok, speaking about scenario jobs 16:46:46 <slaweq> we have still some of them on quite high failure rate, non-voting at least :) 16:47:02 <slaweq> I checked reasons of some of such failures 16:47:03 <slaweq> so: 16:47:11 <slaweq> neutron-tempest-multinode-full (non-voting): 16:47:27 <slaweq> failures in this one doesn't look like related to neutron: 16:47:28 <slaweq> * Block migration failure: http://logs.openstack.org/85/570085/5/check/neutron-tempest-multinode-full/09bea41/logs/testr_results.html.gz 16:47:33 <slaweq> * Rebuild server: http://logs.openstack.org/32/564132/2/check/neutron-tempest-multinode-full/a5262a2/logs/testr_results.html.gz 16:47:47 <slaweq> I didn't found other failures in last few days 16:48:07 <slaweq> job neutron-tempest-dvr-ha-multinode-full, example of failure: 16:48:08 <slaweq> * SSH to instance not available: http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/testr_results.html.gz 16:48:19 <slaweq> this one might be related to neutron 16:48:28 <mlavalle> I think it is 16:50:22 <slaweq> I found such errors in l3 agent logs: http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/subnode-2/screen-q-l3.txt.gz?level=ERROR 16:50:30 <slaweq> is it known for You? 16:51:31 <mlavalle> no, not familiar with that 16:51:56 <slaweq> and in neutron server: http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/screen-q-svc.txt.gz?level=ERROR 16:52:12 <slaweq> but I don't even know if this is related to this failed test exactly 16:52:21 <haleyb> me either, but can guess at the call to update the route and it not existing 16:55:02 <slaweq> can You maybe talk about it during L3 meeting then? 16:55:10 <mlavalle> yes 16:55:18 <slaweq> maybe it will be familiar for someone from this team :) 16:55:18 <mlavalle> we will take a look 16:55:26 <slaweq> thx mlavalle 16:55:40 <mlavalle> slaweq: is there a bug filed for it? 16:55:52 <slaweq> no, I don't know about any 16:56:05 <mlavalle> ok, I'll file bug 16:56:19 <slaweq> thx 16:56:39 <slaweq> ok, going quickly to next topic as we are almost out of time 16:56:45 <slaweq> #topic Rally 16:57:07 <slaweq> I noticed today that few days ago rally job name was changed in: https://review.openstack.org/#/c/558037/ 16:57:19 <slaweq> so we don't have stats from rally in grafana since then 16:57:42 <slaweq> today I send a patch to fix that: https://review.openstack.org/#/c/570949/ 16:57:50 <slaweq> so should be good when it will be merged 16:58:21 <slaweq> one moving quickly to last topic :) 16:58:25 <slaweq> #topic Open discussion 16:58:42 <slaweq> I was asked today if we are planning to create some tag in neutron_tempest_plugin repo 16:58:46 <slaweq> and if yes, then when 16:58:51 <slaweq> mlavalle: do You know? 16:59:05 <mlavalle> we can do it whenever it is needed 16:59:19 <mlavalle> how about next week with Rocky-2 16:59:25 <mlavalle> ? 16:59:28 <slaweq> would be great IMO 16:59:34 <slaweq> thx 16:59:43 <mlavalle> ok, will do it next week 16:59:47 <slaweq> thx mlavalle 16:59:48 <mlavalle> towards the end 16:59:53 <slaweq> and one last thing to mention 17:00:00 <slaweq> According to session in Vancouver I started switching neutron projects to stestr as test runner, patches have common topic. Please review them: 17:00:00 <slaweq> https://review.openstack.org/#/q/status:open+branch:master+topic:switch-to-stestr 17:00:07 <slaweq> ok, we are out of time 17:00:10 <slaweq> thx 17:00:12 <haleyb> mlavalle: sounds good. i can't remember if i was supposed to write up something in our docs to say our cadencde for tags was every release 17:00:13 <mlavalle> Tnamks! 17:00:13 <slaweq> #endmeeting