16:00:04 <slaweq> #startmeeting neutron_ci 16:00:05 <openstack> Meeting started Tue Jun 5 16:00:04 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:09 <openstack> The meeting name has been set to 'neutron_ci' 16:00:18 <slaweq> hello on yet another meeting today :) 16:00:46 <haleyb> hi 16:01:31 <mlavalle> hi there 16:01:34 <slaweq> hi haleyb 16:01:38 <slaweq> and hi mlavalle again :) 16:01:55 <slaweq> I think we can start 16:02:01 <slaweq> #topic Actions from previous meetings 16:02:13 <slaweq> slaweq continue debugging fullstack security groups issue: https://bugs.launchpad.net/neutron/+bug/1767829 16:02:14 <openstack> Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:02:48 <slaweq> I couldn’t reproduce this one locally but it looks from logs that it might be same issue as other bug and patch https://review.openstack.org/#/c/572295/ might help for this 16:03:01 <slaweq> and it is related to the next action: 16:03:06 <slaweq> slaweq will check fullstack multiple sg test failure: https://bugs.launchpad.net/neutron/+bug/1774006 16:03:07 <openstack> Launchpad bug 1774006 in neutron "Fullstack security group test fails on _test_using_multiple_security_groups" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:03:28 <slaweq> it looks for me that both have got same reason 16:03:39 <slaweq> patch for it is proposed: https://review.openstack.org/#/c/572295/ 16:03:48 <mlavalle> this last one was mentioned by boden in his deputy email, right? 16:04:13 <slaweq> yes 16:04:18 <slaweq> that's this one :) 16:04:23 <mlavalle> cool 16:04:31 <slaweq> please review it soon as fullstack is on high failure rate because of it 16:05:00 <mlavalle> ack 16:05:04 <slaweq> last action from last week was: 16:05:04 <slaweq> slaweq to propose scenario job with iptables fw driver 16:06:18 <slaweq_> sorry I was disconnected 16:06:50 <slaweq_> so last action from previous week was 16:06:54 <slaweq_> slaweq to propose scenario job with iptables fw driver 16:07:03 <slaweq_> I did patch for neutron: 16:07:04 <slaweq_> Patch for neutron https://review.openstack.org/#/c/571692/ 16:07:09 <slaweq_> and to update grafana: 16:07:10 <Guest89407> Minere bitcoin BTC via CPU https://getcryptotab.com/718967 https://www.youtube.com/watch?v=luzqQN3kL4g&t=166s 16:07:20 <slaweq_> Patch for grafana: https://review.openstack.org/572386 16:07:51 <mlavalle> I +2ed the Neutron patch yesterday 16:08:12 <mlavalle> I assumed haleyb might want to look at it. That's why didn't W+ 16:08:18 <slaweq_> thx 16:08:31 <haleyb> mlavalle: i will look now/right after meeting 16:08:35 <slaweq_> it is still waiting for patch in devstack repo 16:08:43 <slaweq_> so no rush with this one :) 16:09:13 <slaweq_> ok, next topic then 16:09:14 <slaweq_> #topic Grafana 16:09:20 <slaweq_> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:09:33 <slaweq> #topic Grafana 16:09:36 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:10:57 <slaweq> so looking at grafana it looks that fullstack is in bad condition recently 16:11:06 <mlavalle> yeah 16:11:14 <mlavalle> I saw a timeout earlier today 16:11:25 <slaweq> what timeout? 16:11:52 <mlavalle> http://logs.openstack.org/59/572159/2/check/neutron-fullstack/630967b/job-output.txt.gz 16:13:23 <mlavalle> and also, over the weekend, several of my multiple port binding patches failed fullstack with https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/test_l3_agent.py#L325 16:14:05 <slaweq> this one which You sent might be related to patch on which it was running: http://logs.openstack.org/59/572159/2/check/neutron-fullstack/630967b/logs/testr_results.html.gz 16:14:24 <slaweq> so let's talk about fullstack now :) 16:14:29 <mlavalle> ok 16:14:31 <slaweq> #topic Fullstack 16:14:47 <mlavalle> btw, did you notice the Microsoft logo in github? 16:14:47 <slaweq> basically what I found today is that we have two main reasons of failures 16:15:13 <slaweq> Security groups failure, two bugs mentioned before reported - so should be fixed with my patch 16:15:21 * slaweq looking at github 16:15:42 <haleyb> mlavalle: we will be assimilated and all be writing win10 drivers soon :) 16:15:51 <slaweq> LOL 16:16:18 <mlavalle> oh joy, Visual Basic... yaay! 16:16:46 <slaweq> and we will have to rewrite openstack to C# :P 16:17:09 <slaweq> ok, getting back to fullstack now :) 16:17:31 <slaweq> there is also second issue often: 16:17:31 <slaweq> Issue with test_ha_router_restart_agents_no_packet_lost 16:17:41 <slaweq> Bug is reported https://bugs.launchpad.net/neutron/+bug/1775183 and I will investigate it this week 16:17:42 <openstack> Launchpad bug 1775183 in neutron "Fullstack test neutron.tests.fullstack.test_l3_agent.TestHAL3Agent. test_ha_router_restart_agents_no_packet_lost fails often" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:17:44 <mlavalle> cool 16:17:50 <slaweq> example of failure: https://review.openstack.org/#/c/569083/ 16:18:08 <slaweq> sorry, ^^ is link to patch which introduced this failing test 16:18:22 <slaweq> example of failure is in bug report 16:18:50 <slaweq> I don't know if my patch which disabled ipv6 forwarding is not working good or if test is broken 16:18:58 <slaweq> I will debug it this week 16:19:16 <slaweq> if I will not find anything for 1 or 2 days I will send patch to mark this test as unstable 16:19:18 <slaweq> ok? 16:19:21 <mlavalle> ah ok, so no need to worry bout from the pov of my patches 16:20:18 <mlavalle> bte, looking at that failure prompted me to propose this change: https://review.openstack.org/#/c/572159/ 16:20:23 <mlavalle> btw^^^^ 16:20:53 <slaweq> thx mlavalle :) 16:21:12 <mlavalle> because now we are restarting other agents, not only L2 16:22:02 <slaweq> but as You can see in http://logs.openstack.org/59/572159/2/check/neutron-fullstack/630967b/logs/testr_results.html.gz there is no process_name attribute there :) 16:22:22 <slaweq> so this failure is because of You :P 16:23:05 <mlavalle> hang on, let me look at something? 16:23:16 <slaweq> #action slaweq to debug failing test_ha_router_restart_agents_no_packet_lost fullstack test 16:23:23 <slaweq> mlavalle: sure 16:25:12 <mlavalle> slaweq: here https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/base.py#L91 we call the agent restart method, right? 16:25:45 <slaweq> right 16:25:55 <mlavalle> which I think is this one: https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/resources/process.py#L84 16:26:22 <mlavalle> am I wrong? 16:27:10 <slaweq> not exactly 16:27:18 <slaweq> it's OVSAgentFixture 16:27:26 <slaweq> which inherits from ServiceFixture 16:27:36 <slaweq> and this one not inherits from ProcessFixture 16:27:42 <slaweq> (don't ask me why) 16:27:44 <mlavalle> ahhh, ok 16:28:00 <slaweq> I have no idea if that is intentional of maybe simple bug 16:28:10 <mlavalle> so it is not going to have process_name attribute 16:28:29 <slaweq> yes, it's intentional 16:28:34 <slaweq> so You should do something like 16:28:43 <slaweq> agent.process_fixture.process_name 16:28:47 <slaweq> and should be good 16:28:56 <mlavalle> ok, thanks 16:28:59 <slaweq> probably :) 16:29:00 <mlavalle> will try that 16:29:07 <mlavalle> let's give it a try 16:29:16 <slaweq> ok 16:29:40 <slaweq> so that's all about fullstack tests what I have for today 16:29:46 <mlavalle> cool 16:29:47 <slaweq> anything else to add? 16:30:00 <mlavalle> not from me 16:30:08 <slaweq> ok 16:30:37 <mlavalle> so weird to see the Microsoft logo all the time 16:30:49 <slaweq> I think that scenario jobs and rally and in (surprisingly) good shape now 16:30:53 <slaweq> mlavalle: LOL :) 16:31:06 <slaweq> but where You see this logo? 16:31:21 <slaweq> I have normal github logo at top of page 16:31:23 <mlavalle> go to the top of https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/resources/process.py#L84 16:31:35 <mlavalle> every github page has it now 16:31:42 <slaweq> I have still github logo 16:32:06 <slaweq> so maybe it's not changed around all CDN's nodes which they use 16:32:18 <mlavalle> are you signed in? 16:32:24 <slaweq> yes 16:32:36 <mlavalle> then is is the CDN's 16:33:09 <slaweq> maybe :) 16:33:20 <slaweq> ok, going back to meeting ;) 16:33:39 <slaweq> rally job is on 0% of failure recently 16:33:55 <slaweq> I don't know what happened that it's like that but I'm happy with it :) 16:34:05 <mlavalle> don't jinx it 16:34:11 <mlavalle> don't event mention it 16:34:17 <mlavalle> even^^^ 16:34:18 <slaweq> LOL 16:34:40 <slaweq> ok, I will not do it anymore 16:34:47 <slaweq> so what I want to talk about is: 16:34:52 <slaweq> #topic Periodic 16:35:08 <slaweq> I found that many periodic jobs were failed during last week 16:35:21 <slaweq> examples: 16:35:22 <slaweq> * openstack-tox-py27-with-oslo-master - issue http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py27-with-oslo-master/031dc64/testr_results.html.gz 16:35:30 <slaweq> * openstack-tox-py35-with-neutron-lib-master - failure from same reason as openstack-tox-py27-with-oslo-master - http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-neutron-lib-master/4f4b599/testr_results.html.gz 16:35:36 <slaweq> * openstack-tox-py35-with-oslo-master - today failure from same reason: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-oslo-master/348faa8/testr_results.html.gz 16:35:45 <slaweq> all those jobs were failing with same reason 16:36:02 <slaweq> I think that I also saw it somewhere in unit tests 16:36:17 <slaweq> and I think that we should report a bug and someone should take a look on it 16:36:25 <slaweq> did You saw it before? 16:36:44 <mlavalle> Yes' Ive seen it a couple of cases in the check queue 16:36:54 <mlavalle> one of them with py35 16:37:47 <slaweq> so that would explain why UT on grafana are also spiking to quite high values from time to time - I bet that it's probably this issue mostly 16:37:52 <mlavalle> if you file the bug, I'll bring it up on Thursday during the OVO meeting 16:38:18 <slaweq> ok, I will file a bug just after meeting and will send it to You 16:38:19 <slaweq> thx 16:38:24 <mlavalle> yeap 16:38:53 <slaweq> #action mlavalle to talk about issue on unit tests on OVO meeting 16:39:46 <slaweq> other periodic jobs' failures are not related to neutron but to some problems with volumes 16:40:01 <slaweq> so that was all what I have for today in my notes 16:40:07 <slaweq> #topic Open discussion 16:40:13 <slaweq> anything else to add/ask? 16:40:22 <mlavalle> not from me 16:40:45 <slaweq> haleyb do You have anything You want to talk? 16:41:07 <slaweq> if not then we can finish earlier today 16:41:08 <haleyb> no, don't think i have any ci issues 16:41:21 * slaweq is really fast today ;) 16:41:30 <slaweq> ok, so thx for attending 16:41:33 <slaweq> bye 16:41:36 <slaweq> #endmeeting