#openstack-meeting log

16:00:04 <slaweq> #startmeeting neutron_ci
16:00:05 <openstack> Meeting started Tue Jun  5 16:00:04 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:09 <openstack> The meeting name has been set to 'neutron_ci'
16:00:18 <slaweq> hello on yet another meeting today :)
16:00:46 <haleyb> hi
16:01:31 <mlavalle> hi there
16:01:34 <slaweq> hi haleyb
16:01:38 <slaweq> and hi mlavalle again :)
16:01:55 <slaweq> I think we can start
16:02:01 <slaweq> #topic Actions from previous meetings
16:02:13 <slaweq> slaweq continue debugging fullstack security groups issue: https://bugs.launchpad.net/neutron/+bug/1767829
16:02:14 <openstack> Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,In progress] - Assigned to Slawek Kaplonski (slaweq)
16:02:48 <slaweq> I couldn’t reproduce this one locally but it looks from logs that it might be same issue as other bug and patch https://review.openstack.org/#/c/572295/ might help for this
16:03:01 <slaweq> and it is related to the next action:
16:03:06 <slaweq> slaweq will check fullstack multiple sg test failure: https://bugs.launchpad.net/neutron/+bug/1774006
16:03:07 <openstack> Launchpad bug 1774006 in neutron "Fullstack security group test fails on _test_using_multiple_security_groups" [High,In progress] - Assigned to Slawek Kaplonski (slaweq)
16:03:28 <slaweq> it looks for me that both have got same reason
16:03:39 <slaweq> patch for it is proposed: https://review.openstack.org/#/c/572295/
16:03:48 <mlavalle> this last one was mentioned by boden in his deputy email, right?
16:04:13 <slaweq> yes
16:04:18 <slaweq> that's this one :)
16:04:23 <mlavalle> cool
16:04:31 <slaweq> please review it soon as fullstack is on high failure rate because of it
16:05:00 <mlavalle> ack
16:05:04 <slaweq> last action from last week was:
16:05:04 <slaweq> slaweq to propose scenario job with iptables fw driver
16:06:18 <slaweq_> sorry I was disconnected
16:06:50 <slaweq_> so last action from previous week was
16:06:54 <slaweq_> slaweq to propose scenario job with iptables fw driver
16:07:03 <slaweq_> I did patch for neutron:
16:07:04 <slaweq_> Patch for neutron https://review.openstack.org/#/c/571692/
16:07:09 <slaweq_> and to update grafana:
16:07:10 <Guest89407> Minere bitcoin BTC via CPU  https://getcryptotab.com/718967 https://www.youtube.com/watch?v=luzqQN3kL4g&t=166s
16:07:20 <slaweq_> Patch for grafana: https://review.openstack.org/572386
16:07:51 <mlavalle> I +2ed the Neutron patch yesterday
16:08:12 <mlavalle> I assumed haleyb might want to look at it. That's why didn't W+
16:08:18 <slaweq_> thx
16:08:31 <haleyb> mlavalle: i will look now/right after meeting
16:08:35 <slaweq_> it is still waiting for patch in devstack repo
16:08:43 <slaweq_> so no rush with this one :)
16:09:13 <slaweq_> ok, next topic then
16:09:14 <slaweq_> #topic Grafana
16:09:20 <slaweq_> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:09:33 <slaweq> #topic Grafana
16:09:36 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:10:57 <slaweq> so looking at grafana it looks that fullstack is in bad condition recently
16:11:06 <mlavalle> yeah
16:11:14 <mlavalle> I saw a timeout earlier today
16:11:25 <slaweq> what timeout?
16:11:52 <mlavalle> http://logs.openstack.org/59/572159/2/check/neutron-fullstack/630967b/job-output.txt.gz
16:13:23 <mlavalle> and also, over the weekend, several of my multiple port binding patches failed fullstack with https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/test_l3_agent.py#L325
16:14:05 <slaweq> this one which You sent might be related to patch on which it was running: http://logs.openstack.org/59/572159/2/check/neutron-fullstack/630967b/logs/testr_results.html.gz
16:14:24 <slaweq> so let's talk about fullstack now :)
16:14:29 <mlavalle> ok
16:14:31 <slaweq> #topic Fullstack
16:14:47 <mlavalle> btw, did you notice the Microsoft logo in github?
16:14:47 <slaweq> basically what I found today is that we have two main reasons of failures
16:15:13 <slaweq> Security groups failure, two bugs mentioned before reported - so should be fixed with my patch
16:15:21 * slaweq looking at github
16:15:42 <haleyb> mlavalle: we will be assimilated and all be writing win10 drivers soon :)
16:15:51 <slaweq> LOL
16:16:18 <mlavalle> oh joy, Visual Basic... yaay!
16:16:46 <slaweq> and we will have to rewrite openstack to C# :P
16:17:09 <slaweq> ok, getting back to fullstack now :)
16:17:31 <slaweq> there is also second issue often:
16:17:31 <slaweq> Issue with test_ha_router_restart_agents_no_packet_lost
16:17:41 <slaweq> Bug is reported https://bugs.launchpad.net/neutron/+bug/1775183 and I will investigate it this week
16:17:42 <openstack> Launchpad bug 1775183 in neutron "Fullstack test neutron.tests.fullstack.test_l3_agent.TestHAL3Agent. test_ha_router_restart_agents_no_packet_lost fails often" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:17:44 <mlavalle> cool
16:17:50 <slaweq> example of failure: https://review.openstack.org/#/c/569083/
16:18:08 <slaweq> sorry, ^^ is link to patch which introduced this failing test
16:18:22 <slaweq> example of failure is in bug report
16:18:50 <slaweq> I don't know if my patch which disabled ipv6 forwarding is not working good or if test is broken
16:18:58 <slaweq> I will debug it this week
16:19:16 <slaweq> if I will not find anything for 1 or 2 days I will send patch to mark this test as unstable
16:19:18 <slaweq> ok?
16:19:21 <mlavalle> ah ok, so no need to worry bout from the pov of my patches
16:20:18 <mlavalle> bte, looking at that failure prompted me to propose this change: https://review.openstack.org/#/c/572159/
16:20:23 <mlavalle> btw^^^^
16:20:53 <slaweq> thx mlavalle :)
16:21:12 <mlavalle> because now we are restarting other agents, not only L2
16:22:02 <slaweq> but as You can see in http://logs.openstack.org/59/572159/2/check/neutron-fullstack/630967b/logs/testr_results.html.gz there is no process_name attribute there :)
16:22:22 <slaweq> so this failure is because of You :P
16:23:05 <mlavalle> hang on, let me look at something?
16:23:16 <slaweq> #action slaweq to debug failing test_ha_router_restart_agents_no_packet_lost fullstack test
16:23:23 <slaweq> mlavalle: sure
16:25:12 <mlavalle> slaweq: here https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/base.py#L91 we call the agent restart method, right?
16:25:45 <slaweq> right
16:25:55 <mlavalle> which I think is this one: https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/resources/process.py#L84
16:26:22 <mlavalle> am I wrong?
16:27:10 <slaweq> not exactly
16:27:18 <slaweq> it's OVSAgentFixture
16:27:26 <slaweq> which inherits from ServiceFixture
16:27:36 <slaweq> and this one not inherits from ProcessFixture
16:27:42 <slaweq> (don't ask me why)
16:27:44 <mlavalle> ahhh, ok
16:28:00 <slaweq> I have no idea if that is intentional of maybe simple bug
16:28:10 <mlavalle> so it is not going to have process_name attribute
16:28:29 <slaweq> yes, it's intentional
16:28:34 <slaweq> so You should do something like
16:28:43 <slaweq> agent.process_fixture.process_name
16:28:47 <slaweq> and should be good
16:28:56 <mlavalle> ok, thanks
16:28:59 <slaweq> probably :)
16:29:00 <mlavalle> will try that
16:29:07 <mlavalle> let's give it a try
16:29:16 <slaweq> ok
16:29:40 <slaweq> so that's all about fullstack tests what I have for today
16:29:46 <mlavalle> cool
16:29:47 <slaweq> anything else to add?
16:30:00 <mlavalle> not from me
16:30:08 <slaweq> ok
16:30:37 <mlavalle> so weird to see the Microsoft logo all the time
16:30:49 <slaweq> I think that scenario jobs and rally and in (surprisingly) good shape now
16:30:53 <slaweq> mlavalle: LOL :)
16:31:06 <slaweq> but where You see this logo?
16:31:21 <slaweq> I have normal github logo at top of page
16:31:23 <mlavalle> go to the top of https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/resources/process.py#L84
16:31:35 <mlavalle> every github page has it now
16:31:42 <slaweq> I have still github logo
16:32:06 <slaweq> so maybe it's not changed around all CDN's nodes which they use
16:32:18 <mlavalle> are you signed in?
16:32:24 <slaweq> yes
16:32:36 <mlavalle> then is is the CDN's
16:33:09 <slaweq> maybe :)
16:33:20 <slaweq> ok, going back to meeting ;)
16:33:39 <slaweq> rally job is on 0% of failure recently
16:33:55 <slaweq> I don't know what happened that it's like that but I'm happy with it :)
16:34:05 <mlavalle> don't jinx it
16:34:11 <mlavalle> don't event mention it
16:34:17 <mlavalle> even^^^
16:34:18 <slaweq> LOL
16:34:40 <slaweq> ok, I will not do it anymore
16:34:47 <slaweq> so what I want to talk about is:
16:34:52 <slaweq> #topic Periodic
16:35:08 <slaweq> I found that many periodic jobs were failed during last week
16:35:21 <slaweq> examples:
16:35:22 <slaweq> * openstack-tox-py27-with-oslo-master - issue http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py27-with-oslo-master/031dc64/testr_results.html.gz
16:35:30 <slaweq> * openstack-tox-py35-with-neutron-lib-master - failure from same reason as openstack-tox-py27-with-oslo-master - http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-neutron-lib-master/4f4b599/testr_results.html.gz
16:35:36 <slaweq> * openstack-tox-py35-with-oslo-master - today failure from same reason: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-oslo-master/348faa8/testr_results.html.gz
16:35:45 <slaweq> all those jobs were failing with same reason
16:36:02 <slaweq> I think that I also saw it somewhere in unit tests
16:36:17 <slaweq> and I think that we should report a bug and someone should take a look on it
16:36:25 <slaweq> did You saw it before?
16:36:44 <mlavalle> Yes' Ive seen it a couple of cases in the check queue
16:36:54 <mlavalle> one of them with py35
16:37:47 <slaweq> so that would explain why UT on grafana are also spiking to quite high values from time to time - I bet that it's probably this issue mostly
16:37:52 <mlavalle> if you file the bug, I'll bring it up on Thursday during the OVO meeting
16:38:18 <slaweq> ok, I will file a bug just after meeting and will send it to You
16:38:19 <slaweq> thx
16:38:24 <mlavalle> yeap
16:38:53 <slaweq> #action mlavalle to talk about issue on unit tests on OVO meeting
16:39:46 <slaweq> other periodic jobs' failures are not related to neutron but to some problems with volumes
16:40:01 <slaweq> so that was all what I have for today in my notes
16:40:07 <slaweq> #topic Open discussion
16:40:13 <slaweq> anything else to add/ask?
16:40:22 <mlavalle> not from me
16:40:45 <slaweq> haleyb do You have anything You want to talk?
16:41:07 <slaweq> if not then we can finish earlier today
16:41:08 <haleyb> no, don't think i have any ci issues
16:41:21 * slaweq is really fast today ;)
16:41:30 <slaweq> ok, so thx for attending
16:41:33 <slaweq> bye
16:41:36 <slaweq> #endmeeting