#openstack-meeting log

16:00:34 <slaweq> #startmeeting neutron_ci
16:00:34 <openstack> Meeting started Tue Aug 28 16:00:34 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:35 <mlavalle> liuyulong: thanks for attending. Have a great night!
16:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:38 <slaweq> hello again :)
16:00:39 <njohnston> o/
16:00:39 <openstack> The meeting name has been set to 'neutron_ci'
16:00:41 <mlavalle> o/
16:00:58 <mlavalle> last leg of "Meetings Tuesday"
16:01:00 <manjeets> prior o/ was for CI .. lol
16:01:17 <slaweq> :)
16:01:28 <slaweq> short info
16:01:38 <haleyb> hi
16:01:40 <slaweq> I need to finish this meeting in 45 minutes
16:02:00 <slaweq> so mlavalle You will continue it after that or we will finish 15 minutes earlier today
16:02:07 <slaweq> ok for You?
16:02:17 <mlavalle> let's finish 15 min earlier
16:02:27 <slaweq> fine :)
16:02:31 <slaweq> #topic Actions from previous meetings
16:02:37 <slaweq> * njohnston to tweak stable branches dashboards
16:02:45 <njohnston> https://review.openstack.org/#/c/597168/
16:03:00 <njohnston> this brings the stable dashboards in line with your reformat of the main one
16:03:17 <njohnston> but with jobs dropped if they don't exist for the stable rbanches, like all the scenario jobs
16:03:22 <slaweq> njohnston: thx, I will review it soon
16:03:40 <njohnston> and I bump the versions, so old dashboard == stable/rocky and older == stable/queens now
16:04:17 <slaweq> that's good :)
16:04:31 <slaweq> thx njohnston
16:04:39 <slaweq> ok, next one was:
16:04:41 <slaweq> * slaweq to update grafana dashboard to ((FAILURE + TIME_OUT) / (FAILURE + TIME_OUT + SUCCESS))
16:04:46 <slaweq> Patch: https://review.openstack.org/595763
16:04:46 <njohnston> there were 9 jobs I excluded from the file because Graphite has no record of them for stable branches.  Not sure if that is interesting info, but I have the list if you like
16:05:12 <slaweq> sure, we can check this list together
16:06:35 <njohnston> I put the list in acomment on https://review.openstack.org/#/c/597168
16:07:09 <slaweq> ok, thx
16:07:17 <slaweq> I will check it also
16:07:31 <slaweq> thx for working on this njohnston
16:07:37 <njohnston> np
16:08:13 <slaweq> so getting back to next action, which was:* slaweq to update grafana dashboard to ((FAILURE + TIME_OUT) / (FAILURE + TIME_OUT + SUCCESS))
16:08:31 <slaweq> I sent patch, frickler found one issue there so I need to fix it
16:08:51 <slaweq> moving on to next action
16:08:56 <slaweq> * mlavalle to check neutron_tempest_plugin.scenario.test_dns_integration.DNSIntegrationTests.test_server_with_fip issue
16:09:03 <mlavalle> I did
16:09:46 <mlavalle> I found that the Nova API never reports that the instance got ip addresses in port
16:10:09 <mlavalle> Neutron communicates the vif plugged in event correctly to compute
16:10:21 <slaweq> but I wonder why it happens so often in this job recently and not in another
16:10:29 <slaweq> or maybe You spotted it in different jobs also?
16:10:34 <mlavalle> no
16:10:40 <mlavalle> that is agood question
16:10:50 <mlavalle> I will take a look again with other case
16:11:21 <mlavalle> I left a good kibana query in the bug
16:11:27 <mlavalle> so it is easy to find other cases
16:11:28 <slaweq> I remember from my previous company that we had such issues somewhere around Juno IIRC
16:11:55 <slaweq> and we even patched nova-compute to check status of port directly in neutron before set instance in ERROR
16:12:10 <slaweq> but later I never saw it in newer versions
16:12:31 <slaweq> maybe in this scenario there is different rabbitmq used or something like that
16:12:32 <mlavalle> I'll check another case
16:12:46 <slaweq> mlavalle: ok, thx
16:13:08 <slaweq> #action mlavalle to check another cases of failing neutron_tempest_plugin.scenario.test_dns_integration.DNSIntegrationTests.test_server_with_fip test
16:13:55 <slaweq> ok, that's all for actions from previous week
16:14:03 <slaweq> #topic Grafana
16:14:08 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:15:43 <slaweq> do You see anything You want to talk at first?
16:15:55 <mlavalle> I'll follow your lead
16:16:03 <slaweq> there is again problem with Neutron-tempest-plugin-dvr-multinode-scenario
16:16:13 <slaweq> it's 100% failing since few days
16:16:26 <slaweq> I found that it's always issue with neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA
16:16:34 <slaweq> like:         * http://logs.openstack.org/37/382037/71/check/neutron-tempest-plugin-dvr-multinode-scenario/605ed17/logs/testr_results.html.gz
16:16:59 <slaweq> I reported bug for that https://bugs.launchpad.net/neutron/+bug/1789434 today
16:17:00 <openstack> Launchpad bug 1789434 in neutron "neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA failing 100% times" [High,Confirmed]
16:17:33 <slaweq> it looks like related somehow to my patch: https://review.openstack.org/#/c/589410/
16:17:57 <slaweq> but this test was fine on this patch, so something happend later probably that it's failing now
16:18:48 <slaweq> any volunteer to check that?
16:18:53 <mlavalle> o/
16:19:10 <slaweq> thx mlavalle :)
16:19:29 <haleyb> <5 seconds to volunteer :)
16:19:42 <mlavalle> haleyb: do you want it?
16:19:51 <slaweq> #action mlavalle to check failing router migration from DVR tests
16:20:12 <slaweq> should I assign it to haleyb?
16:20:23 <mlavalle> not unless he explictely wants it
16:20:32 <mlavalle> otherwise, I take it
16:20:38 <haleyb> mlavalle: no, feel free
16:20:47 <mlavalle> haleyb: ack
16:20:50 <slaweq> ok, so we have the winner :)
16:20:52 <slaweq> thx mlavalle
16:20:57 <mlavalle> yaaay!!!!
16:21:02 <mlavalle> I wan!!!!
16:21:02 <slaweq> LOl
16:21:53 <slaweq> ok, so let's continue about scenario jobs then
16:22:20 <slaweq> other scenario job which is failing quite often is this designate job which we already talk about
16:22:39 <slaweq> and I also found couple of times timeouts in neutron-tempest-plugin-scenario-linuxbridge
16:22:47 <slaweq> * http://logs.openstack.org/59/596959/1/check/neutron-tempest-plugin-scenario-linuxbridge/f62f1c6/job-output.txt.gz
16:22:49 <slaweq> * http://logs.openstack.org/18/591818/3/check/neutron-tempest-plugin-scenario-linuxbridge/212183e/job-output.txt.gz
16:22:51 <slaweq> * http://logs.openstack.org/34/596634/1/check/neutron-tempest-plugin-scenario-linuxbridge/098b6f3/job-output.txt.gz
16:26:09 <slaweq> there is virt_type=kvm set but it shouldn't be problem if it's supported by host
16:26:50 <slaweq> is there any voluneer to check why there there are such timeouts?
16:27:06 <slaweq> if no, I will report it as a bug and take a look when I will have some time
16:27:14 <slaweq> but currently I'm quite overloaded :/
16:28:40 <mlavalle> I am also a bit overloaded
16:29:03 <mlavalle> if nobody else steps up and you are patient with me, then sign me up
16:29:11 <slaweq> mlavalle: thx
16:29:23 <mlavalle> I might not get to it until next week
16:29:30 <slaweq> I will report it as a bug for now and we will see who will have some cycles to check that
16:29:35 <slaweq> fine for You?
16:30:00 <haleyb> yes, report as bug and i can look at logs at least
16:30:09 <slaweq> #action slaweq to report a bug about timouts in neutron-tempest-plugin-scenario-linuxbridge
16:30:12 <slaweq> haleyb: thx
16:30:30 <slaweq> I will send link to bug report when I will report it
16:30:34 <haleyb> or just recheck, recheck, recheck...
16:30:53 <slaweq> yes, for now it's kind of workaround but... :)
16:31:04 <mlavalle> sounds good slaweq
16:31:16 <slaweq> thx guys
16:31:21 <slaweq> ok
16:31:36 <slaweq> from scenario jobs it were most often failures which I found last week
16:31:54 <slaweq> so let's move on
16:31:59 <slaweq> #topic functional
16:32:19 <slaweq> FYI: We have fixes almost merged for failing functional tests in stable branches, bug: https://bugs.launchpad.net/neutron/+bug/1788185
16:32:19 <openstack> Launchpad bug 1788185 in neutron "[Stable/Queens] Functional tests neutron.tests.functional.agent.l3.test_ha_router failing 100% times " [Critical,Confirmed] - Assigned to Miguel Lavalle (minsel)
16:32:42 <slaweq> basically this issue is caused by keepalived in version which is now in Xenial repo
16:33:29 <slaweq> I was talking today with frickler and coreycb about it
16:33:47 <slaweq> I will try to prepare some small reproducer and add it to keepalived bug report
16:34:00 <slaweq> but we should be good using older version of keepalived for now
16:34:14 <mlavalle> ok
16:34:28 <mlavalle> we merged the patches yesterday, right?
16:34:42 <slaweq> for Queens it's merged
16:34:51 <slaweq> for Pike and Ocata I have to recheck it
16:35:08 <mlavalle> ok
16:35:40 <slaweq> other issue with functional tests is, that we Still from time to time we hit: https://bugs.launchpad.net/neutron/+bug/1784836
16:35:40 <openstack> Launchpad bug 1784836 in neutron "Functional tests from neutron.tests.functional.db.migrations fails randomly" [Medium,Confirmed]
16:35:48 <slaweq> e.g. in http://logs.openstack.org/18/591818/3/check/neutron-functional/ddb3327/logs/testr_results.html.gz
16:36:18 <slaweq> it's not very often but maybe someone with good db experience could take a look at it
16:37:03 <slaweq> mlavalle:  do You know who we can potentially ask for look at this one?
16:37:20 <mlavalle> I would ask Mike Bayer
16:37:28 <slaweq> thx mlavalle
16:37:55 <slaweq> #action mlavalle to ask Mike Bayer about functional db migration tests failures
16:38:24 <slaweq> anything else to add related to functional tests?
16:38:30 <mlavalle> not from me
16:39:00 <slaweq> ok, so let's move to next topic then
16:39:05 <slaweq> #topic Fullstack
16:39:21 <slaweq> speaking about fullstack, I have only one thing
16:39:31 <slaweq> we still have quite lot of failure but all (or almost) are caused by this  https://bugs.launchpad.net/neutron/+bug/1779328 and I still don’t know why it happens
16:39:31 <openstack> Launchpad bug 1779328 in neutron "Fullstack tests neutron.tests.fullstack.test_securitygroup.TestSecurityGroupsSameNetwork fails" [High,In progress] - Assigned to Slawek Kaplonski (slaweq)
16:39:56 <slaweq> maybe we should mark this test as unstable again to make our life easier?
16:40:05 <slaweq> what You think about it?
16:40:35 <mlavalle> yesh, let's do it
16:40:46 <slaweq> ok, I will do it then
16:40:51 <njohnston> do we have a rule in elastic-recheck for that failure?
16:41:10 <slaweq> #action slaweq to mark fullstack security group test as unstable again
16:41:15 <slaweq> njohnston: I don't think so
16:42:21 <slaweq> njohnston: do You think we should add it there also?
16:42:38 <slaweq> if we mark it as unstable, it will be skipped simply instead of failing
16:42:44 <mlavalle> yeah
16:43:41 <njohnston> I never think it's a bad idea to look at elastic-recheck rules, but then again it is hard to find cores to +2 them these days :-)
16:44:08 <slaweq> yeah, so maybe let's just mark it in our repo as unstable for now
16:44:19 <slaweq> ok, that was all what I had for today
16:44:30 <slaweq> and I need to end this meeting right now :)
16:44:35 <slaweq> perfect timing
16:44:38 <mlavalle> Thanks!
16:44:41 <slaweq> thx guys for attending
16:44:45 <njohnston> thanks!
16:44:46 <slaweq> and see You next week
16:44:47 <mlavalle> o/
16:44:51 <slaweq> #endmeeting