16:00:34 <slaweq> #startmeeting neutron_ci 16:00:34 <openstack> Meeting started Tue Aug 28 16:00:34 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:35 <mlavalle> liuyulong: thanks for attending. Have a great night! 16:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:38 <slaweq> hello again :) 16:00:39 <njohnston> o/ 16:00:39 <openstack> The meeting name has been set to 'neutron_ci' 16:00:41 <mlavalle> o/ 16:00:58 <mlavalle> last leg of "Meetings Tuesday" 16:01:00 <manjeets> prior o/ was for CI .. lol 16:01:17 <slaweq> :) 16:01:28 <slaweq> short info 16:01:38 <haleyb> hi 16:01:40 <slaweq> I need to finish this meeting in 45 minutes 16:02:00 <slaweq> so mlavalle You will continue it after that or we will finish 15 minutes earlier today 16:02:07 <slaweq> ok for You? 16:02:17 <mlavalle> let's finish 15 min earlier 16:02:27 <slaweq> fine :) 16:02:31 <slaweq> #topic Actions from previous meetings 16:02:37 <slaweq> * njohnston to tweak stable branches dashboards 16:02:45 <njohnston> https://review.openstack.org/#/c/597168/ 16:03:00 <njohnston> this brings the stable dashboards in line with your reformat of the main one 16:03:17 <njohnston> but with jobs dropped if they don't exist for the stable rbanches, like all the scenario jobs 16:03:22 <slaweq> njohnston: thx, I will review it soon 16:03:40 <njohnston> and I bump the versions, so old dashboard == stable/rocky and older == stable/queens now 16:04:17 <slaweq> that's good :) 16:04:31 <slaweq> thx njohnston 16:04:39 <slaweq> ok, next one was: 16:04:41 <slaweq> * slaweq to update grafana dashboard to ((FAILURE + TIME_OUT) / (FAILURE + TIME_OUT + SUCCESS)) 16:04:46 <slaweq> Patch: https://review.openstack.org/595763 16:04:46 <njohnston> there were 9 jobs I excluded from the file because Graphite has no record of them for stable branches. Not sure if that is interesting info, but I have the list if you like 16:05:12 <slaweq> sure, we can check this list together 16:06:35 <njohnston> I put the list in acomment on https://review.openstack.org/#/c/597168 16:07:09 <slaweq> ok, thx 16:07:17 <slaweq> I will check it also 16:07:31 <slaweq> thx for working on this njohnston 16:07:37 <njohnston> np 16:08:13 <slaweq> so getting back to next action, which was:* slaweq to update grafana dashboard to ((FAILURE + TIME_OUT) / (FAILURE + TIME_OUT + SUCCESS)) 16:08:31 <slaweq> I sent patch, frickler found one issue there so I need to fix it 16:08:51 <slaweq> moving on to next action 16:08:56 <slaweq> * mlavalle to check neutron_tempest_plugin.scenario.test_dns_integration.DNSIntegrationTests.test_server_with_fip issue 16:09:03 <mlavalle> I did 16:09:46 <mlavalle> I found that the Nova API never reports that the instance got ip addresses in port 16:10:09 <mlavalle> Neutron communicates the vif plugged in event correctly to compute 16:10:21 <slaweq> but I wonder why it happens so often in this job recently and not in another 16:10:29 <slaweq> or maybe You spotted it in different jobs also? 16:10:34 <mlavalle> no 16:10:40 <mlavalle> that is agood question 16:10:50 <mlavalle> I will take a look again with other case 16:11:21 <mlavalle> I left a good kibana query in the bug 16:11:27 <mlavalle> so it is easy to find other cases 16:11:28 <slaweq> I remember from my previous company that we had such issues somewhere around Juno IIRC 16:11:55 <slaweq> and we even patched nova-compute to check status of port directly in neutron before set instance in ERROR 16:12:10 <slaweq> but later I never saw it in newer versions 16:12:31 <slaweq> maybe in this scenario there is different rabbitmq used or something like that 16:12:32 <mlavalle> I'll check another case 16:12:46 <slaweq> mlavalle: ok, thx 16:13:08 <slaweq> #action mlavalle to check another cases of failing neutron_tempest_plugin.scenario.test_dns_integration.DNSIntegrationTests.test_server_with_fip test 16:13:55 <slaweq> ok, that's all for actions from previous week 16:14:03 <slaweq> #topic Grafana 16:14:08 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:15:43 <slaweq> do You see anything You want to talk at first? 16:15:55 <mlavalle> I'll follow your lead 16:16:03 <slaweq> there is again problem with Neutron-tempest-plugin-dvr-multinode-scenario 16:16:13 <slaweq> it's 100% failing since few days 16:16:26 <slaweq> I found that it's always issue with neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA 16:16:34 <slaweq> like: * http://logs.openstack.org/37/382037/71/check/neutron-tempest-plugin-dvr-multinode-scenario/605ed17/logs/testr_results.html.gz 16:16:59 <slaweq> I reported bug for that https://bugs.launchpad.net/neutron/+bug/1789434 today 16:17:00 <openstack> Launchpad bug 1789434 in neutron "neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA failing 100% times" [High,Confirmed] 16:17:33 <slaweq> it looks like related somehow to my patch: https://review.openstack.org/#/c/589410/ 16:17:57 <slaweq> but this test was fine on this patch, so something happend later probably that it's failing now 16:18:48 <slaweq> any volunteer to check that? 16:18:53 <mlavalle> o/ 16:19:10 <slaweq> thx mlavalle :) 16:19:29 <haleyb> <5 seconds to volunteer :) 16:19:42 <mlavalle> haleyb: do you want it? 16:19:51 <slaweq> #action mlavalle to check failing router migration from DVR tests 16:20:12 <slaweq> should I assign it to haleyb? 16:20:23 <mlavalle> not unless he explictely wants it 16:20:32 <mlavalle> otherwise, I take it 16:20:38 <haleyb> mlavalle: no, feel free 16:20:47 <mlavalle> haleyb: ack 16:20:50 <slaweq> ok, so we have the winner :) 16:20:52 <slaweq> thx mlavalle 16:20:57 <mlavalle> yaaay!!!! 16:21:02 <mlavalle> I wan!!!! 16:21:02 <slaweq> LOl 16:21:53 <slaweq> ok, so let's continue about scenario jobs then 16:22:20 <slaweq> other scenario job which is failing quite often is this designate job which we already talk about 16:22:39 <slaweq> and I also found couple of times timeouts in neutron-tempest-plugin-scenario-linuxbridge 16:22:47 <slaweq> * http://logs.openstack.org/59/596959/1/check/neutron-tempest-plugin-scenario-linuxbridge/f62f1c6/job-output.txt.gz 16:22:49 <slaweq> * http://logs.openstack.org/18/591818/3/check/neutron-tempest-plugin-scenario-linuxbridge/212183e/job-output.txt.gz 16:22:51 <slaweq> * http://logs.openstack.org/34/596634/1/check/neutron-tempest-plugin-scenario-linuxbridge/098b6f3/job-output.txt.gz 16:26:09 <slaweq> there is virt_type=kvm set but it shouldn't be problem if it's supported by host 16:26:50 <slaweq> is there any voluneer to check why there there are such timeouts? 16:27:06 <slaweq> if no, I will report it as a bug and take a look when I will have some time 16:27:14 <slaweq> but currently I'm quite overloaded :/ 16:28:40 <mlavalle> I am also a bit overloaded 16:29:03 <mlavalle> if nobody else steps up and you are patient with me, then sign me up 16:29:11 <slaweq> mlavalle: thx 16:29:23 <mlavalle> I might not get to it until next week 16:29:30 <slaweq> I will report it as a bug for now and we will see who will have some cycles to check that 16:29:35 <slaweq> fine for You? 16:30:00 <haleyb> yes, report as bug and i can look at logs at least 16:30:09 <slaweq> #action slaweq to report a bug about timouts in neutron-tempest-plugin-scenario-linuxbridge 16:30:12 <slaweq> haleyb: thx 16:30:30 <slaweq> I will send link to bug report when I will report it 16:30:34 <haleyb> or just recheck, recheck, recheck... 16:30:53 <slaweq> yes, for now it's kind of workaround but... :) 16:31:04 <mlavalle> sounds good slaweq 16:31:16 <slaweq> thx guys 16:31:21 <slaweq> ok 16:31:36 <slaweq> from scenario jobs it were most often failures which I found last week 16:31:54 <slaweq> so let's move on 16:31:59 <slaweq> #topic functional 16:32:19 <slaweq> FYI: We have fixes almost merged for failing functional tests in stable branches, bug: https://bugs.launchpad.net/neutron/+bug/1788185 16:32:19 <openstack> Launchpad bug 1788185 in neutron "[Stable/Queens] Functional tests neutron.tests.functional.agent.l3.test_ha_router failing 100% times " [Critical,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:32:42 <slaweq> basically this issue is caused by keepalived in version which is now in Xenial repo 16:33:29 <slaweq> I was talking today with frickler and coreycb about it 16:33:47 <slaweq> I will try to prepare some small reproducer and add it to keepalived bug report 16:34:00 <slaweq> but we should be good using older version of keepalived for now 16:34:14 <mlavalle> ok 16:34:28 <mlavalle> we merged the patches yesterday, right? 16:34:42 <slaweq> for Queens it's merged 16:34:51 <slaweq> for Pike and Ocata I have to recheck it 16:35:08 <mlavalle> ok 16:35:40 <slaweq> other issue with functional tests is, that we Still from time to time we hit: https://bugs.launchpad.net/neutron/+bug/1784836 16:35:40 <openstack> Launchpad bug 1784836 in neutron "Functional tests from neutron.tests.functional.db.migrations fails randomly" [Medium,Confirmed] 16:35:48 <slaweq> e.g. in http://logs.openstack.org/18/591818/3/check/neutron-functional/ddb3327/logs/testr_results.html.gz 16:36:18 <slaweq> it's not very often but maybe someone with good db experience could take a look at it 16:37:03 <slaweq> mlavalle: do You know who we can potentially ask for look at this one? 16:37:20 <mlavalle> I would ask Mike Bayer 16:37:28 <slaweq> thx mlavalle 16:37:55 <slaweq> #action mlavalle to ask Mike Bayer about functional db migration tests failures 16:38:24 <slaweq> anything else to add related to functional tests? 16:38:30 <mlavalle> not from me 16:39:00 <slaweq> ok, so let's move to next topic then 16:39:05 <slaweq> #topic Fullstack 16:39:21 <slaweq> speaking about fullstack, I have only one thing 16:39:31 <slaweq> we still have quite lot of failure but all (or almost) are caused by this https://bugs.launchpad.net/neutron/+bug/1779328 and I still don’t know why it happens 16:39:31 <openstack> Launchpad bug 1779328 in neutron "Fullstack tests neutron.tests.fullstack.test_securitygroup.TestSecurityGroupsSameNetwork fails" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:39:56 <slaweq> maybe we should mark this test as unstable again to make our life easier? 16:40:05 <slaweq> what You think about it? 16:40:35 <mlavalle> yesh, let's do it 16:40:46 <slaweq> ok, I will do it then 16:40:51 <njohnston> do we have a rule in elastic-recheck for that failure? 16:41:10 <slaweq> #action slaweq to mark fullstack security group test as unstable again 16:41:15 <slaweq> njohnston: I don't think so 16:42:21 <slaweq> njohnston: do You think we should add it there also? 16:42:38 <slaweq> if we mark it as unstable, it will be skipped simply instead of failing 16:42:44 <mlavalle> yeah 16:43:41 <njohnston> I never think it's a bad idea to look at elastic-recheck rules, but then again it is hard to find cores to +2 them these days :-) 16:44:08 <slaweq> yeah, so maybe let's just mark it in our repo as unstable for now 16:44:19 <slaweq> ok, that was all what I had for today 16:44:30 <slaweq> and I need to end this meeting right now :) 16:44:35 <slaweq> perfect timing 16:44:38 <mlavalle> Thanks! 16:44:41 <slaweq> thx guys for attending 16:44:45 <njohnston> thanks! 16:44:46 <slaweq> and see You next week 16:44:47 <mlavalle> o/ 16:44:51 <slaweq> #endmeeting