16:00:43 #startmeeting neutron_ci 16:00:43 Meeting started Tue Apr 10 16:00:43 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:45 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 hi 16:00:48 o/ 16:00:48 The meeting name has been set to 'neutron_ci' 16:00:49 o 16:01:15 #topic Actions from previous meetings 16:01:39 jlibosva, haleyb: are You around? 16:01:43 o/ 16:02:23 o/ 16:02:37 ok, so I think we can start now 16:02:50 first action from previous week: 16:02:56 haleyb to check router migrations issue 16:03:43 i haven't reproduced the issue yet, seemed to work fine manually, so $more_testing=1 16:04:38 ok 16:04:58 #action haleyb to continue testing why router migrations tests fails 16:05:11 next one was 16:05:12 slaweq will check difference between neutron-tempest-multinode-full and neutron-tempest-dvr-ha-multinode-full 16:05:26 so I compared those jobs definitions: https://pastebin.com/zQTHJ1Rg 16:05:49 Basically difference is that on neutron-tempest-multinode-full L3 agent is only on main node (subnode is only for compute), in neutron-tempest-dvr-ha-multinode-full L3 agent on subnodes is in dvr_snat mode 16:06:32 AFAIR ihrachys wants to check if we needs both those jobs - IMO we should have both of them 16:07:22 ah, one more thing - there is also flag DEVSTACK_GATE_TLSPROXY set in one of those jobs but I have no idea what is it for :) 16:08:25 I think tlsproxy is on by default. afair it's some nova thing. 16:09:24 can't even find it mentioned in latest devstack-gate so maybe completely irrelevant 16:09:41 slaweq, as for the dvr vs. legacy, for what I understand, we have tempest-full job for legacy too. do we need that one then? 16:10:23 tempest-full is singlenode 16:10:27 this one is multinode 16:10:47 no? 16:11:27 but maybe we don't need singlenode job then as multinode should covers it as well 16:11:31 what do You think? 16:12:14 yeah that's what I was alluding to. 16:12:36 tempest-full may come from tempest repo though. we also have full-py3 from the same source 16:13:17 but I guess I don't feel strongly about it anymore since there are legit differences we may want to keep between jobs we maintain. 16:13:24 thanks for checking slaweq 16:13:44 Your welcome ihrachys 16:13:58 so we will left it as it is for now, right? 16:15:10 I think so, slaweq 16:15:17 ok, fine 16:15:32 next action then 16:15:39 mlavalle to take a look why rally jobs are taking so long time 16:15:49 I didn't have time last week 16:16:20 ok, but I'm not sure if there is something really wrong with it now - let's talk about it later :) 16:16:31 moving on 16:16:33 next 16:16:33 slaweq will add openstack-tox-lower-constraints to grafana dashboard 16:16:44 patch merged today https://review.openstack.org/#/c/559162/ 16:17:10 and the last one from last week: 16:17:10 slaweq will check old gate-failure bugs 16:17:23 I didn't have time to check them during last week - sorry 16:17:28 #action slaweq will check old gate-failure bugs 16:17:33 do we have a document capturing rationale for the low-constraints job? 16:18:16 ihrachys: are You asking about job? 16:18:51 it's just first time I hear about it. I was not very involved lately. 16:19:07 wanted to understand what the job is for. but maybe I can google myself. 16:19:19 I think there was some email about it few weeks ago 16:19:43 ok I will take a look, nevermine 16:19:46 *nevermind 16:19:54 yes, there is a message in the ML explaining the rationale 16:20:30 thx, I can't find it now 16:20:48 but I think it was given in commit message in patch which added it to neutron repo 16:21:01 so You will find it easily there 16:21:04 ok ok, sorry, I shouldn't have asked dumb questions 16:21:22 ihrachys: it wasn't dumb question for sure :) 16:21:44 ok, moving on to next topic 16:21:45 #topic Grafana 16:21:52 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:22:46 I was checking today last few days there 16:23:43 there was some spike yesterday but it was on all jobs similar IMO 16:25:04 there are still same jobs on quite high failure rate but I think there is nothing "new" what would require special attention 16:25:35 any thoughts? 16:25:43 there is periodic failure it seems 16:25:46 with neutron-lib-master 16:25:59 and looks like for py35 only 16:26:04 yes, I forgot about that one now 16:26:14 but I have it for later discussion prepared :) 16:26:29 http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-neutron-lib-master/66d328a/testr_results.html.gz 16:26:40 ok nevermind then 16:26:43 so IMHO it is due to patch https://review.openstack.org/#/c/541766/ 16:26:50 ihrachys: we can talk about it now 16:27:19 I checked failures since 6.04 for this job and there are exactly same errors there each time 16:27:46 yeah looks like related 16:28:07 it looks for me that it's because of this patch: https://review.openstack.org/#/c/541766/ - maybe there is something missing in neutron now 16:28:29 I can try to check it or at least I will report a bug for that 16:28:46 or maybe someone else wants to fix it? :) 16:29:29 I can assign action to someone from You. Any volunteers? :) 16:29:40 if everybody is busy, I can have a look 16:29:55 or maybe ping yamahata ? 16:30:36 I think yamahata is not available now 16:30:52 so jlibosva can You at least report a bug and ask yamahata to check it? 16:30:58 yamahata: I'm here. what's the issue? 16:31:04 yamahata is based out of California 16:31:05 hi yamahata :) 16:31:25 mlavalle: I didn't know that 16:31:38 me neither :) 16:31:49 yamahata: since few days we have failure in periodic job: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-neutron-lib-master/66d328a/testr_results.html.gz 16:31:59 it looks like related to patch https://review.openstack.org/#/c/541766/ 16:32:11 do You have any idea why it's failing? 16:32:28 for what I understand, a fix would be test only, and it would just capture any new arguments and ignore them when matching mock call args 16:32:42 * yamahata opening links 16:33:00 let me check it. 16:34:28 now callback registration accepts priority priority_group.DEFAULT_PRIORITY=555000 16:34:52 so function call check needs to be updated to include priority. 16:35:34 yamahata, no. if we do that, unit tests will fail with old neutron-lib 16:35:52 Oh, right. 16:35:53 instead of explicitly matching, we should ignore the argument value (if it's present) 16:36:23 or we should release new neutron-lib version and update test as yamahata propose :) 16:36:34 the test goal is to validate that subscription happened for the right callback / event. it has nothing to do with priorities, so we should ignore it. 16:36:59 ihrachys: right 16:37:08 slaweq, meaning, old queens neutron gate will be incompatible with newer neutron-lib? not ideal. 16:37:44 but old queens neutron gate uses newest neutron-lib also? 16:37:55 I prefer the test is liberal / compatible with different versions. we CAN then release / bump requirements / force checking priority, but what's the point 16:38:09 I agree with You 16:38:14 slaweq, it doesn't; but we don't block anyone from updating neutron-lib 16:38:24 I was just curious :) thx 16:38:41 it's not like we have <=XXX in requirements 16:38:47 so anything fresh is fair game 16:39:00 right 16:39:09 so do we have any volunteer who will fix this test? 16:39:31 I thought yamahata will fix it 16:39:32 let me cook first patch to address test_register 16:39:47 test__init__ failure looks different issue 16:39:59 ok, thx yamahata 16:40:22 anyway I'll look into test__init__ too. 16:40:43 #action yamahata to fix issues with openstack-tox-py35-with-neutron-lib-master periodic job 16:40:47 thanks yamahata! 16:41:15 so I think that periodic jobs are fine and we can go to next topic 16:41:17 #topic Fullstack 16:41:55 I was today looking for fullstack issues in last 2 days 16:42:20 I found one POST_FAILURE: http://logs.openstack.org/14/559414/2/check/neutron-fullstack/1cbd1b0/job-output.txt.gz#_2018-04-09_21_06_31_611265 which looks like some issue with infra 16:42:57 and at least twice failed fullstack tests for patch https://review.openstack.org/#/c/499908/ which looks for me that is related to patch it self 16:43:25 do You found any other fullstack failures in last days? 16:44:09 I wasn't paying attention so no 16:44:25 I didn't see any 16:45:06 me neither 16:45:10 ok, so I will try to keep an eye on it and check when something new will happen :) 16:45:18 next topic then 16:45:19 #topic Scenarios 16:45:37 here we still have same "big players" :) 16:45:49 neutron-tempest-plugin-dvr-multinode-scenario failing on 100% still 16:46:09 but what is interesting there was moment that it was little below 100% at 6.04 and then get back to 100% 16:46:38 are those just the migration tests? 16:46:51 perhaps we should mark them with our favorite unstable_test decorator 16:46:59 in most cases failed tests are those with migration 16:47:13 sometimes there is also test_trunk_subport_lifecycle which fails 16:47:23 right, I still have that on my plate 16:47:36 I was checking few jobs and didn't found other issues 16:49:02 haleyb: as You are checking those failures with migration, do You think it is worth to mark them as unstable for now? 16:49:33 slaweq: yes, i suppose we could 16:49:43 ok, will You do it then? 16:50:22 yes 16:50:26 thx 16:50:48 #action haleyb will mark router migration tests are unstable 16:51:02 thx jlibosva, good idea :) 16:51:16 do we do same with trunk then or it's a different beast? 16:51:48 I think we can do the same with it for now 16:51:55 jlibosva: do You agree? 16:51:56 I'd suggest to do one by one 16:52:10 let's see how the failure rate goes without considering router migration 16:52:17 ok 16:52:20 if it's still high, let's cross trunk out too 16:52:20 ++ 16:52:35 ok 16:52:49 ok, so moving on 16:52:53 #topic Rally 16:53:15 eh, I had one item I wanted to bring re scenarios 16:53:22 sorry jlibosva 16:53:29 #topic scenarios 16:53:37 go on then :) 16:53:47 so I've been watching the ovsfw tempest job and it seems to me that it copies other tempest failures 16:54:08 that said, I think the failures are not related to the fact it uses ovsfw driver. I wanted to know opinions about making it voting in the check Q 16:55:15 I agree, it is quite stable since few weeks 16:55:20 would it make sense, long term, to have ovsfw for multinode dvr-ha job? 16:55:24 and then have just one 16:55:42 I think long term it would make sense to make ovsfw the default 16:55:45 I understand that ovsfw is stable already so we can have it enabled and plan for ovsfw for dvr-ha 16:56:02 jlibosva, for all jobs? 16:56:03 which will lead to have ovsfw in all current tempest tests 16:56:11 for devstack, yes, for all jobs 16:56:48 I think that should be the goal, then we can deprecate iptables hybrid driver. I don't think it makes sense to maintain both 16:57:04 iptables driver will stay for LB of course 16:57:21 but that's a long term :) 16:57:37 so for now, we can make the ovsfw voting 16:57:59 is it failing around ~15%? 16:57:59 but there's no migration path from hybrid so how can you deprecate 16:58:22 agreed to make it voting and deal with wider plans separately 16:58:34 there is since Pike 16:58:46 that leaves hybrid plugging behind 16:58:54 mlavalle: it is usually around 10-15% 16:59:03 and with multiple port bindings being planned for rocky, we'll have a way to deal with it too 16:59:42 ok, maybe up to 25% sometimes but it follows other jobs also 16:59:53 I think we are out of time now 16:59:56 jlibosva, ok looks like I was cryo-frozen for a while lol 17:00:06 #endmeeting