16:00:43 <slaweq> #startmeeting neutron_ci 16:00:43 <openstack> Meeting started Tue Apr 10 16:00:43 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 <slaweq> hi 16:00:48 <mlavalle> o/ 16:00:48 <openstack> The meeting name has been set to 'neutron_ci' 16:00:49 <ihrachys> o 16:01:15 <slaweq> #topic Actions from previous meetings 16:01:39 <slaweq> jlibosva, haleyb: are You around? 16:01:43 <haleyb> o/ 16:02:23 <jlibosva> o/ 16:02:37 <slaweq> ok, so I think we can start now 16:02:50 <slaweq> first action from previous week: 16:02:56 <slaweq> haleyb to check router migrations issue 16:03:43 <haleyb> i haven't reproduced the issue yet, seemed to work fine manually, so $more_testing=1 16:04:38 <slaweq> ok 16:04:58 <slaweq> #action haleyb to continue testing why router migrations tests fails 16:05:11 <slaweq> next one was 16:05:12 <slaweq> slaweq will check difference between neutron-tempest-multinode-full and neutron-tempest-dvr-ha-multinode-full 16:05:26 <slaweq> so I compared those jobs definitions: https://pastebin.com/zQTHJ1Rg 16:05:49 <slaweq> Basically difference is that on neutron-tempest-multinode-full L3 agent is only on main node (subnode is only for compute), in neutron-tempest-dvr-ha-multinode-full L3 agent on subnodes is in dvr_snat mode 16:06:32 <slaweq> AFAIR ihrachys wants to check if we needs both those jobs - IMO we should have both of them 16:07:22 <slaweq> ah, one more thing - there is also flag DEVSTACK_GATE_TLSPROXY set in one of those jobs but I have no idea what is it for :) 16:08:25 <ihrachys> I think tlsproxy is on by default. afair it's some nova thing. 16:09:24 <ihrachys> can't even find it mentioned in latest devstack-gate so maybe completely irrelevant 16:09:41 <ihrachys> slaweq, as for the dvr vs. legacy, for what I understand, we have tempest-full job for legacy too. do we need that one then? 16:10:23 <slaweq> tempest-full is singlenode 16:10:27 <slaweq> this one is multinode 16:10:47 <slaweq> no? 16:11:27 <slaweq> but maybe we don't need singlenode job then as multinode should covers it as well 16:11:31 <slaweq> what do You think? 16:12:14 <ihrachys> yeah that's what I was alluding to. 16:12:36 <ihrachys> tempest-full may come from tempest repo though. we also have full-py3 from the same source 16:13:17 <ihrachys> but I guess I don't feel strongly about it anymore since there are legit differences we may want to keep between jobs we maintain. 16:13:24 <ihrachys> thanks for checking slaweq 16:13:44 <slaweq> Your welcome ihrachys 16:13:58 <slaweq> so we will left it as it is for now, right? 16:15:10 <mlavalle> I think so, slaweq 16:15:17 <slaweq> ok, fine 16:15:32 <slaweq> next action then 16:15:39 <slaweq> mlavalle to take a look why rally jobs are taking so long time 16:15:49 <mlavalle> I didn't have time last week 16:16:20 <slaweq> ok, but I'm not sure if there is something really wrong with it now - let's talk about it later :) 16:16:31 <slaweq> moving on 16:16:33 <slaweq> next 16:16:33 <slaweq> slaweq will add openstack-tox-lower-constraints to grafana dashboard 16:16:44 <slaweq> patch merged today https://review.openstack.org/#/c/559162/ 16:17:10 <slaweq> and the last one from last week: 16:17:10 <slaweq> slaweq will check old gate-failure bugs 16:17:23 <slaweq> I didn't have time to check them during last week - sorry 16:17:28 <slaweq> #action slaweq will check old gate-failure bugs 16:17:33 <ihrachys> do we have a document capturing rationale for the low-constraints job? 16:18:16 <slaweq> ihrachys: are You asking about job? 16:18:51 <ihrachys> it's just first time I hear about it. I was not very involved lately. 16:19:07 <ihrachys> wanted to understand what the job is for. but maybe I can google myself. 16:19:19 <slaweq> I think there was some email about it few weeks ago 16:19:43 <ihrachys> ok I will take a look, nevermine 16:19:46 <ihrachys> *nevermind 16:19:54 <mlavalle> yes, there is a message in the ML explaining the rationale 16:20:30 <slaweq> thx, I can't find it now 16:20:48 <slaweq> but I think it was given in commit message in patch which added it to neutron repo 16:21:01 <slaweq> so You will find it easily there 16:21:04 <ihrachys> ok ok, sorry, I shouldn't have asked dumb questions 16:21:22 <slaweq> ihrachys: it wasn't dumb question for sure :) 16:21:44 <slaweq> ok, moving on to next topic 16:21:45 <slaweq> #topic Grafana 16:21:52 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:22:46 <slaweq> I was checking today last few days there 16:23:43 <slaweq> there was some spike yesterday but it was on all jobs similar IMO 16:25:04 <slaweq> there are still same jobs on quite high failure rate but I think there is nothing "new" what would require special attention 16:25:35 <slaweq> any thoughts? 16:25:43 <ihrachys> there is periodic failure it seems 16:25:46 <ihrachys> with neutron-lib-master 16:25:59 <ihrachys> and looks like for py35 only 16:26:04 <slaweq> yes, I forgot about that one now 16:26:14 <slaweq> but I have it for later discussion prepared :) 16:26:29 <ihrachys> http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-neutron-lib-master/66d328a/testr_results.html.gz 16:26:40 <ihrachys> ok nevermind then 16:26:43 <slaweq> so IMHO it is due to patch https://review.openstack.org/#/c/541766/ 16:26:50 <slaweq> ihrachys: we can talk about it now 16:27:19 <slaweq> I checked failures since 6.04 for this job and there are exactly same errors there each time 16:27:46 <ihrachys> yeah looks like related 16:28:07 <slaweq> it looks for me that it's because of this patch: https://review.openstack.org/#/c/541766/ - maybe there is something missing in neutron now 16:28:29 <slaweq> I can try to check it or at least I will report a bug for that 16:28:46 <slaweq> or maybe someone else wants to fix it? :) 16:29:29 <slaweq> I can assign action to someone from You. Any volunteers? :) 16:29:40 <jlibosva> if everybody is busy, I can have a look 16:29:55 <haleyb> or maybe ping yamahata ? 16:30:36 <slaweq> I think yamahata is not available now 16:30:52 <slaweq> so jlibosva can You at least report a bug and ask yamahata to check it? 16:30:58 <yamahata> yamahata: I'm here. what's the issue? 16:31:04 <mlavalle> yamahata is based out of California 16:31:05 <slaweq> hi yamahata :) 16:31:25 <slaweq> mlavalle: I didn't know that 16:31:38 <jlibosva> me neither :) 16:31:49 <slaweq> yamahata: since few days we have failure in periodic job: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-neutron-lib-master/66d328a/testr_results.html.gz 16:31:59 <slaweq> it looks like related to patch https://review.openstack.org/#/c/541766/ 16:32:11 <slaweq> do You have any idea why it's failing? 16:32:28 <ihrachys> for what I understand, a fix would be test only, and it would just capture any new arguments and ignore them when matching mock call args 16:32:42 * yamahata opening links 16:33:00 <yamahata> let me check it. 16:34:28 <yamahata> now callback registration accepts priority priority_group.DEFAULT_PRIORITY=555000 16:34:52 <yamahata> so function call check needs to be updated to include priority. 16:35:34 <ihrachys> yamahata, no. if we do that, unit tests will fail with old neutron-lib 16:35:52 <yamahata> Oh, right. 16:35:53 <ihrachys> instead of explicitly matching, we should ignore the argument value (if it's present) 16:36:23 <slaweq> or we should release new neutron-lib version and update test as yamahata propose :) 16:36:34 <ihrachys> the test goal is to validate that subscription happened for the right callback / event. it has nothing to do with priorities, so we should ignore it. 16:36:59 <slaweq> ihrachys: right 16:37:08 <ihrachys> slaweq, meaning, old queens neutron gate will be incompatible with newer neutron-lib? not ideal. 16:37:44 <slaweq> but old queens neutron gate uses newest neutron-lib also? 16:37:55 <ihrachys> I prefer the test is liberal / compatible with different versions. we CAN then release / bump requirements / force checking priority, but what's the point 16:38:09 <slaweq> I agree with You 16:38:14 <ihrachys> slaweq, it doesn't; but we don't block anyone from updating neutron-lib 16:38:24 <slaweq> I was just curious :) thx 16:38:41 <ihrachys> it's not like we have <=XXX in requirements 16:38:47 <ihrachys> so anything fresh is fair game 16:39:00 <slaweq> right 16:39:09 <slaweq> so do we have any volunteer who will fix this test? 16:39:31 <mlavalle> I thought yamahata will fix it 16:39:32 <yamahata> let me cook first patch to address test_register 16:39:47 <yamahata> test__init__ failure looks different issue 16:39:59 <slaweq> ok, thx yamahata 16:40:22 <yamahata> anyway I'll look into test__init__ too. 16:40:43 <slaweq> #action yamahata to fix issues with openstack-tox-py35-with-neutron-lib-master periodic job 16:40:47 <mlavalle> thanks yamahata! 16:41:15 <slaweq> so I think that periodic jobs are fine and we can go to next topic 16:41:17 <slaweq> #topic Fullstack 16:41:55 <slaweq> I was today looking for fullstack issues in last 2 days 16:42:20 <slaweq> I found one POST_FAILURE: http://logs.openstack.org/14/559414/2/check/neutron-fullstack/1cbd1b0/job-output.txt.gz#_2018-04-09_21_06_31_611265 which looks like some issue with infra 16:42:57 <slaweq> and at least twice failed fullstack tests for patch https://review.openstack.org/#/c/499908/ which looks for me that is related to patch it self 16:43:25 <slaweq> do You found any other fullstack failures in last days? 16:44:09 <ihrachys> I wasn't paying attention so no 16:44:25 <mlavalle> I didn't see any 16:45:06 <jlibosva> me neither 16:45:10 <slaweq> ok, so I will try to keep an eye on it and check when something new will happen :) 16:45:18 <slaweq> next topic then 16:45:19 <slaweq> #topic Scenarios 16:45:37 <slaweq> here we still have same "big players" :) 16:45:49 <slaweq> neutron-tempest-plugin-dvr-multinode-scenario failing on 100% still 16:46:09 <slaweq> but what is interesting there was moment that it was little below 100% at 6.04 and then get back to 100% 16:46:38 <jlibosva> are those just the migration tests? 16:46:51 <jlibosva> perhaps we should mark them with our favorite unstable_test decorator 16:46:59 <slaweq> in most cases failed tests are those with migration 16:47:13 <slaweq> sometimes there is also test_trunk_subport_lifecycle which fails 16:47:23 <jlibosva> right, I still have that on my plate 16:47:36 <slaweq> I was checking few jobs and didn't found other issues 16:49:02 <slaweq> haleyb: as You are checking those failures with migration, do You think it is worth to mark them as unstable for now? 16:49:33 <haleyb> slaweq: yes, i suppose we could 16:49:43 <slaweq> ok, will You do it then? 16:50:22 <haleyb> yes 16:50:26 <slaweq> thx 16:50:48 <slaweq> #action haleyb will mark router migration tests are unstable 16:51:02 <slaweq> thx jlibosva, good idea :) 16:51:16 <ihrachys> do we do same with trunk then or it's a different beast? 16:51:48 <slaweq> I think we can do the same with it for now 16:51:55 <slaweq> jlibosva: do You agree? 16:51:56 <jlibosva> I'd suggest to do one by one 16:52:10 <jlibosva> let's see how the failure rate goes without considering router migration 16:52:17 <ihrachys> ok 16:52:20 <jlibosva> if it's still high, let's cross trunk out too 16:52:20 <slaweq> ++ 16:52:35 <slaweq> ok 16:52:49 <slaweq> ok, so moving on 16:52:53 <slaweq> #topic Rally 16:53:15 <jlibosva> eh, I had one item I wanted to bring re scenarios 16:53:22 <slaweq> sorry jlibosva 16:53:29 <slaweq> #topic scenarios 16:53:37 <slaweq> go on then :) 16:53:47 <jlibosva> so I've been watching the ovsfw tempest job and it seems to me that it copies other tempest failures 16:54:08 <jlibosva> that said, I think the failures are not related to the fact it uses ovsfw driver. I wanted to know opinions about making it voting in the check Q 16:55:15 <slaweq> I agree, it is quite stable since few weeks 16:55:20 <ihrachys> would it make sense, long term, to have ovsfw for multinode dvr-ha job? 16:55:24 <ihrachys> and then have just one 16:55:42 <jlibosva> I think long term it would make sense to make ovsfw the default 16:55:45 <ihrachys> I understand that ovsfw is stable already so we can have it enabled and plan for ovsfw for dvr-ha 16:56:02 <ihrachys> jlibosva, for all jobs? 16:56:03 <jlibosva> which will lead to have ovsfw in all current tempest tests 16:56:11 <jlibosva> for devstack, yes, for all jobs 16:56:48 <jlibosva> I think that should be the goal, then we can deprecate iptables hybrid driver. I don't think it makes sense to maintain both 16:57:04 <jlibosva> iptables driver will stay for LB of course 16:57:21 <jlibosva> but that's a long term :) 16:57:37 <jlibosva> so for now, we can make the ovsfw voting 16:57:59 <mlavalle> is it failing around ~15%? 16:57:59 <ihrachys> but there's no migration path from hybrid so how can you deprecate 16:58:22 <ihrachys> agreed to make it voting and deal with wider plans separately 16:58:34 <jlibosva> there is since Pike 16:58:46 <jlibosva> that leaves hybrid plugging behind 16:58:54 <slaweq> mlavalle: it is usually around 10-15% 16:59:03 <jlibosva> and with multiple port bindings being planned for rocky, we'll have a way to deal with it too 16:59:42 <slaweq> ok, maybe up to 25% sometimes but it follows other jobs also 16:59:53 <slaweq> I think we are out of time now 16:59:56 <ihrachys> jlibosva, ok looks like I was cryo-frozen for a while lol 17:00:06 <slaweq> #endmeeting