16:00:25 <slaweq> #startmeeting neutron_ci 16:00:29 <slaweq> hi 16:00:31 <openstack> Meeting started Tue Feb 12 16:00:25 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:35 <openstack> The meeting name has been set to 'neutron_ci' 16:02:54 <slaweq> lets wait few minutes for other people to join 16:03:39 <slaweq> but looks like there is no crowd today :) 16:04:13 <ralonsoh> hi! I'm still here 16:04:24 <slaweq> hi ralonsoh, good to see You :) 16:04:42 <slaweq> I know that mlavalle will not be available today 16:04:44 <bcafarel> o/ 16:04:52 <slaweq> but maybe haleyb and bcafarel will join 16:04:56 <slaweq> hi bcafarel :) 16:05:06 <slaweq> hongbin: are You around for CI meeting? 16:05:09 <bcafarel> my spider sense was telling me I should switch IRC channel :) 16:05:12 <haleyb> oh, hi, looking at bugs 16:05:14 <hongbin> o/ 16:05:25 <slaweq> ok, so now I think we can start :) 16:05:31 <slaweq> welcome everyone! 16:05:36 <slaweq> #topic Actions from previous meetings 16:05:47 <slaweq> first one was: 16:05:49 <slaweq> mlavalle to continue investigate why L3 agent is considered as down and cause trunk tests fail 16:06:13 <slaweq> mlavalle gave me some update about his findings on it 16:06:28 <slaweq> he said that problem is with the test migrating router from HA to dvr/legacy. That migration implies deleting the existing router state_change_monitor 16:06:53 <slaweq> so we try to execute 'kill', '-15', 'pid'' and that cause agent "dead" basically 16:06:58 <slaweq> please look here: http://paste.openstack.org/show/744001/ 16:07:56 <slaweq> possibly we are missing filter in https://github.com/openstack/neutron/blob/master/etc/neutron/rootwrap.d/l3.filters for kill neutron-keepalived-state-change process 16:08:14 <slaweq> haleyb: do You think that it is possible? 16:08:36 <haleyb> so it's getting an error trying to kill the state change monitor? 16:09:05 <slaweq> IIUC what mlavalle was saying than yes :) 16:09:51 <slaweq> and indeed IMO this KillCommand filter is missing there 16:10:04 <slaweq> so that can be some issue 16:10:50 <haleyb> yes, that missing KillFilter could be an issue 16:11:12 <slaweq> so mlavalle will send patch to add it and we will see if that will help 16:11:24 <haleyb> +1 16:12:12 <slaweq> #action mlavalle to check if adding KillFilter for neutron-keepalived-state-change will solve issues with L3 agent in dvr jobs 16:12:14 <bcafarel> and it worked before without filter? 16:12:50 <slaweq> bcafarel: I think that possibly it wasn't killing keepalived-state-change process but agent was running fine 16:13:05 <slaweq> maybe change py27->py36 for this job triggered this somehow 16:13:32 <slaweq> also mlavalle told me that it happens during router migration 16:13:52 <slaweq> from ha to dvr/legacy 16:14:01 <slaweq> and such scenario isn't tested in other jobs 16:14:18 <slaweq> so it is possible that python 36 triggered that somehow 16:15:16 <slaweq> can we move on or do You have anything else to add here? 16:15:54 <bcafarel> nothing, it was mostly personal curiosity :) 16:15:57 <bcafarel> thanks slaweq 16:16:11 <slaweq> bcafarel: sure :) I'm also wondering how it may be possible 16:17:16 <slaweq> ok, lets move on 16:17:19 <slaweq> next action was: 16:17:24 <slaweq> njohnston Work on grenade job transition 16:17:37 <slaweq> I think he didn't make any progress on this recently 16:18:54 <slaweq> next one was: 16:18:55 <bcafarel> I sent https://review.openstack.org/#/c/636356/ to see how it goes with simple change in the meantime 16:18:56 <patchbot> patch 636356 - neutron - Switch grenade jobs to python3 - 1 patch set 16:19:00 <slaweq> slaweq to check if new ovsdb monitor implementation can allow to listen only for events from specific bridge 16:19:35 <slaweq> bcafarel: thx, lets get back to it in python3 section, ok? 16:20:01 <bcafarel> slaweq: oops sorry multitasking is bad for attention 16:20:08 <slaweq> bcafarel: no, it's fine :) 16:20:35 <slaweq> according to ovsdb monitor, I checked that SimpleInterfaceMonitor isn’t moved to new implementation yet so there was nothing to check for now. 16:20:44 <slaweq> ralonsoh is still working on it, right? 16:21:09 <ralonsoh> slaweq, yes, but I'm still facing the same functional tests problems 16:21:15 <ralonsoh> and I don't know how to solve them 16:21:23 <ralonsoh> I think this is a race condition 16:21:28 <ralonsoh> but I can't probe it 16:21:33 <slaweq> ralonsoh: do You have link to patch? 16:21:53 <ralonsoh> #link https://review.openstack.org/#/c/612400/ 16:21:54 <patchbot> patch 612400 - neutron - [WIP] Add native OVSDB implementation for polling ... - 18 patch sets 16:23:05 <slaweq> a lot of events logged: http://logs.openstack.org/00/612400/18/check/neutron-functional-python27/6e7d69b/job-output.txt.gz#_2019-02-07_12_04_15_376869 16:23:51 <slaweq> is it normal? 16:24:22 <ralonsoh> no, just for testing (is in the patch comment) 16:24:53 <slaweq> but except that "issue" I don't see any failed test in PS18 in functional tests job 16:26:03 <ralonsoh> sorry, neutron-fullstack 16:26:15 <ralonsoh> test_l2_agent_restart(OVS,VLANs,openflow-cli) 16:26:18 <ralonsoh> always the same 16:27:00 <slaweq> ralonsoh: ok, I will take a look at logs from this test later 16:27:05 <ralonsoh> thanks! 16:27:09 <slaweq> maybe I will find something :) 16:27:23 <slaweq> ok, lets move on 16:27:29 <slaweq> last action from last week was: 16:27:31 <slaweq> njohnston to take care of periodic UT jobs failures 16:27:47 <slaweq> I know that he fixed this issue as periodic jobs are fine now 16:27:57 <slaweq> but I don't have link to specific patch 16:29:31 <slaweq> ok, anything else You want to ask/add about actions from last week? 16:30:56 <slaweq> ok, lets move on then 16:30:58 <slaweq> #topic Python 3 16:31:05 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_ci_python3 16:31:45 <slaweq> FYI: after we merged switch of some jobs to py3 versions I pushed patch to update Grafana: https://review.openstack.org/636359 16:31:46 <patchbot> patch 636359 - openstack-infra/project-config - Add new python 3 Neutron jobs to grafana dashboard - 1 patch set 16:32:18 <slaweq> for now from gate/check queues we are only missing: 16:32:20 <slaweq> * neutron-tempest-dvr-ha-multinode-full - patch in progress https://review.openstack.org/633979 16:32:21 <patchbot> patch 633979 - neutron - Migrate neutron-tempest-dvr-ha-multinode-full job ... - 7 patch sets 16:32:22 <slaweq> * neutron-grenade-multinode - patch with new job already proposed: https://review.openstack.org/#/c/622612/ but is failing 16:32:22 <patchbot> patch 622612 - openstack-dev/grenade - Add grenade-multinode-py3 job - 3 patch sets 16:32:24 <slaweq> * neutron-grenade-dvr-multinode - should be done when above job will be done, 16:33:22 <slaweq> bcafarel: so according to Your patch, I think You should rebase/recheck this patch from njohnston https://review.openstack.org/#/c/622612/ and get this merged 16:33:23 <patchbot> patch 622612 - openstack-dev/grenade - Add grenade-multinode-py3 job - 3 patch sets 16:33:32 <slaweq> then we can switch to use this job in neutron 16:34:01 <slaweq> what do You think? 16:34:50 <bcafarel> having the job defined in grenade itself will be nice indeed 16:35:55 <bcafarel> I'll check once we have some initial results on https://review.openstack.org/#/c/636356/ (just enabling python 3 on our side) 16:35:56 <patchbot> patch 636356 - neutron - Switch grenade jobs to python3 - 1 patch set 16:36:05 <slaweq> ok 16:36:32 <slaweq> #action bcafarel to continue work on grafana jobs switch to python 3 16:36:36 <slaweq> thx bcafarel 16:36:38 <slaweq> :) 16:36:43 <bcafarel> :) 16:36:51 <slaweq> that's all from my side according to python 3 16:36:58 <slaweq> do You want to add anything? 16:38:06 <bcafarel> list is getting shorter and shorter that's good :) 16:38:20 <slaweq> indeed :) 16:38:37 <slaweq> ok, lets move on 16:38:44 <slaweq> #topic Grafana 16:38:51 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:40:50 <slaweq> I don't see anything "very bad" in dashboard 16:41:01 <slaweq> things are running as usual 16:41:15 <slaweq> do You see anything special what You want to discuss related to grafana? 16:42:46 <slaweq> ok, so I will take this as no :) 16:42:51 <slaweq> next topic 16:42:55 <slaweq> #topic fullstack/functional 16:43:16 <slaweq> 2 things here 16:43:26 <slaweq> first, about https://review.openstack.org/#/c/629420/ - 16:43:27 <patchbot> patch 629420 - neutron - Revert "Mark mysql related functional tests as uns... - 3 patch sets 16:43:38 <slaweq> I was thinking about this a bit today 16:43:58 <slaweq> and I think that maybe I can add new decorator, something like "skip_if_timeout" 16:44:18 <slaweq> it would be similar to unstable_test() but will skip only if timeout exception would be raised in test 16:44:48 <slaweq> that way we can still track how many times this issue occurs and see other failures properly 16:45:01 <slaweq> what do You think about this idea? 16:45:28 <haleyb> i like it better than using unstable_test since it is only catching one condition 16:46:23 <slaweq> haleyb: thx, so I will go with it tomorrow :) 16:46:46 <bcafarel> trying to think of situations where timeout would be an actual failure (that we would then miss) 16:46:52 <bcafarel> but it sounds reasonable indeed 16:46:58 <slaweq> #action slaweq to propose patch with new decorator skip_if_timeout in functional tests 16:47:16 <hongbin> +1 16:47:19 <slaweq> bcafarel: yes, it may happen but it's still better than how it's now :) 16:47:42 <bcafarel> definitely! 16:48:26 <slaweq> ok, and second thing related to functional tests for today is new (old) https://bugs.launchpad.net/neutron/+bug/1815142 16:48:27 <openstack> Launchpad bug 1815142 in neutron "ovsdbapp.exceptions.TimeoutException in functional tests" [High,Confirmed] 16:48:34 <slaweq> it hits us from time to time 16:49:00 <slaweq> TBH I suspect that it may be similar issue like with those db tests and it's just slow VM 16:49:18 <slaweq> but maybe we can ask otherwiseguy to look at it again 16:51:36 <slaweq> thats all from my side regarding fullstack/functional jobs 16:51:44 <slaweq> do You want to add anything else? 16:53:04 <slaweq> ok, so let's move on then 16:53:10 <slaweq> #topic Tempest/Scenario 16:53:25 <slaweq> we still have some random ssh failure issues in scenario jobs 16:53:39 <slaweq> but that is "know" issue and I don't have any updates about that 16:54:05 <slaweq> I today found some new issue which happend at least 2 times in linuxbridge job: 16:54:12 <slaweq> https://bugs.launchpad.net/neutron/+bug/1815585 16:54:13 <openstack> Launchpad bug 1815585 in neutron "Floating IP status failed to transition to DOWN in neutron-tempest-plugin-scenario-linuxbridge" [High,Confirmed] 16:54:21 <slaweq> are You aware of such problem? 16:54:41 <haleyb> i am 16:55:19 <slaweq> do You know what can be a reason of it? 16:56:04 <haleyb> although my initial debugging did not find an answer. basically when the port is removed from the instance, we expect the floating status to change, but it didn't 16:56:29 <haleyb> at least that is what the test is doing 16:57:43 <slaweq> haleyb: and of course when doing it locally it works fine, right? 16:58:42 <haleyb> it seemed to 16:59:03 <slaweq> :/ 16:59:18 <slaweq> ok, bug is reported, for now it not happens very often 16:59:25 <slaweq> maybe someone will find culprit :) 16:59:38 <slaweq> we are running out of time 16:59:42 <slaweq> so thx for attending 16:59:47 <slaweq> and see You all next week :) 16:59:53 <slaweq> #endmeeting