16:00:25 #startmeeting neutron_ci 16:00:29 hi 16:00:31 Meeting started Tue Feb 12 16:00:25 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:32 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:35 The meeting name has been set to 'neutron_ci' 16:02:54 lets wait few minutes for other people to join 16:03:39 but looks like there is no crowd today :) 16:04:13 hi! I'm still here 16:04:24 hi ralonsoh, good to see You :) 16:04:42 I know that mlavalle will not be available today 16:04:44 o/ 16:04:52 but maybe haleyb and bcafarel will join 16:04:56 hi bcafarel :) 16:05:06 hongbin: are You around for CI meeting? 16:05:09 my spider sense was telling me I should switch IRC channel :) 16:05:12 oh, hi, looking at bugs 16:05:14 o/ 16:05:25 ok, so now I think we can start :) 16:05:31 welcome everyone! 16:05:36 #topic Actions from previous meetings 16:05:47 first one was: 16:05:49 mlavalle to continue investigate why L3 agent is considered as down and cause trunk tests fail 16:06:13 mlavalle gave me some update about his findings on it 16:06:28 he said that problem is with the test migrating router from HA to dvr/legacy. That migration implies deleting the existing router state_change_monitor 16:06:53 so we try to execute 'kill', '-15', 'pid'' and that cause agent "dead" basically 16:06:58 please look here: http://paste.openstack.org/show/744001/ 16:07:56 possibly we are missing filter in https://github.com/openstack/neutron/blob/master/etc/neutron/rootwrap.d/l3.filters for kill neutron-keepalived-state-change process 16:08:14 haleyb: do You think that it is possible? 16:08:36 so it's getting an error trying to kill the state change monitor? 16:09:05 IIUC what mlavalle was saying than yes :) 16:09:51 and indeed IMO this KillCommand filter is missing there 16:10:04 so that can be some issue 16:10:50 yes, that missing KillFilter could be an issue 16:11:12 so mlavalle will send patch to add it and we will see if that will help 16:11:24 +1 16:12:12 #action mlavalle to check if adding KillFilter for neutron-keepalived-state-change will solve issues with L3 agent in dvr jobs 16:12:14 and it worked before without filter? 16:12:50 bcafarel: I think that possibly it wasn't killing keepalived-state-change process but agent was running fine 16:13:05 maybe change py27->py36 for this job triggered this somehow 16:13:32 also mlavalle told me that it happens during router migration 16:13:52 from ha to dvr/legacy 16:14:01 and such scenario isn't tested in other jobs 16:14:18 so it is possible that python 36 triggered that somehow 16:15:16 can we move on or do You have anything else to add here? 16:15:54 nothing, it was mostly personal curiosity :) 16:15:57 thanks slaweq 16:16:11 bcafarel: sure :) I'm also wondering how it may be possible 16:17:16 ok, lets move on 16:17:19 next action was: 16:17:24 njohnston Work on grenade job transition 16:17:37 I think he didn't make any progress on this recently 16:18:54 next one was: 16:18:55 I sent https://review.openstack.org/#/c/636356/ to see how it goes with simple change in the meantime 16:18:56 patch 636356 - neutron - Switch grenade jobs to python3 - 1 patch set 16:19:00 slaweq to check if new ovsdb monitor implementation can allow to listen only for events from specific bridge 16:19:35 bcafarel: thx, lets get back to it in python3 section, ok? 16:20:01 slaweq: oops sorry multitasking is bad for attention 16:20:08 bcafarel: no, it's fine :) 16:20:35 according to ovsdb monitor, I checked that SimpleInterfaceMonitor isn’t moved to new implementation yet so there was nothing to check for now. 16:20:44 ralonsoh is still working on it, right? 16:21:09 slaweq, yes, but I'm still facing the same functional tests problems 16:21:15 and I don't know how to solve them 16:21:23 I think this is a race condition 16:21:28 but I can't probe it 16:21:33 ralonsoh: do You have link to patch? 16:21:53 #link https://review.openstack.org/#/c/612400/ 16:21:54 patch 612400 - neutron - [WIP] Add native OVSDB implementation for polling ... - 18 patch sets 16:23:05 a lot of events logged: http://logs.openstack.org/00/612400/18/check/neutron-functional-python27/6e7d69b/job-output.txt.gz#_2019-02-07_12_04_15_376869 16:23:51 is it normal? 16:24:22 no, just for testing (is in the patch comment) 16:24:53 but except that "issue" I don't see any failed test in PS18 in functional tests job 16:26:03 sorry, neutron-fullstack 16:26:15 test_l2_agent_restart(OVS,VLANs,openflow-cli) 16:26:18 always the same 16:27:00 ralonsoh: ok, I will take a look at logs from this test later 16:27:05 thanks! 16:27:09 maybe I will find something :) 16:27:23 ok, lets move on 16:27:29 last action from last week was: 16:27:31 njohnston to take care of periodic UT jobs failures 16:27:47 I know that he fixed this issue as periodic jobs are fine now 16:27:57 but I don't have link to specific patch 16:29:31 ok, anything else You want to ask/add about actions from last week? 16:30:56 ok, lets move on then 16:30:58 #topic Python 3 16:31:05 Etherpad: https://etherpad.openstack.org/p/neutron_ci_python3 16:31:45 FYI: after we merged switch of some jobs to py3 versions I pushed patch to update Grafana: https://review.openstack.org/636359 16:31:46 patch 636359 - openstack-infra/project-config - Add new python 3 Neutron jobs to grafana dashboard - 1 patch set 16:32:18 for now from gate/check queues we are only missing: 16:32:20 * neutron-tempest-dvr-ha-multinode-full - patch in progress https://review.openstack.org/633979 16:32:21 patch 633979 - neutron - Migrate neutron-tempest-dvr-ha-multinode-full job ... - 7 patch sets 16:32:22 * neutron-grenade-multinode - patch with new job already proposed: https://review.openstack.org/#/c/622612/ but is failing 16:32:22 patch 622612 - openstack-dev/grenade - Add grenade-multinode-py3 job - 3 patch sets 16:32:24 * neutron-grenade-dvr-multinode - should be done when above job will be done, 16:33:22 bcafarel: so according to Your patch, I think You should rebase/recheck this patch from njohnston https://review.openstack.org/#/c/622612/ and get this merged 16:33:23 patch 622612 - openstack-dev/grenade - Add grenade-multinode-py3 job - 3 patch sets 16:33:32 then we can switch to use this job in neutron 16:34:01 what do You think? 16:34:50 having the job defined in grenade itself will be nice indeed 16:35:55 I'll check once we have some initial results on https://review.openstack.org/#/c/636356/ (just enabling python 3 on our side) 16:35:56 patch 636356 - neutron - Switch grenade jobs to python3 - 1 patch set 16:36:05 ok 16:36:32 #action bcafarel to continue work on grafana jobs switch to python 3 16:36:36 thx bcafarel 16:36:38 :) 16:36:43 :) 16:36:51 that's all from my side according to python 3 16:36:58 do You want to add anything? 16:38:06 list is getting shorter and shorter that's good :) 16:38:20 indeed :) 16:38:37 ok, lets move on 16:38:44 #topic Grafana 16:38:51 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:40:50 I don't see anything "very bad" in dashboard 16:41:01 things are running as usual 16:41:15 do You see anything special what You want to discuss related to grafana? 16:42:46 ok, so I will take this as no :) 16:42:51 next topic 16:42:55 #topic fullstack/functional 16:43:16 2 things here 16:43:26 first, about https://review.openstack.org/#/c/629420/ - 16:43:27 patch 629420 - neutron - Revert "Mark mysql related functional tests as uns... - 3 patch sets 16:43:38 I was thinking about this a bit today 16:43:58 and I think that maybe I can add new decorator, something like "skip_if_timeout" 16:44:18 it would be similar to unstable_test() but will skip only if timeout exception would be raised in test 16:44:48 that way we can still track how many times this issue occurs and see other failures properly 16:45:01 what do You think about this idea? 16:45:28 i like it better than using unstable_test since it is only catching one condition 16:46:23 haleyb: thx, so I will go with it tomorrow :) 16:46:46 trying to think of situations where timeout would be an actual failure (that we would then miss) 16:46:52 but it sounds reasonable indeed 16:46:58 #action slaweq to propose patch with new decorator skip_if_timeout in functional tests 16:47:16 +1 16:47:19 bcafarel: yes, it may happen but it's still better than how it's now :) 16:47:42 definitely! 16:48:26 ok, and second thing related to functional tests for today is new (old) https://bugs.launchpad.net/neutron/+bug/1815142 16:48:27 Launchpad bug 1815142 in neutron "ovsdbapp.exceptions.TimeoutException in functional tests" [High,Confirmed] 16:48:34 it hits us from time to time 16:49:00 TBH I suspect that it may be similar issue like with those db tests and it's just slow VM 16:49:18 but maybe we can ask otherwiseguy to look at it again 16:51:36 thats all from my side regarding fullstack/functional jobs 16:51:44 do You want to add anything else? 16:53:04 ok, so let's move on then 16:53:10 #topic Tempest/Scenario 16:53:25 we still have some random ssh failure issues in scenario jobs 16:53:39 but that is "know" issue and I don't have any updates about that 16:54:05 I today found some new issue which happend at least 2 times in linuxbridge job: 16:54:12 https://bugs.launchpad.net/neutron/+bug/1815585 16:54:13 Launchpad bug 1815585 in neutron "Floating IP status failed to transition to DOWN in neutron-tempest-plugin-scenario-linuxbridge" [High,Confirmed] 16:54:21 are You aware of such problem? 16:54:41 i am 16:55:19 do You know what can be a reason of it? 16:56:04 although my initial debugging did not find an answer. basically when the port is removed from the instance, we expect the floating status to change, but it didn't 16:56:29 at least that is what the test is doing 16:57:43 haleyb: and of course when doing it locally it works fine, right? 16:58:42 it seemed to 16:59:03 :/ 16:59:18 ok, bug is reported, for now it not happens very often 16:59:25 maybe someone will find culprit :) 16:59:38 we are running out of time 16:59:42 so thx for attending 16:59:47 and see You all next week :) 16:59:53 #endmeeting