16:00:43 #startmeeting neutron_ci 16:00:44 Meeting started Tue Dec 4 16:00:43 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:45 hi 16:00:45 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:48 The meeting name has been set to 'neutron_ci' 16:01:06 o/ 16:01:48 lets wait few minutes for others 16:02:21 ok 16:03:23 I pinged them in neutron channel 16:03:24 hi 16:03:35 o/ 16:03:36 slaweq found us :) 16:03:44 o/ 16:03:51 :) 16:04:17 o/ 16:04:29 ok, lets start then 16:04:29 my coffeemaker was slow :-/ 16:04:34 #topic Actions from previous meetings 16:04:46 mlavalle to continue tracking not reachable FIP in trunk tests 16:05:03 I did after we merged your patch 16:05:13 the error is still taking place 16:05:25 yes, but I think that it's less often now, right? 16:05:51 it may be slightly lower 16:06:22 when did you fix exactly merge in master? 16:07:05 https://review.openstack.org/#/c/620805/ 16:07:09 3.12 I think 16:08:00 in master it seems it merged 11/27 16:08:12 ahh, right 16:08:15 this one was for pike 16:08:18 sorry 16:08:32 and yes, there seem to be lower number of hist since then 16:08:48 so the next step is to change the scenario test 16:09:03 yes, will You do it? 16:09:06 to create the fip when the port is already bound 16:09:14 and yes, I will do that today 16:09:40 #action mlavalle to change trunk scenario test and see if that will help with FIP issues 16:09:47 ++ 16:10:07 ok, next one then 16:10:09 haleyb takes all this week :D 16:10:19 haleyb: did You fixed all our bugs? 16:10:21 :P 16:10:41 yeah, you didn't see the changes? :p 16:10:48 ok, thx haleyb++ 16:10:55 we can go to the next one then 16:10:57 njohnston to remove neutron-grenade job from neutron's CI queues 16:11:24 So I have a change up to do that, but there are pre-requisites 16:11:46 I can remove the job from our queues but leave the job definition there 16:11:53 because it is used elsewhere 16:12:05 I have a change up already for the grenade repo, which uses neutron-grenade 16:12:18 but it looks like nova, cinder, glance, keystone all use neutron-grenade 16:12:33 so I'll need to spin changes for all of those before we can delete the job definition 16:12:57 but in our gate we have already neutron-grenade-py3, right? 16:13:02 oh, and tempest, openstacksdk, and the requirements repo 16:13:37 we have grenade-py3 yes, provided by the grenade repo 16:14:09 so maybe we can remove this neutron-grenade job from our check and gate queues to not overload gates 16:14:29 and add some comments that definition of this job is used by other projects so can't be removed now 16:14:34 mlavalle: what do You think? 16:14:47 yes, I think I'll do that to start with, and I'll gradually work towards the other projects replacing neutron-grenade with grenade-py3 16:15:19 sounds good 16:15:34 great, so go on with it njohnston :) 16:15:57 #action njohnston will remove neutron-grenade from neutron ci queues and add comment why definition of job is still needed 16:16:09 will do 16:16:13 thx 16:16:17 so, next one 16:16:19 slaweq to continue debugging bug 1798475 when journal log will be available in fullstack tests 16:16:20 bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:16:42 I found this failure on patch where journal log is already storred 16:16:48 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/testr_results.html.gz 16:17:10 but I still have no idea why keepalived switched VIP address from one "host" to the other 16:17:29 I will keep investigating that 16:17:38 #action slaweq to continue debugging bug 1798475 16:17:39 bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:18:12 if I will need help I will ping haleyb :) 16:18:26 next one 16:18:28 slaweq to continue fixing funtional-py3 tests 16:18:34 no progress on this one still 16:18:49 I hope you didn't mind that I raised the flag on this one in the neutron team meeting 16:18:51 we know that we need to limit output from functional tests, remove all deprecations and things like that and check then what will be the result 16:19:03 njohnston: no, great that You did it 16:19:17 we definitely needs more eyes looking at it 16:20:07 if there is anyone else who wants to try to fix it this week, feel free 16:20:21 I will assign it to myself as an action just to not forget about it 16:20:24 ok? 16:20:32 +1 16:20:45 so based on how much progress I make on other assignments, I may take a look at this one 16:20:49 #action slaweq to continue fixing funtional-py3 tests 16:21:04 mlavalle: thx, ping me if You will need any info 16:21:10 ack 16:21:25 ok, next one 16:21:27 njohnston to research py3 conversion for neutron grenade multinode jobs 16:21:46 no progress on that one yet 16:22:03 ok, lets move it to the next week then, right? 16:22:09 sounds good 16:22:10 #action njohnston to research py3 conversion for neutron grenade multinode jobs 16:22:12 thx 16:22:19 next: 16:22:22 slaweq to convert neutron-tempest-plugin jobs to py3 16:22:26 Patch https://review.openstack.org/#/c/621401/ 16:22:38 I rechecked it few times and it looks that it works fine 16:22:52 all neutron-tempest-plugin jobs for master branch are switched to py3 in this patch 16:23:04 and jobs for stable branches are still on py27 of course 16:23:14 please review it if You will have some time 16:24:00 slaweq: I will look at it today 16:24:07 thx mlavalle 16:24:15 ok, so lets move on 16:24:26 njohnston add tempest-slow and networking-ovn-tempest-dsvm-ovs-release to grafana 16:25:14 hmm, I added that config but it looks like I never did a 'git review'. I'll do that right away, sorry I missed it. 16:25:34 no problem, thx njohnston for working on this :) 16:25:57 please add me as reviewer to it when You will push it 16:26:05 will do 16:26:13 thx 16:26:20 ok, lets move on to the last one 16:26:22 mlavalle to discuss about neutron-tempest-dvr job in L3 meeting 16:26:38 mhhh, I couldn't attend the meeting 16:27:15 I'll discuss with haleyb in the Neutron channel 16:27:15 ok, so will You talk about it this week? 16:27:20 ahh, thx 16:27:32 so I will not add action for it anymore 16:27:39 please do 16:27:48 any questions/something to add? 16:27:54 so I can report concusion next week 16:28:01 conclusion 16:29:02 ok, lets move on then 16:29:03 #topic Python 3 16:29:09 njohnston: any updates? 16:30:09 I think we should update our etherpad, to mark what is already done 16:30:20 I will do that this week 16:30:40 I don't think there are any updates this week 16:30:43 from my end 16:31:00 #action slaweq to update etherpad with what is already converted to py3 16:31:28 ok, from my side there is also nothing more to talk today 16:31:37 so lets move on, next topic 16:31:44 #topic Grafana 16:31:49 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:34:15 good news is that tempest jobs, and scenario jobs looks much better this week 16:34:25 even those multinode dvr jobs 16:35:08 what happened last week? 16:35:19 when? 16:35:53 seems we had a spike around Thursday in gate 16:37:14 hmm, I don't know 16:37:58 possibly there are some infra related issues 16:38:21 well, let's keep it in mind 16:38:33 one day last week I saw some failures because of missing dependencies or something like that 16:38:50 but I don't remember what it was exactly and in which day 16:39:33 Hey, just a quick sanity check: we have the tempest-slow job in the check queue as voting but not in gate. Is that the intended configuration? 16:40:04 As in it is not present in the gate queue in any form 16:40:31 in theory, if a job is voting in check, it should be in the gate as well 16:41:04 yes, we should add it to gate queue as well 16:41:09 yes, that is my impression, I just wanted to see if there is some reason for this to be an exception that I don't know about 16:41:14 but maybe lets add it to grafana first 16:41:29 it turns out, it already is in grafana, at least for the check queue 16:41:34 then check how it works for one/two weeks and we will then decide 16:42:02 indeed, I missed it 16:42:50 So I will submit a change to add it to gate, and we can slow-walk that change until we are comfortable on it 16:43:21 sounds like a plan 16:43:39 ok 16:44:10 ok, one more thing I want to mention 16:44:25 not related stricly to grafana but related to job failures 16:44:48 together with hongbin we decided to collect examples of failures which we spot in CI in etherpad: 16:44:49 https://etherpad.openstack.org/p/neutron-ci-failures 16:45:09 there is already quite many examples of issues from this week 16:45:40 some of them are not related to neutron but we wanted to have info how many times such issue happens for us 16:45:48 some are related to neutron for sure 16:46:19 most common issue related to neutron are problems with connectivity to FIP 16:46:31 yeap 16:46:49 I don't know if that is the same issue every time but visible culprit is the same - no connection to FIP 16:46:55 it is in many different jobs 16:47:04 I don't think it is always the same root cause 16:47:53 yes, probably not, but we should take a look at them, collect such examples for few weeks and report bugs if it hits more often in same job/test 16:48:30 from other things, I wanted to mention that there is (again) some issue with ovsdbapp timeouts 16:48:44 it hits us in many different jobs 16:49:00 and otherwiseguy was checking it already 16:49:35 there is a patch to turn on the python-ovs debugging to do further troubleshooting for that 16:50:16 #link https://review.openstack.org/#/c/621572/ 16:50:33 and bug reported already: https://bugs.launchpad.net/bugs/1802640 16:50:34 Launchpad bug 1802640 in neutron "TimeoutException: Commands [ ok, lets talk about some specific jobs now 16:51:28 #topic fullstack/functional 16:51:43 according to fullstack tests we have new issue I think 16:52:04 I saw at least 3 times that test: neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_gateway_ip_changed failed: 16:52:12 http://logs.openstack.org/36/617736/10/check/neutron-fullstack/61fa2eb/logs/testr_results.html.gz 16:52:15 http://logs.openstack.org/68/424468/34/check/neutron-fullstack/4422e70/logs/testr_results.html.gz 16:52:17 http://logs.openstack.org/08/620708/2/check/neutron-fullstack/51099db/logs/testr_results.html.gz 16:52:28 is there anyone who wants to take a look on it? 16:52:48 i can try to look at it 16:53:39 is it always the IP address already alocated exception? 16:53:48 thx hongbin 16:54:00 patch which introduced this test is https://review.openstack.org/#/c/606876/ 16:54:21 mlavalle: I don't know, I didn't have time to investigate it 16:55:05 yes, it looks that it is the same culprit every time 16:55:17 it should be easy to fix I hope :) 16:55:41 #action hongbin to report and check failing neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_gateway_ip_changed test 16:56:19 I want also to highligh again failing functional db migrations tests 16:56:34 I found it at least 5 or 6 times failing this week 16:56:47 there are already logs from those tests stored properly 16:56:58 but there is not too much info there 16:57:16 for example: http://logs.openstack.org/49/613549/6/gate/neutron-functional/315ddc4/logs/testr_results.html.gz 16:57:53 it looks that migrations are working but very slow 16:58:33 I think we should compare such failed tests with passed once and check if there are always the same migrations slowest ones or maybe it's random 16:58:48 but I don't know if I will have time for it this week 16:59:43 ok, and that's all what I wanted to highligh from issues in CI 16:59:54 thx for attending and see You next week 16:59:58 #endmeeting