16:00:06 #startmeeting neutron_ci 16:00:08 Meeting started Tue Jul 17 16:00:06 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:09 hi again 16:00:11 :) 16:00:11 The meeting name has been set to 'neutron_ci' 16:00:23 o/ 16:00:37 mlavalle: haleyb: are You around? 16:00:41 o/ 16:01:07 #topic Actions from previous meetings 16:01:21 there is only one action from last meeting: 16:01:24 mlavalle to follow up with QA team to merge https://review.openstack.org/#/c/578765/ 16:01:42 I forgot. I will do it today 16:01:44 hi 16:01:49 yesterday I pushed new PS there 16:01:54 had the patch open in my browser the entire time 16:01:56 and it's already +W 16:01:57 danm 16:02:02 mlavalle: no problem 16:02:06 :) 16:02:23 I hope it will be merged soon, now I had to recheck it few times already 16:02:25 non-issue then 16:02:35 hi haleyb :) 16:03:01 so, quickly next topic 16:03:03 #topic Grafana 16:03:08 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:03:54 I wish there was a way to set the default for that to the last 7 days. Is there any reason not to do that? 16:03:55 as I was checking it earlier today, it looks much better now for most of jobs 16:04:15 njohnston: I don't know but would be good IMO :) 16:05:23 I think that Neutron gates are better when I am on holiday :D 16:05:44 #action njohnston to see if we can set the default time period on the grafana dashboard to now-7d 16:05:56 ^^ thx njohnston 16:06:58 for example fullstack which was in bad shape few weeks ago is now below 20% for last week 16:07:41 do You want to add something or can we talk about some specific jobs? 16:07:53 we still missed you! 16:08:01 why? 16:08:10 becuse you weren't here 16:08:29 ahh thx mlavalle :) 16:08:40 I'm saying even if the gate is in good shape, we still want you to be around 16:09:01 ok, now I will be all the time :P 16:09:12 so gate will be back to bad shape proably :P 16:09:18 LOL 16:09:34 ok, let's move on 16:09:35 #topic Unit tests 16:09:50 we have still issue with timeouts in UT sometimes 16:09:59 it's described in https://bugs.launchpad.net/neutron/+bug/1779077 16:09:59 Launchpad bug 1779077 in neutron "Unit test jobs (py35) fails with timeout often" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:10:28 But it’s not related only to python 3.5, I saw failing py27 jobs also and lower-constraints as well sometimes 16:10:45 it happens less often then it was few weeks ago IMO 16:10:52 but still it hits us 16:11:35 I checked today in logstash that it's (probably) not only Neutron's problem - some other projects also hits such timeouts in py27/py35 jobs last week 16:12:03 I was asking on infra channel today but they don't know about any specific reason of it 16:12:42 and I also don't know if maybe problems in other projects weren't due to something else, just checked that it happens from time to time for others also 16:13:13 slaweq, I have question regarding py35 issue im facing in networking-odl, but i'll ask later in neutron channel after this meeting 16:13:14 worth taking it into consideration 16:13:21 I want to compare time of longest tests in "good" and "bad" runs and maybe I will find something interesting 16:13:36 as it's not very often I'm doing it in meantime for now 16:13:48 and will update this bug if I will find something 16:15:45 that's all from me about UT issues 16:15:50 do You have something to add? 16:16:05 nope 16:16:13 next topic then 16:16:24 #topic Functional 16:16:44 I think I saw this week again issue like in http://logs.openstack.org/58/574058/10/check/neutron-functional/f38d685/logs/testr_results.html.gz 16:17:00 but I'm not sure exactly where and I couldn't find it today 16:17:09 did You saw such issues recently? 16:17:40 no, I didn't 16:18:13 so just please be aware and if You will see new such issues, please report a bug for it 16:18:22 as there is no any currently IIRC 16:18:43 we will then be able to track it there 16:18:47 On 7/15 we hit 100% neutron-functional failures in the gate, and then it just went away... I know I saw it at the time, and I don't know what was fixed but now the issue is not there 16:19:26 and it was the weekend and I did not have time to investigate at that moment 16:19:52 100% of failures during the weekend might be not really an issue 16:20:04 there is not many patches, especially in gate queue then 16:20:08 I saw a bunch of -2s from it 16:20:28 ahh, so maybe there was something really 16:20:42 maybe some of patches fixes this issue somehow :) 16:21:02 That is my assumption 16:21:25 would be good 16:21:38 let's just check if it will happen again 16:23:32 I think we can now go to our "favorite" topic 16:23:34 #topic Scenarios 16:24:09 since few days we have 2 jobs which are on very high failure rate 16:24:18 fortunatelly both are non-voting :) 16:24:39 it's neutron-tempest-multinode-full and neutron-tempest-dvr-ha-multinode-full 16:25:11 today I checked failures from last few days and I found few issues which happens there 16:25:19 first neutron-tempest-multinode-full 16:25:50 there is one or two tests failing every time when it fails: tempest.api.compute.admin.test_servers_on_multinodes.ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity 16:26:00 and this issue is probably not related to neutron at all 16:26:16 I saw same failures in tempest jobs also 16:26:34 example of such failures in neutron patches: 16:26:36 * http://logs.openstack.org/08/555608/34/check/neutron-tempest-multinode-full/1d3aafb/logs/testr_results.html.gz 16:26:44 * http://logs.openstack.org/15/583015/1/check/neutron-tempest-multinode-full/bd5eb5e/logs/testr_results.html.gz 16:26:46 * http://logs.openstack.org/21/567621/9/check/neutron-tempest-multinode-full/6bc503f/logs/testr_results.html.gz 16:26:48 * http://logs.openstack.org/59/582659/1/check/neutron-tempest-multinode-full/c503459/logs/testr_results.html.gz 16:28:39 so for this job I don't think there is a lot what we can do 16:28:46 let's go to neutron-tempest-dvr-ha-multinode-full 16:28:54 for this job there are 2 main issues 16:28:56 do we need to open a bug with the nova crew? 16:29:05 njohnston: good question 16:29:17 I didn't check if there is already something opened 16:29:32 but we can definitelly ask on nova channel 16:29:46 I will ask after the meeting 16:29:52 The nova/neutron liaison is sean-k-mooney, he might be able to help 16:30:15 ok, thx for info 16:30:26 he is closer to your timezone, slaweq 16:30:33 yes, I know 16:30:41 he is in Ireland IIRC 16:30:45 yeap 16:30:56 so I will ask tomorrow morning 16:31:26 #action slaweq to talk about issue with test_create_server_with_scheduler_hint_group_anti_affinity with nova-neutron liaison 16:32:02 ok, so for neutron-tempest-dvr-ha-multinode-full there are 2 issues generally 16:32:14 first is exactly same as above 16:32:28 and second issue is with tempest.api.compute.volumes.test_attach_volume.AttachVolumeShelveTestJSON.test_{attach, detach}_volume_shelved_or_offload_server tests 16:32:42 examples of failures: 16:32:44 * http://logs.openstack.org/51/414251/74/check/neutron-tempest-dvr-ha-multinode-full/2b4a730/logs/testr_results.html.gz 16:32:46 * http://logs.openstack.org/08/555608/34/check/neutron-tempest-dvr-ha-multinode-full/8d12da6/logs/testr_results.html.gz 16:32:48 * http://logs.openstack.org/29/581029/2/check/neutron-tempest-dvr-ha-multinode-full/f6e69d5/logs/testr_results.html.gz 16:32:50 * http://logs.openstack.org/21/567621/9/check/neutron-tempest-dvr-ha-multinode-full/3d8fd83/logs/testr_results.html.gz 16:32:52 * http://logs.openstack.org/59/582659/1/check/neutron-tempest-dvr-ha-multinode-full/62e9d93/logs/testr_results.html.gz 16:33:05 here it might be related to Neutron as issue is with ssh to instance via floating IP 16:33:19 but I didn't saw this issue in any other job then this with dvr 16:34:45 in l3 agent logs there is few errors like: http://logs.openstack.org/51/414251/74/check/neutron-tempest-dvr-ha-multinode-full/2b4a730/logs/subnode-3/screen-q-l3.txt.gz?level=ERROR 16:34:57 on both subnode-2 and subnode-3 16:35:39 mlavalle: do You think that it might be related? 16:36:05 slaweq: I was taking a look. don't know yet 16:36:47 no tracebacks in the log 16:36:52 just that error 16:37:58 right, don't know what it was doing 16:38:05 strange thing is that it's only on this test 16:39:04 the message comes from pyroute2: https://github.com/svinota/pyroute2/blob/master/pyroute2/netns/nslink.py#L199 16:39:13 so maybe it's some issue with configuring FIP again after server is shelved/unshelved 16:39:16 it seems at least 16:40:01 maybe it would be easy to check manually if someone have dvr environment 16:40:17 it seems they have their own api to handle namespaces 16:40:27 just try to shelve/unshelve server and check connectivity to it 16:40:40 the message is from the close() code, so maybe didn't cause a fatal error, i'm remembering seeing it before but can't place the context 16:40:56 yeah, it closing 16:43:40 I can create a DVR environment easily 16:43:48 if that is the route we want to take 16:44:15 it's my first idea to check as I saw this happens very often in this dvr scenario but not in others 16:44:34 and it's always this test with shelve/unshelve instance 16:44:52 here is what test is doing: https://github.com/openstack/tempest/blob/master/tempest/api/compute/volumes/test_attach_volume.py#L224 16:45:10 it creates server and volume and then shelve server 16:45:18 and unshelve it 16:45:27 after unshelve it tries to connect to it 16:45:36 and this fails 16:46:19 i wonder if CONF.validation.run_validation is True, so that it checks ssh before shelving 16:46:56 i guess it must be 16:47:32 http://logs.openstack.org/51/414251/74/check/neutron-tempest-dvr-ha-multinode-full/2b4a730/logs/tempest_conf.txt.gz 16:47:36 it is set to True 16:50:39 ok, mlavalle if You can deploy dvr env quickly can You test it and maybe report a bug for that issue? 16:51:00 not necessarilly today, but over the next few days 16:51:10 sure :) 16:51:11 it's enough dvr, right? 16:51:18 thx a lot 16:51:19 I don't need ha, right? 16:51:28 and try an 'openstack server shelve/unshelve ...' i guess 16:51:43 mlavalle: good question 16:52:00 scenario is dvr-ha 16:52:10 but maybe dvr would be enough? 16:52:16 I don't know 16:52:21 slaweq: mhhh, that may require a little more work 16:52:25 I'll still try 16:52:33 haleyb: yes, I think that shelve/unshelve will be what is done in this test 16:52:52 haleyb: ack 16:53:25 it might be enough to try with dvr, and see how it goes. i might have a system to try if i can remember the IP 16:53:38 LOL 16:53:40 ok 16:53:51 thx haleyb 16:54:28 #action mlavalle/haleyb to check dvr (dvr-ha) env and shelve/unshelve server 16:54:34 good? :) 16:54:43 yes 16:54:46 thx 16:54:50 sure, i did find it :) 16:55:09 ok, that's all from me for today :) 16:55:17 do You have anything else to add maybe? 16:55:30 not from me 16:55:51 just to say that I am glad I don't have to run 3 metings in a row 16:55:58 LOL 16:56:11 yes, it's hard for sure 16:56:18 2 for me are more than enough 16:56:25 it's not the meetings 16:56:30 it's the preparation 16:56:32 * njohnston is grateful to all of you 16:56:43 yes mlavalle, I agree 16:56:56 ok, thx guys for attending 16:56:59 see You next week 16:57:03 #endmeeting