16:00:14 #startmeeting neutron_ci 16:00:15 Meeting started Tue Sep 25 16:00:14 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:19 welcome again :) 16:00:20 The meeting name has been set to 'neutron_ci' 16:00:52 hi 16:01:24 o/ 16:01:34 made it 16:01:44 :) 16:01:48 now I remember why I miss most ci meetings, usually have to leave shortly after they start 16:01:53 still hi again 16:02:01 bcafarel: LOL 16:02:14 ok, lets go then 16:02:16 #topic Actions from previous meetings 16:02:26 * manjeets continue debugging why migration from HA routers fails 100% of times 16:02:33 manjeets: any progress on this one? 16:02:44 I proposed today to mark those tests as unstable for now: https://review.openstack.org/605057 16:03:37 I think manjeets is not available now 16:04:08 mlavalle: please just take a look at this mine patch - I think it would be good to make this job passing at least sometimes :) 16:04:43 o/ 16:04:48 done 16:04:48 o/ 16:04:54 thx mlavalle 16:04:56 hi njohnston 16:05:03 ok, next one 16:05:04 sorry about the repeat there 16:05:05 * mlavale to check issue with failing test_attach_volume_shelved_or_offload_server test 16:05:22 no problem njohnston :) 16:05:35 no problem njohnston :) 16:05:36 LOL 16:05:40 :-) 16:06:11 ok, mlavalle any update about this shelved unshelved server test fail? 16:06:17 slaweq: hang on 16:06:38 k 16:08:39 slaweq: I don't find it. I think I left some notes recentky there 16:09:02 You don't find any issues like that recently, right? 16:10:06 yes 16:10:15 but do you have a pointer to the bug? 16:11:05 I don't have 16:11:10 but let me find it 16:13:01 I can't find it 16:13:09 was it reported as a bug? 16:13:15 maybe we forgot about that? 16:14:07 yeah, that maybe the problem 16:14:39 in any case, I spent some time last week searching kibana for it 16:14:44 and didn't find instances 16:14:54 so maybe we will be good with it :) 16:14:57 I'll dig the query and get back to you 16:15:04 ok, thx 16:15:07 I sent myself an email 16:15:18 with the query that I need to dig 16:15:53 #action mlavalle will work on logstash query to find if issues with test_attach_volume_shelved_or_offload_server still happens 16:16:05 ok, next one then 16:16:07 * njohnston will continue work on switch fullstack-python35 to python36 job 16:16:35 So it is voting now but as bcafarel pointed out it is still using py35 16:16:39 I am looking in to it now 16:16:47 ok 16:16:57 what about removing old fullstack with py27? 16:17:05 I think we are ready for that now 16:17:18 agreed, I'll push a change for that 16:17:28 mlavalle: haleyb: are You ok with it? 16:17:51 yes, i'm fine with it 16:17:58 me too 16:18:02 great 16:18:08 ok, thx njohnston for working on this 16:18:26 #action njohnston will debug why fullstack-py36 job is using py35 16:18:47 #action njohnston will send a patch to remove fullstack py27 job completly 16:19:16 ok, and the last one is: 16:19:17 * slaweq will continue debugging multinode-dvr-grenade issue 16:19:39 I was working on this last week but I didn't found anything 16:19:59 I found that this issue happens very often on master branch and also on stable/pike 16:20:13 but I didn't found it even once on queens or rocky branches 16:20:36 I suspected that it may be some package which was upgraded recently or something like that 16:20:37 so the "middle" branches are not affected? wierd 16:21:08 but all such packages like ovs, libvirt, qemu and so on are in completly different versions in stable/pike and master branch 16:21:13 so I don't think it's that 16:22:06 I was using query like: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22'%5BFail%5D%20Couldn'%5C%5C''t%20ping%20server'%5C%22 16:22:25 yeah a master branch change that only got backported to versions used in pike sounds strange 16:23:05 from last week (now in logstash): 91 failures on master, 33 on stable/pike 16:23:08 * mlavalle found the logstash query: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=build_status:%5C%22FAILURE%5C%22%20AND%20project:%5C%22openstack%2Fneutron%5C%22%20AND%20message:%5C%22test_attach_volume_shelved_or_offload_server%5C%22%20AND%20tags:%5C%22console%5C%22&from=7d 16:25:25 mlavalle: so it looks that it happens still from time to time 16:26:17 slaweq: yeah but my obeservation last week is that, when it shows up, many other tests also fail 16:26:31 so that makes me suspicious of the the changes 16:26:49 and that's the case for the two ocurrences at the top 16:26:50 ok, it can be that it fails together with other tests also 16:26:54 in today's result 16:27:13 so I'll dig today on this 16:27:19 ok 16:27:51 mlavalle: which changes are suspicious? 16:28:25 oh, maybe package changes 16:28:27 one example is https://review.openstack.org/#/c/601336 16:28:46 it's at the top of the search today 16:29:37 oh, that's just WIP though 16:30:07 yeah, that's why I say, I discount those 16:30:21 as not valid for this bug 16:30:31 right 16:30:36 but I will try to find a valid failure and investigate 16:30:47 +1 16:31:01 thx mlavalle for working on this 16:31:32 my point is that there are less failures than the kibana results show 16:32:13 I mean real failures 16:32:19 that is possible :) 16:32:32 * slaweq hopes that it's not another real issue 16:32:40 it could be we can add more debug commands to one of slaweq's patches too, because when it failed it was "happy" once we logged in to look around - running some more things from the console perhaps? we just don't know where to start 16:33:14 like looking at routes and arp table and flows... 16:33:24 haleyb: are You talking about grenade issue now? 16:33:52 did i miss a topic change ? 16:34:00 yes 16:34:04 but that's ok 16:34:06 I think so :) 16:34:09 LOL 16:34:16 I'm done with the other one 16:34:27 so Your questions about "which change" were not related to what mlavalle was talking about? 16:34:34 doh, last one i saw was " slaweq will continue debugging multinode-dvr-grenade issue" 16:34:50 it was my fault 16:35:00 I interjected the discussion with my query 16:35:11 so blame it all on me 16:35:23 it's always the dumb PTL anyways 16:35:34 :) 16:35:42 :) 16:35:51 ok, so lets go back to grenade job now 16:36:13 yes, when I was checking that after logging to node it was all fine 16:36:38 and still there is one important thing - all smoke tests are passing first 16:36:50 and then there is this instance created and it fails 16:37:14 btw. as it happens also on pike - we can assume that it's not openvswitch firewall fault :) 16:38:05 haleyb: if You want to execute some more commands for debug this, I did 2 small patches: 16:38:09 https://review.openstack.org/#/c/602156/ 16:38:14 and 16:38:15 https://review.openstack.org/#/c/602204/ 16:38:22 right, that's a good thing. it still could be ovs, or some config on the job side of things 16:38:38 feel free to update it with any new debug informations You want there :) 16:38:41 i'll take a look and probably add a few things 16:39:31 for now there is "sleep 3h" in https://review.openstack.org/#/c/602204/10/projects/60_nova/resources.sh added 16:39:46 and in https://review.openstack.org/#/c/602156/ there is my ssh key added 16:40:10 so if it will fail on https://review.openstack.org/#/c/602156/ I can ssh to master node easy without asking infra team about that :) 16:40:38 if You want to debug job with some other things, You maybe will have to remove that sleep :) 16:40:51 :) 16:41:35 * slaweq will buy a beer for someone who will solve this issue :) 16:41:51 * haleyb marks it critical :) 16:41:57 LOL 16:42:05 :D 16:42:40 thx haleyb for help with this one 16:42:52 I think we can move on to next topic then 16:43:07 #topic Grafana 16:43:14 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:43:56 gate functional looks unhappy 16:45:13 yes 16:45:15 a bit 16:45:54 not sure, we don't know how many runs in the graph 16:46:00 at least I don't 16:46:14 but worth keeping an eye on, definitely 16:46:30 I can't find specific examples now 16:46:49 but I'm almost sure that it's again this issue with db migration tests which hits us from time to time 16:48:32 ok, as mlavalle said, lets keep an eye on it for now 16:49:04 slaweq: do you mean this bug https://bugs.launchpad.net/neutron/+bug/1687027? 16:49:04 Launchpad bug 1687027 in neutron "test_walk_versions tests fail with "IndexError: tuple index out of range" after timeout" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:49:26 mlavalle: yes 16:49:55 slaweq: we have a query there. I'll run it today 16:50:05 ok 16:50:07 and report in the bug my findings 16:50:33 let's see if we can correlate the spike in functional with the bug 16:50:55 ok 16:53:04 from other things there are only known issues like grenade dvr job failures and this migration from HA test failures in multinode dvr scenario job 16:53:24 so basically I think we will be good if we will fix those 2 issues :) 16:53:53 the HA failures is what manjeets is working on, right? 16:54:14 right 16:54:28 ack 16:55:01 ok, so that's all from me for today 16:55:10 do You want to talk about anything else? 16:55:15 #topic Open discussion 16:55:19 I pushed a change to delete the python2 fullstack job https://review.openstack.org/605126 but I was wondering if I needed to update the neutron-fullstack-with-uwsgi job that is nonvoting experimental 16:55:41 I also pushed an update to grafana for fullstack: https://review.openstack.org/605128 16:55:49 hmm, we already merged support for uwsgi, right? 16:56:01 yes, last cycle 16:56:22 so maybe we should do this job at least non voting but in check queue? 16:56:32 and switch to py36 also 16:56:34 is uwsgi enabled by default now, or is it just a supported config? 16:56:42 what You think? 16:58:01 note that there are also neutron-functional-with-uwsgi and neutron-tempest-with-uwsgi jobs also in experimental 16:58:35 * njohnston has no opinion on uwsgi jobs 16:58:38 IMO we should promote it to check queue as nonvoting for now 16:59:12 we can do that 16:59:12 njohnston: NEUTRON_DEPLOY_MOD_WSGI: True in zuul.yaml - so i'm assuming it's not default, that's tweaking some other flag 16:59:27 ok, we are out of time now 16:59:36 thx for attending the meeting 16:59:39 o/ 16:59:40 mlavalle if you agree then I'll do that and also convert them to zuulv3 syntax and add them to grafana 16:59:40 and see You next week 16:59:50 njohnston++ 16:59:50 njohnston: ok 16:59:54 #endmeeting