16:00:14 <slaweq> #startmeeting neutron_ci 16:00:15 <openstack> Meeting started Tue Sep 25 16:00:14 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:19 <slaweq> welcome again :) 16:00:20 <openstack> The meeting name has been set to 'neutron_ci' 16:00:52 <haleyb> hi 16:01:24 <mlavalle> o/ 16:01:34 <mlavalle> made it 16:01:44 <slaweq> :) 16:01:48 <bcafarel> now I remember why I miss most ci meetings, usually have to leave shortly after they start 16:01:53 <bcafarel> still hi again 16:02:01 <slaweq> bcafarel: LOL 16:02:14 <slaweq> ok, lets go then 16:02:16 <slaweq> #topic Actions from previous meetings 16:02:26 <slaweq> * manjeets continue debugging why migration from HA routers fails 100% of times 16:02:33 <slaweq> manjeets: any progress on this one? 16:02:44 <slaweq> I proposed today to mark those tests as unstable for now: https://review.openstack.org/605057 16:03:37 <slaweq> I think manjeets is not available now 16:04:08 <slaweq> mlavalle: please just take a look at this mine patch - I think it would be good to make this job passing at least sometimes :) 16:04:43 <njohnston> o/ 16:04:48 <mlavalle> done 16:04:48 <njohnston> o/ 16:04:54 <slaweq> thx mlavalle 16:04:56 <slaweq> hi njohnston 16:05:03 <slaweq> ok, next one 16:05:04 <njohnston> sorry about the repeat there 16:05:05 <slaweq> * mlavale to check issue with failing test_attach_volume_shelved_or_offload_server test 16:05:22 <slaweq> no problem njohnston :) 16:05:35 <slaweq> no problem njohnston :) 16:05:36 <slaweq> LOL 16:05:40 <njohnston> :-) 16:06:11 <slaweq> ok, mlavalle any update about this shelved unshelved server test fail? 16:06:17 <mlavalle> slaweq: hang on 16:06:38 <slaweq> k 16:08:39 <mlavalle> slaweq: I don't find it. I think I left some notes recentky there 16:09:02 <slaweq> You don't find any issues like that recently, right? 16:10:06 <mlavalle> yes 16:10:15 <mlavalle> but do you have a pointer to the bug? 16:11:05 <slaweq> I don't have 16:11:10 <slaweq> but let me find it 16:13:01 <slaweq> I can't find it 16:13:09 <slaweq> was it reported as a bug? 16:13:15 <slaweq> maybe we forgot about that? 16:14:07 <mlavalle> yeah, that maybe the problem 16:14:39 <mlavalle> in any case, I spent some time last week searching kibana for it 16:14:44 <mlavalle> and didn't find instances 16:14:54 <slaweq> so maybe we will be good with it :) 16:14:57 <mlavalle> I'll dig the query and get back to you 16:15:04 <slaweq> ok, thx 16:15:07 <mlavalle> I sent myself an email 16:15:18 <mlavalle> with the query that I need to dig 16:15:53 <slaweq> #action mlavalle will work on logstash query to find if issues with test_attach_volume_shelved_or_offload_server still happens 16:16:05 <slaweq> ok, next one then 16:16:07 <slaweq> * njohnston will continue work on switch fullstack-python35 to python36 job 16:16:35 <njohnston> So it is voting now but as bcafarel pointed out it is still using py35 16:16:39 <njohnston> I am looking in to it now 16:16:47 <slaweq> ok 16:16:57 <slaweq> what about removing old fullstack with py27? 16:17:05 <slaweq> I think we are ready for that now 16:17:18 <njohnston> agreed, I'll push a change for that 16:17:28 <slaweq> mlavalle: haleyb: are You ok with it? 16:17:51 <haleyb> yes, i'm fine with it 16:17:58 <mlavalle> me too 16:18:02 <slaweq> great 16:18:08 <slaweq> ok, thx njohnston for working on this 16:18:26 <slaweq> #action njohnston will debug why fullstack-py36 job is using py35 16:18:47 <slaweq> #action njohnston will send a patch to remove fullstack py27 job completly 16:19:16 <slaweq> ok, and the last one is: 16:19:17 <slaweq> * slaweq will continue debugging multinode-dvr-grenade issue 16:19:39 <slaweq> I was working on this last week but I didn't found anything 16:19:59 <slaweq> I found that this issue happens very often on master branch and also on stable/pike 16:20:13 <slaweq> but I didn't found it even once on queens or rocky branches 16:20:36 <slaweq> I suspected that it may be some package which was upgraded recently or something like that 16:20:37 <bcafarel> so the "middle" branches are not affected? wierd 16:21:08 <slaweq> but all such packages like ovs, libvirt, qemu and so on are in completly different versions in stable/pike and master branch 16:21:13 <slaweq> so I don't think it's that 16:22:06 <slaweq> I was using query like: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22'%5BFail%5D%20Couldn'%5C%5C''t%20ping%20server'%5C%22 16:22:25 <bcafarel> yeah a master branch change that only got backported to versions used in pike sounds strange 16:23:05 <slaweq> from last week (now in logstash): 91 failures on master, 33 on stable/pike 16:23:08 * mlavalle found the logstash query: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=build_status:%5C%22FAILURE%5C%22%20AND%20project:%5C%22openstack%2Fneutron%5C%22%20AND%20message:%5C%22test_attach_volume_shelved_or_offload_server%5C%22%20AND%20tags:%5C%22console%5C%22&from=7d 16:25:25 <slaweq> mlavalle: so it looks that it happens still from time to time 16:26:17 <mlavalle> slaweq: yeah but my obeservation last week is that, when it shows up, many other tests also fail 16:26:31 <mlavalle> so that makes me suspicious of the the changes 16:26:49 <mlavalle> and that's the case for the two ocurrences at the top 16:26:50 <slaweq> ok, it can be that it fails together with other tests also 16:26:54 <mlavalle> in today's result 16:27:13 <mlavalle> so I'll dig today on this 16:27:19 <slaweq> ok 16:27:51 <haleyb> mlavalle: which changes are suspicious? 16:28:25 <haleyb> oh, maybe package changes 16:28:27 <mlavalle> one example is https://review.openstack.org/#/c/601336 16:28:46 <mlavalle> it's at the top of the search today 16:29:37 <haleyb> oh, that's just WIP though 16:30:07 <mlavalle> yeah, that's why I say, I discount those 16:30:21 <mlavalle> as not valid for this bug 16:30:31 <haleyb> right 16:30:36 <mlavalle> but I will try to find a valid failure and investigate 16:30:47 <haleyb> +1 16:31:01 <slaweq> thx mlavalle for working on this 16:31:32 <mlavalle> my point is that there are less failures than the kibana results show 16:32:13 <mlavalle> I mean real failures 16:32:19 <slaweq> that is possible :) 16:32:32 * slaweq hopes that it's not another real issue 16:32:40 <haleyb> it could be we can add more debug commands to one of slaweq's patches too, because when it failed it was "happy" once we logged in to look around - running some more things from the console perhaps? we just don't know where to start 16:33:14 <haleyb> like looking at routes and arp table and flows... 16:33:24 <slaweq> haleyb: are You talking about grenade issue now? 16:33:52 <haleyb> did i miss a topic change ? 16:34:00 <mlavalle> yes 16:34:04 <mlavalle> but that's ok 16:34:06 <slaweq> I think so :) 16:34:09 <slaweq> LOL 16:34:16 <mlavalle> I'm done with the other one 16:34:27 <slaweq> so Your questions about "which change" were not related to what mlavalle was talking about? 16:34:34 <haleyb> doh, last one i saw was " slaweq will continue debugging multinode-dvr-grenade issue" 16:34:50 <mlavalle> it was my fault 16:35:00 <mlavalle> I interjected the discussion with my query 16:35:11 <mlavalle> so blame it all on me 16:35:23 <mlavalle> it's always the dumb PTL anyways 16:35:34 <haleyb> :) 16:35:42 <slaweq> :) 16:35:51 <slaweq> ok, so lets go back to grenade job now 16:36:13 <slaweq> yes, when I was checking that after logging to node it was all fine 16:36:38 <slaweq> and still there is one important thing - all smoke tests are passing first 16:36:50 <slaweq> and then there is this instance created and it fails 16:37:14 <slaweq> btw. as it happens also on pike - we can assume that it's not openvswitch firewall fault :) 16:38:05 <slaweq> haleyb: if You want to execute some more commands for debug this, I did 2 small patches: 16:38:09 <slaweq> https://review.openstack.org/#/c/602156/ 16:38:14 <slaweq> and 16:38:15 <slaweq> https://review.openstack.org/#/c/602204/ 16:38:22 <haleyb> right, that's a good thing. it still could be ovs, or some config on the job side of things 16:38:38 <slaweq> feel free to update it with any new debug informations You want there :) 16:38:41 <haleyb> i'll take a look and probably add a few things 16:39:31 <slaweq> for now there is "sleep 3h" in https://review.openstack.org/#/c/602204/10/projects/60_nova/resources.sh added 16:39:46 <slaweq> and in https://review.openstack.org/#/c/602156/ there is my ssh key added 16:40:10 <slaweq> so if it will fail on https://review.openstack.org/#/c/602156/ I can ssh to master node easy without asking infra team about that :) 16:40:38 <slaweq> if You want to debug job with some other things, You maybe will have to remove that sleep :) 16:40:51 <haleyb> :) 16:41:35 * slaweq will buy a beer for someone who will solve this issue :) 16:41:51 * haleyb marks it critical :) 16:41:57 <mlavalle> LOL 16:42:05 <slaweq> :D 16:42:40 <slaweq> thx haleyb for help with this one 16:42:52 <slaweq> I think we can move on to next topic then 16:43:07 <slaweq> #topic Grafana 16:43:14 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:43:56 <njohnston> gate functional looks unhappy 16:45:13 <slaweq> yes 16:45:15 <slaweq> a bit 16:45:54 <mlavalle> not sure, we don't know how many runs in the graph 16:46:00 <mlavalle> at least I don't 16:46:14 <mlavalle> but worth keeping an eye on, definitely 16:46:30 <slaweq> I can't find specific examples now 16:46:49 <slaweq> but I'm almost sure that it's again this issue with db migration tests which hits us from time to time 16:48:32 <slaweq> ok, as mlavalle said, lets keep an eye on it for now 16:49:04 <mlavalle> slaweq: do you mean this bug https://bugs.launchpad.net/neutron/+bug/1687027? 16:49:04 <openstack> Launchpad bug 1687027 in neutron "test_walk_versions tests fail with "IndexError: tuple index out of range" after timeout" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:49:26 <slaweq> mlavalle: yes 16:49:55 <mlavalle> slaweq: we have a query there. I'll run it today 16:50:05 <slaweq> ok 16:50:07 <mlavalle> and report in the bug my findings 16:50:33 <mlavalle> let's see if we can correlate the spike in functional with the bug 16:50:55 <slaweq> ok 16:53:04 <slaweq> from other things there are only known issues like grenade dvr job failures and this migration from HA test failures in multinode dvr scenario job 16:53:24 <slaweq> so basically I think we will be good if we will fix those 2 issues :) 16:53:53 <mlavalle> the HA failures is what manjeets is working on, right? 16:54:14 <slaweq> right 16:54:28 <mlavalle> ack 16:55:01 <slaweq> ok, so that's all from me for today 16:55:10 <slaweq> do You want to talk about anything else? 16:55:15 <slaweq> #topic Open discussion 16:55:19 <njohnston> I pushed a change to delete the python2 fullstack job https://review.openstack.org/605126 but I was wondering if I needed to update the neutron-fullstack-with-uwsgi job that is nonvoting experimental 16:55:41 <njohnston> I also pushed an update to grafana for fullstack: https://review.openstack.org/605128 16:55:49 <slaweq> hmm, we already merged support for uwsgi, right? 16:56:01 <mlavalle> yes, last cycle 16:56:22 <slaweq> so maybe we should do this job at least non voting but in check queue? 16:56:32 <slaweq> and switch to py36 also 16:56:34 <njohnston> is uwsgi enabled by default now, or is it just a supported config? 16:56:42 <slaweq> what You think? 16:58:01 <njohnston> note that there are also neutron-functional-with-uwsgi and neutron-tempest-with-uwsgi jobs also in experimental 16:58:35 * njohnston has no opinion on uwsgi jobs 16:58:38 <slaweq> IMO we should promote it to check queue as nonvoting for now 16:59:12 <mlavalle> we can do that 16:59:12 <haleyb> njohnston: NEUTRON_DEPLOY_MOD_WSGI: True in zuul.yaml - so i'm assuming it's not default, that's tweaking some other flag 16:59:27 <slaweq> ok, we are out of time now 16:59:36 <slaweq> thx for attending the meeting 16:59:39 <mlavalle> o/ 16:59:40 <njohnston> mlavalle if you agree then I'll do that and also convert them to zuulv3 syntax and add them to grafana 16:59:40 <slaweq> and see You next week 16:59:50 <slaweq> njohnston++ 16:59:50 <mlavalle> njohnston: ok 16:59:54 <slaweq> #endmeeting