16:00:30 <slaweq> #startmeeting neutron_ci 16:00:33 <openstack> Meeting started Tue Apr 3 16:00:30 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:36 <slaweq> hi 16:00:37 <openstack> The meeting name has been set to 'neutron_ci' 16:00:48 <mlavalle> o/ 16:01:14 <haleyb> hi 16:01:35 * mlavalle will have to dropout 15 minutes before the top of the hour 16:01:57 <slaweq> please give me 5 minutes because I'm stil getting back home (big traffic) 16:02:12 <mlavalle> slaweq: no problem 16:02:21 <slaweq> and I'm on mobile connection now 16:07:13 <slaweq> ok, we can start now 16:07:23 <slaweq> sorry for being late 16:08:01 <slaweq> but I just get back home from Easter - 500km which I usually do in about 5 hours I did in almost 9 because of traffics 16:08:10 <slaweq> ok, are You there? :) 16:08:31 <slaweq> mlavalle, haleyb? 16:08:41 <mlavalle> slaweq: hey 16:08:55 <slaweq> I think that ihrachys and jlibosva are not here now 16:09:16 <mlavalle> they haven't spoken up 16:09:53 <haleyb> i know kuba is at a meetup 16:10:04 <slaweq> ok, so I think we can start 16:10:10 <slaweq> #topic Actions from previous meetings 16:10:21 <slaweq> slaweq will write docs how to debug test jobs 16:10:40 <slaweq> I just pushed first version of patch: https://review.openstack.org/#/c/558537/ 16:10:56 <slaweq> clarkb reviewed it for me so I will address his comments 16:11:04 <slaweq> but please check it also :) 16:11:30 <slaweq> next one is: haleyb to check router migrations issue 16:11:38 <slaweq> haleyb: any updates? 16:12:15 <haleyb> slaweq: i am still testing it, am on the systems now, so no update yet 16:12:39 <slaweq> ok, so I will do it as action for this week for You 16:12:41 <slaweq> ok? 16:12:48 <haleyb> sure 16:12:58 <slaweq> #action haleyb to check router migrations issue 16:13:14 <slaweq> next one was: ihrachys to take a look at problem with openstack-tox-py35-with-oslo-master periodic job 16:13:35 <slaweq> AFAIK it is fixed with https://review.openstack.org/#/c/557003/ 16:13:57 <slaweq> so I think all is fine here now 16:14:14 <slaweq> so, next: slaweq to make fullstack job gating 16:14:26 <slaweq> done: https://review.openstack.org/#/c/557218/ 16:14:39 <slaweq> and also grafana dashoboard: https://review.openstack.org/#/c/557266/ 16:15:06 <slaweq> it didn’t appear there yet but as I asked infra team, all looks fine there and probably it should appear when fullstack job will fail at least once - we will see 16:16:02 <slaweq> About details how it works we can talk later, so no moving on to next action: 16:16:06 <slaweq> slaweq will check difference between neutron-tempest-multinode-full and neutron-tempest-dvr-ha-multinode-full 16:16:19 <slaweq> I didn't have time to do it last week 16:16:29 <mlavalle> what? 16:16:29 <slaweq> I will do it this week for sure 16:16:39 <slaweq> mlavalle: what what? 16:16:42 <slaweq> :) 16:16:45 <mlavalle> LOL 16:16:54 <mlavalle> I know youv'e been on vacation 16:17:09 <slaweq> yes, I was 16:17:27 <slaweq> that's why I didn't have time to do this compare of jobs :) 16:18:30 <slaweq> #action slaweq will check difference between neutron-tempest-multinode-full and neutron-tempest-dvr-ha-multinode-full 16:18:40 <slaweq> do You want to add anything/ask about something? 16:18:51 <slaweq> or can we go to the next topic? 16:19:13 <mlavalle> I was kidding with you 16:19:27 <mlavalle> about you not having time on your vacation to do it 16:19:28 <slaweq> mlavalle: I suppose that :) 16:20:39 <slaweq> ok, moving on? 16:22:09 <slaweq> I assume that we can go to next topic 16:22:17 <slaweq> #topic Grafana 16:22:22 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:23:30 <slaweq> I was checking those graphs from last 7 days today 16:23:56 <slaweq> There was one big spike on last Thursday but I think that it was some problem with infra because all jobs have same spike there. 16:24:28 <slaweq> Except that I think that it was pretty quite week. 16:24:56 <slaweq> *quiet 16:24:57 <mlavalle> I see fullstack trending up 16:25:16 <mlavalle> and also Rally 16:25:57 <slaweq> now yes but it's still not big failure rate 16:26:21 <slaweq> about rally I have few examples of failures and I want to talk about them in few minutes 16:26:46 <mlavalle> ok 16:27:31 <slaweq> about fullstack it could be because of me and my DNM patch: https://review.openstack.org/#/c/558259/ 16:27:43 <slaweq> which fails on fullstack few times today :) 16:27:57 <slaweq> except that I don't think there is any problem with it 16:28:19 <slaweq> so we can change topic to fullstack now if we started about it :) 16:28:23 <slaweq> #topic Fullstack 16:28:24 <mlavalle> that.s the example for the doc revision you proposed, right? 16:28:32 <slaweq> mlavalle: right 16:29:05 <slaweq> and this example was failing because of timeout reached 16:29:52 <slaweq> as I said, fullstack is IMO stable in both queues now (at least I didn't saw any problems with it during last week) 16:30:09 <slaweq> so I checked also bugs with "fullstack" tag in launchpad 16:30:20 <slaweq> There is one bug with „fullstack” tag now (except wishlist): https://bugs.launchpad.net/neutron/+bug/1744402 16:30:22 <openstack> Launchpad bug 1744402 in neutron "fullstack security groups test fails because ncat process don't starts" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:30:23 <mlavalle> ok, cool 16:30:46 <slaweq> this bug is assigned to me - I'm aware of it and I want to check logs if it will happen again 16:31:14 <slaweq> as I added some small change to test few days aga: https://review.openstack.org/#/c/556155/ 16:31:28 <slaweq> but I didn't saw it since this patch was merged 16:31:42 <slaweq> I will keep an eye on it still :) 16:32:10 <slaweq> do You want to add something about fullstack? 16:33:04 <mlavalle> no, thanks 16:33:09 <slaweq> ok 16:33:14 <slaweq> #topic Scenarios 16:33:32 <slaweq> only problem which we have is neutron-tempest-plugin-dvr-multinode-scenario still on 100% of failures 16:34:01 * mlavalle just reviewed https://review.openstack.org/#/c/558537. Since manjeets highlighted the entire text, to sse where my comments apply, plese move cursor over comments 16:34:13 <slaweq> but that is because of problems with migration 16:34:31 <slaweq> so haleyb is on that 16:34:39 <slaweq> mlavalle: ok, thx for review 16:35:27 <manjeets> mlavalle, sorry for making it little hard to hard review I should have done file comment 16:35:53 <slaweq> I have question about neutron-tempest-plugin-scenario-linuxbridge - should we maybe try to add it to gate queue also (like fullstack) now? 16:36:11 <slaweq> wht do You think about that? 16:36:32 <mlavalle> how long has it been running in check? 16:37:05 <mlavalle> I mean voting in the check queue? 16:37:18 <slaweq> It's voting since 14.03: https://review.openstack.org/#/c/552689/ 16:37:37 <mlavalle> mmhhh let's hold for a week 16:37:42 <slaweq> sure 16:38:06 <mlavalle> expecially, since last week tended to be quiet, mostly towards the end 16:38:18 <slaweq> yes, right 16:38:32 <slaweq> let's wait and see if it will be fine still 16:38:35 <slaweq> :) 16:39:07 <slaweq> ok, next topic 16:39:10 <slaweq> #topic Rally 16:39:22 <slaweq> as mlavalle shows it has some failures recently 16:39:44 <slaweq> so I checked today and found 3 examples of failures from last week 16:39:58 <slaweq> all of them were because of reaching global job timeout: 16:40:05 <slaweq> http://logs.openstack.org/18/558318/1/check/neutron-rally-neutron/fdee864/job-output.txt.gz 16:40:13 <slaweq> http://logs.openstack.org/84/556584/4/check/neutron-rally-neutron/8a4dc9d/job-output.txt.gz 16:40:17 <mlavalle> ok, I saw the same with one of my patches 16:40:20 <slaweq> http://logs.openstack.org/81/552881/8/check/neutron-rally-neutron/fb6fb63/job-output.txt.gz 16:40:46 <slaweq> in one of those I think it was even stopped after all tests passed 16:41:22 <slaweq> I think that we should check what takes most time in those jobs and maybe try to speed it up a little bit 16:41:39 <slaweq> is there anyone who wants to check that? :) 16:42:02 <mlavalle> This is the one I saw http://logs.openstack.org/84/556584/4/check/neutron-rally-neutron/8a4dc9d/job-output.txt.gz#_2018-04-03_00_22_53_422472 16:42:56 <mlavalle> I don't know if I will have the bandwidth this week, but if nobody will take a look, I will try 16:43:18 <slaweq> ok, thx 16:43:23 <mlavalle> hopefully I won't be shamed by slaweq if I don't have time next week 16:43:32 <slaweq> for sure not :) 16:43:49 <mlavalle> well, what if you are wearing your Hulk mask? 16:43:57 <slaweq> #action mlavalle to take a look why rally jobs are taking so long time 16:44:10 <slaweq> mlavalle: today I don't have it 16:44:20 <slaweq> but next week - we will see :P 16:45:11 <slaweq> ok, let's move on 16:45:14 <slaweq> #topic Periodic 16:45:36 <slaweq> openstack-tox-py35-with-oslo-master looks like is fine again - thx ihrachys :) 16:45:58 <slaweq> neutron-tempest-postgres-full sometimes has 100% of failures 16:46:14 <slaweq> I checked logs from last two failures and what I found is: 16:46:21 <slaweq> once it wasn’t real failure but timeout reached (after all tests passe): http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/neutron-tempest-postgres-full/03ca3f3/job-output.txt.gz 16:46:32 <slaweq> Another time it was failure not related to neutron: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/neutron-tempest-postgres-full/d5c0933/job-output.txt.gz#_2018-03-30_07_15_04_817879 16:47:57 <slaweq> IMO if such timeouts will happen more often we should try to check it - for now it was only once so it isn't biggest problem for now 16:48:02 <slaweq> what You think about it? 16:48:14 <mlavalle> agree 16:48:23 <mlavalle> let's keep an eye on it 16:48:29 <slaweq> yes 16:49:11 <slaweq> ok, so moving on to last topic 16:49:16 <slaweq> #topic others 16:49:25 <slaweq> I have one more thing to ask 16:49:37 <slaweq> recently we added new job openstack-tox-lower-constraints 16:49:48 <slaweq> to our queue 16:49:54 <slaweq> *queues 16:50:03 <slaweq> do we want to add it to our grafana dashboard? 16:50:05 <mlavalle> was request from infra 16:50:13 <mlavalle> IIRC 16:50:26 <slaweq> yes, I know - I just wanted to ask if we should add it to grafana :) 16:50:41 <slaweq> to have better visibility whats going on with it 16:50:49 <mlavalle> good point 16:50:52 <mlavalle> yes 16:50:57 <slaweq> ok, so I will do it 16:51:15 <slaweq> #action slaweq will add openstack-tox-lower-constraints to grafana dashboard 16:51:26 <haleyb> +1 16:51:26 <slaweq> and last thing 16:51:56 <slaweq> I checked also bugs with tag "gate-failure" on launchpad: https://tinyurl.com/y826rccx 16:52:27 <slaweq> there is quite many such bugs with "high" priority older than few months and not assigned to anybody 16:52:44 <slaweq> maybe we should check them and close those which are not a problem anymore 16:52:48 <slaweq> what do You think? 16:52:55 <mlavalle> yes, let's do it 16:53:13 <slaweq> do You want to go through them now? 16:53:25 <slaweq> or should I do it later maybe? 16:53:27 <mlavalle> I have to drop out now 16:53:42 <mlavalle> as I said at the beginning of the meeting 16:53:49 <slaweq> ok 16:53:55 <mlavalle> but let's try to do them over the week 16:54:03 <mlavalle> do you want to partner on that? 16:54:25 <slaweq> so I will try to check them this week and I will ask if I will need something 16:54:31 <slaweq> thx a lot :) 16:54:32 <mlavalle> ok 16:54:41 <slaweq> so that's all from my side 16:54:53 * mlavalle dropping out 16:54:56 <slaweq> sorry for being so quick but I was preparing it today in car :) 16:54:59 <mlavalle> o/ 16:55:05 <slaweq> bye mlavalle 16:55:34 <slaweq> haleyb: do You have anything else? or can we finish few minutes before time? 16:55:54 <haleyb> i am done, lunch here 16:56:06 <slaweq> #action slaweq will check old gate-failure bugs 16:56:33 <slaweq> ok, so bon appetit haleyb :) 16:56:37 <slaweq> and see You 16:56:44 <slaweq> #endmeeting