#openstack-meeting log

16:00:30 <slaweq> #startmeeting neutron_ci
16:00:33 <openstack> Meeting started Tue Apr  3 16:00:30 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:36 <slaweq> hi
16:00:37 <openstack> The meeting name has been set to 'neutron_ci'
16:00:48 <mlavalle> o/
16:01:14 <haleyb> hi
16:01:35 * mlavalle will have to dropout 15 minutes before the top of the hour
16:01:57 <slaweq> please give me 5 minutes because I'm stil getting back home (big traffic)
16:02:12 <mlavalle> slaweq: no problem
16:02:21 <slaweq> and I'm on mobile connection now
16:07:13 <slaweq> ok, we can start now
16:07:23 <slaweq> sorry for being late
16:08:01 <slaweq> but I just get back home from Easter - 500km which I usually do in about 5 hours I did in almost 9 because of traffics
16:08:10 <slaweq> ok, are You there? :)
16:08:31 <slaweq> mlavalle, haleyb?
16:08:41 <mlavalle> slaweq: hey
16:08:55 <slaweq> I think that ihrachys and jlibosva are not here now
16:09:16 <mlavalle> they haven't spoken up
16:09:53 <haleyb> i know kuba is at a meetup
16:10:04 <slaweq> ok, so I think we can start
16:10:10 <slaweq> #topic Actions from previous meetings
16:10:21 <slaweq> slaweq will write docs how to debug test jobs
16:10:40 <slaweq> I just pushed first version of patch: https://review.openstack.org/#/c/558537/
16:10:56 <slaweq> clarkb reviewed it for me so I will address his comments
16:11:04 <slaweq> but please check it also :)
16:11:30 <slaweq> next one is: haleyb to check router migrations issue
16:11:38 <slaweq> haleyb: any updates?
16:12:15 <haleyb> slaweq: i am still testing it, am on the systems now, so no update yet
16:12:39 <slaweq> ok, so I will do it as action for this week for You
16:12:41 <slaweq> ok?
16:12:48 <haleyb> sure
16:12:58 <slaweq> #action haleyb to check router migrations issue
16:13:14 <slaweq> next one was: ihrachys to take a look at problem with openstack-tox-py35-with-oslo-master periodic job
16:13:35 <slaweq> AFAIK it is fixed with https://review.openstack.org/#/c/557003/
16:13:57 <slaweq> so I think all is fine here now
16:14:14 <slaweq> so, next: slaweq to make fullstack job gating
16:14:26 <slaweq> done: https://review.openstack.org/#/c/557218/
16:14:39 <slaweq> and also grafana dashoboard: https://review.openstack.org/#/c/557266/
16:15:06 <slaweq> it didn’t appear there yet but as I asked infra team, all looks fine there and probably it should appear when fullstack job will fail at least once - we will see
16:16:02 <slaweq> About details how it works we can talk later, so no moving on to next action:
16:16:06 <slaweq> slaweq will check difference between neutron-tempest-multinode-full and neutron-tempest-dvr-ha-multinode-full
16:16:19 <slaweq> I didn't have time to do it last week
16:16:29 <mlavalle> what?
16:16:29 <slaweq> I will do it this week for sure
16:16:39 <slaweq> mlavalle: what what?
16:16:42 <slaweq> :)
16:16:45 <mlavalle> LOL
16:16:54 <mlavalle> I know youv'e been on vacation
16:17:09 <slaweq> yes, I was
16:17:27 <slaweq> that's why I didn't have time to do this compare of jobs :)
16:18:30 <slaweq> #action slaweq will check difference between neutron-tempest-multinode-full and neutron-tempest-dvr-ha-multinode-full
16:18:40 <slaweq> do You want to add anything/ask about something?
16:18:51 <slaweq> or can we go to the next topic?
16:19:13 <mlavalle> I was kidding with you
16:19:27 <mlavalle> about you not having time on your vacation to do it
16:19:28 <slaweq> mlavalle: I suppose that :)
16:20:39 <slaweq> ok, moving on?
16:22:09 <slaweq> I assume that we can go to next topic
16:22:17 <slaweq> #topic Grafana
16:22:22 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:23:30 <slaweq> I was checking those graphs from last 7 days today
16:23:56 <slaweq> There was one big spike on last Thursday but I think that it was some problem with infra because all jobs have same spike there.
16:24:28 <slaweq> Except that I think that it was pretty quite week.
16:24:56 <slaweq> *quiet
16:24:57 <mlavalle> I see fullstack trending up
16:25:16 <mlavalle> and also Rally
16:25:57 <slaweq> now yes but it's still not big failure rate
16:26:21 <slaweq> about rally I have few examples of failures and I want to talk about them in few minutes
16:26:46 <mlavalle> ok
16:27:31 <slaweq> about fullstack it could be because of me and my DNM patch: https://review.openstack.org/#/c/558259/
16:27:43 <slaweq> which fails on fullstack few times today :)
16:27:57 <slaweq> except that I don't think there is any problem with it
16:28:19 <slaweq> so we can change topic to fullstack now if we started about it :)
16:28:23 <slaweq> #topic Fullstack
16:28:24 <mlavalle> that.s the example for the doc revision you proposed, right?
16:28:32 <slaweq> mlavalle: right
16:29:05 <slaweq> and this example was failing because of timeout reached
16:29:52 <slaweq> as I said, fullstack is IMO stable in both queues now (at least I didn't saw any problems with it during last week)
16:30:09 <slaweq> so I checked also bugs with "fullstack" tag in launchpad
16:30:20 <slaweq> There is one bug with „fullstack” tag now (except wishlist):  https://bugs.launchpad.net/neutron/+bug/1744402
16:30:22 <openstack> Launchpad bug 1744402 in neutron "fullstack security groups test fails because ncat process don't starts" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:30:23 <mlavalle> ok, cool
16:30:46 <slaweq> this bug is assigned to me - I'm aware of it and I want to check logs if it will happen again
16:31:14 <slaweq> as I added some small change to test few days aga: https://review.openstack.org/#/c/556155/
16:31:28 <slaweq> but I didn't saw it since this patch was merged
16:31:42 <slaweq> I will keep an eye on it still :)
16:32:10 <slaweq> do You want to add something about fullstack?
16:33:04 <mlavalle> no, thanks
16:33:09 <slaweq> ok
16:33:14 <slaweq> #topic Scenarios
16:33:32 <slaweq> only problem which we have is neutron-tempest-plugin-dvr-multinode-scenario still on 100% of failures
16:34:01 * mlavalle just reviewed https://review.openstack.org/#/c/558537. Since manjeets highlighted the entire text, to sse where my comments apply, plese move cursor over comments
16:34:13 <slaweq> but that is because of problems with migration
16:34:31 <slaweq> so haleyb is on that
16:34:39 <slaweq> mlavalle: ok, thx for review
16:35:27 <manjeets> mlavalle, sorry for making it little hard to hard review I should have done file comment
16:35:53 <slaweq> I have question about neutron-tempest-plugin-scenario-linuxbridge - should we maybe try to add it to gate queue also (like fullstack) now?
16:36:11 <slaweq> wht do You think about that?
16:36:32 <mlavalle> how long has it been running in check?
16:37:05 <mlavalle> I mean voting in the check queue?
16:37:18 <slaweq> It's voting since 14.03: https://review.openstack.org/#/c/552689/
16:37:37 <mlavalle> mmhhh let's hold for a week
16:37:42 <slaweq> sure
16:38:06 <mlavalle> expecially, since last week tended to be quiet, mostly towards the end
16:38:18 <slaweq> yes, right
16:38:32 <slaweq> let's wait and see if it will be fine still
16:38:35 <slaweq> :)
16:39:07 <slaweq> ok, next topic
16:39:10 <slaweq> #topic Rally
16:39:22 <slaweq> as mlavalle shows it has some failures recently
16:39:44 <slaweq> so I checked today and found 3 examples of failures from last week
16:39:58 <slaweq> all of them were because of reaching global job timeout:
16:40:05 <slaweq> http://logs.openstack.org/18/558318/1/check/neutron-rally-neutron/fdee864/job-output.txt.gz
16:40:13 <slaweq> http://logs.openstack.org/84/556584/4/check/neutron-rally-neutron/8a4dc9d/job-output.txt.gz
16:40:17 <mlavalle> ok, I saw the same with one of my patches
16:40:20 <slaweq> http://logs.openstack.org/81/552881/8/check/neutron-rally-neutron/fb6fb63/job-output.txt.gz
16:40:46 <slaweq> in one of those I think it was even stopped after all tests passed
16:41:22 <slaweq> I think that we should check what takes most time in those jobs and maybe try to speed it up a little bit
16:41:39 <slaweq> is there anyone who wants to check that? :)
16:42:02 <mlavalle> This is the one I saw http://logs.openstack.org/84/556584/4/check/neutron-rally-neutron/8a4dc9d/job-output.txt.gz#_2018-04-03_00_22_53_422472
16:42:56 <mlavalle> I don't know if I will have the bandwidth this week, but if nobody will take a look, I will try
16:43:18 <slaweq> ok, thx
16:43:23 <mlavalle> hopefully I won't be shamed by slaweq if I don't have time next week
16:43:32 <slaweq> for sure not :)
16:43:49 <mlavalle> well, what if you are wearing your Hulk mask?
16:43:57 <slaweq> #action mlavalle to take a look why rally jobs are taking so long time
16:44:10 <slaweq> mlavalle: today I don't have it
16:44:20 <slaweq> but next week - we will see :P
16:45:11 <slaweq> ok, let's move on
16:45:14 <slaweq> #topic Periodic
16:45:36 <slaweq> openstack-tox-py35-with-oslo-master looks like is fine again - thx ihrachys :)
16:45:58 <slaweq> neutron-tempest-postgres-full sometimes has 100% of failures
16:46:14 <slaweq> I checked logs from last two failures and what I found is:
16:46:21 <slaweq> once it wasn’t real failure but timeout reached (after all tests passe): http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/neutron-tempest-postgres-full/03ca3f3/job-output.txt.gz
16:46:32 <slaweq> Another time it was failure not related to neutron: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/neutron-tempest-postgres-full/d5c0933/job-output.txt.gz#_2018-03-30_07_15_04_817879
16:47:57 <slaweq> IMO if such timeouts will happen more often we should try to check it - for now it was only once so it isn't biggest problem for now
16:48:02 <slaweq> what You think about it?
16:48:14 <mlavalle> agree
16:48:23 <mlavalle> let's keep an eye on it
16:48:29 <slaweq> yes
16:49:11 <slaweq> ok, so moving on to last topic
16:49:16 <slaweq> #topic others
16:49:25 <slaweq> I have one more thing to ask
16:49:37 <slaweq> recently we added new job openstack-tox-lower-constraints
16:49:48 <slaweq> to our queue
16:49:54 <slaweq> *queues
16:50:03 <slaweq> do we want to add it to our grafana dashboard?
16:50:05 <mlavalle> was request from infra
16:50:13 <mlavalle> IIRC
16:50:26 <slaweq> yes, I know - I just wanted to ask if we should add it to grafana :)
16:50:41 <slaweq> to have better visibility whats going on with it
16:50:49 <mlavalle> good point
16:50:52 <mlavalle> yes
16:50:57 <slaweq> ok, so I will do it
16:51:15 <slaweq> #action slaweq will add openstack-tox-lower-constraints to grafana dashboard
16:51:26 <haleyb> +1
16:51:26 <slaweq> and last thing
16:51:56 <slaweq> I checked also bugs with tag "gate-failure" on launchpad: https://tinyurl.com/y826rccx
16:52:27 <slaweq> there is quite many such bugs with "high" priority older than few months and not assigned to anybody
16:52:44 <slaweq> maybe we should check them and close those which are not a problem anymore
16:52:48 <slaweq> what do You think?
16:52:55 <mlavalle> yes, let's do it
16:53:13 <slaweq> do You want to go through them now?
16:53:25 <slaweq> or should I do it later maybe?
16:53:27 <mlavalle> I have to drop out now
16:53:42 <mlavalle> as I said at the beginning of the meeting
16:53:49 <slaweq> ok
16:53:55 <mlavalle> but let's try to do them over the week
16:54:03 <mlavalle> do you want to partner on that?
16:54:25 <slaweq> so I will try to check them this week and I will ask if I will need something
16:54:31 <slaweq> thx a lot :)
16:54:32 <mlavalle> ok
16:54:41 <slaweq> so that's all from my side
16:54:53 * mlavalle dropping out
16:54:56 <slaweq> sorry for being so quick but I was preparing it today in car :)
16:54:59 <mlavalle> o/
16:55:05 <slaweq> bye mlavalle
16:55:34 <slaweq> haleyb: do You have anything else? or can we finish few minutes before time?
16:55:54 <haleyb> i am done, lunch here
16:56:06 <slaweq> #action slaweq will check old gate-failure bugs
16:56:33 <slaweq> ok, so bon appetit haleyb :)
16:56:37 <slaweq> and see You
16:56:44 <slaweq> #endmeeting