#openstack-meeting log

16:00:05 <slaweq> #startmeeting neutron_ci
16:00:09 <openstack> Meeting started Tue May 15 16:00:05 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:09 <slaweq> hello
16:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:12 <openstack> The meeting name has been set to 'neutron_ci'
16:00:32 <njohnston> o/
16:00:39 <slaweq> hi njohnston
16:00:47 <mlavalle> o/
16:00:50 <jlibosva> o/
16:02:01 <slaweq> I just pinged ihar and haleyb also in neutron channel
16:02:11 <slaweq> maybe they will join us also
16:02:22 <slaweq> but I think that we can start
16:02:28 <slaweq> #topic Actions from previous meetings
16:02:37 <slaweq> mlavalle will check why trunk tests are failing in dvr multinode scenario
16:02:50 <mlavalle> slaweq: didn't have time last week. sorry
16:03:32 <slaweq> so moving to next week?
16:03:39 <mlavalle> yes, please
16:04:02 <slaweq> #action mlavalle will check why trunk tests are failing in dvr multinode scenario
16:04:08 <mlavalle> Thanks
16:04:08 <slaweq> thx
16:04:16 <slaweq> slaweq will continue debugging slow rally tests issue
16:04:28 <slaweq> I also didn't have time to debug it
16:04:36 <slaweq> #action slaweq will continue debugging slow rally tests issue
16:04:43 <slaweq> slaweq to fix neutron-dynamic-routing scenario tests bug
16:05:04 <slaweq> fressi sent fixes:  https://review.openstack.org/#/c/567736/ and https://review.openstack.org/#/c/567742/
16:05:17 <slaweq> both are merged now so I hope it should be better
16:05:37 <slaweq> next one is:
16:05:37 <slaweq> slaweq to make 2 scenario jobs gating
16:05:49 <slaweq> ovsfw job done: https://review.openstack.org/#/c/567055/
16:06:14 <slaweq> neutron-tempest-plugin-scenario-linuxbridge done: https://review.openstack.org/#/c/567057/
16:06:24 <ihar> eh sorry for late
16:06:29 <slaweq> hi ihar
16:06:40 <slaweq> second one was merged today
16:07:09 <slaweq> I also did patch to add those new jobs to dashboard: https://review.openstack.org/#/c/567062/ and it's waiting for review now
16:07:26 <slaweq> mlavalle: would be good if You could +1 it :)
16:07:57 <mlavalle> Done
16:08:13 <slaweq> thx
16:08:25 <mlavalle> but more importatly, ihar +1ed it also
16:08:38 <slaweq> thx ihar too :)
16:08:40 <ihar> :)))
16:09:04 * slaweq wish that 2 x +1 could be counted as +2 :)
16:09:26 <slaweq> and the last from previous week is:
16:09:28 <slaweq> haleyb will debug failing security groups fullstack test: https://bugs.launchpad.net/neutron/+bug/1767829
16:09:29 <openstack> Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:09:51 <slaweq> but haleyb is not here probably
16:10:00 <slaweq> and I think that he didn't do anything with that
16:10:07 <ihar> it's assigned to you btw
16:10:15 <slaweq> ihar: yes, I know
16:10:32 <slaweq> but last week haleyb told that he will try to check so action was assigned to him
16:10:33 <slaweq> :)
16:11:02 <ihar> yeah I get, just saying should prolly be assigned to him to nag him some more
16:11:07 <slaweq> I was trying to reproduce it on top of master branch with some DNM patch with extra logging but I couldn't
16:11:51 <slaweq> but he has some patch which cause this issue 100% times AFAIR - maybe I will just try to debug on top of his patch and will find something
16:12:01 <slaweq> I will assign it to me now :)
16:12:02 <mlavalle> yeah, if you assign it to him, you'll have more nagging power
16:12:10 <mlavalle> ;-)
16:12:32 <slaweq> #action slaweq to debug failing security groups fullstack test: https://bugs.launchpad.net/neutron/+bug/1767829
16:12:34 <openstack> Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:12:49 <slaweq> I know that haleyb is very busy so I will try to check it maybe
16:13:11 <slaweq> ok, that were all actions from previous week
16:13:22 <slaweq> let's move to next topic then
16:13:26 <slaweq> #topic Grafana
16:13:32 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:15:43 <ihar> how come not a single failure in fullstack / functional for 4 days!
16:16:08 <ihar> (there are some lately, but there was a long trench)
16:16:27 <ihar> actually 5 days
16:16:35 <slaweq> ihar: are You talking about gate queue?
16:16:45 <slaweq> I think it was very similar in the past here
16:16:47 <ihar> yea
16:16:57 <ihar> oh right it's gate, maybe nothing was in gate
16:17:21 <slaweq> but speaking about Fullstack - it was on high failure rate - around 40-50% at the end of last week - but as I went through failed patches from last week I didn’t found one (or 2) general reasons of such failures,
16:18:04 <mlavalle> so, entropy?
16:18:23 <slaweq> also when I today went through different patches and looking for some example failures I saw quite many jobs marked WIP or something like that when many jobs were red
16:18:55 <mlavalle> did they have fullstack failures?
16:18:59 <slaweq> so I didn't check reasons of failures in such jobs and maybe it had some impact on graphs also
16:19:22 <slaweq> mlavalle: yes, I'm talking about patches which had failures on (almost) everything - fullstack also :)
16:19:35 <ihar> example of a wip job failure?
16:19:44 <mlavalle> ok, yes, maybe they had an impact
16:20:01 <slaweq> ihar: give me a sec
16:20:42 <slaweq> e.g. https://review.openstack.org/#/c/533850/
16:21:26 <slaweq> https://review.openstack.org/#/c/567621/
16:21:33 <slaweq> https://review.openstack.org/#/c/549168/
16:21:42 <ihar> what's 'wip' you refer to
16:22:30 <slaweq> ok, those are not marked as WIP but I assumed that if every job is red then there is probably something wrong with patch
16:23:10 <mlavalle> I don't know the oher one, but 567621 is from a co-worker of mine and it is wip
16:23:21 <mlavalle> I'll ask him to mark it so
16:23:28 <ihar> ok gotcha. or there is an infra issue.
16:23:52 <slaweq> ihar: right, but then it would be probably seen on graphs as well :)
16:24:30 <slaweq> mlavalle: I just wanted to talk that I didn't check failures of one of jobs in such patch :)
16:24:52 <mlavalle> ack
16:25:28 <slaweq> talking about fullstack, there were also failures like: https://review.openstack.org/#/c/499908/ which is clearly related to patch
16:26:05 <slaweq> so, to sum up I didn't found anything new which would repeat many times :)
16:26:29 <slaweq> but I found one issue: http://logs.openstack.org/82/568282/3/check/neutron-fullstack/21baf5f/logs/testr_results.html.gz
16:26:38 <slaweq> which is new for me
16:27:18 <slaweq> so for now just for the record - there was such issue at least once and we should check if it will not repeat more times :)
16:28:22 <slaweq> any questions related to fullstack tests (or grafana)?
16:28:43 <mlavalle> I'm good
16:28:57 <ihar> no
16:29:04 <slaweq> so let's move on
16:29:09 <slaweq> #topic Scenarios
16:29:54 <slaweq> Neutron-tempest-plugin-dvr-multinode-scenario failures rate is better now, after marking few tests as unstable
16:29:57 <slaweq> but it's still not perfect
16:30:00 <haleyb> sorry, i had a meeting conflict and didn't see your ping, will try and watch out of one eye
16:30:11 <slaweq> haleyb: sure :)
16:30:44 <slaweq> I found one example of failure for dvr-multinode scenario: http://logs.openstack.org/86/567086/5/check/neutron-tempest-plugin-dvr-multinode-scenario/59d052d/job-output.txt.gz#_2018-05-15_09_04_47_980425
16:31:25 <ihar> hm, AuthenticationException
16:31:37 <ihar> interesting, it's usually timeouts
16:32:01 <ihar> means that metadata didn't provide the ssh fingerprint
16:32:28 <ihar> the test suck, it doesn't dump instance console output
16:32:37 <ihar> that would give us an idea what fails there
16:32:42 <slaweq> yes, there is no console log there
16:32:44 <ihar> but probably metadata down?
16:33:36 <slaweq> logs of metadata agent looks fine
16:33:56 <slaweq> at first glance at least :)
16:34:45 <ihar> it will look just fine if instance can't reach it
16:34:47 <ihar> :)
16:35:02 <slaweq> but some instances reached it for sure
16:36:52 <ihar> I think the next step is make the test case dump console
16:37:01 <slaweq> I can try to read test code and check if there is maybe anything what could be changed/improved to catch console log in such case
16:37:02 <ihar> (and probably there are more test cases that have the same issue)
16:37:23 <mlavalle> most likely
16:37:37 <slaweq> are You fine with that?
16:37:37 <ihar> that would be nice of you if you can do it
16:37:42 <slaweq> :)
16:38:01 <ihar> note the dumping often times is done in connect check methods
16:38:07 <ihar> not directly in test case
16:38:12 <slaweq> #action slaweq to check why some scenario tests don't log instance console log
16:38:26 <slaweq> yes, I think I saw it before already somewhere :)
16:39:35 <slaweq> ok, I have one more thing about scenario tests
16:39:41 <slaweq> and it is from frickler
16:40:28 <slaweq> he asked me last week to discuss if we can switch neutron-tempest-plugin-designate-scenario job to be voting (gating later) as it looks quite stable since long time
16:40:57 <slaweq> so I'm asking what You think about it?
16:41:10 <mlavalle> if failure rate is good, yes, let's do it
16:41:19 <ihar> I think yes it's a good move. it's very stable.
16:41:20 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=9&fullscreen
16:41:43 <slaweq> yes, it's below 10 % for most of the time at least in last 30 days
16:41:49 <slaweq> so I also think that it is stable
16:41:54 <mlavalle> yes, let';s do it
16:42:06 <slaweq> ok, so I will switch it to be voting for now
16:42:18 <slaweq> and later we can think about making it gating also - right?
16:42:37 <mlavalle> yes
16:42:46 <slaweq> #action slaweq to switch neutron-tempest-plugin-designate-scenario to be voting
16:42:48 <slaweq> thx
16:42:59 <slaweq> ok, moving on
16:43:00 <slaweq> #topic Rally
16:43:00 <mlavalle> thanks to you and frickler !
16:43:19 <slaweq> thx frickler :)
16:43:34 <slaweq> in rally tests there is nothing new
16:43:57 <slaweq> still only failures which I found are related to global timeout, e.g.: http://logs.openstack.org/12/470912/43/check/neutron-rally-neutron/101415f/job-output.txt.gz
16:44:13 <slaweq> so I need to investigate it
16:44:21 <slaweq> this week
16:44:42 <slaweq> next topic is
16:44:42 <slaweq> #topic Periodic
16:45:02 <slaweq> 2 jobs were failing during last week:
16:45:36 <slaweq> neutron-tempest-postgres-full - failed few times but as I checked logs, failures related to some error with volumes or nova service list (or something like that)
16:45:47 <slaweq> so I didn't debug it more
16:46:12 <slaweq> Neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4 - this one should be fixed since today as fressi's patches are merged
16:46:16 <ihar> psql is traditionally unstable. I don't think anyone has taste to fix it so as long as it passes from time to time it's good enough.
16:46:35 <slaweq> ihar: so it's good enough :)
16:47:05 <slaweq> and that was all about periodic jobs
16:47:13 <slaweq> anything to add? questions?
16:47:21 <mlavalle> not from me
16:47:37 <slaweq> ok, so last topic for today
16:47:38 <slaweq> #topic Open discussion
16:47:56 <slaweq> I just wanted to ask if You are fine to cancel next week's meeting due to summit?
16:48:03 <slaweq> if so I will sent an email about that
16:48:13 <mlavalle> I was going to suggest it
16:48:17 <slaweq> :)
16:48:24 <ihar> I am good for less meetings
16:48:29 <ihar> I have a patch to raise
16:48:41 <ihar> for unit tests
16:48:43 <ihar> https://review.openstack.org/#/c/568390/
16:48:52 <ihar> it is *not* ready and I will try to get it ready today
16:49:08 <ihar> but if not someone better pick it up from me because next time I will get to it the next Monday
16:49:14 <ihar> and it fixes a unit test gate issue
16:49:27 <slaweq> ok ihar, go ahead :)
16:49:30 <ihar> (should be easy to resolve the last standing failure there though, just a heads-up)
16:49:41 <slaweq> ignore my last sentence :)
16:49:43 <mlavalle> ok, thanks for the heads up
16:50:30 <slaweq> ihar: I can help You with it if You want
16:50:57 <jlibosva> I have one item I wanted to share
16:50:58 <ihar> just take it over if I don't get it done today
16:51:05 <slaweq> ihar: sure
16:51:18 <slaweq> jlibosva: go on
16:51:31 <jlibosva> We have a community goal to switch to python3
16:51:38 <jlibosva> we used to have functional and fullstack experimental jobs
16:52:08 <jlibosva> we spent some time in the past fixing bugs so functional suite can be executed fine without any errors
16:52:19 <jlibosva> we had one last thing missing, which was caused by bug in oslo.service
16:52:27 <jlibosva> aaaaand, this bug has been fixed after a year or so :)
16:52:38 <slaweq> \o/
16:52:39 <jlibosva> but we lost the experimental jobs
16:52:44 <jlibosva> so I have a patch to add them back - https://review.openstack.org/#/c/568282/
16:53:01 <jlibosva> in the meantime, there are new issues in functional job, I'm working on fixes. I think there are two issues only
16:53:28 <jlibosva> I think it would be worth to have the functional job passing and then move it to check queue - one more thing that could make the zuul vote -1, right? :)
16:53:34 <jlibosva> what are your opinions?
16:53:47 <slaweq> maybe as non-voting for the beginning
16:53:57 <njohnston> +1
16:54:00 <mlavalle> I agree with your proposal
16:54:08 <jlibosva> I mean, it will create yet another job and it seems we have a few already in Neutron :-/
16:54:08 <ihar> of course non voting first but otherwise I just +w'd the jobs
16:54:14 <jlibosva> ihar: thanks
16:54:20 <mlavalle> as slaweq says, non voting first
16:54:24 <jlibosva> ok, I'll be working on patches for it, thanks :)
16:54:27 <ihar> more jobs is fine if we manage failures
16:54:34 <slaweq> thx jlibosva
16:54:40 <jlibosva> gaining some Python 3 knowledge on the fly
16:54:44 <ihar> arguably the project is doing great so far
16:55:55 <slaweq> ok, someone wants to talk about anything else?
16:56:11 <slaweq> if no, we can enjoy our free 4 minutes :)
16:56:33 <ihar> 4 mins is fine with me
16:56:36 <ihar> cheers
16:56:41 <mlavalle> Thanks
16:56:42 <slaweq> thx for attending
16:56:45 <jlibosva> thx bye
16:56:49 <slaweq> we will see in 2 weeks than
16:56:52 <slaweq> see You
16:56:52 <slaweq> https://review.openstack.org/#/c/568282/
16:56:58 <slaweq> #undo
16:56:59 <openstack> Removing item from minutes: #link https://review.openstack.org/#/c/568282/
16:57:04 <slaweq> #endmeeting