15:00:34 <johnthetubaguy> #startmeeting XenAPI
15:00:34 <openstack> Meeting started Wed Jul 23 15:00:34 2014 UTC and is due to finish in 60 minutes.  The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:38 <openstack> The meeting name has been set to 'xenapi'
15:00:48 <johnthetubaguy> BobBall: hows things?
15:00:53 <johnthetubaguy> #topic CI
15:00:59 <BobBall> Could be better
15:01:08 <BobBall> Failure rate has skyrocketed
15:01:11 <johnthetubaguy> so I got reports of some bad CI results
15:01:15 <BobBall> 1 in 3 patches fail at the moment
15:01:16 <johnthetubaguy> ah, interesting
15:01:22 <johnthetubaguy> that explains things
15:01:25 <johnthetubaguy> do we know why?
15:01:35 <BobBall> Just general races
15:01:39 <BobBall> some may be in xenapi driver
15:01:46 <BobBall> others are definitely generic races
15:01:46 <johnthetubaguy> hmm, you got some examples?
15:01:51 <BobBall> no
15:01:54 <johnthetubaguy> seems like something changed though
15:02:04 <BobBall> oh yes - I enabled the majority of tests
15:02:13 <johnthetubaguy> ah… cool
15:02:19 <johnthetubaguy> that makes total sense thn
15:02:20 <BobBall> the last couple of days I've just disabled a few of the bigger culprits again
15:02:21 <johnthetubaguy> then
15:02:25 <BobBall> to get the pass rate back to sensible
15:02:37 <johnthetubaguy> where is the exclusion list again?
15:02:43 <BobBall> but it shows there are real races that we are hitting more than jenkins; whether they are generic or in xenapi is a different question
15:03:05 <johnthetubaguy> yeah, that is good to know
15:03:09 <BobBall> http://git.openstack.org/cgit/stackforge/xenapi-os-testing/tree/tempest_exclusion_list
15:03:46 <johnthetubaguy> tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario?
15:03:54 <johnthetubaguy> that feels like a bad one to skip...
15:04:04 <BobBall> That depended on my devstack change which was merged the other day
15:04:12 <BobBall> but until the pass rate is higher I don't want to enable it
15:04:21 <BobBall> it's been skipped since the CI was set up
15:04:22 <johnthetubaguy> ah, the ciros thing?
15:04:31 <BobBall> indeed
15:04:53 <johnthetubaguy> currious are we hitting issues because of the cirros image?
15:05:06 <BobBall> no because those 3 tests are still disabled
15:05:30 <johnthetubaguy> so the others are still hard coded to use the other image?
15:05:38 <johnthetubaguy> I guess your tempest config does that
15:05:42 <BobBall> huh?
15:06:04 <johnthetubaguy> well how do you pick which image, before there was only one image, so we knew which it was picking
15:06:15 <BobBall> we always use VHD
15:06:35 <johnthetubaguy> yeah, I was more asking how? is that just tempest configuration?
15:06:38 <BobBall> but there were three tempest tests (including test_minimum_basic_scenario) that are hard-coded to use qcow2 and fall back on euc
15:06:52 <BobBall> ah yes - guess so - can't remember where now
15:07:23 <johnthetubaguy> just wondering if we should double check it doesn't just pick the "first image" thats all
15:07:34 <BobBall> it doesn't
15:07:36 <johnthetubaguy> because devstack will mean it uploads both images right
15:07:41 <johnthetubaguy> ah OK
15:07:47 <BobBall> yes
15:07:54 <johnthetubaguy> just trying to rule things out
15:08:14 <johnthetubaguy> so chances of elasticrecheck getting deployed are tiny I supose?
15:08:23 <BobBall> infinitesimly :D
15:08:37 <BobBall> we're looking at getting some time to move to the upstream CI
15:08:43 <johnthetubaguy> right, I should try get some help on that front, if we get some effort there
15:08:44 <BobBall> at which point integrating E-R is much more practical
15:08:58 <johnthetubaguy> well its for "free" I suspect
15:09:07 <johnthetubaguy> well, sort of
15:09:13 <BobBall> nothing is free :)
15:09:47 <johnthetubaguy> so how do I help with moving the tests upstream discussions?
15:09:54 <johnthetubaguy> were are they happening? infra meetings?
15:10:30 <BobBall> at the moment those discussions aren't happening
15:10:44 <johnthetubaguy> OK
15:11:16 <johnthetubaguy> I am sure we want to help with that, and have open reqs for people to help, so fingers crossed there
15:11:48 <BobBall> I'm trying to get an intern to make some real progress on that too :)
15:12:34 <BobBall> anyway - enabling the majority of the tests has increased the average test length from just over an hour to 1.5 ish
15:12:40 <johnthetubaguy> OK, cool, that should plug a gap till we can hire someone
15:12:45 <johnthetubaguy> ah yes, interesting
15:12:57 <johnthetubaguy> time for another round of optimisations
15:13:17 <BobBall> I've also added some useful code to see what the recent failures were
15:14:16 <johnthetubaguy> OK sounds good
15:14:30 <johnthetubaguy> feel like I should try and get someone to looking into these failures
15:14:46 <johnthetubaguy> your status page is probably the best place to start I guess?
15:14:58 <BobBall> yes
15:15:03 <BobBall> have a look at...
15:15:13 <BobBall> http://f0ab4dc4366795c303a0-8fd069087bab3f263c7f9ddd524fce42.r22.cf1.rackcdn.com/ci_status/all_failures.txt
15:15:16 <BobBall> at the bottom
15:16:42 <johnthetubaguy> interesting, tests relating to notifications
15:16:47 <johnthetubaguy> thanks for those
15:17:04 <BobBall> that's a generic failure
15:17:07 <BobBall> a timeout waiting for it
15:17:29 <johnthetubaguy> yeah, maybe we never raised it
15:17:35 <johnthetubaguy> anyways, fun fun
15:17:52 <johnthetubaguy> and no tempest failures detected, that is things other than the mirrors issue?
15:17:54 <BobBall> it's raised - been seen in the gate
15:17:57 <BobBall> yes
15:18:18 <johnthetubaguy> can you remember examples? or it is just general stuff we see in the gate?
15:18:52 <BobBall> https://bugs.launchpad.net/ceilometer/+bug/1336755
15:18:55 <uvirtbot> Launchpad bug 1336755 in ceilometer "telemetry_notification_api.py test times out waiting for Nova notification" [Critical,Triaged]
15:19:32 <johnthetubaguy> right, interesting
15:19:40 <johnthetubaguy> OK, thanks for the update
15:19:47 <johnthetubaguy> any more on this you want to cover?
15:19:53 <BobBall> no
15:20:02 <BobBall> just I don't have the bandwidth ATM to debug all of these issues
15:20:10 <BobBall> so to get the rate back up I'm disabling tests :)
15:20:58 <johnthetubaguy> OK, how do we get to fixing those though?
15:21:15 <johnthetubaguy> I do wonder about running two systems, one of them running all the tests
15:21:23 <BobBall> perhaps
15:21:31 <BobBall> but I'd rather see a stress on one specific test
15:21:42 <BobBall> getting parallel runs of the same test over and over
15:21:47 <BobBall> but I've not worked out how to do that
15:22:13 <johnthetubaguy> oh, thats a good idea
15:22:59 <johnthetubaguy> so, one more thought on this
15:23:16 <johnthetubaguy> are there any in this list that we expect would never work: http://git.openstack.org/cgit/stackforge/xenapi-os-testing/tree/tempest_exclusion_list
15:23:43 <BobBall> no
15:23:45 <BobBall> they should all work now
15:23:57 <johnthetubaguy> even the encypted volumes stuff?
15:24:36 <BobBall> yes
15:24:46 <johnthetubaguy> Ok, I will not ask how
15:24:55 <BobBall> https://review.openstack.org/#/c/107430/
15:25:10 <BobBall> dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/30/107430/1/17493/testr_results.html.gz
15:25:19 <BobBall> That's an example of all of the tests that we've had passing
15:25:25 <johnthetubaguy> nice
15:25:29 <johnthetubaguy> thats awesome
15:25:36 <BobBall> Oh - sorry - test_encrypted_cinder_volumes is one of the three that depends on euc
15:25:46 <BobBall> so I've not tested it since enabling the euc image
15:27:00 <johnthetubaguy> hmm, OK
15:27:31 <johnthetubaguy> anyways, cool progress here
15:27:48 <BobBall> backward progress in terms of the pass rate
15:27:50 <johnthetubaguy> sounds nuts, but its nice to see these failures, its kinda some of what I wanted to see
15:27:52 <BobBall> I should have tried more than 3 runes
15:27:55 <BobBall> runs*
15:27:57 <johnthetubaguy> hey ho
15:28:20 <BobBall> over the last 24 hours the most failed test is already excluded:
15:28:21 <BobBall> 7 tempest.thirdparty.boto.test_s3_ec2_images.S3ImagesTest.test_register_get_deregister_ami_image
15:28:24 <BobBall> 12 tempest.thirdparty.boto.test_s3_ec2_images.S3ImagesTest.test_register_get_deregister_aki_image
15:28:27 <BobBall> 18 tempest.thirdparty.boto.test_s3_ec2_images.S3ImagesTest
15:28:29 <BobBall> 24 No tempest failures detected
15:28:35 <BobBall> (that was from 'osci-view failures --recent 24')
15:29:11 <johnthetubaguy> right
15:29:30 <johnthetubaguy> top work, lets try get some help deciphering some of the errors we are seeing
15:29:42 <johnthetubaguy> #help need to debug some of the errors in the XenServer CI
15:29:48 <johnthetubaguy> #topic Open Discussion
15:30:04 <johnthetubaguy> cool, so any other things people want to raise?
15:30:27 <BobBall> not today
15:30:35 <johnthetubaguy> OK cool
15:30:39 <johnthetubaguy> thanks for the updates
15:30:47 <johnthetubaguy> catch you next … oh wait
15:30:58 <johnthetubaguy> next week we are at the mid-cylce, so maybe skip next week?
15:31:41 <johnthetubaguy> #info no XenAPI meeting next week, resuming the week after
15:31:45 <johnthetubaguy> #endmeeting