15:00:34 <johnthetubaguy> #startmeeting XenAPI 15:00:34 <openstack> Meeting started Wed Jul 23 15:00:34 2014 UTC and is due to finish in 60 minutes. The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:38 <openstack> The meeting name has been set to 'xenapi' 15:00:48 <johnthetubaguy> BobBall: hows things? 15:00:53 <johnthetubaguy> #topic CI 15:00:59 <BobBall> Could be better 15:01:08 <BobBall> Failure rate has skyrocketed 15:01:11 <johnthetubaguy> so I got reports of some bad CI results 15:01:15 <BobBall> 1 in 3 patches fail at the moment 15:01:16 <johnthetubaguy> ah, interesting 15:01:22 <johnthetubaguy> that explains things 15:01:25 <johnthetubaguy> do we know why? 15:01:35 <BobBall> Just general races 15:01:39 <BobBall> some may be in xenapi driver 15:01:46 <BobBall> others are definitely generic races 15:01:46 <johnthetubaguy> hmm, you got some examples? 15:01:51 <BobBall> no 15:01:54 <johnthetubaguy> seems like something changed though 15:02:04 <BobBall> oh yes - I enabled the majority of tests 15:02:13 <johnthetubaguy> ah… cool 15:02:19 <johnthetubaguy> that makes total sense thn 15:02:20 <BobBall> the last couple of days I've just disabled a few of the bigger culprits again 15:02:21 <johnthetubaguy> then 15:02:25 <BobBall> to get the pass rate back to sensible 15:02:37 <johnthetubaguy> where is the exclusion list again? 15:02:43 <BobBall> but it shows there are real races that we are hitting more than jenkins; whether they are generic or in xenapi is a different question 15:03:05 <johnthetubaguy> yeah, that is good to know 15:03:09 <BobBall> http://git.openstack.org/cgit/stackforge/xenapi-os-testing/tree/tempest_exclusion_list 15:03:46 <johnthetubaguy> tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario? 15:03:54 <johnthetubaguy> that feels like a bad one to skip... 15:04:04 <BobBall> That depended on my devstack change which was merged the other day 15:04:12 <BobBall> but until the pass rate is higher I don't want to enable it 15:04:21 <BobBall> it's been skipped since the CI was set up 15:04:22 <johnthetubaguy> ah, the ciros thing? 15:04:31 <BobBall> indeed 15:04:53 <johnthetubaguy> currious are we hitting issues because of the cirros image? 15:05:06 <BobBall> no because those 3 tests are still disabled 15:05:30 <johnthetubaguy> so the others are still hard coded to use the other image? 15:05:38 <johnthetubaguy> I guess your tempest config does that 15:05:42 <BobBall> huh? 15:06:04 <johnthetubaguy> well how do you pick which image, before there was only one image, so we knew which it was picking 15:06:15 <BobBall> we always use VHD 15:06:35 <johnthetubaguy> yeah, I was more asking how? is that just tempest configuration? 15:06:38 <BobBall> but there were three tempest tests (including test_minimum_basic_scenario) that are hard-coded to use qcow2 and fall back on euc 15:06:52 <BobBall> ah yes - guess so - can't remember where now 15:07:23 <johnthetubaguy> just wondering if we should double check it doesn't just pick the "first image" thats all 15:07:34 <BobBall> it doesn't 15:07:36 <johnthetubaguy> because devstack will mean it uploads both images right 15:07:41 <johnthetubaguy> ah OK 15:07:47 <BobBall> yes 15:07:54 <johnthetubaguy> just trying to rule things out 15:08:14 <johnthetubaguy> so chances of elasticrecheck getting deployed are tiny I supose? 15:08:23 <BobBall> infinitesimly :D 15:08:37 <BobBall> we're looking at getting some time to move to the upstream CI 15:08:43 <johnthetubaguy> right, I should try get some help on that front, if we get some effort there 15:08:44 <BobBall> at which point integrating E-R is much more practical 15:08:58 <johnthetubaguy> well its for "free" I suspect 15:09:07 <johnthetubaguy> well, sort of 15:09:13 <BobBall> nothing is free :) 15:09:47 <johnthetubaguy> so how do I help with moving the tests upstream discussions? 15:09:54 <johnthetubaguy> were are they happening? infra meetings? 15:10:30 <BobBall> at the moment those discussions aren't happening 15:10:44 <johnthetubaguy> OK 15:11:16 <johnthetubaguy> I am sure we want to help with that, and have open reqs for people to help, so fingers crossed there 15:11:48 <BobBall> I'm trying to get an intern to make some real progress on that too :) 15:12:34 <BobBall> anyway - enabling the majority of the tests has increased the average test length from just over an hour to 1.5 ish 15:12:40 <johnthetubaguy> OK, cool, that should plug a gap till we can hire someone 15:12:45 <johnthetubaguy> ah yes, interesting 15:12:57 <johnthetubaguy> time for another round of optimisations 15:13:17 <BobBall> I've also added some useful code to see what the recent failures were 15:14:16 <johnthetubaguy> OK sounds good 15:14:30 <johnthetubaguy> feel like I should try and get someone to looking into these failures 15:14:46 <johnthetubaguy> your status page is probably the best place to start I guess? 15:14:58 <BobBall> yes 15:15:03 <BobBall> have a look at... 15:15:13 <BobBall> http://f0ab4dc4366795c303a0-8fd069087bab3f263c7f9ddd524fce42.r22.cf1.rackcdn.com/ci_status/all_failures.txt 15:15:16 <BobBall> at the bottom 15:16:42 <johnthetubaguy> interesting, tests relating to notifications 15:16:47 <johnthetubaguy> thanks for those 15:17:04 <BobBall> that's a generic failure 15:17:07 <BobBall> a timeout waiting for it 15:17:29 <johnthetubaguy> yeah, maybe we never raised it 15:17:35 <johnthetubaguy> anyways, fun fun 15:17:52 <johnthetubaguy> and no tempest failures detected, that is things other than the mirrors issue? 15:17:54 <BobBall> it's raised - been seen in the gate 15:17:57 <BobBall> yes 15:18:18 <johnthetubaguy> can you remember examples? or it is just general stuff we see in the gate? 15:18:52 <BobBall> https://bugs.launchpad.net/ceilometer/+bug/1336755 15:18:55 <uvirtbot> Launchpad bug 1336755 in ceilometer "telemetry_notification_api.py test times out waiting for Nova notification" [Critical,Triaged] 15:19:32 <johnthetubaguy> right, interesting 15:19:40 <johnthetubaguy> OK, thanks for the update 15:19:47 <johnthetubaguy> any more on this you want to cover? 15:19:53 <BobBall> no 15:20:02 <BobBall> just I don't have the bandwidth ATM to debug all of these issues 15:20:10 <BobBall> so to get the rate back up I'm disabling tests :) 15:20:58 <johnthetubaguy> OK, how do we get to fixing those though? 15:21:15 <johnthetubaguy> I do wonder about running two systems, one of them running all the tests 15:21:23 <BobBall> perhaps 15:21:31 <BobBall> but I'd rather see a stress on one specific test 15:21:42 <BobBall> getting parallel runs of the same test over and over 15:21:47 <BobBall> but I've not worked out how to do that 15:22:13 <johnthetubaguy> oh, thats a good idea 15:22:59 <johnthetubaguy> so, one more thought on this 15:23:16 <johnthetubaguy> are there any in this list that we expect would never work: http://git.openstack.org/cgit/stackforge/xenapi-os-testing/tree/tempest_exclusion_list 15:23:43 <BobBall> no 15:23:45 <BobBall> they should all work now 15:23:57 <johnthetubaguy> even the encypted volumes stuff? 15:24:36 <BobBall> yes 15:24:46 <johnthetubaguy> Ok, I will not ask how 15:24:55 <BobBall> https://review.openstack.org/#/c/107430/ 15:25:10 <BobBall> dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/30/107430/1/17493/testr_results.html.gz 15:25:19 <BobBall> That's an example of all of the tests that we've had passing 15:25:25 <johnthetubaguy> nice 15:25:29 <johnthetubaguy> thats awesome 15:25:36 <BobBall> Oh - sorry - test_encrypted_cinder_volumes is one of the three that depends on euc 15:25:46 <BobBall> so I've not tested it since enabling the euc image 15:27:00 <johnthetubaguy> hmm, OK 15:27:31 <johnthetubaguy> anyways, cool progress here 15:27:48 <BobBall> backward progress in terms of the pass rate 15:27:50 <johnthetubaguy> sounds nuts, but its nice to see these failures, its kinda some of what I wanted to see 15:27:52 <BobBall> I should have tried more than 3 runes 15:27:55 <BobBall> runs* 15:27:57 <johnthetubaguy> hey ho 15:28:20 <BobBall> over the last 24 hours the most failed test is already excluded: 15:28:21 <BobBall> 7 tempest.thirdparty.boto.test_s3_ec2_images.S3ImagesTest.test_register_get_deregister_ami_image 15:28:24 <BobBall> 12 tempest.thirdparty.boto.test_s3_ec2_images.S3ImagesTest.test_register_get_deregister_aki_image 15:28:27 <BobBall> 18 tempest.thirdparty.boto.test_s3_ec2_images.S3ImagesTest 15:28:29 <BobBall> 24 No tempest failures detected 15:28:35 <BobBall> (that was from 'osci-view failures --recent 24') 15:29:11 <johnthetubaguy> right 15:29:30 <johnthetubaguy> top work, lets try get some help deciphering some of the errors we are seeing 15:29:42 <johnthetubaguy> #help need to debug some of the errors in the XenServer CI 15:29:48 <johnthetubaguy> #topic Open Discussion 15:30:04 <johnthetubaguy> cool, so any other things people want to raise? 15:30:27 <BobBall> not today 15:30:35 <johnthetubaguy> OK cool 15:30:39 <johnthetubaguy> thanks for the updates 15:30:47 <johnthetubaguy> catch you next … oh wait 15:30:58 <johnthetubaguy> next week we are at the mid-cylce, so maybe skip next week? 15:31:41 <johnthetubaguy> #info no XenAPI meeting next week, resuming the week after 15:31:45 <johnthetubaguy> #endmeeting