15:00:34 #startmeeting XenAPI 15:00:34 Meeting started Wed Jul 23 15:00:34 2014 UTC and is due to finish in 60 minutes. The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:38 The meeting name has been set to 'xenapi' 15:00:48 BobBall: hows things? 15:00:53 #topic CI 15:00:59 Could be better 15:01:08 Failure rate has skyrocketed 15:01:11 so I got reports of some bad CI results 15:01:15 1 in 3 patches fail at the moment 15:01:16 ah, interesting 15:01:22 that explains things 15:01:25 do we know why? 15:01:35 Just general races 15:01:39 some may be in xenapi driver 15:01:46 others are definitely generic races 15:01:46 hmm, you got some examples? 15:01:51 no 15:01:54 seems like something changed though 15:02:04 oh yes - I enabled the majority of tests 15:02:13 ah… cool 15:02:19 that makes total sense thn 15:02:20 the last couple of days I've just disabled a few of the bigger culprits again 15:02:21 then 15:02:25 to get the pass rate back to sensible 15:02:37 where is the exclusion list again? 15:02:43 but it shows there are real races that we are hitting more than jenkins; whether they are generic or in xenapi is a different question 15:03:05 yeah, that is good to know 15:03:09 http://git.openstack.org/cgit/stackforge/xenapi-os-testing/tree/tempest_exclusion_list 15:03:46 tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario? 15:03:54 that feels like a bad one to skip... 15:04:04 That depended on my devstack change which was merged the other day 15:04:12 but until the pass rate is higher I don't want to enable it 15:04:21 it's been skipped since the CI was set up 15:04:22 ah, the ciros thing? 15:04:31 indeed 15:04:53 currious are we hitting issues because of the cirros image? 15:05:06 no because those 3 tests are still disabled 15:05:30 so the others are still hard coded to use the other image? 15:05:38 I guess your tempest config does that 15:05:42 huh? 15:06:04 well how do you pick which image, before there was only one image, so we knew which it was picking 15:06:15 we always use VHD 15:06:35 yeah, I was more asking how? is that just tempest configuration? 15:06:38 but there were three tempest tests (including test_minimum_basic_scenario) that are hard-coded to use qcow2 and fall back on euc 15:06:52 ah yes - guess so - can't remember where now 15:07:23 just wondering if we should double check it doesn't just pick the "first image" thats all 15:07:34 it doesn't 15:07:36 because devstack will mean it uploads both images right 15:07:41 ah OK 15:07:47 yes 15:07:54 just trying to rule things out 15:08:14 so chances of elasticrecheck getting deployed are tiny I supose? 15:08:23 infinitesimly :D 15:08:37 we're looking at getting some time to move to the upstream CI 15:08:43 right, I should try get some help on that front, if we get some effort there 15:08:44 at which point integrating E-R is much more practical 15:08:58 well its for "free" I suspect 15:09:07 well, sort of 15:09:13 nothing is free :) 15:09:47 so how do I help with moving the tests upstream discussions? 15:09:54 were are they happening? infra meetings? 15:10:30 at the moment those discussions aren't happening 15:10:44 OK 15:11:16 I am sure we want to help with that, and have open reqs for people to help, so fingers crossed there 15:11:48 I'm trying to get an intern to make some real progress on that too :) 15:12:34 anyway - enabling the majority of the tests has increased the average test length from just over an hour to 1.5 ish 15:12:40 OK, cool, that should plug a gap till we can hire someone 15:12:45 ah yes, interesting 15:12:57 time for another round of optimisations 15:13:17 I've also added some useful code to see what the recent failures were 15:14:16 OK sounds good 15:14:30 feel like I should try and get someone to looking into these failures 15:14:46 your status page is probably the best place to start I guess? 15:14:58 yes 15:15:03 have a look at... 15:15:13 http://f0ab4dc4366795c303a0-8fd069087bab3f263c7f9ddd524fce42.r22.cf1.rackcdn.com/ci_status/all_failures.txt 15:15:16 at the bottom 15:16:42 interesting, tests relating to notifications 15:16:47 thanks for those 15:17:04 that's a generic failure 15:17:07 a timeout waiting for it 15:17:29 yeah, maybe we never raised it 15:17:35 anyways, fun fun 15:17:52 and no tempest failures detected, that is things other than the mirrors issue? 15:17:54 it's raised - been seen in the gate 15:17:57 yes 15:18:18 can you remember examples? or it is just general stuff we see in the gate? 15:18:52 https://bugs.launchpad.net/ceilometer/+bug/1336755 15:18:55 Launchpad bug 1336755 in ceilometer "telemetry_notification_api.py test times out waiting for Nova notification" [Critical,Triaged] 15:19:32 right, interesting 15:19:40 OK, thanks for the update 15:19:47 any more on this you want to cover? 15:19:53 no 15:20:02 just I don't have the bandwidth ATM to debug all of these issues 15:20:10 so to get the rate back up I'm disabling tests :) 15:20:58 OK, how do we get to fixing those though? 15:21:15 I do wonder about running two systems, one of them running all the tests 15:21:23 perhaps 15:21:31 but I'd rather see a stress on one specific test 15:21:42 getting parallel runs of the same test over and over 15:21:47 but I've not worked out how to do that 15:22:13 oh, thats a good idea 15:22:59 so, one more thought on this 15:23:16 are there any in this list that we expect would never work: http://git.openstack.org/cgit/stackforge/xenapi-os-testing/tree/tempest_exclusion_list 15:23:43 no 15:23:45 they should all work now 15:23:57 even the encypted volumes stuff? 15:24:36 yes 15:24:46 Ok, I will not ask how 15:24:55 https://review.openstack.org/#/c/107430/ 15:25:10 dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/30/107430/1/17493/testr_results.html.gz 15:25:19 That's an example of all of the tests that we've had passing 15:25:25 nice 15:25:29 thats awesome 15:25:36 Oh - sorry - test_encrypted_cinder_volumes is one of the three that depends on euc 15:25:46 so I've not tested it since enabling the euc image 15:27:00 hmm, OK 15:27:31 anyways, cool progress here 15:27:48 backward progress in terms of the pass rate 15:27:50 sounds nuts, but its nice to see these failures, its kinda some of what I wanted to see 15:27:52 I should have tried more than 3 runes 15:27:55 runs* 15:27:57 hey ho 15:28:20 over the last 24 hours the most failed test is already excluded: 15:28:21 7 tempest.thirdparty.boto.test_s3_ec2_images.S3ImagesTest.test_register_get_deregister_ami_image 15:28:24 12 tempest.thirdparty.boto.test_s3_ec2_images.S3ImagesTest.test_register_get_deregister_aki_image 15:28:27 18 tempest.thirdparty.boto.test_s3_ec2_images.S3ImagesTest 15:28:29 24 No tempest failures detected 15:28:35 (that was from 'osci-view failures --recent 24') 15:29:11 right 15:29:30 top work, lets try get some help deciphering some of the errors we are seeing 15:29:42 #help need to debug some of the errors in the XenServer CI 15:29:48 #topic Open Discussion 15:30:04 cool, so any other things people want to raise? 15:30:27 not today 15:30:35 OK cool 15:30:39 thanks for the updates 15:30:47 catch you next … oh wait 15:30:58 next week we are at the mid-cylce, so maybe skip next week? 15:31:41 #info no XenAPI meeting next week, resuming the week after 15:31:45 #endmeeting