15:01:18 <johnthetubaguy> #startmeeting XenAPI 15:01:19 <openstack> Meeting started Wed Jun 11 15:01:18 2014 UTC and is due to finish in 60 minutes. The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:24 <openstack> The meeting name has been set to 'xenapi' 15:01:30 <johnthetubaguy> hello all 15:01:59 <johnthetubaguy> hello all 15:02:03 <johnthetubaguy> hows things going? 15:02:07 <BobBall> Fine fine 15:02:08 <BobBall> you? 15:02:12 <johnthetubaguy> what topics do we have for today? 15:02:16 <johnthetubaguy> BobBall: good thanks 15:02:41 <BobBall> our CI is the best again 15:03:04 <BobBall> http://www.rcbops.com/gerrit/reports/nova-cireport.html 15:03:07 <BobBall> the stats have been updated 15:03:36 <johnthetubaguy> #topic CI 15:03:48 * johnthetubaguy is looking at stats 15:04:05 <johnthetubaguy> hehe, so we are better than jenkins, except for the yellow bit 15:04:25 <johnthetubaguy> how do we get the yellow bit down? 15:04:29 <BobBall> we can't 15:04:41 <johnthetubaguy> why not? is that a time it takes to run thing? 15:04:43 <BobBall> unless we get more people in different timezones working on the CI in the same way as infra does 15:05:01 <BobBall> once a patch is missed, it's yellow forever - so we'd need <2 hour responses on _everything_ 15:05:15 <johnthetubaguy> right, but how often does it screw up? 15:05:22 <johnthetubaguy> in the last 30 days 15:05:59 <BobBall> very very rarely 15:06:08 <BobBall> but if it happens and we miss even 1 then our yellow bar is bigger 15:06:41 <johnthetubaguy> right, thats fine, just curious how we improve that, people watching it doesn't feel like the correct answer 15:07:01 <BobBall> that's how infra fixes it :) 15:07:20 <BobBall> people go to #openstack-infra and shout until it's fixed 15:07:23 <BobBall> no one does that for xs CI 15:07:45 <johnthetubaguy> … we could monitor things, and make it fix its-self a little bit 15:07:59 <johnthetubaguy> but anyway, maybe what we need is a better measure 15:08:16 <BobBall> maybe 15:08:21 <BobBall> more automated emails would be nice 15:08:29 <BobBall> but quite honestly I'm not fussed about the yellow 15:08:33 <johnthetubaguy> thats a monitoring issue on our side right? 15:08:42 <johnthetubaguy> where our = xenserver ci 15:08:58 <BobBall> Sure - or even better on the nova-cireport.html's side 15:09:09 <BobBall> "Hey - I think your CI is down because it hasn't voted on XYZ" 15:09:40 <johnthetubaguy> the reason I say this, is at the summit there was agreement to reduce the yellow bar, and no one compained 15:09:41 <BobBall> I think that ci report is going into infra at some point which makes it easier to add such an email 15:09:48 <BobBall> I complained 15:09:54 <johnthetubaguy> if we are not happy we need to complain and come up with a better idea 15:10:03 <johnthetubaguy> OK, so I was half asleep, what was the response? 15:10:03 <BobBall> I pointed out in the etherpad why it is not appropriate 15:10:12 <BobBall> I haven't followed it up yet 15:10:13 <johnthetubaguy> oh, so no one was reading that 15:10:16 <johnthetubaguy> oops 15:10:19 <BobBall> but neither has anyone else AFAIK 15:10:25 <BobBall> i.e. no formal proposal has been made 15:10:30 <BobBall> that I've seen anyone 15:10:32 <BobBall> anyway* 15:10:45 <johnthetubaguy> agreed, mostly as the gate is screwed right now 15:10:59 <BobBall> Fine - so if/when it's proposed I will definitely argue against it 15:11:12 <BobBall> https://etherpad.openstack.org/p/juno-nova-third-party-ci I think? 15:11:16 <BobBall> it's not loading for me 15:11:25 <johnthetubaguy> I was just trying to get a better idea as a rebuttle 15:11:33 <BobBall> Line 32 15:11:36 <BobBall> Everyone jumps when jenkins is down, but few people (other than those running the CI system) monitor 3rd party CIs with the same enthusiasm. If a 3rd party misses a patch (e.g. gerrit stream monitoring fails), then a new patch is submitted, the old missed patch is forever held as a miss by the stats. IOW I imagine Jenkis miss rate will always be lower than 3rd party miss rate. 15:11:46 <BobBall> My suggestion.... 15:11:49 <BobBall> Missed split: No vote vs late vote 15:11:52 <BobBall> disagreements stats (how often does it disagree with jenkins) - perhaps say 'jenkins fail' is only if all tempest failed in Jenkins to avoid known gate instabilities? - why compare to Jenkins rather than some other known, desired state? 15:11:59 <BobBall> correllation % / overlay with infra-jenkins 15:12:10 <BobBall> Low disagreement stats would be the key metric IMO 15:12:19 <johnthetubaguy> maybe 15:12:20 <BobBall> No late votes would also be a key metric 15:12:40 <BobBall> No votes should be 'acceptable' in the case of CI downtime as long as the 'no votes' are not too high 15:12:46 <johnthetubaguy> I like the idea of % late and % missed being different 15:12:49 <BobBall> i.e. maybe 10% 'no votes' is acceptable for a 3rd party CI 15:12:56 <johnthetubaguy> yeah, that seems reasonable 15:13:15 <johnthetubaguy> disagreement is harder, we want them to find other bugs, which would be disagreement 15:13:17 <BobBall> but we did make it clear that reporting must be <2h so no 'late votes' are acceptable (although I also think that's too strict) 15:13:32 <johnthetubaguy> let me find the link 15:13:34 <BobBall> Sure - it would all need to be on a scale 15:13:43 <BobBall> i.e. if you have 10% disagreements then we're happy 15:13:50 <BobBall> but we'll assume that 90% of all jobs should agree 15:14:02 <BobBall> if there are _ANY_ jenkins fails that you pass, then that's a massive red flag 15:14:07 <johnthetubaguy> https://wiki.openstack.org/wiki/HypervisorSupportMatrix/DeprecationPlan 15:14:12 <johnthetubaguy> hmm, it says four hours 15:14:33 <BobBall> but I don't like forcing a CI to match specific arbitrary numbers... the numbers should just give the PTL a feel on what is acceptable or not 15:14:37 <johnthetubaguy> I think an average below two is probably kinder 15:14:50 <BobBall> unacceptable --> warning; no fix/plan --> booting 15:14:56 <johnthetubaguy> right, the idea here was, how do we give a clear bar, rather than a gut feeling 15:15:03 <BobBall> let's all be reasonable here - we're all human after all :D 15:15:18 <BobBall> Sure - but the bar can't be set so low that none of the non-infra tests can match it 15:15:30 <johnthetubaguy> agreed 15:15:43 <BobBall> perhaps another metric that'd be useful is "CHANGES missed" rather than patchsets 15:15:46 <johnthetubaguy> just don't want people feeling like, hey we don't like you, so we don't approve your CI 15:15:57 <BobBall> if you miss patch 4 and patch 5 comes along, then you test 5, 4 shouldn't be a "miss" 15:15:59 <johnthetubaguy> ah, that in interesting idea 15:16:18 <BobBall> because there is no point going back and testing 4, and the CI is back up and running testing 5... 15:16:30 <BobBall> missed vs late etc 15:16:42 <johnthetubaguy> I think looking at the average reporting time is fine here, thinking about this more 15:17:11 <johnthetubaguy> anyways... 15:17:16 <BobBall> yes 15:17:18 <johnthetubaguy> digging out of that rat hole 15:17:27 <BobBall> rabbit hole. Definitely bigger than a rat hole. 15:17:32 <johnthetubaguy> but I think we understand what we want 15:17:33 <johnthetubaguy> lol 15:17:41 <johnthetubaguy> what else did you want to cover 15:17:55 <BobBall> uhhh... not sure 15:17:56 <BobBall> oh yeah 15:17:57 <johnthetubaguy> I am getting back to setting up a parallel setup to get out some funky stuff 15:18:01 <BobBall> there's a rubbish bug 15:18:24 <BobBall> if you give devstack/d-g a repo (i.e. review.openstack.org repo) then it'll checkout from there 15:18:31 <BobBall> which is correct - right? 15:18:43 <BobBall> BUT in some cases you want to merge, a-la-Zuul 15:18:56 <BobBall> (all cases are safer with merging of course) 15:18:57 <johnthetubaguy> oh, this rings a bell 15:19:08 <BobBall> So there was a change this last week that failed until it was rebased 15:19:09 <johnthetubaguy> I remember turbo hipster guys talking about this one 15:19:11 <BobBall> now it passes 15:19:38 <BobBall> I'm still pushing to try and move more of the CI to an -infra base 15:19:45 <BobBall> which will let it use zuul 15:19:50 <johnthetubaguy> right 15:20:02 <johnthetubaguy> about ciros, did you move to the new image/ 15:20:03 <BobBall> but I guess a short term fix might be to add some more hacky flags in D-G to merge rather than checkout 15:20:13 <BobBall> not this week, no 15:20:25 <johnthetubaguy> OK, so no more tests enabled at this point? 15:20:30 <BobBall> correct 15:20:45 <johnthetubaguy> no worries, just checking 15:21:08 <johnthetubaguy> having another meeting this week about getting us more help for this CI 15:21:25 <BobBall> well I'm not sure what the focus would be ATM 15:21:30 <johnthetubaguy> so there is a little bit of progress 15:21:30 <BobBall> apart from adding the quark stuff I guess 15:21:48 <BobBall> (or replacing nova-net with neutron+quark) 15:21:49 <johnthetubaguy> yeah, adding quark, adding more tests 15:22:04 <johnthetubaguy> maybe adding cloudcafe 15:22:41 <johnthetubaguy> but anyways, thats part of the discussion 15:22:43 <johnthetubaguy> I guess 15:22:56 <johnthetubaguy> also, just help with the 24-7 maintainance thing 15:23:03 <BobBall> yeah 15:23:13 <johnthetubaguy> some US people would spread the curve a little further 15:23:20 <johnthetubaguy> and into peak patch creation times 15:23:38 <johnthetubaguy> cool, so we are done for CI I guess? 15:23:53 <BobBall> indeed... but there is a learning curve which might be too long given that we're not having many issues at all ATM 15:24:00 <BobBall> Done indeed 15:24:07 <BobBall> I need to update the nodepool patches with more docs 15:24:14 <BobBall> hoping to do that tomorrow I think 15:24:24 <johnthetubaguy> agreed, but its probably needed, the other thing, is moving to zuul via turbohipster people 15:24:31 <johnthetubaguy> cool 15:24:37 <BobBall> That's a long way off 15:24:38 <johnthetubaguy> #topic Open Discussion 15:24:43 <johnthetubaguy> any thing more? 15:24:50 <BobBall> we need all of the upstreaming done first - which is the start of those nodepool changes ;) 15:25:03 <BobBall> Yeah... I keep meaning to test... 15:25:07 <BobBall> is HVM working? 15:25:13 <johnthetubaguy> BobBall: well, they can run modfied branches of some stuff 15:25:16 <johnthetubaguy> HVM working? 15:25:20 <johnthetubaguy> what do you mean? 15:25:28 <BobBall> There was a suggestion on some list somewhere where tempest only worked with PV guests and not HVM 15:25:38 <BobBall> oh, no, think it was on IRC 15:25:55 <BobBall> probably worth switching to just run a full tempest on HVM at some point to prove it does 15:25:56 <johnthetubaguy> oh, no idea, I suspect they just set the image up wrongly 15:25:59 <BobBall> and/or run some specific tests 15:26:09 <BobBall> well does Cirros support running HVM? 15:26:18 <johnthetubaguy> oh, so volume attach will need PV tools right? 15:26:27 <johnthetubaguy> or something like that 15:26:58 <BobBall> right - does cirros include PV tools for that? or would it run fully HVM? 15:27:02 <johnthetubaguy> oh, I doubt cirros is the correct choice for HVM tests 15:27:06 * BobBall doesn't know... 15:27:08 <BobBall> ah ok 15:27:19 <BobBall> mabe that's the answer then 15:27:30 <johnthetubaguy> some of our PVHM images are fairly small 15:27:37 <johnthetubaguy> they would probably do the trick 15:28:03 <johnthetubaguy> (if we turn caching on) 15:28:07 <BobBall> ok great 15:28:33 <johnthetubaguy> it certainly works in production, but its a good point, better image coverage would help 15:28:40 <johnthetubaguy> like testing windows and linx 15:28:51 <johnthetubaguy> oh wait, that will fail, but whatever 15:28:58 <johnthetubaguy> nested HVM, not so good 15:29:11 <johnthetubaguy> anyways, we are all done I guess? 15:29:37 <BobBall> yeah, think so 15:29:44 <johnthetubaguy> cool, thanks BobBall 15:29:53 <johnthetubaguy> catch you next week I guess 15:30:16 <johnthetubaguy> probably earlier on IRC with this nodepool stuff :) 15:30:20 <johnthetubaguy> #endmeeting