15:01:18 <johnthetubaguy> #startmeeting XenAPI
15:01:19 <openstack> Meeting started Wed Jun 11 15:01:18 2014 UTC and is due to finish in 60 minutes.  The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:24 <openstack> The meeting name has been set to 'xenapi'
15:01:30 <johnthetubaguy> hello all
15:01:59 <johnthetubaguy> hello all
15:02:03 <johnthetubaguy> hows things going?
15:02:07 <BobBall> Fine fine
15:02:08 <BobBall> you?
15:02:12 <johnthetubaguy> what topics do we have for today?
15:02:16 <johnthetubaguy> BobBall: good thanks
15:02:41 <BobBall> our CI is the best again
15:03:04 <BobBall> http://www.rcbops.com/gerrit/reports/nova-cireport.html
15:03:07 <BobBall> the stats have been updated
15:03:36 <johnthetubaguy> #topic CI
15:03:48 * johnthetubaguy is looking at stats
15:04:05 <johnthetubaguy> hehe, so we are better than jenkins, except for the yellow bit
15:04:25 <johnthetubaguy> how do we get the yellow bit down?
15:04:29 <BobBall> we can't
15:04:41 <johnthetubaguy> why not? is that a time it takes to run thing?
15:04:43 <BobBall> unless we get more people in different timezones working on the CI in the same way as infra does
15:05:01 <BobBall> once a patch is missed, it's yellow forever - so we'd need <2 hour responses on _everything_
15:05:15 <johnthetubaguy> right, but how often does it screw up?
15:05:22 <johnthetubaguy> in the last 30 days
15:05:59 <BobBall> very very rarely
15:06:08 <BobBall> but if it happens and we miss even 1 then our yellow bar is bigger
15:06:41 <johnthetubaguy> right, thats fine, just curious how we improve that, people watching it doesn't feel like the correct answer
15:07:01 <BobBall> that's how infra fixes it :)
15:07:20 <BobBall> people go to #openstack-infra and shout until it's fixed
15:07:23 <BobBall> no one does that for xs CI
15:07:45 <johnthetubaguy> … we could monitor things, and make it fix its-self a little bit
15:07:59 <johnthetubaguy> but anyway, maybe what we need is a better measure
15:08:16 <BobBall> maybe
15:08:21 <BobBall> more automated emails would be nice
15:08:29 <BobBall> but quite honestly I'm not fussed about the yellow
15:08:33 <johnthetubaguy> thats a monitoring issue on our side right?
15:08:42 <johnthetubaguy> where our = xenserver ci
15:08:58 <BobBall> Sure - or even better on the nova-cireport.html's side
15:09:09 <BobBall> "Hey - I think your CI is down because it hasn't voted on XYZ"
15:09:40 <johnthetubaguy> the reason I say this, is at the summit there was agreement to reduce the yellow bar, and no one compained
15:09:41 <BobBall> I think that ci report is going into infra at some point which makes it easier to add such an email
15:09:48 <BobBall> I complained
15:09:54 <johnthetubaguy> if we are not happy we need to complain and come up with a better idea
15:10:03 <johnthetubaguy> OK, so I was half asleep, what was the response?
15:10:03 <BobBall> I pointed out in the etherpad why it is not appropriate
15:10:12 <BobBall> I haven't followed it up yet
15:10:13 <johnthetubaguy> oh, so no one was reading that
15:10:16 <johnthetubaguy> oops
15:10:19 <BobBall> but neither has anyone else AFAIK
15:10:25 <BobBall> i.e. no formal proposal has been made
15:10:30 <BobBall> that I've seen anyone
15:10:32 <BobBall> anyway*
15:10:45 <johnthetubaguy> agreed, mostly as the gate is screwed right now
15:10:59 <BobBall> Fine - so if/when it's proposed I will definitely argue against it
15:11:12 <BobBall> https://etherpad.openstack.org/p/juno-nova-third-party-ci I think?
15:11:16 <BobBall> it's not loading for me
15:11:25 <johnthetubaguy> I was just trying to get a better idea as a rebuttle
15:11:33 <BobBall> Line 32
15:11:36 <BobBall> Everyone jumps when jenkins is down, but few people (other than those running the CI system) monitor 3rd party CIs with the same enthusiasm.  If a 3rd party misses a patch (e.g. gerrit stream monitoring fails), then a new patch is submitted, the old missed patch is forever held as a miss by the stats.  IOW I imagine Jenkis miss rate will always be lower than 3rd party miss rate.
15:11:46 <BobBall> My suggestion....
15:11:49 <BobBall> Missed split: No vote vs late vote
15:11:52 <BobBall> disagreements stats (how often does it disagree with jenkins) - perhaps say 'jenkins fail' is only if all tempest failed in Jenkins to avoid known gate instabilities? - why compare to Jenkins rather than some other known, desired state?
15:11:59 <BobBall> correllation % / overlay with infra-jenkins
15:12:10 <BobBall> Low disagreement stats would be the key metric IMO
15:12:19 <johnthetubaguy> maybe
15:12:20 <BobBall> No late votes would also be a key metric
15:12:40 <BobBall> No votes should be 'acceptable' in the case of CI downtime as long as the 'no votes' are not too high
15:12:46 <johnthetubaguy> I like the idea of % late and % missed being different
15:12:49 <BobBall> i.e. maybe 10% 'no votes' is acceptable for a 3rd party CI
15:12:56 <johnthetubaguy> yeah, that seems reasonable
15:13:15 <johnthetubaguy> disagreement is harder, we want them to find other bugs, which would be disagreement
15:13:17 <BobBall> but we did make it clear that reporting must be <2h so no 'late votes' are acceptable (although I also think that's too strict)
15:13:32 <johnthetubaguy> let me find the link
15:13:34 <BobBall> Sure - it would all need to be on a scale
15:13:43 <BobBall> i.e. if you have 10% disagreements then we're happy
15:13:50 <BobBall> but we'll assume that 90% of all jobs should agree
15:14:02 <BobBall> if there are _ANY_ jenkins fails that you pass, then that's a massive red flag
15:14:07 <johnthetubaguy> https://wiki.openstack.org/wiki/HypervisorSupportMatrix/DeprecationPlan
15:14:12 <johnthetubaguy> hmm, it says four hours
15:14:33 <BobBall> but I don't like forcing a CI to match specific arbitrary numbers... the numbers should just give the PTL a feel on what is acceptable or not
15:14:37 <johnthetubaguy> I think an average below two is probably kinder
15:14:50 <BobBall> unacceptable --> warning; no fix/plan --> booting
15:14:56 <johnthetubaguy> right, the idea here was, how do we give a clear bar, rather than a gut feeling
15:15:03 <BobBall> let's all be reasonable here - we're all human after all :D
15:15:18 <BobBall> Sure - but the bar can't be set so low that none of the non-infra tests can match it
15:15:30 <johnthetubaguy> agreed
15:15:43 <BobBall> perhaps another metric that'd be useful is "CHANGES missed" rather than patchsets
15:15:46 <johnthetubaguy> just don't want people feeling like, hey we don't like you, so we don't approve your CI
15:15:57 <BobBall> if you miss patch 4 and patch 5 comes along, then you test 5, 4 shouldn't be a "miss"
15:15:59 <johnthetubaguy> ah, that in interesting idea
15:16:18 <BobBall> because there is no point going back and testing 4, and the CI is back up and running testing 5...
15:16:30 <BobBall> missed vs late etc
15:16:42 <johnthetubaguy> I think looking at the average reporting time is fine here, thinking about this more
15:17:11 <johnthetubaguy> anyways...
15:17:16 <BobBall> yes
15:17:18 <johnthetubaguy> digging out of that rat hole
15:17:27 <BobBall> rabbit hole.  Definitely bigger than a rat hole.
15:17:32 <johnthetubaguy> but I think we understand what we want
15:17:33 <johnthetubaguy> lol
15:17:41 <johnthetubaguy> what else did you want to cover
15:17:55 <BobBall> uhhh... not sure
15:17:56 <BobBall> oh yeah
15:17:57 <johnthetubaguy> I am getting back to setting up a parallel setup to get out some funky stuff
15:18:01 <BobBall> there's a rubbish bug
15:18:24 <BobBall> if you give devstack/d-g a repo (i.e. review.openstack.org repo) then it'll checkout from there
15:18:31 <BobBall> which is correct - right?
15:18:43 <BobBall> BUT in some cases you want to merge, a-la-Zuul
15:18:56 <BobBall> (all cases are safer with merging of course)
15:18:57 <johnthetubaguy> oh, this rings a bell
15:19:08 <BobBall> So there was a change this last week that failed until it was rebased
15:19:09 <johnthetubaguy> I remember turbo hipster guys talking about this one
15:19:11 <BobBall> now it passes
15:19:38 <BobBall> I'm still pushing to try and move more of the CI to an -infra base
15:19:45 <BobBall> which will let it use zuul
15:19:50 <johnthetubaguy> right
15:20:02 <johnthetubaguy> about ciros, did you move to the new image/
15:20:03 <BobBall> but I guess a short term fix might be to add some more hacky flags in D-G to merge rather than checkout
15:20:13 <BobBall> not this week, no
15:20:25 <johnthetubaguy> OK, so no more tests enabled at this point?
15:20:30 <BobBall> correct
15:20:45 <johnthetubaguy> no worries, just checking
15:21:08 <johnthetubaguy> having another meeting this week about getting us more help for this CI
15:21:25 <BobBall> well I'm not sure what the focus would be ATM
15:21:30 <johnthetubaguy> so there is a little bit of progress
15:21:30 <BobBall> apart from adding the quark stuff I guess
15:21:48 <BobBall> (or replacing nova-net with neutron+quark)
15:21:49 <johnthetubaguy> yeah, adding quark, adding more tests
15:22:04 <johnthetubaguy> maybe adding cloudcafe
15:22:41 <johnthetubaguy> but anyways, thats part of the discussion
15:22:43 <johnthetubaguy> I guess
15:22:56 <johnthetubaguy> also, just help with the 24-7 maintainance thing
15:23:03 <BobBall> yeah
15:23:13 <johnthetubaguy> some US people would spread the curve a little further
15:23:20 <johnthetubaguy> and into peak patch creation times
15:23:38 <johnthetubaguy> cool, so we are done for CI I guess?
15:23:53 <BobBall> indeed... but there is a learning curve which might be too long given that we're not having many issues at all ATM
15:24:00 <BobBall> Done indeed
15:24:07 <BobBall> I need to update the nodepool patches with more docs
15:24:14 <BobBall> hoping to do that tomorrow I think
15:24:24 <johnthetubaguy> agreed, but its probably needed, the other thing, is moving to zuul via turbohipster people
15:24:31 <johnthetubaguy> cool
15:24:37 <BobBall> That's a long way off
15:24:38 <johnthetubaguy> #topic Open Discussion
15:24:43 <johnthetubaguy> any thing more?
15:24:50 <BobBall> we need all of the upstreaming done first - which is the start of those nodepool changes ;)
15:25:03 <BobBall> Yeah... I keep meaning to test...
15:25:07 <BobBall> is HVM working?
15:25:13 <johnthetubaguy> BobBall: well, they can run modfied branches of some stuff
15:25:16 <johnthetubaguy> HVM working?
15:25:20 <johnthetubaguy> what do you mean?
15:25:28 <BobBall> There was a suggestion on some list somewhere where tempest only worked with PV guests and not HVM
15:25:38 <BobBall> oh, no, think it was on IRC
15:25:55 <BobBall> probably worth switching to just run a full tempest on HVM at some point to prove it does
15:25:56 <johnthetubaguy> oh, no idea, I suspect they just set the image up wrongly
15:25:59 <BobBall> and/or run some specific tests
15:26:09 <BobBall> well does Cirros support running HVM?
15:26:18 <johnthetubaguy> oh, so volume attach will need PV tools right?
15:26:27 <johnthetubaguy> or something like that
15:26:58 <BobBall> right - does cirros include PV tools for that? or would it run fully HVM?
15:27:02 <johnthetubaguy> oh, I doubt cirros is the correct choice for HVM tests
15:27:06 * BobBall doesn't know...
15:27:08 <BobBall> ah ok
15:27:19 <BobBall> mabe that's the answer then
15:27:30 <johnthetubaguy> some of our PVHM images are fairly small
15:27:37 <johnthetubaguy> they would probably do the trick
15:28:03 <johnthetubaguy> (if we turn caching on)
15:28:07 <BobBall> ok great
15:28:33 <johnthetubaguy> it certainly works in production, but its a good point, better image coverage would help
15:28:40 <johnthetubaguy> like testing windows and linx
15:28:51 <johnthetubaguy> oh wait, that will fail, but whatever
15:28:58 <johnthetubaguy> nested HVM, not so good
15:29:11 <johnthetubaguy> anyways, we are all done I guess?
15:29:37 <BobBall> yeah, think so
15:29:44 <johnthetubaguy> cool, thanks BobBall
15:29:53 <johnthetubaguy> catch you next week I guess
15:30:16 <johnthetubaguy> probably earlier on IRC with this nodepool stuff :)
15:30:20 <johnthetubaguy> #endmeeting