19:01:51 <mtaylor> #startmeeting
19:01:52 <openstack> Meeting started Tue Dec  6 19:01:51 2011 UTC.  The chair is mtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:53 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic.
19:02:12 <heckj> \o
19:04:30 <mtaylor> ok. so, I've got a list of things to share with folks - but I think jeblair and ttx had something we were chatting about earlier that they said was a good CI meeting topic...
19:05:25 <mtaylor> I suppose we can come to that later. for now
19:05:27 <lloydde> for the new guy here (me!) the wiki overview on the CI team is?
19:05:49 <mtaylor> lloydde: hi! and we should probably have one of those
19:05:56 <anotherjesse> lloydde: there is ci.openstack.org for some of the information
19:05:56 <lloydde> hehe
19:06:07 <lloydde> thnx anotherjesse!
19:06:21 <mtaylor> lloydde: the CI team essentially runs jenkins, gerrit and the rest of the dev infrastructure
19:06:27 <lloydde> roger
19:06:30 <mtaylor> lloydde: so the folks who are only noticed when things are annoying :)
19:06:42 <mtaylor> #topic CI todo list
19:07:04 <ArseneRei> lloydde: May or may not be helpful. http://wiki.openstack.org/GerritJenkinsGithub
19:07:19 <lloydde> noted
19:07:23 <mtaylor> I keep mentioning our todo list, and I keep promising to make blueprints and bugs out of it
19:07:46 <mtaylor> I still haven't gotten that done yet, so I just pastebinned the todo list so I could share it at the moment
19:07:55 <mtaylor> #link http://paste.ubuntu.com/761935/
19:08:18 <mtaylor> #action mtaylor turn the todo list into proper blueprints by 12/12
19:09:32 <mtaylor> there are essentiall three sections to the list - the first section is the general pile of random stuff that needs done at some point
19:10:00 <mtaylor> the second section, which is marked Jim, are mainly tasks focued on the integration testing
19:10:14 <mtaylor> and the third section are plugins and bugfixes on the docket for jenkins itself
19:10:34 <mtaylor> which I'm working on hiring someone to do (but which had no current resource assigned to them)
19:10:57 <anotherjesse> some of them go in openstack-common instead of openstack-ci right?
19:11:00 <anotherjesse> client split for instance
19:11:07 <mtaylor> yes
19:11:33 <mtaylor> that's mainly listed in the CI todo list because the way some of the jenkins jobs are structured is either waiting on that or needs to be changed once it's done
19:12:11 <mtaylor> glance client being a good example of that - pip-requires glance is much uglier than the future pip-requires -e git:python-glanceclient
19:13:06 <anotherjesse> mtaylor: I'll help writeup the cli separation
19:13:14 <mtaylor> anotherjesse: awesome
19:13:22 <anotherjesse> since I can't help with the ci stuff as much
19:14:57 <mtaylor> anotherjesse: I was chatting with a guy at hp who (similar to you) has his own jenkins up and doing tasks ... and we were talking about getting someone to write a jenkins plugin that would allow us to sensibly check the jenkins configs in to git and actually, you know, version control them
19:15:20 <mtaylor> I think that would be killer helpful in getting more of us better able to work together on that aspect of things
19:15:30 <anotherjesse> mtaylor: currently devstack has a build_jenkins.sh that builds a 2 devstack based jenkins jobs on oneiric
19:16:59 <mtaylor> anotherjesse: that uses the jenkins api, yeah? we've got an outstanding bug in jenkins which prevents us from using that on the public jenkins
19:17:23 <mtaylor> anotherjesse: but I'm working on getting that filed with kohsuke so that it'll be fixed
19:17:38 <anotherjesse> nice
19:18:17 <mtaylor> also - I want to convince ppb of something, but this might be a good place for an initial discussion
19:18:52 <mtaylor> which is that gerrit has built in support for ensuring that devs have signed the CLA
19:19:18 <mtaylor> seems to me with all of our automated things, having reviewers required to go look at a wiki page to see if the person submitting the patch has signed the CLA is a failure point
19:19:41 <mtaylor> when all of the machinery is there to remove the need for reviewers to care about it
19:19:59 <mtaylor> anybody have any good reasons in their head why we _shouldn't_ do that?
19:21:31 <mtaylor> awesome
19:21:47 <mtaylor> #agreed gerrit should handle/enforce CLA signing so devs don't have to
19:21:51 <mtaylor> that was easy
19:21:52 <anotherjesse> I thought we were doing that ...
19:21:56 <mtaylor> nope
19:21:57 <anotherjesse> so I agree that we should
19:22:10 <mtaylor> yeah - the "check the wiki" step is pretty fail
19:22:30 <mtaylor> #topic translations import
19:22:51 <mtaylor> we set up the jobs yesterday to handle the translations import from the launchpad translations into a git commit
19:23:08 <mtaylor> there are two different ways to handle the final step:
19:23:22 <mtaylor> one is to have jenkins submit the translations to gerrit for review as a normal change
19:23:41 <mtaylor> (I lied, there are three ways)
19:23:49 <mtaylor> the second is to give jenkins the permissions to allow it to push the translations changes directly
19:24:14 <mtaylor> the third is to have it submit a change and then write a server-side hook for gerrit which will notice a translations change and automatically merge it
19:24:34 <mtaylor> the third way seems the nicest, because it means we don't have to give all of the jenkins jobs push access to trunk
19:24:51 <mtaylor> but it will mean a couple of extraneous emails on each translations import
19:25:23 <mtaylor> any thoughts/opinions?
19:26:04 <jeblair> the second is the simplest of course
19:26:35 <mtaylor> jaypipes: you here yet? I'm vamping a little bit because I want you in the next topic I want to talk about
19:26:37 <jeblair> but it grants jenkins a lot of permissions
19:26:42 <ArseneRei> I'm exactly sure what the translations are, but the third seems best, despite extraneous e-mails.
19:26:46 <ArseneRei> I'm not*
19:27:38 <jeblair> yep, the only downsides to 3 are extra emails for watchers of projects, and a little extra complexity.
19:28:02 <ArseneRei> Right, but I don't see a good reason to put that work in the hands of Jenkins, other than the simplicity.
19:28:28 <mtaylor> ArseneRei: translations of text strings into other languages - we use the launchpad translations feature to have people do the translating work itself, and it provides files with the translated text
19:29:05 <jeblair> we feel we don't need further code review of these changes (which option 1 would provide) because they are reviewed inside of launchpad
19:29:15 <ArseneRei> mtaylor: Ah, okay. Thanks. Then yes, #3. Jenkins shouldn't be involved on that level, in my opinion.
19:29:32 <ArseneRei> Yeah, that's sounds reasonable to me.
19:30:10 <mtaylor> ArseneRei: well, jenkins will still be involved somewhat, as it will be the one fetching the translations from launchpad and making a git commit from them ... but i do believe that having the acceptance of that commit codified inside of gerrit is a nicer place for it
19:30:39 <ArseneRei> mtaylor: Sure, but that's why I said at that level. :)
19:30:47 <mtaylor> excellent
19:31:56 <mtaylor> #agreed have jenkins propose a translations change to gerrit and then have a gerrit hook that notices a translation change from jenkins that only contains changes to po/ and automatically accepts it
19:32:29 <jaypipes> mtaylor: here now.
19:32:31 <mtaylor> are we keeping translations in consistent places across projects? I think they're in the po/ subdir everywhere that we use them, yeah?
19:32:38 <mtaylor> jaypipes: excellent
19:32:52 <mtaylor> #topic Integration testing and gating
19:33:26 <mtaylor> so, jeblair is about to send out an email with a longer-form version of this, but I wanted to loop folks here in to both status and thinking
19:33:49 <mtaylor> we've got devstack-based integration testing working from jenkins, which is great
19:34:39 <mtaylor> amongst the features it has are spinning up fresh clean cloud servers for each tests, running devstack in them, and then on failures putting the machine to the side and installing the ssh key of the dev who broke it so that they can debug the problem
19:35:15 <jeblair> mtaylor: do we have have authorization to do that yet?
19:35:19 <jaypipes> mtaylor: excellent. what are you using for integration tests, though? because I have yet to be able to run tempest properly...
19:35:39 <mtaylor> no. we do not have authorization to hand the vms to devs yet
19:35:54 <jeblair> so that feature will be disabled until we do.  :(
19:35:55 <heckj> mtaylor: in same boat - haven't got tempest all sorted yet myself
19:36:00 <mtaylor> but we've asked and hopefuly should have an answer soon
19:36:10 <anotherjesse> mtaylor: eventually it would be nice to have a flow to inject others keys
19:36:13 <mtaylor> jeblair: we're still running exercise.sh at the moment, yeah?
19:36:21 <mtaylor> anotherjesse: ++
19:36:21 <anotherjesse> since the person who can fix might not be the person who runs
19:36:24 <jeblair> yes, exercise.sh
19:36:31 <mtaylor> anotherjesse: good feature request
19:36:39 * mtaylor adds to list
19:36:46 <jeblair> can we elaborate on that?
19:37:04 <jeblair> as it stands, it adds the keys of the dev who proposed the change
19:37:18 <jeblair> and of course, that person can add whoever else's key he or she wants
19:37:28 <jaypipes> heckj: I'm *almost* there, though... been fixing a lot of bugs as I go along.
19:37:29 <jeblair> how would people like to change that?
19:37:45 <anotherjesse> jeblair: maybe anyone in core can inject a key?
19:38:36 * jaypipes can't get exercise.sh to succeed... fails all tests :( https://bugs.launchpad.net/devstack/+bug/898843
19:38:37 <uvirtbot> Launchpad bug 898843 in devstack "exercise.sh fails all three exercises" [Undecided,New]
19:38:54 <anotherjesse> currently trunk is broken btw
19:39:03 <mtaylor> you know - what might be interesting (and might have to wait until we're running on something with glance) ... would be a button or something to say "hey, I'd also like a copy of that machine over here"
19:39:06 <anotherjesse> about 7 hours ago - so having this in place would be muy bueno
19:40:02 <mtaylor> so that brings us to the next point ... which is gating trunk with this
19:40:18 <anotherjesse> jaypipes: if you have time we can help figure out the fix for that later
19:40:20 <anotherjesse> in #dev
19:40:24 <jaypipes> anotherjesse: coolio :)
19:40:29 <jaypipes> anotherjesse: cheers
19:40:43 <jaypipes> anotherjesse: prolly something silly.. just don't know enough to diagnose
19:41:17 <mtaylor> gating trunk on the integration tests, it currently stands, may just about kill most of the devs, because of the potential complexity of submitting coordinated changes
19:41:25 <mtaylor> so we have a couple of thoughts on dealing with that
19:42:25 <mtaylor> the first step - and the easiest (it'll get us at least _something_) - is to turn on integration tests post merge and then have those actually send notification somewhere that people will notice it
19:42:28 <anotherjesse> mtaylor: the coordinated changes is the complexity :(
19:42:45 <anotherjesse> mtaylor: ya - knowing when it breaks is half the battle
19:42:51 <jeblair> also, there are a number of technical issues that need to be a lot more solid before we gate trunk on such a complicated test.  currently devstack and the launch scripts are highly suceptible to network and transient errors, that needs to be a lot more solid so we don't have false negatives
19:42:53 <mtaylor> this may be a combination of IRC messages and emails to the QA team, but ensuring actual information is there is at least a start
19:43:21 <mtaylor> yes. also what jeblair said - there are some external depends that we don't have control over such as PyPI and github
19:43:34 <jeblair> (gating stable on those will help us flush those stability problems out)
19:43:45 <ArseneRei> What is gating? Is it freezing a merge until some set of tests pass?
19:44:03 <heckj> I've been seeing a huge number of intermittent errors with PyPi over the past few days -I'd be very loate to gate trunk on it.
19:44:06 <mtaylor> ArseneRei: yes. we currently do not allow code to hit trunk unless it passes unittests
19:44:13 <mtaylor> heckj: yeah - same with github
19:44:15 <ArseneRei> mtaylor: Thanks.
19:44:19 <mtaylor> heckj: so we need to solve those
19:44:25 <mtaylor> additional things we can do to improve on the trunk situation include:
19:45:06 <mtaylor> adding a feature similar to what dprince did to run tests on code before it's approved/reviewed, but without jenkins voting on the change, so that reviewers can assess the failures
19:45:37 <mtaylor> and then developing tooling around submitting coordinated changes that would potentially ease the burden of needing to change nova and devstack at the same time
19:45:54 <_0x44> mtaylor: There was a verbal vote at the essex summit to gate on an integration test that was supported by vish for nova.
19:45:56 <jeblair> that's on the list under "- Pre-review testing" which is chronologically after "- New Gerrit Trigger Plugin" because that's a dependency
19:45:56 <bengrue> "gating trunk on the integration tests, it currently stands, may just about kill most of the devs, because of the potential complexity of submitting coordinated changes"
19:45:58 <mtaylor> (there are client-side tools we'd need, as well as a change to gerrit or the gerrit trigger plugin to make that work)
19:46:01 <bengrue> may?
19:46:04 <_0x44> mtaylor: I'm not sure why this needs to come up again.
19:46:17 <_0x44> mtaylor: If there's a single integration test, gate on it.
19:46:22 <mtaylor> _0x44: yes. we're talking about the actual implementation of it, not whether or not we agree it should be done
19:46:26 <dprince> mtaylor: we are currently running everything pre-commit to trunk:
19:46:28 <dprince> http://reviewday.ohthree.com/
19:46:49 <_0x44> mtaylor: It's already been done by Ozone...
19:46:57 <mtaylor> _0x44: it's that the integation tests depend on multiple things, not just nova, and the coordination of those things become tricky
19:47:00 <dprince> I know there is a bit of disagreement on the use of SmokeStack. But its what I've got.
19:48:06 <_0x44> mtaylor: If you don't gate it won't be less tricky.
19:48:25 <_0x44> mtaylor: It will just force that trickiness onto the user/downstream distributor
19:48:52 <ArseneRei> mtaylor: On the dprince and intermittent issue, it can be modified perhaps with voting when knowing if the intermittent issues are absent.
19:49:03 <mtaylor> _0x44: I think what we're talking about is a phased implementation
19:49:09 <dprince> Also. I aim to update that report so that installer (ubuntu) or config management changes (chef) that team titan makes those changes ahead of time.
19:49:40 <dprince> reviewday and smokestack essentially are trying to visualize the "fullstack".
19:49:41 <_0x44> mtaylor: I don't think that's the right way to do it.
19:49:51 <mtaylor> _0x44: based on concerns from vish, actually, on the process of submitting changes sensibly
19:50:20 <mtaylor> _0x44: and the concern about a high level of false negatives at the moment that have nothing to do with the devs code
19:50:28 <_0x44> mtaylor: The phased implementation that was chosen at essex was to gate on a single integration test.
19:50:42 <mtaylor> but instead on the flakiness of github and pypi
19:51:04 <mtaylor> _0x44: which is what we have at the moment for stable/diablo
19:51:14 <_0x44> mtaylor: Fix those as they come up, don't try to prematurely decide what false-negatives will happen and throw out the value of integration testing because you want to fix occassional false negatives.
19:51:29 <mtaylor> _0x44: we're not prematurely deciding
19:51:41 <mtaylor> we've been running this for weeks and looking at the actual problems
19:51:54 <jeblair> https://jenkins.openstack.org/job/dev-gate-integration-tests-devstack-vm/
19:52:16 <jeblair> those failures are all false negatives
19:52:40 <jeblair> and that job can't even run right now because of a problem with the cloud provider we're using
19:52:57 <jeblair> i don't think it's fair to stop all forward progress on the project until we can handle problems like that
19:53:00 <_0x44> jeblair: There are only five false negatives there.
19:53:21 <mtaylor> so what we're saying is, we need to sort those out, but in the mean time we'd like to at least run this stuff post-merge so that we're not waiting for nirvana before getting feedback
19:53:47 <jeblair> and there are only 30 tests there, that's a 16% false negative rate
19:55:06 <jeblair> it is also worth noting that at this point, i don't believe any integration test can involve less than nova, glance, and keystone
19:55:20 <mtaylor> and devstack
19:55:24 <mtaylor> and python-novaclient
19:55:27 <_0x44> jeblair: That's the point of an integration test...
19:55:50 <jeblair> so this test (will) actually gate(s) all of nova, glance, keystone, devstack, python-novaclient, and openstack-ci (which has the scripts that launch the test)
19:56:32 <jeblair> _0x44: i know, i mean only to give an indication of what's meant by "coordinated change"
19:57:00 <jeblair> (oh, and tempest, the test framework itself, once that's ready)
19:58:04 <mtaylor> jeblair: I wonder, since dprince is already doing pre-review testing on his systems ... any reason to not have reviewday register code review votes on gerrit? at least for coordinated information purposes
19:58:19 <_0x44> jeblair: Only fixes that depend on new features in a specific project need to have coordinated changes. Otherwise you test against the last known good.
19:58:23 <mtaylor> I mean, anybody can register a non-binding review on something
19:58:36 <mtaylor> _0x44: yes. it turns out that's what we're doing
19:58:39 <jmckenty> mtaylor: that seems like a really good idea
19:58:55 <jmckenty> re: having reviewday post reviews to gerritt
19:59:12 <jeblair> mtaylor: i don't believe anything should be prohibiting that
19:59:23 <mtaylor> dprince: ^^^
19:59:44 <mtaylor> dprince: any interest on your side in doing that? or any way I can help you do that?
20:00:07 <ttx> \o
20:00:08 <_0x44> mtaylor: If you're already doing that, then the fact that patches need to land in a specific order in order for tests to pass isn't really a concern for CI. It's a concern for the upstream projects.
20:00:46 <mtaylor> _0x44: yes. I think the larger concern is coordinating changes between devstack and project config changes. landing patches in API/feature order is less problematic
20:00:47 <jeblair> if strict ordering is the approach projects would like to take, then that certainly makes it easier to implement on the CI side
20:01:13 <jeblair> but vishy seemed concerned with that approach the last time we spoke
20:01:19 <_0x44> mtaylor: That still isn't a CI problem, that's a project management/PTL problem.
20:01:29 <jeblair> mtaylor: i'm not sure how devstack in special in that regard
20:01:37 <jmckenty> sorry to interrupt, but are we going to use this channel for PPB?
20:01:39 <jmckenty> it's noon
20:01:45 <jmckenty> er... GMT somethingorother
20:01:48 <mtaylor> it is time to wrap up
20:01:53 <_0x44> Anyway, I'll stfu now.
20:01:54 <mtaylor> let me just say...
20:01:56 <jeblair> mtaylor: during a development cycle, how nova and devstack interact isn't that different than nova and glance
20:02:07 <jeblair> as far as how you would coordinate those changes
20:02:21 <zns> here. PPB?
20:02:24 <mtaylor> _0x44: well, it's not about stfu - I think we might have missed each other on scope of what I'm talking about here ...
20:02:33 <jmckenty> zns, just waiting for the channel
20:02:38 <_0x44> mtaylor: I was meaning for the ppb
20:02:49 <vishy> jmckenty: is the ppb happening?
20:02:51 <mtaylor> _0x44: heh
20:02:54 <mtaylor> #endmeeting