19:01:51 #startmeeting 19:01:52 Meeting started Tue Dec 6 19:01:51 2011 UTC. The chair is mtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:53 Useful Commands: #action #agreed #help #info #idea #link #topic. 19:02:12 \o 19:04:30 ok. so, I've got a list of things to share with folks - but I think jeblair and ttx had something we were chatting about earlier that they said was a good CI meeting topic... 19:05:25 I suppose we can come to that later. for now 19:05:27 for the new guy here (me!) the wiki overview on the CI team is? 19:05:49 lloydde: hi! and we should probably have one of those 19:05:56 lloydde: there is ci.openstack.org for some of the information 19:05:56 hehe 19:06:07 thnx anotherjesse! 19:06:21 lloydde: the CI team essentially runs jenkins, gerrit and the rest of the dev infrastructure 19:06:27 roger 19:06:30 lloydde: so the folks who are only noticed when things are annoying :) 19:06:42 #topic CI todo list 19:07:04 lloydde: May or may not be helpful. http://wiki.openstack.org/GerritJenkinsGithub 19:07:19 noted 19:07:23 I keep mentioning our todo list, and I keep promising to make blueprints and bugs out of it 19:07:46 I still haven't gotten that done yet, so I just pastebinned the todo list so I could share it at the moment 19:07:55 #link http://paste.ubuntu.com/761935/ 19:08:18 #action mtaylor turn the todo list into proper blueprints by 12/12 19:09:32 there are essentiall three sections to the list - the first section is the general pile of random stuff that needs done at some point 19:10:00 the second section, which is marked Jim, are mainly tasks focued on the integration testing 19:10:14 and the third section are plugins and bugfixes on the docket for jenkins itself 19:10:34 which I'm working on hiring someone to do (but which had no current resource assigned to them) 19:10:57 some of them go in openstack-common instead of openstack-ci right? 19:11:00 client split for instance 19:11:07 yes 19:11:33 that's mainly listed in the CI todo list because the way some of the jenkins jobs are structured is either waiting on that or needs to be changed once it's done 19:12:11 glance client being a good example of that - pip-requires glance is much uglier than the future pip-requires -e git:python-glanceclient 19:13:06 mtaylor: I'll help writeup the cli separation 19:13:14 anotherjesse: awesome 19:13:22 since I can't help with the ci stuff as much 19:14:57 anotherjesse: I was chatting with a guy at hp who (similar to you) has his own jenkins up and doing tasks ... and we were talking about getting someone to write a jenkins plugin that would allow us to sensibly check the jenkins configs in to git and actually, you know, version control them 19:15:20 I think that would be killer helpful in getting more of us better able to work together on that aspect of things 19:15:30 mtaylor: currently devstack has a build_jenkins.sh that builds a 2 devstack based jenkins jobs on oneiric 19:16:59 anotherjesse: that uses the jenkins api, yeah? we've got an outstanding bug in jenkins which prevents us from using that on the public jenkins 19:17:23 anotherjesse: but I'm working on getting that filed with kohsuke so that it'll be fixed 19:17:38 nice 19:18:17 also - I want to convince ppb of something, but this might be a good place for an initial discussion 19:18:52 which is that gerrit has built in support for ensuring that devs have signed the CLA 19:19:18 seems to me with all of our automated things, having reviewers required to go look at a wiki page to see if the person submitting the patch has signed the CLA is a failure point 19:19:41 when all of the machinery is there to remove the need for reviewers to care about it 19:19:59 anybody have any good reasons in their head why we _shouldn't_ do that? 19:21:31 awesome 19:21:47 #agreed gerrit should handle/enforce CLA signing so devs don't have to 19:21:51 that was easy 19:21:52 I thought we were doing that ... 19:21:56 nope 19:21:57 so I agree that we should 19:22:10 yeah - the "check the wiki" step is pretty fail 19:22:30 #topic translations import 19:22:51 we set up the jobs yesterday to handle the translations import from the launchpad translations into a git commit 19:23:08 there are two different ways to handle the final step: 19:23:22 one is to have jenkins submit the translations to gerrit for review as a normal change 19:23:41 (I lied, there are three ways) 19:23:49 the second is to give jenkins the permissions to allow it to push the translations changes directly 19:24:14 the third is to have it submit a change and then write a server-side hook for gerrit which will notice a translations change and automatically merge it 19:24:34 the third way seems the nicest, because it means we don't have to give all of the jenkins jobs push access to trunk 19:24:51 but it will mean a couple of extraneous emails on each translations import 19:25:23 any thoughts/opinions? 19:26:04 the second is the simplest of course 19:26:35 jaypipes: you here yet? I'm vamping a little bit because I want you in the next topic I want to talk about 19:26:37 but it grants jenkins a lot of permissions 19:26:42 I'm exactly sure what the translations are, but the third seems best, despite extraneous e-mails. 19:26:46 I'm not* 19:27:38 yep, the only downsides to 3 are extra emails for watchers of projects, and a little extra complexity. 19:28:02 Right, but I don't see a good reason to put that work in the hands of Jenkins, other than the simplicity. 19:28:28 ArseneRei: translations of text strings into other languages - we use the launchpad translations feature to have people do the translating work itself, and it provides files with the translated text 19:29:05 we feel we don't need further code review of these changes (which option 1 would provide) because they are reviewed inside of launchpad 19:29:15 mtaylor: Ah, okay. Thanks. Then yes, #3. Jenkins shouldn't be involved on that level, in my opinion. 19:29:32 Yeah, that's sounds reasonable to me. 19:30:10 ArseneRei: well, jenkins will still be involved somewhat, as it will be the one fetching the translations from launchpad and making a git commit from them ... but i do believe that having the acceptance of that commit codified inside of gerrit is a nicer place for it 19:30:39 mtaylor: Sure, but that's why I said at that level. :) 19:30:47 excellent 19:31:56 #agreed have jenkins propose a translations change to gerrit and then have a gerrit hook that notices a translation change from jenkins that only contains changes to po/ and automatically accepts it 19:32:29 mtaylor: here now. 19:32:31 are we keeping translations in consistent places across projects? I think they're in the po/ subdir everywhere that we use them, yeah? 19:32:38 jaypipes: excellent 19:32:52 #topic Integration testing and gating 19:33:26 so, jeblair is about to send out an email with a longer-form version of this, but I wanted to loop folks here in to both status and thinking 19:33:49 we've got devstack-based integration testing working from jenkins, which is great 19:34:39 amongst the features it has are spinning up fresh clean cloud servers for each tests, running devstack in them, and then on failures putting the machine to the side and installing the ssh key of the dev who broke it so that they can debug the problem 19:35:15 mtaylor: do we have have authorization to do that yet? 19:35:19 mtaylor: excellent. what are you using for integration tests, though? because I have yet to be able to run tempest properly... 19:35:39 no. we do not have authorization to hand the vms to devs yet 19:35:54 so that feature will be disabled until we do. :( 19:35:55 mtaylor: in same boat - haven't got tempest all sorted yet myself 19:36:00 but we've asked and hopefuly should have an answer soon 19:36:10 mtaylor: eventually it would be nice to have a flow to inject others keys 19:36:13 jeblair: we're still running exercise.sh at the moment, yeah? 19:36:21 anotherjesse: ++ 19:36:21 since the person who can fix might not be the person who runs 19:36:24 yes, exercise.sh 19:36:31 anotherjesse: good feature request 19:36:39 * mtaylor adds to list 19:36:46 can we elaborate on that? 19:37:04 as it stands, it adds the keys of the dev who proposed the change 19:37:18 and of course, that person can add whoever else's key he or she wants 19:37:28 heckj: I'm *almost* there, though... been fixing a lot of bugs as I go along. 19:37:29 how would people like to change that? 19:37:45 jeblair: maybe anyone in core can inject a key? 19:38:36 * jaypipes can't get exercise.sh to succeed... fails all tests :( https://bugs.launchpad.net/devstack/+bug/898843 19:38:37 Launchpad bug 898843 in devstack "exercise.sh fails all three exercises" [Undecided,New] 19:38:54 currently trunk is broken btw 19:39:03 you know - what might be interesting (and might have to wait until we're running on something with glance) ... would be a button or something to say "hey, I'd also like a copy of that machine over here" 19:39:06 about 7 hours ago - so having this in place would be muy bueno 19:40:02 so that brings us to the next point ... which is gating trunk with this 19:40:18 jaypipes: if you have time we can help figure out the fix for that later 19:40:20 in #dev 19:40:24 anotherjesse: coolio :) 19:40:29 anotherjesse: cheers 19:40:43 anotherjesse: prolly something silly.. just don't know enough to diagnose 19:41:17 gating trunk on the integration tests, it currently stands, may just about kill most of the devs, because of the potential complexity of submitting coordinated changes 19:41:25 so we have a couple of thoughts on dealing with that 19:42:25 the first step - and the easiest (it'll get us at least _something_) - is to turn on integration tests post merge and then have those actually send notification somewhere that people will notice it 19:42:28 mtaylor: the coordinated changes is the complexity :( 19:42:45 mtaylor: ya - knowing when it breaks is half the battle 19:42:51 also, there are a number of technical issues that need to be a lot more solid before we gate trunk on such a complicated test. currently devstack and the launch scripts are highly suceptible to network and transient errors, that needs to be a lot more solid so we don't have false negatives 19:42:53 this may be a combination of IRC messages and emails to the QA team, but ensuring actual information is there is at least a start 19:43:21 yes. also what jeblair said - there are some external depends that we don't have control over such as PyPI and github 19:43:34 (gating stable on those will help us flush those stability problems out) 19:43:45 What is gating? Is it freezing a merge until some set of tests pass? 19:44:03 I've been seeing a huge number of intermittent errors with PyPi over the past few days -I'd be very loate to gate trunk on it. 19:44:06 ArseneRei: yes. we currently do not allow code to hit trunk unless it passes unittests 19:44:13 heckj: yeah - same with github 19:44:15 mtaylor: Thanks. 19:44:19 heckj: so we need to solve those 19:44:25 additional things we can do to improve on the trunk situation include: 19:45:06 adding a feature similar to what dprince did to run tests on code before it's approved/reviewed, but without jenkins voting on the change, so that reviewers can assess the failures 19:45:37 and then developing tooling around submitting coordinated changes that would potentially ease the burden of needing to change nova and devstack at the same time 19:45:54 <_0x44> mtaylor: There was a verbal vote at the essex summit to gate on an integration test that was supported by vish for nova. 19:45:56 that's on the list under "- Pre-review testing" which is chronologically after "- New Gerrit Trigger Plugin" because that's a dependency 19:45:56 "gating trunk on the integration tests, it currently stands, may just about kill most of the devs, because of the potential complexity of submitting coordinated changes" 19:45:58 (there are client-side tools we'd need, as well as a change to gerrit or the gerrit trigger plugin to make that work) 19:46:01 may? 19:46:04 <_0x44> mtaylor: I'm not sure why this needs to come up again. 19:46:17 <_0x44> mtaylor: If there's a single integration test, gate on it. 19:46:22 _0x44: yes. we're talking about the actual implementation of it, not whether or not we agree it should be done 19:46:26 mtaylor: we are currently running everything pre-commit to trunk: 19:46:28 http://reviewday.ohthree.com/ 19:46:49 <_0x44> mtaylor: It's already been done by Ozone... 19:46:57 _0x44: it's that the integation tests depend on multiple things, not just nova, and the coordination of those things become tricky 19:47:00 I know there is a bit of disagreement on the use of SmokeStack. But its what I've got. 19:48:06 <_0x44> mtaylor: If you don't gate it won't be less tricky. 19:48:25 <_0x44> mtaylor: It will just force that trickiness onto the user/downstream distributor 19:48:52 mtaylor: On the dprince and intermittent issue, it can be modified perhaps with voting when knowing if the intermittent issues are absent. 19:49:03 _0x44: I think what we're talking about is a phased implementation 19:49:09 Also. I aim to update that report so that installer (ubuntu) or config management changes (chef) that team titan makes those changes ahead of time. 19:49:40 reviewday and smokestack essentially are trying to visualize the "fullstack". 19:49:41 <_0x44> mtaylor: I don't think that's the right way to do it. 19:49:51 _0x44: based on concerns from vish, actually, on the process of submitting changes sensibly 19:50:20 _0x44: and the concern about a high level of false negatives at the moment that have nothing to do with the devs code 19:50:28 <_0x44> mtaylor: The phased implementation that was chosen at essex was to gate on a single integration test. 19:50:42 but instead on the flakiness of github and pypi 19:51:04 _0x44: which is what we have at the moment for stable/diablo 19:51:14 <_0x44> mtaylor: Fix those as they come up, don't try to prematurely decide what false-negatives will happen and throw out the value of integration testing because you want to fix occassional false negatives. 19:51:29 _0x44: we're not prematurely deciding 19:51:41 we've been running this for weeks and looking at the actual problems 19:51:54 https://jenkins.openstack.org/job/dev-gate-integration-tests-devstack-vm/ 19:52:16 those failures are all false negatives 19:52:40 and that job can't even run right now because of a problem with the cloud provider we're using 19:52:57 i don't think it's fair to stop all forward progress on the project until we can handle problems like that 19:53:00 <_0x44> jeblair: There are only five false negatives there. 19:53:21 so what we're saying is, we need to sort those out, but in the mean time we'd like to at least run this stuff post-merge so that we're not waiting for nirvana before getting feedback 19:53:47 and there are only 30 tests there, that's a 16% false negative rate 19:55:06 it is also worth noting that at this point, i don't believe any integration test can involve less than nova, glance, and keystone 19:55:20 and devstack 19:55:24 and python-novaclient 19:55:27 <_0x44> jeblair: That's the point of an integration test... 19:55:50 so this test (will) actually gate(s) all of nova, glance, keystone, devstack, python-novaclient, and openstack-ci (which has the scripts that launch the test) 19:56:32 _0x44: i know, i mean only to give an indication of what's meant by "coordinated change" 19:57:00 (oh, and tempest, the test framework itself, once that's ready) 19:58:04 jeblair: I wonder, since dprince is already doing pre-review testing on his systems ... any reason to not have reviewday register code review votes on gerrit? at least for coordinated information purposes 19:58:19 <_0x44> jeblair: Only fixes that depend on new features in a specific project need to have coordinated changes. Otherwise you test against the last known good. 19:58:23 I mean, anybody can register a non-binding review on something 19:58:36 _0x44: yes. it turns out that's what we're doing 19:58:39 mtaylor: that seems like a really good idea 19:58:55 re: having reviewday post reviews to gerritt 19:59:12 mtaylor: i don't believe anything should be prohibiting that 19:59:23 dprince: ^^^ 19:59:44 dprince: any interest on your side in doing that? or any way I can help you do that? 20:00:07 \o 20:00:08 <_0x44> mtaylor: If you're already doing that, then the fact that patches need to land in a specific order in order for tests to pass isn't really a concern for CI. It's a concern for the upstream projects. 20:00:46 _0x44: yes. I think the larger concern is coordinating changes between devstack and project config changes. landing patches in API/feature order is less problematic 20:00:47 if strict ordering is the approach projects would like to take, then that certainly makes it easier to implement on the CI side 20:01:13 but vishy seemed concerned with that approach the last time we spoke 20:01:19 <_0x44> mtaylor: That still isn't a CI problem, that's a project management/PTL problem. 20:01:29 mtaylor: i'm not sure how devstack in special in that regard 20:01:37 sorry to interrupt, but are we going to use this channel for PPB? 20:01:39 it's noon 20:01:45 er... GMT somethingorother 20:01:48 it is time to wrap up 20:01:53 <_0x44> Anyway, I'll stfu now. 20:01:54 let me just say... 20:01:56 mtaylor: during a development cycle, how nova and devstack interact isn't that different than nova and glance 20:02:07 as far as how you would coordinate those changes 20:02:21 here. PPB? 20:02:24 _0x44: well, it's not about stfu - I think we might have missed each other on scope of what I'm talking about here ... 20:02:33 zns, just waiting for the channel 20:02:38 <_0x44> mtaylor: I was meaning for the ppb 20:02:49 jmckenty: is the ppb happening? 20:02:51 _0x44: heh 20:02:54 #endmeeting