17:00:47 <davidkranz> #startmeeting qa 17:00:49 <openstack> Meeting started Thu Jan 17 17:00:47 2013 UTC. The chair is davidkranz. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:52 <openstack> The meeting name has been set to 'qa' 17:00:57 <mtreinish> hi 17:01:01 <mlavalle> hi 17:01:03 <ravikumar_hp> hi 17:01:05 <davidkranz> Hi there. 17:01:05 <donaldngo> hi 17:01:24 <davidkranz> Jay couldn't make it today. 17:01:50 <davidkranz> afazekas: Around? 17:02:19 <davidkranz> sdague: Here? 17:02:50 <mtreinish> davidkranz: sdague is out sick today. So, I don't think he'll make it. 17:03:05 <davidkranz> mtreinish: Ok, then let's start. 17:03:10 <davidkranz> #topic Reviews 17:03:34 <davidkranz> I don't think there is much to discuss there. 17:04:07 <davidkranz> Can any one speak to the state of the quantum tests? 17:04:32 * afazekas is here 17:04:35 <mlavalle> davidkranz: yes 17:04:47 <mlavalle> I am developing the code for 2 BP's 17:04:50 <afrittoli> hi 17:04:54 <mlavalle> basic tests 17:04:59 <mlavalle> and advanced tests 17:05:38 <mlavalle> https://blueprints.launchpad.net/tempest/+spec/quantum-basic-api 17:06:01 <mlavalle> https://blueprints.launchpad.net/tempest/+spec/quantum-extended-api 17:06:12 <mlavalle> coding the first one 17:06:24 <mlavalle> zyluo is writing code for the second one 17:06:30 <davidkranz> mlavalle: I see those blueprints. What is the relation to https://review.openstack.org/#/c/19152/ 17:06:36 <ravikumar_hp> mlavalle: is basic tests will be gated tests? 17:07:21 <mlavalle> davidkranz: that's a refactoring of a smoketest that mnewby implemented a month ago 17:08:01 <davidkranz> mlavalle: So your new tests are in addition to those. 17:08:17 <mlavalle> davidkranz: correct 17:08:54 <davidkranz> mlavalle: So it sounds like progress is being made. That's great. 17:09:15 <mlavalle> davidkranz: :-) 17:10:12 <davidkranz> OK, I think the next topic is progress on parallel execution. 17:10:20 <davidkranz> #topic Parallel execution 17:11:34 <afazekas> At the moment the resource reuse has more benefits 17:12:12 <davidkranz> afazekas: Resource reuse is certainly easier being more local. 17:12:56 <donaldngo> have we decided to use testr to run the current tempest tests? 17:13:22 <davidkranz> afazekas: But they are both important. The sheer volume of tests is increasing rapidly. 17:13:44 <davidkranz> donaldngo: That is a work in progress. 17:13:59 <afazekas> Probably we need to add more CPU to the gate VMs in order to see performance improvement from parallel testing 17:14:23 <afazekas> Looks like now, even the heavy load can causes flaky cases 17:15:05 <davidkranz> afazekas: That's probably true. 17:16:01 <davidkranz> afazekas: But still, a lot of time is spent waiting for state changes and that does not need more cpu to eliminate. 17:16:05 <afazekas> I will play with tmpfs as instance and glance image storage, probably it can mitigate the flaky issues, and might increase the performance 17:16:51 <afazekas> davidkranz: since the tempest and n-cpu running on the same cpu, we are affected by the cpu load 17:17:01 <davidkranz> afazekas: That would be great. 17:17:56 <davidkranz> afazekas: It may just be that there is simply too many processes running for a single-cpu instance. 17:18:32 <afazekas> yes 17:18:37 <davidkranz> It would be interesting to compare the performance right now with a 2-cpu instance. 17:18:52 <afazekas> we should test the impact of adding more cpu to qemu 17:19:01 <afazekas> yes 17:19:49 <davidkranz> #topic Open Discussion 17:19:59 <davidkranz> Does any one have anything else to discuss? 17:20:03 <donaldngo> is the migration to testtools still a work in progress or are we sticking with nose? 17:21:26 <davidkranz> donaldngo: I think cyeoh is working on it based on the chatter in #openstack-qa 17:21:40 <afazekas> Why we need to switch to testtools ? 17:22:10 <donaldngo> yea I saw cyeoh email I wasn't sure if this was a proof of concept or a change in direction 17:22:46 <davidkranz> afazekas: Jay and Daryl both tried and failed to get around bugs in the nose multiprocessing plugin 17:23:12 <davidkranz> afazekas: THere was no response from the nose people. 17:23:34 <davidkranz> The developer of testtools is part of the OpenStack community and was seen as an alternative. 17:23:38 <afazekas> davidkranz: can you send me links to this bugs ? 17:24:02 <davidkranz> afazekas: You should ask jaypipes for the details 17:24:16 <davidkranz> Also, the ci team switched away from nose. 17:24:20 <jhenner> what about fixing the multiprocessing plugin? (I know it is a headache to use) 17:24:35 <davidkranz> But there has not been a decision that we should stop using nose. 17:24:50 <afazekas> IMHO our case is vary special, so we might need to develop our own very dynamic test tools, with resource reuse capabilities 17:25:00 <davidkranz> jhenner: I wish jaypipes were here to comment. 17:25:04 <jaypipes> davidkranz: I spent a few hours last night, and have some promising code. 17:25:12 <jaypipes> davidkranz: on testtools + fixtures. 17:25:24 <jaypipes> davidkranz: problem is, it really does require quite a big rewrite. 17:25:41 <jaypipes> and I'm not sure how long it would take 17:25:51 <davidkranz> jaypipes: Cool. I think folks are still unsure that we should do such a rewrite and that we could salvage nose with less work. 17:26:37 <donaldngo> davidkranz++ 17:26:40 <afazekas> Is testtools using the threading module for parallel execution ? 17:26:40 <davidkranz> jaypipes: testtools is still work-in-progress and I'm not sure how we will decide in the end. 17:28:28 <afazekas> We need to consider IPC between test threads/processes, because of the resource sharing 17:29:35 <davidkranz> I think if some one can make nose do what we need, we would probably stick with it. But no one has. 17:29:52 <davidkranz> And those who have tried are pursuing the testtools approach at the moment. 17:31:58 <davidkranz> Any other comments or other topics to discuss? 17:33:21 <afazekas> Would be nice if we could identify flaky issues, more easier. 17:34:13 <afazekas> Basically collecting reviews when a recheck/reverify fixes an issue. 17:34:49 <jhenner> do we know what we need? 17:35:14 <davidkranz> afazekas: There are also the hourly full tempest runs. 17:35:15 <jhenner> I mean, what are our requirements? Are there summarized somewhere? 17:35:21 <davidkranz> afazekas: They are still flaky. 17:35:46 <davidkranz> jhenner: You mean requirements about flakiness? 17:36:22 <afazekas> I want to find coincides in the flaky cases log files. 17:37:03 <mtreinish> davidkranz: I didn't think that the hourly has failed recently since I pushed the fix to that error state during build fix. 17:38:08 <davidkranz> mtreinish: It failed yesterday and the day before. Doesn't seem tempest-related. 17:38:40 <mtreinish> davidkranz: ok 17:40:28 <davidkranz> It would make sense to keep track of the failure rate. 17:40:45 <davidkranz> I will ask the ci folks about that. 17:40:48 <jhenner> davidkranz: I meant requirements for the test runner to run. I didn't finish my sentence in time. 17:41:36 <jhenner> I think failure rate can be watched by some Jenkins plugin. Let me chceck 17:41:36 <davidkranz> jhenner: Ah. I don't think there are any written down. 17:42:13 <davidkranz> jhenner: That would be great. 17:42:45 <jhenner> "Project healthL can display number of failures per test in some interval you choose. 17:43:11 <davidkranz> jhenner: Is this something the ci folks have to install or whatever? 17:43:32 <afazekas> davidkranz: I think so 17:43:54 <davidkranz> jeblair: You there? 17:44:02 <jhenner> https://wiki.jenkins-ci.org/display/JENKINS/Project+Health+Report+Plugin 17:45:07 <davidkranz> That looks like what we need exactly. 17:46:43 <davidkranz> I'll ping the ci folks about that. 17:46:57 <jeblair> hi! 17:47:06 <jeblair> scrolling back 17:47:15 <davidkranz> jeblair: We were talking about the need to track flakiness. 17:47:33 <davidkranz> jeblair: And wondering if we could use that plugin. 17:47:48 <davidkranz> jeblair: Or some other idea you might have for that purpose. 17:48:51 <jeblair> davidkranz: we had to stop having jenkins parse test output, because of the impact that has on jenkins (it creates a synchronization point in test runs, and keeping a build history causes too much load) 17:49:03 <jeblair> davidkranz: so i don't think we can use that plugin 17:49:22 <davidkranz> jeblair: What about just tracking the %failure on a day by day basis. 17:49:49 <davidkranz> jeblair: I mean overall success so test output is not needed. 17:49:51 <jeblair> davidkranz: the concern here is which individual tests are failing, right? 17:50:05 <jhenner> jeblair: There is some clustering support for Jenkins. Do you know about that? Wouldn't it help? 17:50:20 <jeblair> davidkranz: (btw, you saw http://status.openstack.org/rechecks/ which is human crowdsourcing for overall job failing flakiness) 17:50:51 <davidkranz> jeblair: I did see that. 17:51:21 <davidkranz> jeblair: It's been my experience that folks ignore flakies unless the heat rises to a certain level 17:51:42 <davidkranz> jeblair: So I just thought it would be good to know when we were there in an obvious and objective way. 17:52:30 <davidkranz> Still trying to move toward a bigger gate but worried about flakiness. 17:52:45 <jeblair> davidkranz: so what's the granularity you want? at the jenkins job level, or individual test level? 17:53:29 <davidkranz> jeblair: Both really, but it seemed like you were saying individual test level was too expensive. 17:53:47 <jeblair> davidkranz: well, just that having _jenkins_ parse that is too expensive 17:54:09 <davidkranz> jeblair: I see. We could scrape log files instead. 17:54:32 <jeblair> davidkranz: also, there's a data-cleanliness issue, in that we run tests for proposed as well as merged changes, and also periodic; you're probably not so concerned with which tests are failing on proposed changes? 17:54:51 <davidkranz> jeblair: Right. 17:55:34 <davidkranz> jeblair: If we could start with job-level failure rate it would be helpful. 17:55:53 <davidkranz> jeblair: If load or other infrastructure issues are a problem it causes random tests to fail. 17:56:22 <jeblair> okay, i think the meeting's almost over, so let's brainstorm about that. 17:56:39 <davidkranz> jeblair: Thanks. That sounds good. 17:57:03 <davidkranz> ANything else before closing the meeting? 17:57:33 <davidkranz> OK, see you all next week. 17:57:37 * afazekas NO 17:58:03 <davidkranz> or on #openstack-qa 17:58:07 <davidkranz> #endmeeting