17:00:25 <jaypipes> #startmeeting 17:00:26 <openstack> Meeting started Thu Jun 21 17:00:25 2012 UTC. The chair is jaypipes. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:31 <JoseSwiftQE> hi! 17:00:35 <rohitk> Hello! 17:01:10 <jaypipes> davidkranz_: around? 17:01:25 <davidkranz_> Here now. 17:02:05 <jaypipes> heya 17:02:08 <davidkranz> jaypipes: You saw that Daryl can't make it, right? 17:02:13 <jaypipes> yeah 17:02:45 <JoseSwiftQE> We're pretty much all bowing out this week, meeting conflicts and the like. 17:02:56 <jaypipes> so I'm getting increasingly more frustrated with running tempest with multiple parallel processes... 17:03:11 <davidkranz> jaypipes: Anything I could help with? 17:03:16 <jaypipes> trying to diagnose Nova timeouts is getting very annoying. 17:03:28 <jaypipes> davidkranz: I'm not sure :( 17:04:21 <jaypipes> davidkranz: the problem is that when I run the tests without --processes, it takes a while but eventually completes. If I run with --processes=8 (my box is a 12-core machine), after a while I start seeing RPC timeouts in networking. and then shit starts snowballing after that. 17:05:15 <davidkranz> jaypipes: Hmm. 17:05:27 <jaypipes> davidkranz: and it's not an issue with quotas, because I've made the base compute test class create its own tenant/user for its testing. 17:05:39 <jaypipes> davidkranz: I'm wondering if Nova just can't keep up with it. 17:05:57 <davidkranz> jaypipes: That's what I was thinking. Are you using a single devstack node? 17:05:58 <jaypipes> davidkranz: so I will reset my env (yet again) and try with processes=2 instead, and see if things work better. 17:06:17 <jaypipes> davidkranz: yup, but frankly, the box has 12 cores and 24G of memory... should NOT be an issue 17:06:24 <davidkranz> jaypipes: We need to separate nova stress issues from tempest paralleliaztion issues. 17:07:02 <jaypipes> davidkranz: I'm actually not trying to stress Nova! :) Just trying to run the tempest test suite in a shorter amount of time with parallel processes 17:07:10 <davidkranz> jaypipes: The number of cores might not matter if there is only one api server or nova-network server. 17:07:36 <jaypipes> davidkranz: right. 17:07:42 <davidkranz> jaypipes: You may not be trying to stress, but if you are running 8 copies of tempest then you are! 17:08:11 <jaypipes> davidkranz: here's the kicker, though: when tempest starts crawling (after these RPC timeouts), doing a virsh list --all hangs indefinitely. So I think this may actually be a libvirt issue./ 17:08:39 <jaypipes> davidkranz: not running 8 copies of tempest... just oine copy of tempest, with all the tests split across 8 processes. 17:08:53 <davidkranz> jaypipes: I will take a try running against a multi-node system with nova-network running on all compute nodes. 17:09:16 <jaypipes> davidkranz: k, I will push my code then for you to pull. 17:10:07 <davidkranz> jaypipes: 8 copies of one with 8 processes still puts the same amount of transient stress, just for shorter duratino overall 17:10:34 <davidkranz> jaypipes: I meant "or" one with 8 processes. 17:10:41 <jaypipes> sure 17:10:56 <jaypipes> but that doesn't explain libvirt/QEMU hanging. :( 17:11:23 <jaypipes> I'm going to chat with vish about the libvirt non-blocking mode patch that is currently in the queue to see if that might help 17:11:52 <davidkranz> jaypipes: It is also possible this is a result of some post-essex regression. 17:12:17 <jaypipes> yeah 17:12:22 <davidkranz> jaypipes: I think I should work on getting some version of the stress tests into a job that runs every night. 17:13:04 <jaypipes> davidkranz: that would be good, yes. 17:13:28 <jaypipes> davidkranz: problem is, running stress tests on a 4G VM in the CI environment isn't particularly useful in reporting real errors... 17:13:59 <davidkranz> jaypipes: Yes, we need a real cluster for that. We will also need a real cluster when there are real performance tests. 17:14:09 <jaypipes> indeed 17:14:53 <jaypipes> well, besides me bitching about this, are there particular topics we need to discuss this week? 17:15:25 <davidkranz> jaypipes: Just what we should do about the resource thing. 17:15:40 <davidkranz> jaypipes: Daryl seemed to think it was related to something you were working on. 17:16:16 <jaypipes> davidkranz: well, the original patch I put together for the refactoring of smoke tests did have a resource manager in the base test classes. 17:16:21 <jaypipes> I believe that is what he means 17:16:48 <davidkranz> jaypipes: OK. Perhaps you can comment on the email I sent outlining my "counter-proposal". 17:17:04 <jaypipes> davidkranz: I will, yes 17:17:11 <davidkranz> jaypipes: Great. 17:17:24 <rohitk> jaypipes: Are we taking a direction on the negative tests re-factor? I've submitted a lot of negative tests, how can I help? 17:17:33 <davidkranz> jaypipes: Nothing else that I know of at the moment. 17:18:07 <rohitk> 1. Identifying overlaps in unit tests 17:18:18 <jaypipes> rohitk: we are not adding any more negative tests at this point. Instead, we are looking at using a grammar-based fuzz testing tool like randgen to do negative API testing 17:18:39 <rohitk> jaypipes: hmmm 17:19:07 <rohitk> jaypipes: The randgen would do negative API (blackbox) testing 17:19:41 <jaypipes> rohitk: correct. 17:19:51 <jaypipes> rohitk: although so do the unit tests mostly. 17:19:53 <rohitk> jaypipes: I think that would depend on the FuzzClientManager 17:19:54 <rohitk> ? 17:20:01 <jaypipes> yes. 17:20:45 <rohitk> jaypipes: ok, I'll look up the randgen LP link that you put up in the e-mail 17:20:49 <rohitk> thanks! 17:20:59 <jaypipes> rohitk: basically, the recent addition of so many negative test cases have made tempest run about 200% longer than before, and we need to find a better, faster strategy instead of adding a test method for every possible negative iteration 17:21:30 <rohitk> japypipes: totally agree, there is little value in making tests unnecessarily run longer 17:22:07 <davidkranz> jaypipes: As soon as you push your code I will give it a try. 17:22:17 <jaypipes> davidkranz: k, thx 17:23:07 * jaypipes wishes there were 30 hours in a day... :( 17:23:55 <jaypipes> alright... JoseSwiftQE, any update on swift? 17:25:19 <rohitk> jaypipes: I've also tracked updates on the Bugs filed for the Skipped tests, i'll wear the SkipCaptain hat for cleaning those up 17:25:30 <vishy> jaypipes: libvirt hang: is it on oneiric? 17:25:44 <jaypipes> vishy: yep 17:25:54 <JoseSwiftQE> jaypipes: No changes since last meeting. Just waiting for reviews. 17:25:57 <jaypipes> rohitk: thx. where are you keeping track of that stuff? 17:26:02 <jaypipes> JoseSwiftQE: k, thx 17:26:10 <vishy> jaypipes: it is a libvirt bug that has been discussed on the ml 17:26:26 <rohitk> i saw updates on many of the keystone bugs filed by myself, have'nt tracked them at a place yet, 17:26:31 <rohitk> jaypipes: but will do 17:26:40 <jaypipes> vishy: it's that RPC timeout thing... it's back. Whenever I run with --processes=X where X is >1 17:26:55 <vishy> jaypipes: oh nm then 17:26:57 <jaypipes> vishy: and libvirt just seems to hang and ERROR builds just pile up. 17:27:34 <vishy> jaypipes: oh i have a good idea about that 17:27:42 <jaypipes> vishy: do tell! 17:27:55 <vishy> jaypipes: are you sure it is libvirt that is hanging? 17:28:32 <jaypipes> vishy: if I do a virsh list --all, it hangs. doign ps aux |grep kvm shows a bunch of instances 17:28:42 <vishy> jaypipes: it is probably this: http://www.gossamer-threads.com/lists/openstack/dev/8808?do=post_view_threaded#8808 17:28:51 <vishy> jaypipes: solution: use precise :) 17:29:31 <jaypipes> vishy: heh. 17:30:36 <davidkranz> jaypipes: I think that explains it. Notice the comment from me in that thread. I have been using precise since April... 17:30:46 <jaypipes> davidkranz: k. 17:30:58 <jaypipes> I will try installing 12.04 then 17:31:14 <jaypipes> dist-upgrade from oneric to precise is a complete FAIL. 17:31:35 <davidkranz> jaypipes: Good idea. Just beware that there are some incompatibilities with glance I ran into,. 17:31:36 <jaypipes> I'll pull another 12.04 iso and reinstall everything... ugh. 17:31:37 <rohitk> jaypipes: ++ 17:31:45 <jaypipes> davidkranz: what incompats? 17:32:08 <davidkranz> jaypipes: It had to do with resyncing the database. 17:32:30 <davidkranz> I don't remember the details. It was a while ago. 17:32:47 <jaypipes> davidkranz: oh, k\ 17:32:50 <davidkranz> jaypipes: And they may have been fixed. I was a guinea pig for 12.04 with Adam G. 17:32:55 <jaypipes> heh 17:34:06 <jaypipes> alright y'all, I'm going to head out and install 12.04. davidkranz could you type up a very brief summary to the ML? 17:34:14 <jaypipes> #endmeeting