17:00:25 <jaypipes> #startmeeting
17:00:26 <openstack> Meeting started Thu Jun 21 17:00:25 2012 UTC.  The chair is jaypipes. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:31 <JoseSwiftQE> hi!
17:00:35 <rohitk> Hello!
17:01:10 <jaypipes> davidkranz_: around?
17:01:25 <davidkranz_> Here now.
17:02:05 <jaypipes> heya
17:02:08 <davidkranz> jaypipes: You saw that Daryl can't make it, right?
17:02:13 <jaypipes> yeah
17:02:45 <JoseSwiftQE> We're pretty much all bowing out this week, meeting conflicts and the like.
17:02:56 <jaypipes> so I'm getting increasingly more frustrated with running tempest with multiple parallel processes...
17:03:11 <davidkranz> jaypipes: Anything I could help with?
17:03:16 <jaypipes> trying to diagnose Nova timeouts is getting very annoying.
17:03:28 <jaypipes> davidkranz: I'm not sure :(
17:04:21 <jaypipes> davidkranz: the problem is that when I run the tests without --processes, it takes a while but eventually completes. If I run with --processes=8 (my box is a 12-core machine), after a while I start seeing RPC timeouts in networking. and then shit starts snowballing after that.
17:05:15 <davidkranz> jaypipes: Hmm.
17:05:27 <jaypipes> davidkranz: and it's not an issue with quotas, because I've made the base compute test class create its own tenant/user for its testing.
17:05:39 <jaypipes> davidkranz: I'm wondering if Nova just can't keep up with it.
17:05:57 <davidkranz> jaypipes: That's what I was thinking. Are you using a single devstack node?
17:05:58 <jaypipes> davidkranz: so I will reset my env (yet again) and try with processes=2 instead, and see if things work better.
17:06:17 <jaypipes> davidkranz: yup, but frankly, the box has 12 cores and 24G of memory... should NOT be an issue
17:06:24 <davidkranz> jaypipes: We need to separate nova stress issues from tempest paralleliaztion issues.
17:07:02 <jaypipes> davidkranz: I'm actually not trying to stress Nova! :) Just trying to run the tempest test suite in a shorter amount of time with parallel processes
17:07:10 <davidkranz> jaypipes: The number of cores might not matter if there is only one api server or nova-network server.
17:07:36 <jaypipes> davidkranz: right.
17:07:42 <davidkranz> jaypipes: You may not be trying to stress, but if you are running 8 copies of tempest then you are!
17:08:11 <jaypipes> davidkranz: here's the kicker, though: when tempest starts crawling (after these RPC timeouts), doing a virsh list --all hangs indefinitely. So I think this may actually be a libvirt issue./
17:08:39 <jaypipes> davidkranz: not running 8 copies of tempest... just oine copy of tempest, with all the tests split across 8 processes.
17:08:53 <davidkranz> jaypipes: I will take a try running against a multi-node system with nova-network running on all compute nodes.
17:09:16 <jaypipes> davidkranz: k, I will push my code then for you to pull.
17:10:07 <davidkranz> jaypipes: 8 copies of one with 8 processes still puts the same amount of transient stress, just for shorter duratino overall
17:10:34 <davidkranz> jaypipes: I meant "or" one with 8 processes.
17:10:41 <jaypipes> sure
17:10:56 <jaypipes> but that doesn't explain libvirt/QEMU hanging. :(
17:11:23 <jaypipes> I'm going to chat with vish about the libvirt non-blocking mode patch that is currently in the queue to see if that might help
17:11:52 <davidkranz> jaypipes: It is also possible this is a result of some post-essex regression.
17:12:17 <jaypipes> yeah
17:12:22 <davidkranz> jaypipes: I think I should work on getting some version of the stress tests into a job that runs every night.
17:13:04 <jaypipes> davidkranz: that would be good, yes.
17:13:28 <jaypipes> davidkranz: problem is, running stress tests on a 4G VM in the CI environment isn't particularly useful in reporting real errors...
17:13:59 <davidkranz> jaypipes: Yes, we need a real cluster for that. We will also need a real cluster when there are real performance tests.
17:14:09 <jaypipes> indeed
17:14:53 <jaypipes> well, besides me bitching about this, are there particular topics we need to discuss this week?
17:15:25 <davidkranz> jaypipes: Just what we should do about the resource thing.
17:15:40 <davidkranz> jaypipes: Daryl seemed to think it was related to something you were working on.
17:16:16 <jaypipes> davidkranz: well, the original patch I put together for the refactoring of smoke tests did have a resource manager in the base test classes.
17:16:21 <jaypipes> I believe that is what he means
17:16:48 <davidkranz> jaypipes: OK. Perhaps you can comment on the email I sent outlining my "counter-proposal".
17:17:04 <jaypipes> davidkranz: I will, yes
17:17:11 <davidkranz> jaypipes: Great.
17:17:24 <rohitk> jaypipes: Are we taking a direction on the negative tests re-factor? I've submitted a lot of negative tests, how can I help?
17:17:33 <davidkranz> jaypipes: Nothing else that I know of at the moment.
17:18:07 <rohitk> 1. Identifying overlaps in unit tests
17:18:18 <jaypipes> rohitk: we are not adding any more negative tests at this point. Instead, we are looking at using a grammar-based fuzz testing tool like randgen to do negative API testing
17:18:39 <rohitk> jaypipes: hmmm
17:19:07 <rohitk> jaypipes: The randgen would do negative API (blackbox) testing
17:19:41 <jaypipes> rohitk: correct.
17:19:51 <jaypipes> rohitk: although so do the unit tests mostly.
17:19:53 <rohitk> jaypipes: I think that would depend on the FuzzClientManager
17:19:54 <rohitk> ?
17:20:01 <jaypipes> yes.
17:20:45 <rohitk> jaypipes: ok, I'll look up the randgen LP link that you put up in the e-mail
17:20:49 <rohitk> thanks!
17:20:59 <jaypipes> rohitk: basically, the recent addition of so many negative test cases have made tempest run about 200% longer than before, and we need to find a better, faster strategy instead of adding a test method for every possible negative iteration
17:21:30 <rohitk> japypipes: totally agree, there is little value in making tests unnecessarily run longer
17:22:07 <davidkranz> jaypipes: As soon as you push your code I will give it a try.
17:22:17 <jaypipes> davidkranz: k, thx
17:23:07 * jaypipes wishes there were 30 hours in a day... :(
17:23:55 <jaypipes> alright... JoseSwiftQE, any update on swift?
17:25:19 <rohitk> jaypipes: I've also tracked updates on the Bugs filed for the Skipped tests, i'll wear the SkipCaptain hat for cleaning those up
17:25:30 <vishy> jaypipes: libvirt hang: is it on oneiric?
17:25:44 <jaypipes> vishy: yep
17:25:54 <JoseSwiftQE> jaypipes:  No changes since last meeting.  Just waiting for reviews.
17:25:57 <jaypipes> rohitk: thx. where are you keeping track of that stuff?
17:26:02 <jaypipes> JoseSwiftQE: k, thx
17:26:10 <vishy> jaypipes: it is a libvirt bug that has been discussed on the ml
17:26:26 <rohitk> i saw updates on many of the keystone bugs filed by myself, have'nt tracked them at a place yet,
17:26:31 <rohitk> jaypipes: but will do
17:26:40 <jaypipes> vishy: it's that RPC timeout thing... it's back. Whenever I run with --processes=X where X is >1
17:26:55 <vishy> jaypipes: oh nm then
17:26:57 <jaypipes> vishy: and libvirt just seems to hang and ERROR builds just pile up.
17:27:34 <vishy> jaypipes: oh i have a good idea about that
17:27:42 <jaypipes> vishy: do tell!
17:27:55 <vishy> jaypipes: are you sure it is libvirt that is hanging?
17:28:32 <jaypipes> vishy: if I do a virsh list --all, it hangs. doign ps aux |grep kvm shows a bunch of instances
17:28:42 <vishy> jaypipes: it is probably this: http://www.gossamer-threads.com/lists/openstack/dev/8808?do=post_view_threaded#8808
17:28:51 <vishy> jaypipes: solution: use precise :)
17:29:31 <jaypipes> vishy: heh.
17:30:36 <davidkranz> jaypipes: I think that explains it. Notice the comment from me in that thread. I have been using precise since April...
17:30:46 <jaypipes> davidkranz: k.
17:30:58 <jaypipes> I will try installing 12.04 then
17:31:14 <jaypipes> dist-upgrade from oneric to precise is a complete FAIL.
17:31:35 <davidkranz> jaypipes: Good idea. Just beware that there are some incompatibilities with glance I ran into,.
17:31:36 <jaypipes> I'll pull another 12.04 iso and reinstall everything... ugh.
17:31:37 <rohitk> jaypipes: ++
17:31:45 <jaypipes> davidkranz: what incompats?
17:32:08 <davidkranz> jaypipes: It had to do with resyncing the database.
17:32:30 <davidkranz> I don't remember the details. It was a while ago.
17:32:47 <jaypipes> davidkranz: oh, k\
17:32:50 <davidkranz> jaypipes: And they may have been fixed. I was a guinea pig for 12.04 with Adam G.
17:32:55 <jaypipes> heh
17:34:06 <jaypipes> alright y'all, I'm going to head out and install 12.04. davidkranz could you type up a very brief summary to the ML?
17:34:14 <jaypipes> #endmeeting