Thursday, 2013-12-05

silehtjd__, if you can take a look to the oslo.messaging stuffs13:13
silehtjd__, and,n,z13:13
silehtthx :)13:14
thomasemHey everyone!13:32
thomasemHey, jd__, eglynn: this patch set appears ready to go, could we get some eyes on it? Wanting to get this piece in soon for the dependent patch set that's about to be up for review. :)
jd__thomasem: definitely on my too long todo list :(13:43
eglynnthomasem: I'll try to take a look at it before EoD13:43
jd__probably not today since I'm conferencing13:43
thomasemjd__, eglynn: Thanks! I appreciate the help.13:44
thomasemjd__, Understandable13:44
nprivalovajd__, hi! I have a question for you about several instances of collector in perf tests. ping me if you have a time13:56
jd__nprivalova: likely not today I think, but you can mail me if you want or we can try to chat tomorrow :)13:57
nprivalovajd__, ok. I'll try to ask someone else :) And if no result will mail13:58
jd__sure, I'm interested :)13:58
nprivalovaguys, I have the lab with 3 controllers  and 200 computes. There are HA-mysql installation in controllers. I installed ceilometer only on one controller and on all computes and I believe that if I install ceilometer on 2 controllers more performance will be better. As I understand I may start 3 instances of collector instead of 1. Am I right?14:04
thomasemCan you clarify this statement? "I installed ceilometer only on one controller and on all computes?"?14:12
thomasemAs long as the collectors are hitting the same queue (so it round-robins) I think that's the intended way for it to work.14:12
thomasemby hitting I mean consuming from14:12
nprivalovaso I have ceilometer-api, ceilometer-agent-central, ceilometer-collector and  ceilometer-agent-compute running on one collector and  ceilometer-agent-compute on all computes14:12
nprivalovalooks like if I run ceilometer-collector on 2 controllers more they start to process more messages from queue and DB will be loaded more14:12
thomasemYeah, the DB needs to be able to keep up14:12
thomasemwith multiple connections14:12
thomasemif you have 10 subsequent messages on the queue, and 3 collectors, this is the intended spread, I think:14:12
thomasemceilometer_01 - 1,4,7,1014:12
thomasemcollector_02 - 2,5,814:12
thomasemcollector_03 - 3,6,914:12
thomasemsorry that first ceilometer_01 should be collector_0114:12
nprivalovayep, I believe that db configured to work with multiple connections. Galera is used there14:12
thomasemCool. The addition of collectors (assuming there are always unconsumed messages in the queue) will likely increase the load on your DB, so you could shift the bottle-neck.14:12
thomasemif there is one14:12
thomasemBut at some point the MQ service can't go any faster too14:12
thomasemAnywho, to answer the question, I believe the collector is designed to share a queue with other collectors to process more messages faster.14:13
nprivalovaI just want to understand may I start to work on bp about "make getting the data from db faster". On the HK summit there was a lot of discussions about  MQ bottleneck. So I was confused about this fact. "Should I start improving "getting" performance before resolving this bottleneck" - that is my concern14:26
nprivalovanow I've made a test with 200 instances up and interval 5 sec polling. It worked ok, about 9 000 000 entries in db. Today I'm planning to run at least 1000 instances. But I didn't have alarms and events14:27
thomasemSo, nprivalova, you might want to have a look at some of the DB performance testing we're doing. We are looking to scale to 1,000,000 messages/day14:59
thomasemSo we're taking a look at the drivers against various data stores and seeing what we come up with. This goes directly to the DB layer, so it's not looking at the collector speed. Though, I am interested about bottlenecks further up the stack.15:00
thomasemThe speed of my driver won't mean squat without the MQ and the collector keeping up. :)15:02
nprivalovathomasem, I know about your investigations. My purpose is mostly MQ performance. DB performance is really interesting piece so I hope you will get interesting results :) btw, do you measure 'write-spead'? Because 'read-spead' depends on implementation very much15:05
thomasemAll we're testing right now is write-speed.15:06
thomasemWe're going to worry about read-speed after that (which is lesser priority than writes).15:06
thomasemSince writes HAVE to keep up with the flow of messages, queries can take a little more time.15:07
thomasemnprivalova, Cool. I wasn't sure you were. :)15:07
thomasemnprivalova, pretty much, I'm taking a look at various deployment configurations and backends and just hammering it with millions of generated events (from a pool, pseudo-random) and getting RAM, disk I/O, read speed, CPU utilization, etc.15:09
thomasemto find out what needs to be tuned15:09
shadoweris there a documentation describing how to install ceilometer in a multi-node setup? I checked's architecture.html and install/manual.html but I'm still unclear where each service goes16:44
shadowere.g. if I have a bunch of nova compute nodes and a single node with keystone, scheduler, the databases, etc.16:44
*** litong has quit IRC16:44
shadowerI'd want to put the compute agent onto each compute node and the ceilometer-api on the controller node16:44
shadowerbut what about the notification agent, collector and all the other services?16:44
*** litong has joined #openstack-ceilometer16:46
dragondmgordc: Thanks for the review. I've uploaded a new patchset fixing your concerns. If you think we really need to add a new config group for those options I can add that too.20:44
herndongordc: not sure what the error is with trait types... gates didn't catch anything. Couldn't we just fix the problem instead of reverting the whole patch??21:11
gordcwhoops. didn't see messages.21:51
gordcdragondm: i'm ok with not having config group... i'll +2 once jenkins pass21:51
gordcherndon: i'm not sure what error jd__ sees. i'm ok with your patch though... just want to see if there was a reason for revert.21:52
herndonthe event patch can't go in if trait types is there :/21:53
dragondmgordc: Cool. I had to recheck on jenkins (looks like tempest is failing with a spurious glance issue) Hopefully that will go through soon.21:53
herndonI commented on the review. I'm really surprised there is a problem as I tested the migrations with data in the db, and tested up->down->up migrations. This stuff is tricky :(.21:54
gordcherndon: i didn't see any issues either when i tested with data. (saw issues elsewhere but not related to your patch)21:55
gordcdragondm: yeah, i was going to recheck the patch. you beat me to it :)21:55
dragondmHeh. I'm quick. Perhaps a little too quick if the typos in my documentation are any indication.  ( brain.speed > finger.speed ) :P21:56
gordcaside from typos the docs were really good though. i would've avoided your 2000 line patch if your docs weren't so damn clear.lol21:59
dragondmHeh. Thanks. Yah, since I'm basically defining a mini DSL, I figured folks would need to know what to do with it :>  I actually wrote much of the documentation when I wrote the blueprint for the feature.22:01
