Friday, 2016-02-05

openstackgerritMerged openstack/gnocchi: carbonara: implement full listing for new measures  https://review.openstack.org/27628900:06
*** thorst has joined #openstack-telemetry00:22
*** thorst has quit IRC00:23
*** thorst has joined #openstack-telemetry00:24
*** thorst has quit IRC00:33
*** farid has joined #openstack-telemetry00:52
*** farid has quit IRC00:57
*** farid has joined #openstack-telemetry00:59
*** chlong has joined #openstack-telemetry01:01
*** cheneydc has joined #openstack-telemetry01:11
*** r-daneel has quit IRC01:24
*** thorst has joined #openstack-telemetry01:30
*** chlong has quit IRC01:34
*** chlong has joined #openstack-telemetry01:35
*** thorst has quit IRC01:38
*** raginbajin has quit IRC01:42
*** raginbajin has joined #openstack-telemetry01:46
*** ljxiash has joined #openstack-telemetry01:48
*** liusheng has joined #openstack-telemetry02:02
*** thorst has joined #openstack-telemetry02:09
*** thorst has quit IRC02:10
*** thorst has joined #openstack-telemetry02:10
*** davidlenwell has quit IRC02:18
*** davidlenwell has joined #openstack-telemetry02:19
*** thorst has quit IRC02:19
*** davidlenwell has quit IRC02:23
*** liamji has joined #openstack-telemetry02:26
*** davidlenwell has joined #openstack-telemetry02:27
*** farid has quit IRC02:29
*** farid has joined #openstack-telemetry02:30
*** r-mibu has joined #openstack-telemetry02:34
*** prashantD has quit IRC02:50
*** achatterjee has joined #openstack-telemetry02:50
*** thorst has joined #openstack-telemetry03:17
*** sanjana has quit IRC03:17
*** prashantD has joined #openstack-telemetry03:24
*** thorst has quit IRC03:25
*** achatterjee has quit IRC03:34
*** ljxiash has quit IRC03:39
*** ljxiash has joined #openstack-telemetry03:39
*** ljxiash has quit IRC03:46
*** sanjana has joined #openstack-telemetry04:03
*** achatterjee has joined #openstack-telemetry04:08
*** thorst has joined #openstack-telemetry04:23
*** thorst has quit IRC04:30
*** links has joined #openstack-telemetry04:43
*** sriman has joined #openstack-telemetry04:44
srimanHi guys,04:44
srimancan any one suggest me how to deploy telemetry on devstack?04:45
srimanmaster branch04:45
*** prashantD has quit IRC05:03
*** peristeri has quit IRC05:06
*** yprokule has joined #openstack-telemetry05:10
swamireddysriman: Use below line in the localrc and run the stack.sh05:15
swamireddysriman: enable_plugin ceilometer https://git.openstack.org/openstack/ceilometer.git05:15
*** agireud has quit IRC05:20
*** ljxiash has joined #openstack-telemetry05:21
*** agireud has joined #openstack-telemetry05:23
*** thorst has joined #openstack-telemetry05:27
openstackgerritRyota MIBU proposed openstack/ceilometer: make even-alarm supported in default  https://review.openstack.org/27343205:30
*** thorst has quit IRC05:34
*** ljxiash has quit IRC05:34
*** ljxiash has joined #openstack-telemetry05:35
*** ljxiash has quit IRC05:39
*** ljxiash has joined #openstack-telemetry05:49
*** ljxiash has quit IRC05:54
openstackgerritOpenStack Proposal Bot proposed openstack/ceilometer: Imported Translations from Zanata  https://review.openstack.org/27334606:07
*** ljxiash has joined #openstack-telemetry06:17
openstackgerritRyota MIBU proposed openstack/aodh: WIP: tempest: copy api tests from tempest tree  https://review.openstack.org/25518806:31
openstackgerritRyota MIBU proposed openstack/aodh: tempest: add aodh tempest plugin  https://review.openstack.org/25519106:31
*** thorst has joined #openstack-telemetry06:31
*** thorst has quit IRC06:39
openstackgerritSanjana proposed openstack/python-ceilometerclient: Fixing a word spelling  https://review.openstack.org/27659706:48
*** jwcroppe has joined #openstack-telemetry06:58
openstackgerritSanjana proposed openstack/python-ceilometerclient: Fixing a word spelling  https://review.openstack.org/27659706:59
*** ljxiash has quit IRC07:01
*** links has quit IRC07:04
*** links has joined #openstack-telemetry07:14
*** belmoreira has joined #openstack-telemetry07:29
*** _nadya_ has joined #openstack-telemetry07:34
*** farid has quit IRC07:35
*** thorst has joined #openstack-telemetry07:37
*** thorst has quit IRC07:44
*** safchain has joined #openstack-telemetry07:57
openstackgerritJulien Danjou proposed openstack/gnocchi: carbonara: serialize AggregatedTimeSerie using RLE  https://review.openstack.org/27636508:06
*** sriman has quit IRC08:09
*** jwcroppe has quit IRC08:19
*** ildikov has quit IRC08:30
*** thorst has joined #openstack-telemetry08:42
*** thorst has quit IRC08:48
*** chlong has quit IRC08:52
openstackgerritMerged openstack/ceilometer: tempest: migrate base class for tests  https://review.openstack.org/25570708:54
openstackgerritMerged openstack/python-ceilometerclient: Fixing a word spelling  https://review.openstack.org/27659709:09
*** eglynn has joined #openstack-telemetry09:10
*** efoley has joined #openstack-telemetry09:14
*** openstackgerrit has quit IRC09:17
*** yassine has joined #openstack-telemetry09:17
*** openstackgerrit has joined #openstack-telemetry09:18
*** lsmola has joined #openstack-telemetry09:22
*** ildikov has joined #openstack-telemetry09:28
*** efoley_ has joined #openstack-telemetry09:34
*** efoley has quit IRC09:38
*** thorst has joined #openstack-telemetry09:47
openstackgerritEdwin Zhai proposed openstack/aodh: Clean config in source code  https://review.openstack.org/27665109:50
*** boris-42 has joined #openstack-telemetry09:52
*** thorst has quit IRC09:54
*** eglynn has quit IRC09:56
*** cheneydc has quit IRC09:56
*** lsmola has quit IRC09:59
openstackgerritJinxing Fang proposed openstack/ceilometer: Update the home page  https://review.openstack.org/27666010:03
*** eglynn has joined #openstack-telemetry10:14
*** r-mibu has quit IRC10:20
*** r-mibu has joined #openstack-telemetry10:20
*** thorst has joined #openstack-telemetry10:52
*** efoley_ has quit IRC10:54
*** thorst has quit IRC10:59
*** efoley_ has joined #openstack-telemetry11:01
*** efoley__ has joined #openstack-telemetry11:04
*** efoley_ has quit IRC11:05
*** jwcroppe has joined #openstack-telemetry11:26
openstackgerritMehdi Abaakouk (sileht) proposed openstack/python-gnocchiclient: Translate resource_id to UUID5 format.  https://review.openstack.org/26949311:31
*** jwcroppe has quit IRC11:32
*** jwcroppe has joined #openstack-telemetry11:33
*** achatterjee has quit IRC11:33
openstackgerritMehdi Abaakouk (sileht) proposed openstack/aodh: Fix alarm reason  https://review.openstack.org/27461511:34
*** efoley__ has quit IRC11:34
*** efoley__ has joined #openstack-telemetry11:35
*** jwcroppe has quit IRC11:37
*** eglynn has quit IRC11:38
*** thorst has joined #openstack-telemetry11:57
*** thorst has quit IRC12:04
*** efoley_ has joined #openstack-telemetry12:09
*** efoley__ has quit IRC12:13
*** ljxiash has joined #openstack-telemetry12:18
*** ljxiash_ has joined #openstack-telemetry12:22
*** ljxiash has quit IRC12:25
*** idegtiarov_ has joined #openstack-telemetry12:28
*** gordc has joined #openstack-telemetry12:33
*** yprokule has quit IRC12:36
*** efoley_ has quit IRC12:38
*** ljxiash_ has quit IRC12:39
*** thorst has joined #openstack-telemetry12:41
*** links has quit IRC12:43
openstackgerritNadya Shakhat proposed openstack/ceilometer: [WIP] Add cache abstraction  https://review.openstack.org/27671412:58
*** krotscheck has joined #openstack-telemetry13:01
*** efoley_ has joined #openstack-telemetry13:03
gordc_nadya_: just a note, i'd rather we try to fix racing issue rather than replace one issue with another.13:06
openstackgerritgordon chung proposed openstack/gnocchi: add randomness/chaos to metricd - POC  https://review.openstack.org/27648513:08
_nadya_gordc: I see, Gordon. I absolutely agree than in current handle_sample there is a race condition. I will try to use a lock with Redis and will test it. So, my plan is to rewrite handle_samle13:09
gordc_nadya_: a lock won't fix the ordering (i tried) :)13:10
_nadya_gordc: and this is a "low-lvl" race condition13:10
_nadya_gordc: yep!13:10
_nadya_gordc: but anyway I see two pros13:10
gordc_nadya_: the problem i found is basically depending on # of threads and the gap between related samples. if the # of threads is greater than the gap between related samples, it will be a race no matter what.13:11
gordc_nadya_: the only real fix is to fix threading.13:11
_nadya_gordc: 1. My researches show that with the cache agents works faster 2. No load on Rabbit. 2 is very important. I know that it doesn't solve the problem, but I want to have at least the alternative in Mitaka.13:14
*** nicodemus_ has joined #openstack-telemetry13:15
_nadya_gordc: I wanted to suggest to switch off transformers, but it is not the option because of autoscaling13:15
nicodemus_hello gordc13:16
*** yassine_ has joined #openstack-telemetry13:18
gordc_nadya_: i'd say it's difficult to calculate 'faster' when using test data and having a global cache which is conveniently on the same machine as your agents. in a lot of cases, global cache will not be on the same machine as all your agents and that adds a whole other level of latency/racing13:18
*** leitan has joined #openstack-telemetry13:18
_nadya_gordc: sure. I plan to test it on 200 node13:18
gordcregarding rabbit. i don't know. the whole design of a mq is to handle redirection of data...13:19
gordcif rabbit can't handle basic sorting (which is a pretty common use case according to a qpid dev)... well, we got bigger issues that ceilometer's notification agent.lol13:20
_nadya_gordc: Rabbit is considering as a main bottleneck now for OpenStack scaling, in Mirantis at least :)13:20
gordcnicodemus_: o/13:20
gordcwhat's up?13:20
gordc_nadya_: should've went with qpid.lol13:20
*** yassine has quit IRC13:20
nicodemus_gordc: I'm still struggling with metricd's processing13:21
_nadya_gordc: looking into Kafka now...13:21
openstackgerritMerged openstack/aodh: Clean config in source code  https://review.openstack.org/27665113:21
gordc_nadya_: i do really hope people help with the kafka driver though.13:21
gordcnicodemus_: still not processing fast enough?13:22
gordcjd__: ^ if you have some queries in mind.13:22
gordcnicodemus_: did you manage to get notification agent up and running?13:22
jd__gordc: sure13:22
jd__nicodemus_: what's the version deployed? how many measures is there? what metricd shows about progression in the log?13:23
nicodemus_unfortunately no... If I devote one metricd worker per instance, it can keep up. But with 16 workers total and 36 instances, more measures go in the ceph pool than the ones that are processed13:23
nicodemus_metricd logs show no error, there are a lot of skipped measures (already processed)13:23
*** boris-42 has quit IRC13:23
nicodemus_gordc: notification agent is running, but not doing anything. To be honest, I never used it/started it before13:25
nicodemus_ceilometer-agent-compute posts a message in metering.sample with 20 measures (20 counter_name in the payload)13:25
nicodemus_ceilometer-collector consumes that message, and post it to gnocchi, which in turn converts it to 20 "measure_" objects in the ceph pool13:26
nicodemus_I don't quite understand what would ceilometer-notifiation-agent do in this scenario13:27
gordcnicodemus_: did you change notification_topics in your ceilometer.conf to publish to metering? the default workflow should be polling->notification->collector13:27
nicodemus_jd__: I pulled gnocchi from master yesterday and redeployed13:27
jd__nicodemus_: ok, cool13:27
gordcnicodemus_: only pre-Liberty did polling agents write straight to collector.13:28
nicodemus_gordc: this is my ceilometer.conf: http://paste.openstack.org/show/486087/13:28
nicodemus_gordc: I was mistaken yesterday. We're using Kilo, not Liberty13:29
nicodemus_we backported the gnocchi dispatcher to Kilo13:30
gordcnicodemus_: ah, yeah, that' makes a lot more sense now.13:31
nicodemus_gordc: apologies for that, my mistake.13:32
*** pradk has joined #openstack-telemetry13:32
gordcnicodemus_: ok so roughly ~450samples/min passed to gnocchi and backlog grows roughly 300measures/min13:33
nicodemus_I wouldn't say the backlog grows 300 measures/min, it's between 50/100 measures/min13:34
nicodemus_but, 16 workers don't seem to be enough13:35
gordcnicodemus_: ... yeah, i'd hope 16 workers could handle 450measures/min13:36
nicodemus_I'm deploying another 4 metricd instances, looking to find the magic number13:36
*** cdent has joined #openstack-telemetry13:40
gordcnicodemus_: i don't know if you have debug logs, but maybe check rougly how many 'Processing measures for' logs you have vs '(already processed)' logs you have?13:40
nicodemus_gordc: I have debug, let me check13:40
gordcnicodemus_: maybe we do need to add more logic to better distribute workers.13:41
gordctesting it out right now.13:41
openstackgerritMerged openstack/python-gnocchiclient: Translate resource_id to UUID5 format.  https://review.openstack.org/26949313:42
nicodemus_gordc: on a one-minute timeframe, there were 1468 log lines for 'Processing measures' and 954 lines for '(already processed)' on all four metricd instances13:46
gordcnicodemus_: kk. how many total workers again? 16?13:48
nicodemus_gordc: 16 workers total, using redis for coordination13:50
nicodemus_gordc: I have four 4-vcpu instances running 4 metricd workers each. All instances show about 80-90% CPU usage13:51
gordcnicodemus_: i think that's expected, the workers are constantly looping/doing something so i'd imagine the cpu usage would be high.13:53
*** liusheng has quit IRC13:53
nicodemus_gordc: and the good thing, there are no errors in the logs :) all of them are working fine13:54
*** liusheng has joined #openstack-telemetry13:54
*** julim has joined #openstack-telemetry13:54
gordcnicodemus_: one small step :)13:55
gordcjd__: seems like more issues with global reqs (http://lists.openstack.org/pipermail/openstack-dev/2016-February/085838.html)13:58
gordcunleash your manifesto to burn it down13:58
cdentI think we should just stop testing anything but head14:00
jd__gordc: I think I just don't care anymore to lost time on a reply :)14:00
cdentbut I'm rude like that14:00
openstackgerritVictor Stinner proposed openstack/gnocchi: Don't require trollius on Python 3.4 and newer  https://review.openstack.org/27674214:01
gordccdent: adapt or burn?14:02
cdentthat old stuff is for the sellers14:02
jd__nicodemus_: can you paste an extract of the log of one metricd for ~5 minutes ?14:02
jd__nicodemus_: also, what's the archive policy you're using?14:03
gordccdent: was nova meetup where you live?14:06
cdentIt was about a 3 hour drive away14:06
gordcdid you bike or swim?14:07
silehtgordc, can you upgrade your +1 here: https://review.openstack.org/#/c/276110/ ?14:07
gordcsileht: :) i had a question but forgot to hit reply, if postgres doesn't index FK. do we need it still?14:07
sileht(if the awnser satisfy you of course)14:07
silehtgordc, oh14:08
silehtlets check that14:08
* gordc not postgres expert, i just googled and an old article said 'no auto index but only required depending on data' so i'm not sure14:09
gordci'm ok either way.14:09
silehtgordc, alembic migration tests unfortunatly doesn't assert on this kind of thing14:10
gordcsileht: i'm ok with removing it if we don't know if we need it... we can always add it back if we notice it might help.14:12
gordcthat sound ok?14:12
silehtgordc, let's readd it later if needed14:13
gordcsileht: kk14:13
silehtgordc, yeah your's right only mysql create a index for fk14:14
gordcsileht:  i like how everything is consistent. *sigh*14:15
silehtme too14:15
*** liamji has quit IRC14:16
jd__I'm down to 2.99 bytes per datapoint14:16
jd__I CAN DO BETTER14:16
jd__I can do negative storage14:16
jd__the more datapoints you store in Gnocchi the more free disk space you'll have14:17
silehtlol14:17
* jd__ needs sleep14:17
*** agireud has quit IRC14:17
gordcjd__: is the RLE stuff shrinking storage or data retrieval or both?14:18
jd__both14:18
jd__it shrinks file size14:18
gordcmagic14:18
jd__it's not magic, it's just that the approach we took since beginning was completely not optimized14:18
jd__(on purpose since I never dig into it since I never thought I'd spend 2 years writing a tsdb)14:19
jd__FML14:19
*** agireud has joined #openstack-telemetry14:19
* jd__ gools floating point compression algorithm14:20
jd__s/gools/googles/14:20
gordci need to figure out how much a sample in legacy mongodb was... 1kb? 10kb?14:21
jd__clearly depends on metadata14:23
*** peristeri has joined #openstack-telemetry14:24
gordcsadly, no one knows what was in metadata.14:24
nicodemus_jd__: I'm using the default archive policy, let me paste it with the logs14:24
*** ska has quit IRC14:26
jd__gordc: it's aaat least a few Kb anyway yeah14:29
jd__way bigger14:29
ityaptin_gordc: In mongodb one sample  was 1kb-1.7kb depending on metadata14:32
openstackgerritMerged openstack/aodh: gabbi's own paste.ini file  https://review.openstack.org/26533014:32
gordcityaptin_: at least it's not 1mb :)14:33
nicodemus_jd__: here's the 5 minute log and the archive policy: http://paste.openstack.org/show/486093/14:34
ityaptin_After longevity running for the 200 Gb of stored data the avg size was near the 1.2 kb per sample. Of course we should consider + 10%-15% to indexes.14:35
ityaptin_gordc: Yes :)14:35
*** eglynn has joined #openstack-telemetry14:35
*** eglynn has quit IRC14:35
*** eglynn has joined #openstack-telemetry14:35
gordcityaptin_: seriously? is it because we have big indices or because index in mongodb is expensive?14:37
*** efoley__ has joined #openstack-telemetry14:38
*** efoley_ has quit IRC14:38
*** KrishR has joined #openstack-telemetry14:39
ityaptin_gordc: with recent news by idegtiarov - both. We have a unused indexes and experience shows that they are big.14:41
gordcityaptin_: i see. good to know. thanks for sharing info.14:42
ityaptin_gordc: with my pleasure14:43
jd__nicodemus_: ok so this one is not really strugging: 2016-02-05 14:25:23.095 8435 INFO gnocchi.cli [-] Metricd reporting: 0 measurements bundles across 0 metrics wait to be processed.14:53
jd__there's no 5 minutes though only 30s :)14:54
nicodemus_jd__: This log was collected with only one ceilometer-compute-agent running, for measures not to build up. Would you like me to capture the logs again with both compute-agents running?14:55
jd__nicodemus_: sure!14:55
jd__I'd like to see the log when it struggles to cope14:55
nicodemus_Hmmm... maybe paste.openstack limits the amount of lines? there should be 4700 log lines14:56
*** datravis has quit IRC14:59
jd__nicodemus_: ah maybe15:00
jd__nicodemus_: gist.github.com? or fax it15:00
*** pradk has quit IRC15:03
*** pradk has joined #openstack-telemetry15:03
nicodemus_jd__: ok, collecting. We're thinking also to change the backend, from ceph to swift... do you think it could improve performance, being plain HTTP?15:04
gordcjd__:  you know what the diff between this conf https://github.com/openstack/gnocchi/blob/master/gnocchi/storage/_carbonara.py#L50 and this conf https://github.com/openstack/gnocchi/blob/master/gnocchi/cli.py#L76 is?15:04
jd__nicodemus_: do you want to change the backend for performance reason?15:04
jd__gordc: both should come from prepare_service(), except maybe in test mode where the one from metricd does not come from the conf built in the test15:06
*** efoley__ has quit IRC15:06
*** efoley__ has joined #openstack-telemetry15:06
jd__not sure how the tests are run if that's your problem :)15:06
nicodemus_jd__: it's one thing we thought maybe could improve it15:06
gordcjd__: for some reason the conf.metricd.workers exists at cli point, but conf.metricd.workers is gone when it hits storage layer.15:07
jd__nicodemus_: it's hard to say honestly… maybe you should try with the file storage first to see what's the diff between Ceph and file is, and have some values to compare with Swift too?15:07
jd__gordc: hmmmm I think they are all registered at the same place, no?15:08
gordcjd__: that's what i thought.15:08
nicodemus_jd__: here's the log: https://gist.github.com/nvlan/32b55d1bd381ac74221c15:09
gordcjd__: testing my random processing patch and it says the value don't exist. https://review.openstack.org/#/c/276485/2/gnocchi/storage/_carbonara.py15:09
nicodemus_jd__: when the log ends, there were 1752 "measure_" objects in the gnocchi pool waiting to be processed15:09
gordcnicodemus_: i would definitely start with file backend first. easier switch if you're just testing.15:10
jd__2016-02-05 15:01:24.995 31336 DEBUG gnocchi.storage._carbonara [-] Computed new metric 72f07343-ff08-4850-a105-db48133fab28 with 1 new measures in 2.08 seconds process_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:34915:13
jd__clearly it takes 2s to process measures for one metric15:13
jd__it's way too slow15:13
jd__2016-02-05 15:01:29.164 31337 DEBUG gnocchi.storage._carbonara [-] Computed new metric 02e84891-c93d-4356-a772-572b677e3dcb with 1 new measures in 13.90 seconds process_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:34915:14
jd__13s sometimes15:14
jd__it's like reading and writing to Ceph is very slow15:14
jd__gordc: because the conf object is conf.storage in this driver15:14
jd__gordc: not the full conf object is passed to avoid such hack15:14
jd__gordc: ;]15:14
gordcdammit. i read it wrong15:15
gordcjd__: thanks15:15
jd__s/this driver/drivers/15:15
jd__I remember now :)15:15
jd__nicodemus_: so why it takes between 2 and 13s to fetch and write a few files in Ceph? I wonder :/15:15
jd__nicodemus_: can you enlarge your… Ceph? :]15:17
nicodemus_jd__: I just tried ceph write... http://paste.openstack.org/show/486096/15:18
jd__Average Latency:        1.04703 is this in second?15:19
*** datravis has joined #openstack-telemetry15:19
nicodemus_yes15:19
jd__that looks high15:19
jd__and i'm not impressed by the speed but I don't know what slow or fast in Ceph is15:20
jd__sileht: an opinion?15:20
*** ildikov has quit IRC15:20
jd__I could ask shan15:20
nicodemus_jd__: those were 16 concurrent operations, 55 MB/s15:20
nicodemus_jd__: "measure_" objects should be really tiny, correct?15:21
jd__nicodemus_: yes it's usually one datapoint with Ceilometer, so a few Kb15:22
jd__like 1 Kb15:22
jd__nicodemus_: are you running on 1 Gb or 10 Gb network?15:22
nicodemus_1Gb network15:23
openstackgerritMerged openstack/gnocchi: Remove useless indexes  https://review.openstack.org/27611015:24
silehtjd__, ceph performance depends of ton of parameters15:25
jd__yeah leseb said it does not look so bad15:26
jd__55 MB/s is roughly half 1 Gb network15:26
jd__but the latency seems very high to me, maybe that's normal, idk15:26
silehtthat's a cheap cluster15:26
nicodemus_jd__: I missed one question, unfortunately we cannot enlarge ceph. We did make sure the pgs for the pool are spread across all OSDs15:26
jd__sileht: don't insult nicodemus_ cluster lol15:27
jd__nicodemus_: ok, fair enough15:27
jd__but clearly if you need 10s to process a metric it's never going to scale15:28
gordcjd__: local backlog?15:28
jd__won't help15:29
jd__it takes 0.01s to retrieve the new measures15:29
jd__but handling all the rest takes 10s15:29
nicodemus_sileht: hahah I agree the cluster might not be top-notch.. but then again, we only have two compute nodes pushing measures from 36 instances15:29
jd__probably because there is ~30 files to manipulate and it's slow15:29
jd__it looks to me Ceph is the bottleneck here so I'm not sure how to improve15:30
silehtceph is good to the spread IO only if you have a ton of OSDs overwise, it's just a bottleneck15:30
jd__nicodemus_: try to set aggregation_workers_number to 32 or something?15:31
silehtoverwise/otherwise15:31
gordcjd__: ah, it's the processing+write that's taking bulk of time15:31
jd__gordc: processing 1 point is going to be blazzingly fast… it's read/write that takes *seconds* :(15:31
nicodemus_jd__: this is the rados benck with a 1024 byte-size object: http://paste.openstack.org/show/486100/15:31
*** jwcroppe has joined #openstack-telemetry15:31
jd__nicodemus_: ah that looks better :)15:32
nicodemus_jd__: latency seems to be much lower15:32
jd__for latency15:32
jd__but the speed is wtf?15:32
silehtfor me, a ceph cluster with less 30 OSDs is hard to use15:33
nicodemus_this cluster has 16 OSDs in three hw nodes15:34
openstackgerritIgor Degtiarov proposed openstack/ceilometer: [MongoDB] exchange compound index with single field indexes  https://review.openstack.org/27626215:34
idegtiarov_llu, hi are you around?15:34
nicodemus_it's nearly all for gnocchi, the other pools are hardly used15:35
silehtnicodemus_, 3 nodes, does the gnocchi have replicated 3 ?15:36
sileht(pool)15:36
nicodemus_sileht: yes, pool size is 315:36
silehtso for each metter gnocchi have to wait the three nodes15:37
nicodemus_sileht: why would that be? for each PG, there is one primary OSD and the other two are for replication, so the request should only go to the primary OSD15:39
silehtnicodemus_, that's not how ceph  work, you send the data to the primary OSD, then the primary OSD send the data to two others and them primary OSD ack to clean the write15:39
silehtso you always wait for all nodes15:40
silehtby default min_size = size, so if you lost one node you cluster is just stuck until the failed node come back15:41
silehtbecause ceph cannot relocate the missing pg into another node and respected the size = 315:41
nicodemus_sileht: I see. So, in this case ceph is being the bottleneck?15:42
nicodemus_If such is the case, then adding another 16 metricd workers should not make any difference... does that make sense?15:43
gordcprobably not much. i'd imagine it'd be waiting for another worker to release lock on metric15:46
silehtnicodemus_, that can make sense to increase a bit the number of workers to ensure that at least all OSDs have something to do, but that can increase the latency of your ceph IO15:46
silehts/at least/15:46
*** david-lyle has joined #openstack-telemetry15:47
silehtnicodemus_, if you have only 2 computes and this is not going to increase, perhaps the file is sufficient15:47
silehtfile/file driver15:47
*** mragupat has joined #openstack-telemetry15:48
nicodemus_sileht: this deploy will not increase in # of computes, however we might deploy in the future in a customer with a high compute count15:49
silehtnicodemus_, in this case you will have more ceph nodes too :)15:49
nicodemus_sileht: if I configure file backend, wouldn't all read/writes go to just one disk? Wouldn't that be slower than using several ceph OSDs?15:52
*** eglynn has quit IRC15:52
silehtnicodemus_, you ceph is only 76Mb/s, I think your harddrive will perform better15:55
*** mgarza has joined #openstack-telemetry15:55
silehtnicodemus_, low cost harddrive are around 80MB/s, high cost 200MB/s15:56
nicodemus_sileht: In such a case, switching to swift shouldn't make any difference as well (both are on the same network)15:57
silehtnicodemus_, swift have a little difference, if I remember correcly the min_size per default is 115:58
silehtnicodemus_, if the node to replicate the data  is too slow, it will do it async15:58
silehtnicodemus_, so you have more change to lose data, but when the cluster is not heahtly by default swift perform better15:59
silehtceph or swift can be tweaked to have the same behavior, that's just the default configuration that different16:00
*** tongli has quit IRC16:00
*** tongli has joined #openstack-telemetry16:02
*** _nadya__ has joined #openstack-telemetry16:03
*** alextricity25_ has joined #openstack-telemetry16:04
nicodemus_sileht: the one thing that I don't quite understand, if a rados benck with 1024-byte object size has an avg latency of 0.021, why would it take metricd over 1s to process a measure? (or as jd__ saw, even over 10s)16:05
*** alextricity25_ is now known as alextricity16:06
*** alextricity is now known as alextricity25_16:06
silehtnicodemus_,indeed that doesn't looks good16:10
jd__nicodemus_: well with that bench the MB/s is very low so that's weird, if it's true for whatever reason it's also a source of slowness16:10
silehtnicodemus_, you need to write more data is you lower the object size to 102416:11
jd__nicodemus_: though yeah that's why I'd suggest you bench with the file driver; and then with Ceph, to see how metricd cope with your data and if Ceph is the bottleneck16:11
*** belmoreira has quit IRC16:13
*** _nadya_ has quit IRC16:14
*** alextricity25 has quit IRC16:14
*** vishwanathj has joined #openstack-telemetry16:14
nicodemus_jd__: I'll give it a try. Considering that I have three gnocchi API, the file driver would write locally on each API host disk, right? For this test should I leave just one?16:14
jd__nicodemus_: yes start with one16:15
jd__nicodemus_: if it works well enough, you can be fancy and try to export to other nodes via NFS maybe :)16:15
nicodemus_jd__: I have separated the gnocchi API from the metricd hosts. How would the metricd workers access the data if it's on the gnocchi API host? ...I think I need to have API and metricd in one single host, correct?16:17
jd__nicodemus_: correct (or use NFS)16:18
nicodemus_jd__: ok. I'll reconfigure then and gather another 5-minute log to see what happens16:19
nicodemus_and thank you all for your insights! :)16:20
*** david-lyle has quit IRC16:23
*** alextricity25_ is now known as alextricity16:23
*** alextricity is now known as alextricity2516:23
*** yassine_ has quit IRC16:25
*** pas-ha has joined #openstack-telemetry16:25
pas-hahi all, I have a question on using ceilo on multi-node devstack16:25
pas-hahow can I force ceilometer devstack plugin to deploy ceilometer-acompute only?16:26
jd__nicodemus_: you're welcome :)16:26
pas-hawith local.conf variables only that is16:26
*** eglynn has joined #openstack-telemetry16:28
*** david-lyle has joined #openstack-telemetry16:29
*** mragupat has quit IRC16:34
*** mragupat has joined #openstack-telemetry16:34
*** tongli has quit IRC16:35
*** efoley__ has quit IRC16:37
*** shardy has joined #openstack-telemetry16:43
jd__gordc: sileht: https://gist.github.com/jd/d3c23a261bd153d2929916:45
jd__I go down to 0.02 bytes per points which means for 4 hours of 1 second points made of value 0 and 1 the file takes 288 bytes16:47
jd__\o/16:47
jd__better than the 250 Kb16:47
gordcjd__: when does the compression happen?16:47
jd__at serialization16:47
jd__I didn't update the patch yet16:47
gordcjd__: kk. so we add ~.4s per write i guess?16:48
*** eglynn has quit IRC16:50
jd__gordc: yes16:52
jd__that's what I'm worried of for now16:52
jd__lz4 is pretty fast but maybe we can go faster things16:52
jd__Facebook Gorilla paper that's implemented in InfluxDB uses an XOR based compression of data, but I'm a bit lazy to implement that actually16:53
jd__hard to know if it's worth my time or not16:53
jd__I'll take a look next week16:53
openstackgerritJulien Danjou proposed openstack/gnocchi: carbonara: serialize AggregatedTimeSerie using RLE and LZ4  https://review.openstack.org/27636516:53
openstackgerritJulien Danjou proposed openstack/gnocchi: carbonara: clean unused methods  https://review.openstack.org/27682416:53
jd__gordc: patch updated ^16:53
gordcjd__: yeah, you've been very lazy this cycle. the number of patches has been disappointing.16:53
gordcstep up your game!16:53
jd__lol16:53
jd__I'm only #7 on http://stackalytics.com/?metric=commits16:54
jd__:(16:54
gordcsad.lol16:57
gordcbbl. lunch16:57
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Add some resource types tests  https://review.openstack.org/27041917:00
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Implements resource attribute number  https://review.openstack.org/27009117:00
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Simply how to get keystone url  https://review.openstack.org/27628217:00
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Implements resource attribute bool  https://review.openstack.org/27041817:00
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Implements resource attribute uuid  https://review.openstack.org/27009017:00
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Don't create Ceilometer resource types by default.  https://review.openstack.org/27032217:00
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Move legacy Ceilometer resource into indexer.  https://review.openstack.org/27026617:00
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Move resource type into their own sql table  https://review.openstack.org/26984317:00
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Implements resource type CRUD.  https://review.openstack.org/26984417:00
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Implements resource attribute string  https://review.openstack.org/26988817:00
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Rework the handling of the resource ID  https://review.openstack.org/27683017:00
*** vishwanathj has quit IRC17:02
*** vishwanathj has joined #openstack-telemetry17:03
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Rework the handling of the resource ID  https://review.openstack.org/27683017:07
*** ildikov has joined #openstack-telemetry17:10
*** _nadya__ has quit IRC17:14
*** tongli has joined #openstack-telemetry17:14
*** david-lyle has quit IRC17:14
*** vishwanathj has quit IRC17:18
*** KrishR has quit IRC17:20
*** farid has joined #openstack-telemetry17:20
*** david-lyle has joined #openstack-telemetry17:21
*** david-lyle has quit IRC17:25
*** prashantD has joined #openstack-telemetry17:29
openstackgerritMehdi Abaakouk (sileht) proposed openstack/gnocchi: Extend measures batching to named metrics  https://review.openstack.org/27336817:44
*** KrishR has joined #openstack-telemetry17:45
*** jkraj has joined #openstack-telemetry18:22
*** pas-ha has quit IRC18:40
*** shardy has quit IRC18:59
*** thorst has quit IRC19:03
*** thorst has joined #openstack-telemetry19:05
openstackgerritPradeep Kilambi proposed openstack/ceilometer: Handle malformed resource definitions gracefully  https://review.openstack.org/27687919:18
*** thorst has quit IRC19:24
*** pradk has quit IRC19:27
*** KrishR has quit IRC19:32
*** prashantD has quit IRC19:33
*** prashantD has joined #openstack-telemetry19:34
*** krotscheck is now known as krotscheck_dcm19:39
*** _nadya_ has joined #openstack-telemetry19:42
*** _nadya_ has quit IRC19:46
*** thorst has joined #openstack-telemetry19:54
*** thorst has joined #openstack-telemetry19:57
*** jwcroppe has quit IRC19:58
*** jwcroppe has joined #openstack-telemetry19:59
*** KrishR has joined #openstack-telemetry20:00
*** agireud has quit IRC20:00
*** agireud has joined #openstack-telemetry20:02
*** _nadya_ has joined #openstack-telemetry20:22
*** _nadya_ has quit IRC20:23
openstackgerritMerged openstack/gnocchi: Simply how to get keystone url  https://review.openstack.org/27628220:37
*** david-lyle has joined #openstack-telemetry20:39
*** david-lyle has quit IRC20:55
*** david-lyle_ has joined #openstack-telemetry20:55
*** david-lyle_ is now known as david-lyle21:00
*** tongli has quit IRC21:00
*** gordc has quit IRC21:09
*** safchain has quit IRC21:19
*** boris-42 has joined #openstack-telemetry21:23
*** _nadya_ has joined #openstack-telemetry21:24
*** _nadya_ has quit IRC21:28
*** peristeri has quit IRC21:29
*** mragupat has quit IRC21:40
*** mragupat has joined #openstack-telemetry21:41
*** thorst has quit IRC22:02
*** thorst has joined #openstack-telemetry22:04
*** thorst has quit IRC22:09
*** cdent is now known as dentures22:10
*** leitan has quit IRC22:17
openstackgerritgordon chung proposed openstack/gnocchi: add randomness/chaos to metricd - POC  https://review.openstack.org/27648522:19
*** nicodemus_ has quit IRC22:21
*** dentures has quit IRC22:39
*** vishwanathj has joined #openstack-telemetry22:46
*** vishwanathj has quit IRC23:00
openstackgerritRohit Jaiswal proposed openstack/python-ceilometerclient: Enhances client to support unique meter retrieval  https://review.openstack.org/27263323:05
*** mragupat has quit IRC23:08
*** farid has quit IRC23:20
*** prashantD has quit IRC23:34
*** prashantD has joined #openstack-telemetry23:36
*** mgarza has quit IRC23:37
*** chlong has joined #openstack-telemetry23:38
*** rbak has quit IRC23:39
*** KrishR has quit IRC23:48
*** leitan has joined #openstack-telemetry23:53

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!