openstackgerrit | Merged openstack/gnocchi: carbonara: implement full listing for new measures https://review.openstack.org/276289 | 00:06 |
---|---|---|
*** thorst has joined #openstack-telemetry | 00:22 | |
*** thorst has quit IRC | 00:23 | |
*** thorst has joined #openstack-telemetry | 00:24 | |
*** thorst has quit IRC | 00:33 | |
*** farid has joined #openstack-telemetry | 00:52 | |
*** farid has quit IRC | 00:57 | |
*** farid has joined #openstack-telemetry | 00:59 | |
*** chlong has joined #openstack-telemetry | 01:01 | |
*** cheneydc has joined #openstack-telemetry | 01:11 | |
*** r-daneel has quit IRC | 01:24 | |
*** thorst has joined #openstack-telemetry | 01:30 | |
*** chlong has quit IRC | 01:34 | |
*** chlong has joined #openstack-telemetry | 01:35 | |
*** thorst has quit IRC | 01:38 | |
*** raginbajin has quit IRC | 01:42 | |
*** raginbajin has joined #openstack-telemetry | 01:46 | |
*** ljxiash has joined #openstack-telemetry | 01:48 | |
*** liusheng has joined #openstack-telemetry | 02:02 | |
*** thorst has joined #openstack-telemetry | 02:09 | |
*** thorst has quit IRC | 02:10 | |
*** thorst has joined #openstack-telemetry | 02:10 | |
*** davidlenwell has quit IRC | 02:18 | |
*** davidlenwell has joined #openstack-telemetry | 02:19 | |
*** thorst has quit IRC | 02:19 | |
*** davidlenwell has quit IRC | 02:23 | |
*** liamji has joined #openstack-telemetry | 02:26 | |
*** davidlenwell has joined #openstack-telemetry | 02:27 | |
*** farid has quit IRC | 02:29 | |
*** farid has joined #openstack-telemetry | 02:30 | |
*** r-mibu has joined #openstack-telemetry | 02:34 | |
*** prashantD has quit IRC | 02:50 | |
*** achatterjee has joined #openstack-telemetry | 02:50 | |
*** thorst has joined #openstack-telemetry | 03:17 | |
*** sanjana has quit IRC | 03:17 | |
*** prashantD has joined #openstack-telemetry | 03:24 | |
*** thorst has quit IRC | 03:25 | |
*** achatterjee has quit IRC | 03:34 | |
*** ljxiash has quit IRC | 03:39 | |
*** ljxiash has joined #openstack-telemetry | 03:39 | |
*** ljxiash has quit IRC | 03:46 | |
*** sanjana has joined #openstack-telemetry | 04:03 | |
*** achatterjee has joined #openstack-telemetry | 04:08 | |
*** thorst has joined #openstack-telemetry | 04:23 | |
*** thorst has quit IRC | 04:30 | |
*** links has joined #openstack-telemetry | 04:43 | |
*** sriman has joined #openstack-telemetry | 04:44 | |
sriman | Hi guys, | 04:44 |
sriman | can any one suggest me how to deploy telemetry on devstack? | 04:45 |
sriman | master branch | 04:45 |
*** prashantD has quit IRC | 05:03 | |
*** peristeri has quit IRC | 05:06 | |
*** yprokule has joined #openstack-telemetry | 05:10 | |
swamireddy | sriman: Use below line in the localrc and run the stack.sh | 05:15 |
swamireddy | sriman: enable_plugin ceilometer https://git.openstack.org/openstack/ceilometer.git | 05:15 |
*** agireud has quit IRC | 05:20 | |
*** ljxiash has joined #openstack-telemetry | 05:21 | |
*** agireud has joined #openstack-telemetry | 05:23 | |
*** thorst has joined #openstack-telemetry | 05:27 | |
openstackgerrit | Ryota MIBU proposed openstack/ceilometer: make even-alarm supported in default https://review.openstack.org/273432 | 05:30 |
*** thorst has quit IRC | 05:34 | |
*** ljxiash has quit IRC | 05:34 | |
*** ljxiash has joined #openstack-telemetry | 05:35 | |
*** ljxiash has quit IRC | 05:39 | |
*** ljxiash has joined #openstack-telemetry | 05:49 | |
*** ljxiash has quit IRC | 05:54 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/ceilometer: Imported Translations from Zanata https://review.openstack.org/273346 | 06:07 |
*** ljxiash has joined #openstack-telemetry | 06:17 | |
openstackgerrit | Ryota MIBU proposed openstack/aodh: WIP: tempest: copy api tests from tempest tree https://review.openstack.org/255188 | 06:31 |
openstackgerrit | Ryota MIBU proposed openstack/aodh: tempest: add aodh tempest plugin https://review.openstack.org/255191 | 06:31 |
*** thorst has joined #openstack-telemetry | 06:31 | |
*** thorst has quit IRC | 06:39 | |
openstackgerrit | Sanjana proposed openstack/python-ceilometerclient: Fixing a word spelling https://review.openstack.org/276597 | 06:48 |
*** jwcroppe has joined #openstack-telemetry | 06:58 | |
openstackgerrit | Sanjana proposed openstack/python-ceilometerclient: Fixing a word spelling https://review.openstack.org/276597 | 06:59 |
*** ljxiash has quit IRC | 07:01 | |
*** links has quit IRC | 07:04 | |
*** links has joined #openstack-telemetry | 07:14 | |
*** belmoreira has joined #openstack-telemetry | 07:29 | |
*** _nadya_ has joined #openstack-telemetry | 07:34 | |
*** farid has quit IRC | 07:35 | |
*** thorst has joined #openstack-telemetry | 07:37 | |
*** thorst has quit IRC | 07:44 | |
*** safchain has joined #openstack-telemetry | 07:57 | |
openstackgerrit | Julien Danjou proposed openstack/gnocchi: carbonara: serialize AggregatedTimeSerie using RLE https://review.openstack.org/276365 | 08:06 |
*** sriman has quit IRC | 08:09 | |
*** jwcroppe has quit IRC | 08:19 | |
*** ildikov has quit IRC | 08:30 | |
*** thorst has joined #openstack-telemetry | 08:42 | |
*** thorst has quit IRC | 08:48 | |
*** chlong has quit IRC | 08:52 | |
openstackgerrit | Merged openstack/ceilometer: tempest: migrate base class for tests https://review.openstack.org/255707 | 08:54 |
openstackgerrit | Merged openstack/python-ceilometerclient: Fixing a word spelling https://review.openstack.org/276597 | 09:09 |
*** eglynn has joined #openstack-telemetry | 09:10 | |
*** efoley has joined #openstack-telemetry | 09:14 | |
*** openstackgerrit has quit IRC | 09:17 | |
*** yassine has joined #openstack-telemetry | 09:17 | |
*** openstackgerrit has joined #openstack-telemetry | 09:18 | |
*** lsmola has joined #openstack-telemetry | 09:22 | |
*** ildikov has joined #openstack-telemetry | 09:28 | |
*** efoley_ has joined #openstack-telemetry | 09:34 | |
*** efoley has quit IRC | 09:38 | |
*** thorst has joined #openstack-telemetry | 09:47 | |
openstackgerrit | Edwin Zhai proposed openstack/aodh: Clean config in source code https://review.openstack.org/276651 | 09:50 |
*** boris-42 has joined #openstack-telemetry | 09:52 | |
*** thorst has quit IRC | 09:54 | |
*** eglynn has quit IRC | 09:56 | |
*** cheneydc has quit IRC | 09:56 | |
*** lsmola has quit IRC | 09:59 | |
openstackgerrit | Jinxing Fang proposed openstack/ceilometer: Update the home page https://review.openstack.org/276660 | 10:03 |
*** eglynn has joined #openstack-telemetry | 10:14 | |
*** r-mibu has quit IRC | 10:20 | |
*** r-mibu has joined #openstack-telemetry | 10:20 | |
*** thorst has joined #openstack-telemetry | 10:52 | |
*** efoley_ has quit IRC | 10:54 | |
*** thorst has quit IRC | 10:59 | |
*** efoley_ has joined #openstack-telemetry | 11:01 | |
*** efoley__ has joined #openstack-telemetry | 11:04 | |
*** efoley_ has quit IRC | 11:05 | |
*** jwcroppe has joined #openstack-telemetry | 11:26 | |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/python-gnocchiclient: Translate resource_id to UUID5 format. https://review.openstack.org/269493 | 11:31 |
*** jwcroppe has quit IRC | 11:32 | |
*** jwcroppe has joined #openstack-telemetry | 11:33 | |
*** achatterjee has quit IRC | 11:33 | |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/aodh: Fix alarm reason https://review.openstack.org/274615 | 11:34 |
*** efoley__ has quit IRC | 11:34 | |
*** efoley__ has joined #openstack-telemetry | 11:35 | |
*** jwcroppe has quit IRC | 11:37 | |
*** eglynn has quit IRC | 11:38 | |
*** thorst has joined #openstack-telemetry | 11:57 | |
*** thorst has quit IRC | 12:04 | |
*** efoley_ has joined #openstack-telemetry | 12:09 | |
*** efoley__ has quit IRC | 12:13 | |
*** ljxiash has joined #openstack-telemetry | 12:18 | |
*** ljxiash_ has joined #openstack-telemetry | 12:22 | |
*** ljxiash has quit IRC | 12:25 | |
*** idegtiarov_ has joined #openstack-telemetry | 12:28 | |
*** gordc has joined #openstack-telemetry | 12:33 | |
*** yprokule has quit IRC | 12:36 | |
*** efoley_ has quit IRC | 12:38 | |
*** ljxiash_ has quit IRC | 12:39 | |
*** thorst has joined #openstack-telemetry | 12:41 | |
*** links has quit IRC | 12:43 | |
openstackgerrit | Nadya Shakhat proposed openstack/ceilometer: [WIP] Add cache abstraction https://review.openstack.org/276714 | 12:58 |
*** krotscheck has joined #openstack-telemetry | 13:01 | |
*** efoley_ has joined #openstack-telemetry | 13:03 | |
gordc | _nadya_: just a note, i'd rather we try to fix racing issue rather than replace one issue with another. | 13:06 |
openstackgerrit | gordon chung proposed openstack/gnocchi: add randomness/chaos to metricd - POC https://review.openstack.org/276485 | 13:08 |
_nadya_ | gordc: I see, Gordon. I absolutely agree than in current handle_sample there is a race condition. I will try to use a lock with Redis and will test it. So, my plan is to rewrite handle_samle | 13:09 |
gordc | _nadya_: a lock won't fix the ordering (i tried) :) | 13:10 |
_nadya_ | gordc: and this is a "low-lvl" race condition | 13:10 |
_nadya_ | gordc: yep! | 13:10 |
_nadya_ | gordc: but anyway I see two pros | 13:10 |
gordc | _nadya_: the problem i found is basically depending on # of threads and the gap between related samples. if the # of threads is greater than the gap between related samples, it will be a race no matter what. | 13:11 |
gordc | _nadya_: the only real fix is to fix threading. | 13:11 |
_nadya_ | gordc: 1. My researches show that with the cache agents works faster 2. No load on Rabbit. 2 is very important. I know that it doesn't solve the problem, but I want to have at least the alternative in Mitaka. | 13:14 |
*** nicodemus_ has joined #openstack-telemetry | 13:15 | |
_nadya_ | gordc: I wanted to suggest to switch off transformers, but it is not the option because of autoscaling | 13:15 |
nicodemus_ | hello gordc | 13:16 |
*** yassine_ has joined #openstack-telemetry | 13:18 | |
gordc | _nadya_: i'd say it's difficult to calculate 'faster' when using test data and having a global cache which is conveniently on the same machine as your agents. in a lot of cases, global cache will not be on the same machine as all your agents and that adds a whole other level of latency/racing | 13:18 |
*** leitan has joined #openstack-telemetry | 13:18 | |
_nadya_ | gordc: sure. I plan to test it on 200 node | 13:18 |
gordc | regarding rabbit. i don't know. the whole design of a mq is to handle redirection of data... | 13:19 |
gordc | if rabbit can't handle basic sorting (which is a pretty common use case according to a qpid dev)... well, we got bigger issues that ceilometer's notification agent.lol | 13:20 |
_nadya_ | gordc: Rabbit is considering as a main bottleneck now for OpenStack scaling, in Mirantis at least :) | 13:20 |
gordc | nicodemus_: o/ | 13:20 |
gordc | what's up? | 13:20 |
gordc | _nadya_: should've went with qpid.lol | 13:20 |
*** yassine has quit IRC | 13:20 | |
nicodemus_ | gordc: I'm still struggling with metricd's processing | 13:21 |
_nadya_ | gordc: looking into Kafka now... | 13:21 |
openstackgerrit | Merged openstack/aodh: Clean config in source code https://review.openstack.org/276651 | 13:21 |
gordc | _nadya_: i do really hope people help with the kafka driver though. | 13:21 |
gordc | nicodemus_: still not processing fast enough? | 13:22 |
gordc | jd__: ^ if you have some queries in mind. | 13:22 |
gordc | nicodemus_: did you manage to get notification agent up and running? | 13:22 |
jd__ | gordc: sure | 13:22 |
jd__ | nicodemus_: what's the version deployed? how many measures is there? what metricd shows about progression in the log? | 13:23 |
nicodemus_ | unfortunately no... If I devote one metricd worker per instance, it can keep up. But with 16 workers total and 36 instances, more measures go in the ceph pool than the ones that are processed | 13:23 |
nicodemus_ | metricd logs show no error, there are a lot of skipped measures (already processed) | 13:23 |
*** boris-42 has quit IRC | 13:23 | |
nicodemus_ | gordc: notification agent is running, but not doing anything. To be honest, I never used it/started it before | 13:25 |
nicodemus_ | ceilometer-agent-compute posts a message in metering.sample with 20 measures (20 counter_name in the payload) | 13:25 |
nicodemus_ | ceilometer-collector consumes that message, and post it to gnocchi, which in turn converts it to 20 "measure_" objects in the ceph pool | 13:26 |
nicodemus_ | I don't quite understand what would ceilometer-notifiation-agent do in this scenario | 13:27 |
gordc | nicodemus_: did you change notification_topics in your ceilometer.conf to publish to metering? the default workflow should be polling->notification->collector | 13:27 |
nicodemus_ | jd__: I pulled gnocchi from master yesterday and redeployed | 13:27 |
jd__ | nicodemus_: ok, cool | 13:27 |
gordc | nicodemus_: only pre-Liberty did polling agents write straight to collector. | 13:28 |
nicodemus_ | gordc: this is my ceilometer.conf: http://paste.openstack.org/show/486087/ | 13:28 |
nicodemus_ | gordc: I was mistaken yesterday. We're using Kilo, not Liberty | 13:29 |
nicodemus_ | we backported the gnocchi dispatcher to Kilo | 13:30 |
gordc | nicodemus_: ah, yeah, that' makes a lot more sense now. | 13:31 |
nicodemus_ | gordc: apologies for that, my mistake. | 13:32 |
*** pradk has joined #openstack-telemetry | 13:32 | |
gordc | nicodemus_: ok so roughly ~450samples/min passed to gnocchi and backlog grows roughly 300measures/min | 13:33 |
nicodemus_ | I wouldn't say the backlog grows 300 measures/min, it's between 50/100 measures/min | 13:34 |
nicodemus_ | but, 16 workers don't seem to be enough | 13:35 |
gordc | nicodemus_: ... yeah, i'd hope 16 workers could handle 450measures/min | 13:36 |
nicodemus_ | I'm deploying another 4 metricd instances, looking to find the magic number | 13:36 |
*** cdent has joined #openstack-telemetry | 13:40 | |
gordc | nicodemus_: i don't know if you have debug logs, but maybe check rougly how many 'Processing measures for' logs you have vs '(already processed)' logs you have? | 13:40 |
nicodemus_ | gordc: I have debug, let me check | 13:40 |
gordc | nicodemus_: maybe we do need to add more logic to better distribute workers. | 13:41 |
gordc | testing it out right now. | 13:41 |
openstackgerrit | Merged openstack/python-gnocchiclient: Translate resource_id to UUID5 format. https://review.openstack.org/269493 | 13:42 |
nicodemus_ | gordc: on a one-minute timeframe, there were 1468 log lines for 'Processing measures' and 954 lines for '(already processed)' on all four metricd instances | 13:46 |
gordc | nicodemus_: kk. how many total workers again? 16? | 13:48 |
nicodemus_ | gordc: 16 workers total, using redis for coordination | 13:50 |
nicodemus_ | gordc: I have four 4-vcpu instances running 4 metricd workers each. All instances show about 80-90% CPU usage | 13:51 |
gordc | nicodemus_: i think that's expected, the workers are constantly looping/doing something so i'd imagine the cpu usage would be high. | 13:53 |
*** liusheng has quit IRC | 13:53 | |
nicodemus_ | gordc: and the good thing, there are no errors in the logs :) all of them are working fine | 13:54 |
*** liusheng has joined #openstack-telemetry | 13:54 | |
*** julim has joined #openstack-telemetry | 13:54 | |
gordc | nicodemus_: one small step :) | 13:55 |
gordc | jd__: seems like more issues with global reqs (http://lists.openstack.org/pipermail/openstack-dev/2016-February/085838.html) | 13:58 |
gordc | unleash your manifesto to burn it down | 13:58 |
cdent | I think we should just stop testing anything but head | 14:00 |
jd__ | gordc: I think I just don't care anymore to lost time on a reply :) | 14:00 |
cdent | but I'm rude like that | 14:00 |
openstackgerrit | Victor Stinner proposed openstack/gnocchi: Don't require trollius on Python 3.4 and newer https://review.openstack.org/276742 | 14:01 |
gordc | cdent: adapt or burn? | 14:02 |
cdent | that old stuff is for the sellers | 14:02 |
jd__ | nicodemus_: can you paste an extract of the log of one metricd for ~5 minutes ? | 14:02 |
jd__ | nicodemus_: also, what's the archive policy you're using? | 14:03 |
gordc | cdent: was nova meetup where you live? | 14:06 |
cdent | It was about a 3 hour drive away | 14:06 |
gordc | did you bike or swim? | 14:07 |
sileht | gordc, can you upgrade your +1 here: https://review.openstack.org/#/c/276110/ ? | 14:07 |
gordc | sileht: :) i had a question but forgot to hit reply, if postgres doesn't index FK. do we need it still? | 14:07 |
sileht | (if the awnser satisfy you of course) | 14:07 |
sileht | gordc, oh | 14:08 |
sileht | lets check that | 14:08 |
* gordc not postgres expert, i just googled and an old article said 'no auto index but only required depending on data' so i'm not sure | 14:09 | |
gordc | i'm ok either way. | 14:09 |
sileht | gordc, alembic migration tests unfortunatly doesn't assert on this kind of thing | 14:10 |
gordc | sileht: i'm ok with removing it if we don't know if we need it... we can always add it back if we notice it might help. | 14:12 |
gordc | that sound ok? | 14:12 |
sileht | gordc, let's readd it later if needed | 14:13 |
gordc | sileht: kk | 14:13 |
sileht | gordc, yeah your's right only mysql create a index for fk | 14:14 |
gordc | sileht: i like how everything is consistent. *sigh* | 14:15 |
sileht | me too | 14:15 |
*** liamji has quit IRC | 14:16 | |
jd__ | I'm down to 2.99 bytes per datapoint | 14:16 |
jd__ | I CAN DO BETTER | 14:16 |
jd__ | I can do negative storage | 14:16 |
jd__ | the more datapoints you store in Gnocchi the more free disk space you'll have | 14:17 |
sileht | lol | 14:17 |
* jd__ needs sleep | 14:17 | |
*** agireud has quit IRC | 14:17 | |
gordc | jd__: is the RLE stuff shrinking storage or data retrieval or both? | 14:18 |
jd__ | both | 14:18 |
jd__ | it shrinks file size | 14:18 |
gordc | magic | 14:18 |
jd__ | it's not magic, it's just that the approach we took since beginning was completely not optimized | 14:18 |
jd__ | (on purpose since I never dig into it since I never thought I'd spend 2 years writing a tsdb) | 14:19 |
jd__ | FML | 14:19 |
*** agireud has joined #openstack-telemetry | 14:19 | |
* jd__ gools floating point compression algorithm | 14:20 | |
jd__ | s/gools/googles/ | 14:20 |
gordc | i need to figure out how much a sample in legacy mongodb was... 1kb? 10kb? | 14:21 |
jd__ | clearly depends on metadata | 14:23 |
*** peristeri has joined #openstack-telemetry | 14:24 | |
gordc | sadly, no one knows what was in metadata. | 14:24 |
nicodemus_ | jd__: I'm using the default archive policy, let me paste it with the logs | 14:24 |
*** ska has quit IRC | 14:26 | |
jd__ | gordc: it's aaat least a few Kb anyway yeah | 14:29 |
jd__ | way bigger | 14:29 |
ityaptin_ | gordc: In mongodb one sample was 1kb-1.7kb depending on metadata | 14:32 |
openstackgerrit | Merged openstack/aodh: gabbi's own paste.ini file https://review.openstack.org/265330 | 14:32 |
gordc | ityaptin_: at least it's not 1mb :) | 14:33 |
nicodemus_ | jd__: here's the 5 minute log and the archive policy: http://paste.openstack.org/show/486093/ | 14:34 |
ityaptin_ | After longevity running for the 200 Gb of stored data the avg size was near the 1.2 kb per sample. Of course we should consider + 10%-15% to indexes. | 14:35 |
ityaptin_ | gordc: Yes :) | 14:35 |
*** eglynn has joined #openstack-telemetry | 14:35 | |
*** eglynn has quit IRC | 14:35 | |
*** eglynn has joined #openstack-telemetry | 14:35 | |
gordc | ityaptin_: seriously? is it because we have big indices or because index in mongodb is expensive? | 14:37 |
*** efoley__ has joined #openstack-telemetry | 14:38 | |
*** efoley_ has quit IRC | 14:38 | |
*** KrishR has joined #openstack-telemetry | 14:39 | |
ityaptin_ | gordc: with recent news by idegtiarov - both. We have a unused indexes and experience shows that they are big. | 14:41 |
gordc | ityaptin_: i see. good to know. thanks for sharing info. | 14:42 |
ityaptin_ | gordc: with my pleasure | 14:43 |
jd__ | nicodemus_: ok so this one is not really strugging: 2016-02-05 14:25:23.095 8435 INFO gnocchi.cli [-] Metricd reporting: 0 measurements bundles across 0 metrics wait to be processed. | 14:53 |
jd__ | there's no 5 minutes though only 30s :) | 14:54 |
nicodemus_ | jd__: This log was collected with only one ceilometer-compute-agent running, for measures not to build up. Would you like me to capture the logs again with both compute-agents running? | 14:55 |
jd__ | nicodemus_: sure! | 14:55 |
jd__ | I'd like to see the log when it struggles to cope | 14:55 |
nicodemus_ | Hmmm... maybe paste.openstack limits the amount of lines? there should be 4700 log lines | 14:56 |
*** datravis has quit IRC | 14:59 | |
jd__ | nicodemus_: ah maybe | 15:00 |
jd__ | nicodemus_: gist.github.com? or fax it | 15:00 |
*** pradk has quit IRC | 15:03 | |
*** pradk has joined #openstack-telemetry | 15:03 | |
nicodemus_ | jd__: ok, collecting. We're thinking also to change the backend, from ceph to swift... do you think it could improve performance, being plain HTTP? | 15:04 |
gordc | jd__: you know what the diff between this conf https://github.com/openstack/gnocchi/blob/master/gnocchi/storage/_carbonara.py#L50 and this conf https://github.com/openstack/gnocchi/blob/master/gnocchi/cli.py#L76 is? | 15:04 |
jd__ | nicodemus_: do you want to change the backend for performance reason? | 15:04 |
jd__ | gordc: both should come from prepare_service(), except maybe in test mode where the one from metricd does not come from the conf built in the test | 15:06 |
*** efoley__ has quit IRC | 15:06 | |
*** efoley__ has joined #openstack-telemetry | 15:06 | |
jd__ | not sure how the tests are run if that's your problem :) | 15:06 |
nicodemus_ | jd__: it's one thing we thought maybe could improve it | 15:06 |
gordc | jd__: for some reason the conf.metricd.workers exists at cli point, but conf.metricd.workers is gone when it hits storage layer. | 15:07 |
jd__ | nicodemus_: it's hard to say honestly… maybe you should try with the file storage first to see what's the diff between Ceph and file is, and have some values to compare with Swift too? | 15:07 |
jd__ | gordc: hmmmm I think they are all registered at the same place, no? | 15:08 |
gordc | jd__: that's what i thought. | 15:08 |
nicodemus_ | jd__: here's the log: https://gist.github.com/nvlan/32b55d1bd381ac74221c | 15:09 |
gordc | jd__: testing my random processing patch and it says the value don't exist. https://review.openstack.org/#/c/276485/2/gnocchi/storage/_carbonara.py | 15:09 |
nicodemus_ | jd__: when the log ends, there were 1752 "measure_" objects in the gnocchi pool waiting to be processed | 15:09 |
gordc | nicodemus_: i would definitely start with file backend first. easier switch if you're just testing. | 15:10 |
jd__ | 2016-02-05 15:01:24.995 31336 DEBUG gnocchi.storage._carbonara [-] Computed new metric 72f07343-ff08-4850-a105-db48133fab28 with 1 new measures in 2.08 seconds process_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:349 | 15:13 |
jd__ | clearly it takes 2s to process measures for one metric | 15:13 |
jd__ | it's way too slow | 15:13 |
jd__ | 2016-02-05 15:01:29.164 31337 DEBUG gnocchi.storage._carbonara [-] Computed new metric 02e84891-c93d-4356-a772-572b677e3dcb with 1 new measures in 13.90 seconds process_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:349 | 15:14 |
jd__ | 13s sometimes | 15:14 |
jd__ | it's like reading and writing to Ceph is very slow | 15:14 |
jd__ | gordc: because the conf object is conf.storage in this driver | 15:14 |
jd__ | gordc: not the full conf object is passed to avoid such hack | 15:14 |
jd__ | gordc: ;] | 15:14 |
gordc | dammit. i read it wrong | 15:15 |
gordc | jd__: thanks | 15:15 |
jd__ | s/this driver/drivers/ | 15:15 |
jd__ | I remember now :) | 15:15 |
jd__ | nicodemus_: so why it takes between 2 and 13s to fetch and write a few files in Ceph? I wonder :/ | 15:15 |
jd__ | nicodemus_: can you enlarge your… Ceph? :] | 15:17 |
nicodemus_ | jd__: I just tried ceph write... http://paste.openstack.org/show/486096/ | 15:18 |
jd__ | Average Latency: 1.04703 is this in second? | 15:19 |
*** datravis has joined #openstack-telemetry | 15:19 | |
nicodemus_ | yes | 15:19 |
jd__ | that looks high | 15:19 |
jd__ | and i'm not impressed by the speed but I don't know what slow or fast in Ceph is | 15:20 |
jd__ | sileht: an opinion? | 15:20 |
*** ildikov has quit IRC | 15:20 | |
jd__ | I could ask shan | 15:20 |
nicodemus_ | jd__: those were 16 concurrent operations, 55 MB/s | 15:20 |
nicodemus_ | jd__: "measure_" objects should be really tiny, correct? | 15:21 |
jd__ | nicodemus_: yes it's usually one datapoint with Ceilometer, so a few Kb | 15:22 |
jd__ | like 1 Kb | 15:22 |
jd__ | nicodemus_: are you running on 1 Gb or 10 Gb network? | 15:22 |
nicodemus_ | 1Gb network | 15:23 |
openstackgerrit | Merged openstack/gnocchi: Remove useless indexes https://review.openstack.org/276110 | 15:24 |
sileht | jd__, ceph performance depends of ton of parameters | 15:25 |
jd__ | yeah leseb said it does not look so bad | 15:26 |
jd__ | 55 MB/s is roughly half 1 Gb network | 15:26 |
jd__ | but the latency seems very high to me, maybe that's normal, idk | 15:26 |
sileht | that's a cheap cluster | 15:26 |
nicodemus_ | jd__: I missed one question, unfortunately we cannot enlarge ceph. We did make sure the pgs for the pool are spread across all OSDs | 15:26 |
jd__ | sileht: don't insult nicodemus_ cluster lol | 15:27 |
jd__ | nicodemus_: ok, fair enough | 15:27 |
jd__ | but clearly if you need 10s to process a metric it's never going to scale | 15:28 |
gordc | jd__: local backlog? | 15:28 |
jd__ | won't help | 15:29 |
jd__ | it takes 0.01s to retrieve the new measures | 15:29 |
jd__ | but handling all the rest takes 10s | 15:29 |
nicodemus_ | sileht: hahah I agree the cluster might not be top-notch.. but then again, we only have two compute nodes pushing measures from 36 instances | 15:29 |
jd__ | probably because there is ~30 files to manipulate and it's slow | 15:29 |
jd__ | it looks to me Ceph is the bottleneck here so I'm not sure how to improve | 15:30 |
sileht | ceph is good to the spread IO only if you have a ton of OSDs overwise, it's just a bottleneck | 15:30 |
jd__ | nicodemus_: try to set aggregation_workers_number to 32 or something? | 15:31 |
sileht | overwise/otherwise | 15:31 |
gordc | jd__: ah, it's the processing+write that's taking bulk of time | 15:31 |
jd__ | gordc: processing 1 point is going to be blazzingly fast… it's read/write that takes *seconds* :( | 15:31 |
nicodemus_ | jd__: this is the rados benck with a 1024 byte-size object: http://paste.openstack.org/show/486100/ | 15:31 |
*** jwcroppe has joined #openstack-telemetry | 15:31 | |
jd__ | nicodemus_: ah that looks better :) | 15:32 |
nicodemus_ | jd__: latency seems to be much lower | 15:32 |
jd__ | for latency | 15:32 |
jd__ | but the speed is wtf? | 15:32 |
sileht | for me, a ceph cluster with less 30 OSDs is hard to use | 15:33 |
nicodemus_ | this cluster has 16 OSDs in three hw nodes | 15:34 |
openstackgerrit | Igor Degtiarov proposed openstack/ceilometer: [MongoDB] exchange compound index with single field indexes https://review.openstack.org/276262 | 15:34 |
idegtiarov_ | llu, hi are you around? | 15:34 |
nicodemus_ | it's nearly all for gnocchi, the other pools are hardly used | 15:35 |
sileht | nicodemus_, 3 nodes, does the gnocchi have replicated 3 ? | 15:36 |
sileht | (pool) | 15:36 |
nicodemus_ | sileht: yes, pool size is 3 | 15:36 |
sileht | so for each metter gnocchi have to wait the three nodes | 15:37 |
nicodemus_ | sileht: why would that be? for each PG, there is one primary OSD and the other two are for replication, so the request should only go to the primary OSD | 15:39 |
sileht | nicodemus_, that's not how ceph work, you send the data to the primary OSD, then the primary OSD send the data to two others and them primary OSD ack to clean the write | 15:39 |
sileht | so you always wait for all nodes | 15:40 |
sileht | by default min_size = size, so if you lost one node you cluster is just stuck until the failed node come back | 15:41 |
sileht | because ceph cannot relocate the missing pg into another node and respected the size = 3 | 15:41 |
nicodemus_ | sileht: I see. So, in this case ceph is being the bottleneck? | 15:42 |
nicodemus_ | If such is the case, then adding another 16 metricd workers should not make any difference... does that make sense? | 15:43 |
gordc | probably not much. i'd imagine it'd be waiting for another worker to release lock on metric | 15:46 |
sileht | nicodemus_, that can make sense to increase a bit the number of workers to ensure that at least all OSDs have something to do, but that can increase the latency of your ceph IO | 15:46 |
sileht | s/at least/ | 15:46 |
*** david-lyle has joined #openstack-telemetry | 15:47 | |
sileht | nicodemus_, if you have only 2 computes and this is not going to increase, perhaps the file is sufficient | 15:47 |
sileht | file/file driver | 15:47 |
*** mragupat has joined #openstack-telemetry | 15:48 | |
nicodemus_ | sileht: this deploy will not increase in # of computes, however we might deploy in the future in a customer with a high compute count | 15:49 |
sileht | nicodemus_, in this case you will have more ceph nodes too :) | 15:49 |
nicodemus_ | sileht: if I configure file backend, wouldn't all read/writes go to just one disk? Wouldn't that be slower than using several ceph OSDs? | 15:52 |
*** eglynn has quit IRC | 15:52 | |
sileht | nicodemus_, you ceph is only 76Mb/s, I think your harddrive will perform better | 15:55 |
*** mgarza has joined #openstack-telemetry | 15:55 | |
sileht | nicodemus_, low cost harddrive are around 80MB/s, high cost 200MB/s | 15:56 |
nicodemus_ | sileht: In such a case, switching to swift shouldn't make any difference as well (both are on the same network) | 15:57 |
sileht | nicodemus_, swift have a little difference, if I remember correcly the min_size per default is 1 | 15:58 |
sileht | nicodemus_, if the node to replicate the data is too slow, it will do it async | 15:58 |
sileht | nicodemus_, so you have more change to lose data, but when the cluster is not heahtly by default swift perform better | 15:59 |
sileht | ceph or swift can be tweaked to have the same behavior, that's just the default configuration that different | 16:00 |
*** tongli has quit IRC | 16:00 | |
*** tongli has joined #openstack-telemetry | 16:02 | |
*** _nadya__ has joined #openstack-telemetry | 16:03 | |
*** alextricity25_ has joined #openstack-telemetry | 16:04 | |
nicodemus_ | sileht: the one thing that I don't quite understand, if a rados benck with 1024-byte object size has an avg latency of 0.021, why would it take metricd over 1s to process a measure? (or as jd__ saw, even over 10s) | 16:05 |
*** alextricity25_ is now known as alextricity | 16:06 | |
*** alextricity is now known as alextricity25_ | 16:06 | |
sileht | nicodemus_,indeed that doesn't looks good | 16:10 |
jd__ | nicodemus_: well with that bench the MB/s is very low so that's weird, if it's true for whatever reason it's also a source of slowness | 16:10 |
sileht | nicodemus_, you need to write more data is you lower the object size to 1024 | 16:11 |
jd__ | nicodemus_: though yeah that's why I'd suggest you bench with the file driver; and then with Ceph, to see how metricd cope with your data and if Ceph is the bottleneck | 16:11 |
*** belmoreira has quit IRC | 16:13 | |
*** _nadya_ has quit IRC | 16:14 | |
*** alextricity25 has quit IRC | 16:14 | |
*** vishwanathj has joined #openstack-telemetry | 16:14 | |
nicodemus_ | jd__: I'll give it a try. Considering that I have three gnocchi API, the file driver would write locally on each API host disk, right? For this test should I leave just one? | 16:14 |
jd__ | nicodemus_: yes start with one | 16:15 |
jd__ | nicodemus_: if it works well enough, you can be fancy and try to export to other nodes via NFS maybe :) | 16:15 |
nicodemus_ | jd__: I have separated the gnocchi API from the metricd hosts. How would the metricd workers access the data if it's on the gnocchi API host? ...I think I need to have API and metricd in one single host, correct? | 16:17 |
jd__ | nicodemus_: correct (or use NFS) | 16:18 |
nicodemus_ | jd__: ok. I'll reconfigure then and gather another 5-minute log to see what happens | 16:19 |
nicodemus_ | and thank you all for your insights! :) | 16:20 |
*** david-lyle has quit IRC | 16:23 | |
*** alextricity25_ is now known as alextricity | 16:23 | |
*** alextricity is now known as alextricity25 | 16:23 | |
*** yassine_ has quit IRC | 16:25 | |
*** pas-ha has joined #openstack-telemetry | 16:25 | |
pas-ha | hi all, I have a question on using ceilo on multi-node devstack | 16:25 |
pas-ha | how can I force ceilometer devstack plugin to deploy ceilometer-acompute only? | 16:26 |
jd__ | nicodemus_: you're welcome :) | 16:26 |
pas-ha | with local.conf variables only that is | 16:26 |
*** eglynn has joined #openstack-telemetry | 16:28 | |
*** david-lyle has joined #openstack-telemetry | 16:29 | |
*** mragupat has quit IRC | 16:34 | |
*** mragupat has joined #openstack-telemetry | 16:34 | |
*** tongli has quit IRC | 16:35 | |
*** efoley__ has quit IRC | 16:37 | |
*** shardy has joined #openstack-telemetry | 16:43 | |
jd__ | gordc: sileht: https://gist.github.com/jd/d3c23a261bd153d29299 | 16:45 |
jd__ | I go down to 0.02 bytes per points which means for 4 hours of 1 second points made of value 0 and 1 the file takes 288 bytes | 16:47 |
jd__ | \o/ | 16:47 |
jd__ | better than the 250 Kb | 16:47 |
gordc | jd__: when does the compression happen? | 16:47 |
jd__ | at serialization | 16:47 |
jd__ | I didn't update the patch yet | 16:47 |
gordc | jd__: kk. so we add ~.4s per write i guess? | 16:48 |
*** eglynn has quit IRC | 16:50 | |
jd__ | gordc: yes | 16:52 |
jd__ | that's what I'm worried of for now | 16:52 |
jd__ | lz4 is pretty fast but maybe we can go faster things | 16:52 |
jd__ | Facebook Gorilla paper that's implemented in InfluxDB uses an XOR based compression of data, but I'm a bit lazy to implement that actually | 16:53 |
jd__ | hard to know if it's worth my time or not | 16:53 |
jd__ | I'll take a look next week | 16:53 |
openstackgerrit | Julien Danjou proposed openstack/gnocchi: carbonara: serialize AggregatedTimeSerie using RLE and LZ4 https://review.openstack.org/276365 | 16:53 |
openstackgerrit | Julien Danjou proposed openstack/gnocchi: carbonara: clean unused methods https://review.openstack.org/276824 | 16:53 |
jd__ | gordc: patch updated ^ | 16:53 |
gordc | jd__: yeah, you've been very lazy this cycle. the number of patches has been disappointing. | 16:53 |
gordc | step up your game! | 16:53 |
jd__ | lol | 16:53 |
jd__ | I'm only #7 on http://stackalytics.com/?metric=commits | 16:54 |
jd__ | :( | 16:54 |
gordc | sad.lol | 16:57 |
gordc | bbl. lunch | 16:57 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Add some resource types tests https://review.openstack.org/270419 | 17:00 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Implements resource attribute number https://review.openstack.org/270091 | 17:00 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Simply how to get keystone url https://review.openstack.org/276282 | 17:00 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Implements resource attribute bool https://review.openstack.org/270418 | 17:00 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Implements resource attribute uuid https://review.openstack.org/270090 | 17:00 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Don't create Ceilometer resource types by default. https://review.openstack.org/270322 | 17:00 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Move legacy Ceilometer resource into indexer. https://review.openstack.org/270266 | 17:00 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Move resource type into their own sql table https://review.openstack.org/269843 | 17:00 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Implements resource type CRUD. https://review.openstack.org/269844 | 17:00 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Implements resource attribute string https://review.openstack.org/269888 | 17:00 |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Rework the handling of the resource ID https://review.openstack.org/276830 | 17:00 |
*** vishwanathj has quit IRC | 17:02 | |
*** vishwanathj has joined #openstack-telemetry | 17:03 | |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Rework the handling of the resource ID https://review.openstack.org/276830 | 17:07 |
*** ildikov has joined #openstack-telemetry | 17:10 | |
*** _nadya__ has quit IRC | 17:14 | |
*** tongli has joined #openstack-telemetry | 17:14 | |
*** david-lyle has quit IRC | 17:14 | |
*** vishwanathj has quit IRC | 17:18 | |
*** KrishR has quit IRC | 17:20 | |
*** farid has joined #openstack-telemetry | 17:20 | |
*** david-lyle has joined #openstack-telemetry | 17:21 | |
*** david-lyle has quit IRC | 17:25 | |
*** prashantD has joined #openstack-telemetry | 17:29 | |
openstackgerrit | Mehdi Abaakouk (sileht) proposed openstack/gnocchi: Extend measures batching to named metrics https://review.openstack.org/273368 | 17:44 |
*** KrishR has joined #openstack-telemetry | 17:45 | |
*** jkraj has joined #openstack-telemetry | 18:22 | |
*** pas-ha has quit IRC | 18:40 | |
*** shardy has quit IRC | 18:59 | |
*** thorst has quit IRC | 19:03 | |
*** thorst has joined #openstack-telemetry | 19:05 | |
openstackgerrit | Pradeep Kilambi proposed openstack/ceilometer: Handle malformed resource definitions gracefully https://review.openstack.org/276879 | 19:18 |
*** thorst has quit IRC | 19:24 | |
*** pradk has quit IRC | 19:27 | |
*** KrishR has quit IRC | 19:32 | |
*** prashantD has quit IRC | 19:33 | |
*** prashantD has joined #openstack-telemetry | 19:34 | |
*** krotscheck is now known as krotscheck_dcm | 19:39 | |
*** _nadya_ has joined #openstack-telemetry | 19:42 | |
*** _nadya_ has quit IRC | 19:46 | |
*** thorst has joined #openstack-telemetry | 19:54 | |
*** thorst has joined #openstack-telemetry | 19:57 | |
*** jwcroppe has quit IRC | 19:58 | |
*** jwcroppe has joined #openstack-telemetry | 19:59 | |
*** KrishR has joined #openstack-telemetry | 20:00 | |
*** agireud has quit IRC | 20:00 | |
*** agireud has joined #openstack-telemetry | 20:02 | |
*** _nadya_ has joined #openstack-telemetry | 20:22 | |
*** _nadya_ has quit IRC | 20:23 | |
openstackgerrit | Merged openstack/gnocchi: Simply how to get keystone url https://review.openstack.org/276282 | 20:37 |
*** david-lyle has joined #openstack-telemetry | 20:39 | |
*** david-lyle has quit IRC | 20:55 | |
*** david-lyle_ has joined #openstack-telemetry | 20:55 | |
*** david-lyle_ is now known as david-lyle | 21:00 | |
*** tongli has quit IRC | 21:00 | |
*** gordc has quit IRC | 21:09 | |
*** safchain has quit IRC | 21:19 | |
*** boris-42 has joined #openstack-telemetry | 21:23 | |
*** _nadya_ has joined #openstack-telemetry | 21:24 | |
*** _nadya_ has quit IRC | 21:28 | |
*** peristeri has quit IRC | 21:29 | |
*** mragupat has quit IRC | 21:40 | |
*** mragupat has joined #openstack-telemetry | 21:41 | |
*** thorst has quit IRC | 22:02 | |
*** thorst has joined #openstack-telemetry | 22:04 | |
*** thorst has quit IRC | 22:09 | |
*** cdent is now known as dentures | 22:10 | |
*** leitan has quit IRC | 22:17 | |
openstackgerrit | gordon chung proposed openstack/gnocchi: add randomness/chaos to metricd - POC https://review.openstack.org/276485 | 22:19 |
*** nicodemus_ has quit IRC | 22:21 | |
*** dentures has quit IRC | 22:39 | |
*** vishwanathj has joined #openstack-telemetry | 22:46 | |
*** vishwanathj has quit IRC | 23:00 | |
openstackgerrit | Rohit Jaiswal proposed openstack/python-ceilometerclient: Enhances client to support unique meter retrieval https://review.openstack.org/272633 | 23:05 |
*** mragupat has quit IRC | 23:08 | |
*** farid has quit IRC | 23:20 | |
*** prashantD has quit IRC | 23:34 | |
*** prashantD has joined #openstack-telemetry | 23:36 | |
*** mgarza has quit IRC | 23:37 | |
*** chlong has joined #openstack-telemetry | 23:38 | |
*** rbak has quit IRC | 23:39 | |
*** KrishR has quit IRC | 23:48 | |
*** leitan has joined #openstack-telemetry | 23:53 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!