*** dave-mccowan has quit IRC | 00:17 | |
*** zhangguoqing has joined #openstack-telemetry | 00:41 | |
*** dave-mccowan has joined #openstack-telemetry | 00:51 | |
*** tovin07_ has joined #openstack-telemetry | 00:54 | |
*** dave-mccowan has quit IRC | 00:55 | |
*** lhx__ has joined #openstack-telemetry | 01:47 | |
*** zhurong has joined #openstack-telemetry | 02:11 | |
*** thorst_afk has joined #openstack-telemetry | 02:27 | |
*** thorst_afk has quit IRC | 02:35 | |
*** thorst_afk has joined #openstack-telemetry | 02:37 | |
*** masber has joined #openstack-telemetry | 02:45 | |
*** thorst_afk has quit IRC | 02:52 | |
*** lhx__ has quit IRC | 03:36 | |
*** lhx__ has joined #openstack-telemetry | 03:37 | |
*** boris-42_ has quit IRC | 03:39 | |
*** lhx__ has quit IRC | 04:02 | |
*** lhx__ has joined #openstack-telemetry | 04:03 | |
*** ChanServ changes topic to "#openstack-telemetry is OpenStack Telemetry | http://wiki.openstack.org/Telemetry" | 04:09 | |
-openstackstatus- NOTICE: Sufficient free space has been reclaimed that jobs are passing again; any POST_FAILURE results can now be rechecked. | 04:09 | |
*** zhurong has quit IRC | 04:11 | |
*** zhurong has joined #openstack-telemetry | 04:15 | |
*** Gautam has joined #openstack-telemetry | 04:47 | |
*** links has joined #openstack-telemetry | 05:20 | |
*** yprokule has joined #openstack-telemetry | 05:34 | |
*** lhx__ has quit IRC | 05:43 | |
*** lhx__ has joined #openstack-telemetry | 05:43 | |
*** rcernin has quit IRC | 06:00 | |
*** Gautam_ has joined #openstack-telemetry | 06:01 | |
*** Gautam has quit IRC | 06:04 | |
*** rcernin has joined #openstack-telemetry | 06:17 | |
*** hoonetorg has quit IRC | 06:25 | |
*** hoonetorg has joined #openstack-telemetry | 06:37 | |
*** Gautam has joined #openstack-telemetry | 06:47 | |
*** Gautam_ has quit IRC | 06:49 | |
*** lhx__ has quit IRC | 06:54 | |
*** lhx__ has joined #openstack-telemetry | 06:54 | |
*** toddnni has quit IRC | 07:10 | |
*** jroll has quit IRC | 07:12 | |
*** shardy has joined #openstack-telemetry | 07:14 | |
*** pcaruana has joined #openstack-telemetry | 07:16 | |
*** zhangguoqing has quit IRC | 07:32 | |
*** Gautam has quit IRC | 07:36 | |
*** Gautam has joined #openstack-telemetry | 07:37 | |
*** zhangguoqing has joined #openstack-telemetry | 08:19 | |
*** daidv has joined #openstack-telemetry | 09:02 | |
*** nhanvu has joined #openstack-telemetry | 09:40 | |
*** ricolin has joined #openstack-telemetry | 09:40 | |
*** nhanvu has left #openstack-telemetry | 09:42 | |
ricolin | Hi guys I got a combination alarm question | 09:47 |
---|---|---|
ricolin | will it still work if I use the alarm_id (which I got when create combination alarm), and delete through ceilometer.alarms()? | 09:48 |
ricolin | ceilometer.alarms.delete | 09:50 |
ricolin | jd_, ^^^ | 09:51 |
ricolin | This is about what can we do with combination alarm resource in heat | 09:52 |
ricolin | so some help will be nice:) | 09:52 |
ricolin | https://review.openstack.org/#/c/439433/10 | 09:53 |
*** links has quit IRC | 09:55 | |
*** adriant has quit IRC | 09:58 | |
*** tovin07_ has quit IRC | 10:03 | |
*** zhurong has quit IRC | 10:04 | |
*** links has joined #openstack-telemetry | 10:08 | |
*** jroll has joined #openstack-telemetry | 10:25 | |
*** daidv has quit IRC | 10:35 | |
*** aks__ has joined #openstack-telemetry | 10:36 | |
aks__ | hi all | 10:36 |
aks__ | do we have cassandra driver in gnocchi ? | 10:37 |
*** cdent has joined #openstack-telemetry | 10:44 | |
jd_ | aks__: no | 10:59 |
aks__ | ok | 11:01 |
*** Gautam has quit IRC | 11:06 | |
*** aks__ has quit IRC | 11:52 | |
*** zhurong has joined #openstack-telemetry | 12:35 | |
*** zhurong has quit IRC | 12:36 | |
*** dave-mccowan has joined #openstack-telemetry | 12:41 | |
openstackgerrit | Merged openstack/python-pankoclient master: use extras https://review.openstack.org/465141 | 12:52 |
*** fguillot has joined #openstack-telemetry | 12:54 | |
*** efoley has joined #openstack-telemetry | 13:00 | |
*** lhx__ has quit IRC | 13:10 | |
*** pradk has quit IRC | 13:11 | |
*** thorst_afk has joined #openstack-telemetry | 13:12 | |
*** donghao has joined #openstack-telemetry | 13:23 | |
Anticimex | mmm, postgres planner analysis of ceilometer-newton with postgres db dispatcher backend | 13:29 |
Anticimex | with minor tweaks 10x-100x improvement in query speed, and i suppose with a better handcrafted schema 1000x-10000x doable | 13:30 |
* Anticimex planning to go improved psql route rather than gnocchi, seems wrong solution to original problem statement | 13:30 | |
sileht | lol | 13:32 |
sileht | storing all metadatas for all samples doesn't make any sense... | 13:32 |
*** ddyer has quit IRC | 13:53 | |
*** ddyer has joined #openstack-telemetry | 13:57 | |
*** zhangguoqing has quit IRC | 14:06 | |
*** lhx__ has joined #openstack-telemetry | 14:16 | |
*** pradk has joined #openstack-telemetry | 14:23 | |
*** vint_bra has joined #openstack-telemetry | 14:23 | |
*** thorst_afk has quit IRC | 14:27 | |
*** gordc has joined #openstack-telemetry | 14:32 | |
*** vint_bra has quit IRC | 14:35 | |
Anticimex | sileht: nod, a better schema can improve a lot | 14:37 |
*** chlong has joined #openstack-telemetry | 14:38 | |
Anticimex | a metrics query cross joins sample against sample... which at least 9.6 planner converts into two sequence scans | 14:38 |
Anticimex | relevant indexes exists, but planner can be guided a bit. "people on the internet" have with similar challenges been able to improve O(1000) - O(10000) by query tweaking | 14:39 |
gordc | Anticimex: the better schema was gnocchi ;) | 14:39 |
Anticimex | i saw a ceilometer->gnocchi performance tweaking presentation in BOS and would have to disagree i think :) | 14:40 |
gordc | i think i opened a bug to improve sql backend a few years back. my sql skills weren't enough to overcome that metadata hurdle. | 14:40 |
Anticimex | (politely) | 14:40 |
gordc | if you can modify sql and still maintain same api agreements than all the power to you. | 14:41 |
Anticimex | well, ceilometer-api is deprecated | 14:41 |
gordc | yep :) who've been good two years ago | 14:41 |
Anticimex | so there's officially no api to maintain compability with, but our rating engine could gain another O(10k+) from simply doing 1-3 psycopg2 queries rather than zillion http queries | 14:42 |
Anticimex | so i'm thinking a new psycopg2 dispatcher against improved schema | 14:42 |
Anticimex | a dispatcher with performance tweaks on insert as well of course | 14:43 |
gordc | sure. i thought you were originally trying to build something to match ceiloemter api but was performant. | 14:43 |
Anticimex | right, that's where i started looking today | 14:43 |
gordc | no argument if you change api, you can make sql work | 14:43 |
Anticimex | at the queries | 14:43 |
Anticimex | and there's room to improve on sqla/pg-default here too | 14:44 |
Anticimex | our ceilometer-pg vm has 4 cpu cores and 32G RAM with ~10GB data for the month, and a monthly rating run takes on the order of 10 hours, with full cpu usage on the postgres node | 14:45 |
Anticimex | and that's for a 16 compute instance deployment which is like 25% utilized | 14:46 |
gordc | what's ceilometer-pg? your own custom schema? | 14:46 |
Anticimex | oh sorry, the postgres database node our ceilometer dispatches to | 14:47 |
gordc | or existing api+postgres | 14:47 |
Anticimex | that | 14:47 |
Anticimex | on newton | 14:47 |
gordc | oh. yeah. i'm not surprised... jd_ benched it a few years back https://julien.danjou.info/blog/2015/gnocchi-benchmarks | 14:48 |
gordc | that's roughly when we stopped working on ceilometer storage. | 14:48 |
Anticimex | yeah, and i see a first 10x blowup due to pg losing a bitmap scan to a sequence scan | 14:48 |
Anticimex | on a cross join (sample x sample) | 14:49 |
*** vint_bra has joined #openstack-telemetry | 14:49 | |
*** vint_bra has quit IRC | 14:50 | |
gordc | yep... 2+ years ago we realised that :P | 14:50 |
Anticimex | using resource queries to get summary data out, GET /v2/resources/2ce4315b-6b8a-4c6d-bd61-7bd9cf002ebd , so the resulting pg select that blows up on those | 14:50 |
Anticimex | right | 14:51 |
Anticimex | and a little bit of psql tweaking and it can be reduced 10x :) | 14:51 |
Anticimex | i'll get back to chan later today with a quick example of what i mean | 14:51 |
gordc | cool. would be good to learn (although this won't merge because we've deprecated... and openstack is pushing a mysql agenda | 14:52 |
Anticimex | yeah, will happily keep it outside openstack if i go forward with it. only have to maintain compability with the ceilometer dispatcher interface | 14:53 |
Anticimex | seems to me mostly to be monty who's pushing a mysql agenda :-) | 14:53 |
gordc | the pgsql clique is not strong in openstack :P | 14:54 |
Anticimex | clearly | 14:54 |
gordc | if you do your pgsql wizardy and analyze gnocchi, that'd be great too :) | 14:54 |
Anticimex | someone else already did gnocchi analyzation, don't need to redo his work | 14:55 |
gordc | you a link? | 14:55 |
Anticimex | yeah trying to find, forgot its name, 1 sec | 14:56 |
gordc | cool cool, no rush. | 14:58 |
Anticimex | https://www.youtube.com/watch?v=aHaGipVcIJ4 | 15:01 |
Anticimex | was a good presentation, job well done by the guy IMO | 15:03 |
gordc | oh. that's jd_... and akrzos... agreed. really helping with us on how to improve things. | 15:05 |
Anticimex | oh are they around in channel? | 15:06 |
akrzos | here | 15:06 |
akrzos | thank Anticimex | 15:06 |
akrzos | thanks* | 15:06 |
Anticimex | ideally if i try to hack together a psycopg2 dispatcehr as per my thoughts above for ourselves, i could try to replicate the benchmarks against that | 15:06 |
Anticimex | and would save lots of time if the triple-o receipts were shared i think :) | 15:06 |
*** vint_bra has joined #openstack-telemetry | 15:07 | |
*** rcernin has quit IRC | 15:18 | |
*** shardy has quit IRC | 15:23 | |
*** Tamayo has joined #openstack-telemetry | 15:28 | |
*** sbadia has left #openstack-telemetry | 15:30 | |
jd_ | Anticimex: o/ | 15:40 |
*** r-daneel has joined #openstack-telemetry | 16:06 | |
*** thorst has joined #openstack-telemetry | 16:09 | |
*** donghao has quit IRC | 16:10 | |
*** yprokule has quit IRC | 16:17 | |
*** thorst has quit IRC | 16:23 | |
*** pcaruana has quit IRC | 16:24 | |
*** thorst has joined #openstack-telemetry | 16:26 | |
*** efoley_ has joined #openstack-telemetry | 16:33 | |
*** efoley has quit IRC | 16:37 | |
*** efoley__ has joined #openstack-telemetry | 16:52 | |
*** efoley_ has quit IRC | 16:55 | |
openstackgerrit | gordon chung proposed openstack/python-pankoclient master: switch to use non-legacy SessionClient https://review.openstack.org/465157 | 17:05 |
openstackgerrit | gordon chung proposed openstack/python-pankoclient master: move shell out of osc https://review.openstack.org/465725 | 17:05 |
openstackgerrit | gordon chung proposed openstack/python-pankoclient master: add panko shell https://review.openstack.org/465736 | 17:05 |
*** nicodemus_ has joined #openstack-telemetry | 17:08 | |
nicodemus_ | hello | 17:08 |
*** thorst has quit IRC | 17:09 | |
nicodemus_ | I'm observing an unusual behavior using gnocchi 3.1.1 | 17:10 |
nicodemus_ | when using CEPH backend, metricd starts processing quite fast but after a while each new computed metric takes longer | 17:10 |
*** donghao has joined #openstack-telemetry | 17:11 | |
nicodemus_ | http://paste.openstack.org/show/610585/ this paste shows processing time before and after the restart | 17:11 |
*** ricolin has quit IRC | 17:12 | |
gordc | nicodemus_: play with your filestore settings. that's what helped in my environment. | 17:14 |
gordc | disclaimer: i'm not a ceph expert... (not a ceph anything really) | 17:15 |
nicodemus_ | gordc, you mean CEPH settings? | 17:15 |
*** donghao has quit IRC | 17:15 | |
*** links has quit IRC | 17:16 | |
gordc | nicodemus_: yep | 17:17 |
gordc | nicodemus_: https://www.slideshare.net/GordonChung/gnocchi-profiling-v2/#17 those are configurations i tried... the bottom right is what i settled on | 17:18 |
nicodemus_ | gordc, Interesting... I'll give it a try and see what happens | 17:20 |
nicodemus_ | thanks! | 17:20 |
gordc | nicodemus_: kk, i don't know if there's public data from real ceph operators... that was just me googling and testing random values. | 17:21 |
*** efoley__ has quit IRC | 17:22 | |
*** thorst has joined #openstack-telemetry | 17:29 | |
*** toddnni has joined #openstack-telemetry | 17:33 | |
*** thorst has quit IRC | 17:33 | |
*** thorst has joined #openstack-telemetry | 17:33 | |
*** lhx__ has quit IRC | 17:59 | |
*** thorst has quit IRC | 18:00 | |
*** thorst has joined #openstack-telemetry | 18:19 | |
openstackgerrit | gordon chung proposed openstack/python-pankoclient master: switch to use non-legacy SessionClient https://review.openstack.org/465157 | 18:40 |
openstackgerrit | gordon chung proposed openstack/python-pankoclient master: add panko shell https://review.openstack.org/465736 | 18:40 |
openstackgerrit | gordon chung proposed openstack/python-pankoclient master: move shell out of osc https://review.openstack.org/465725 | 18:40 |
Anticimex | Here's a .sql file write up of how to improve ceilometer postgresql backend GET /v2/resource/<resource_id> by 31000 X: https://gist.github.com/Millnert/53f471bdd7b173d09e15a60882082a78 | 18:42 |
Anticimex | pg <3 | 18:43 |
Anticimex | it becomes flat in time essentially, vs scaling with number of samples put in | 18:46 |
Anticimex | now.. to find where to put in this new query code in ceilometer-api backend driver thingaling | 18:47 |
gordc | cool, nice analysis. | 18:54 |
*** thorst has quit IRC | 19:02 | |
*** donghao has joined #openstack-telemetry | 19:13 | |
*** donghao has quit IRC | 19:17 | |
nicodemus_ | gordc, let me ask you one question... what does exactly the metric_processing_delay parameter? Is it an interval for each metricd worker to wake up and see if there are any measures to process? | 19:19 |
nicodemus_ | The log is complaining about 'Metric processing lagging scheduling rate', but if I increase the number of workers, the load on the ceph cluster grows to a point in which the performance is even worse | 19:20 |
nicodemus_ | but then again, if I configure each worker to wake up every 60 seconds, the incoming measures will outpace metric processing | 19:22 |
gordc | nicodemus_: yeah the scheduling is kind fo sketchy in gnocchi3. | 19:22 |
gordc | basically that option tells metricd every x seconds, grab a bunch of metrics to for the workers to work on. | 19:24 |
gordc | the scheduling is not smart enough to tell hold back if there is still work to be done by next scheduling cycle | 19:25 |
gordc | for the most part, you can ignore that warning (especially for ceph since it actually has the worse scheduling logic. | 19:26 |
nicodemus_ | so, a high value would do more harm than good | 19:26 |
*** thorst has joined #openstack-telemetry | 19:26 | |
gordc | high value for metric_processing_delay? | 19:27 |
nicodemus_ | that is, if metricd is falling behind the amount of measures incoming | 19:27 |
nicodemus_ | yes, suppose I configure it for 60 seconds, I may end up with N workers idle | 19:27 |
gordc | yes, for ceph, if the measurements to metric ratio is high, the scheduling suffers in v3 | 19:27 |
nicodemus_ | until the next time the scheduler grabs metrics | 19:28 |
nicodemus_ | I see | 19:28 |
nicodemus_ | Is there a silver lining in gnocchi 4? | 19:28 |
gordc | yeah :) gnocchi4 has each worker figure out it's own tasks rather than have a central scheduler trying to guess when to schedule more tasks for workers | 19:29 |
gordc | if you want a preview: https://www.slideshare.net/GordonChung/gnocchi-v4-preview | 19:30 |
*** thorst has quit IRC | 19:31 | |
nicodemus_ | gordc, nice! | 19:31 |
nicodemus_ | another reason to consider migrating to gnocchi 4 :) | 19:32 |
gordc | although i think you need to look at your ceph driver. i noticed how all your processing times are all >1s... that should be <10ms i think unless your archive policy is verbose. | 19:33 |
gordc | or i guess your ceph storage is not local. | 19:33 |
gordc | nicodemus_: if you want to hack gnocchi3, you can change: https://github.com/gnocchixyz/gnocchi/blob/stable/3.1/gnocchi/cli.py#L154 to something much larger. | 19:33 |
nicodemus_ | it's all on AWS, it's not the same performance as when I had my dedicated & shiny CEPH cluster | 19:34 |
gordc | ah i see. makes sense. | 19:34 |
nicodemus_ | I'll try to hack cli.py and see what happens | 19:36 |
nicodemus_ | thanks a lot! | 19:36 |
gordc | np | 19:38 |
nicodemus_ | gordc, just one more thing... is there an ETA for gnocchi v4? | 19:40 |
nicodemus_ | (I'm anxious now :P) | 19:40 |
gordc | nicodemus_: ask jd_ :) | 19:45 |
gordc | i think the features are in... just testing and fixes now i guess | 19:45 |
gordc | some time in june seems safe to say | 19:46 |
nicodemus_ | great! can't wait | 19:46 |
gordc | you can test it at github.com/gnocchixyz/gnocchi. was going to send a note to ML asking people to try it | 19:48 |
jd_ | Anticimex: is this 31s for the query? | 19:48 |
*** openstackgerrit has quit IRC | 19:48 | |
Anticimex | 31000 ms -> 1ms yes | 19:48 |
Anticimex | but more correct probably to say it's from O(n) (n = #samples) to O(0.00001*n) or O(1), depending on the btrees etc | 19:49 |
gordc | ... some reason i read that as 31min | 19:49 |
*** aagate has joined #openstack-telemetry | 19:49 | |
jd_ | nicodemus_: good question, I'm on PTO this week (I know, you can't tell) but i'll need to chat with gordc and sileht if we're ready or not; seems to me we are now, I hope they fix all the bug by the time I come back | 19:50 |
jd_ | Anticimex: ok, I thought 31s was your optimization result :) good then! | 19:50 |
gordc | jd_: i hope they do too. | 19:50 |
Anticimex | jd_: aha, no.. :) | 19:50 |
jd_ | gordc: /nick they | 19:50 |
gordc | yep. they is a very good worker. | 19:51 |
*** openstackgerrit has joined #openstack-telemetry | 20:03 | |
openstackgerrit | gordon chung proposed openstack/python-pankoclient master: clean up utils https://review.openstack.org/467745 | 20:03 |
Anticimex | jd_: more accurately, O(n) -> O(log n) (had to refresh): http://bigocheatsheet.com/ | 20:16 |
openstackgerrit | gordon chung proposed openstack/python-pankoclient master: move shell out of osc https://review.openstack.org/465725 | 20:24 |
*** fguillot has quit IRC | 21:11 | |
*** donghao has joined #openstack-telemetry | 21:15 | |
*** dave-mccowan has quit IRC | 21:20 | |
*** donghao has quit IRC | 21:20 | |
*** pradk has quit IRC | 21:29 | |
*** pradk has joined #openstack-telemetry | 21:31 | |
*** thorst has joined #openstack-telemetry | 21:47 | |
*** rwsu has quit IRC | 22:15 | |
*** vint_bra has quit IRC | 22:18 | |
*** cdent has quit IRC | 22:18 | |
*** adriant has joined #openstack-telemetry | 22:20 | |
*** thorst has quit IRC | 22:23 | |
openstackgerrit | gordon chung proposed openstack/panko master: support uwsgi https://review.openstack.org/467796 | 22:47 |
openstackgerrit | Merged openstack/ceilometer stable/newton: Add support of refereshing the resource info in local cache https://review.openstack.org/467012 | 22:48 |
*** gordc has quit IRC | 22:52 | |
*** pradk has quit IRC | 22:57 | |
*** nicodemus_ has quit IRC | 23:01 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!