openstackgerrit | gordon chung proposed openstack/gnocchi master: WIP bucketise incoming https://review.openstack.org/441389 | 00:05 |
---|---|---|
*** gordc has quit IRC | 00:12 | |
*** catintheroof has quit IRC | 00:31 | |
*** iceyao has joined #openstack-telemetry | 00:38 | |
*** zhurong has joined #openstack-telemetry | 01:15 | |
*** gongysh has joined #openstack-telemetry | 01:49 | |
*** thorst has quit IRC | 01:58 | |
*** lhx__ has joined #openstack-telemetry | 01:58 | |
*** vint_bra has joined #openstack-telemetry | 02:10 | |
*** vint_bra has left #openstack-telemetry | 02:12 | |
*** thorst has joined #openstack-telemetry | 02:21 | |
*** thorst has quit IRC | 02:23 | |
*** g3ek has quit IRC | 02:54 | |
*** g3ek has joined #openstack-telemetry | 03:01 | |
*** chlong_ has joined #openstack-telemetry | 03:15 | |
*** chlong has quit IRC | 03:15 | |
*** thorst has joined #openstack-telemetry | 03:24 | |
*** thorst has quit IRC | 03:28 | |
*** oomichi has quit IRC | 03:30 | |
*** oomichi has joined #openstack-telemetry | 03:33 | |
*** rbak has joined #openstack-telemetry | 03:48 | |
*** g3ek has quit IRC | 03:59 | |
*** thorst has joined #openstack-telemetry | 04:00 | |
*** zhurong has quit IRC | 04:01 | |
*** thorst has quit IRC | 04:03 | |
*** g3ek has joined #openstack-telemetry | 04:05 | |
*** rbak has quit IRC | 04:18 | |
*** links has joined #openstack-telemetry | 04:41 | |
*** iceyao has quit IRC | 04:53 | |
*** thorst has joined #openstack-telemetry | 05:04 | |
*** dhellman_ has joined #openstack-telemetry | 05:17 | |
*** iceyao has joined #openstack-telemetry | 05:18 | |
*** thorst has quit IRC | 05:19 | |
*** iceyao has quit IRC | 05:23 | |
*** dhellman_ has quit IRC | 05:24 | |
*** gongysh has quit IRC | 05:31 | |
*** gongysh has joined #openstack-telemetry | 05:32 | |
*** adriant has quit IRC | 05:39 | |
*** nadya has joined #openstack-telemetry | 05:39 | |
*** Andrew_jedi has joined #openstack-telemetry | 05:51 | |
*** Jack_Iv has joined #openstack-telemetry | 05:59 | |
*** yprokule has joined #openstack-telemetry | 06:00 | |
*** nadya has quit IRC | 06:12 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/panko master: Imported Translations from Zanata https://review.openstack.org/442293 | 06:14 |
*** thorst has joined #openstack-telemetry | 06:16 | |
*** thorst has quit IRC | 06:20 | |
*** lhx__ has quit IRC | 06:21 | |
*** lhx__ has joined #openstack-telemetry | 06:22 | |
*** nadya has joined #openstack-telemetry | 06:25 | |
*** g3ek has quit IRC | 06:32 | |
*** g3ek has joined #openstack-telemetry | 06:41 | |
*** pcaruana has joined #openstack-telemetry | 06:43 | |
*** Jack_Iv has quit IRC | 06:46 | |
*** Jack_Iv has joined #openstack-telemetry | 06:48 | |
*** nadya has quit IRC | 06:50 | |
*** Andrew_jedi has quit IRC | 06:53 | |
*** Jack_Iv has quit IRC | 06:56 | |
*** Gautam has joined #openstack-telemetry | 07:00 | |
*** Andrew_jedi has joined #openstack-telemetry | 07:11 | |
*** thorst has joined #openstack-telemetry | 07:17 | |
*** iceyao has joined #openstack-telemetry | 07:19 | |
*** thorst has quit IRC | 07:21 | |
*** nadya has joined #openstack-telemetry | 07:23 | |
*** iceyao has quit IRC | 07:24 | |
*** nadya has quit IRC | 07:33 | |
*** rcernin has joined #openstack-telemetry | 07:40 | |
*** tesseract has joined #openstack-telemetry | 07:43 | |
*** lhx__ has quit IRC | 07:53 | |
*** lhx__ has joined #openstack-telemetry | 07:53 | |
*** dschultz has quit IRC | 07:54 | |
*** donghao has joined #openstack-telemetry | 08:05 | |
*** donghao has quit IRC | 08:11 | |
*** thorst has joined #openstack-telemetry | 08:18 | |
*** chlong_ has quit IRC | 08:18 | |
*** thorst has quit IRC | 08:22 | |
*** Andrew_jedi has left #openstack-telemetry | 08:23 | |
*** Gautam has quit IRC | 08:23 | |
*** sanchitmalhotra has quit IRC | 08:27 | |
*** sanchitmalhotra has joined #openstack-telemetry | 08:28 | |
*** Jack_I has joined #openstack-telemetry | 08:29 | |
*** amoralej|off is now known as amoralej | 08:31 | |
*** shardy has joined #openstack-telemetry | 08:43 | |
*** sudipto has joined #openstack-telemetry | 08:50 | |
*** sudipto_ has joined #openstack-telemetry | 08:50 | |
*** dschultz has joined #openstack-telemetry | 08:55 | |
*** Gautam has joined #openstack-telemetry | 08:55 | |
*** dschultz has quit IRC | 08:59 | |
*** openstackgerrit has quit IRC | 09:03 | |
*** thorst has joined #openstack-telemetry | 09:18 | |
*** thorst has quit IRC | 09:23 | |
*** thorst has joined #openstack-telemetry | 09:40 | |
*** thorst has quit IRC | 09:44 | |
*** flwang1 has joined #openstack-telemetry | 10:06 | |
flwang1 | jd_: ping | 10:08 |
flwang1 | how can i get the sample id when i using 'sample-list'? thanks | 10:08 |
*** lhx__ has quit IRC | 10:20 | |
*** lhx_ has joined #openstack-telemetry | 10:20 | |
*** openstackgerrit has joined #openstack-telemetry | 10:40 | |
openstackgerrit | liusheng proposed openstack/python-pankoclient master: Modify the doc descriptions of pankoclient https://review.openstack.org/441848 | 10:40 |
*** thorst has joined #openstack-telemetry | 10:40 | |
*** thorst has quit IRC | 10:45 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/aodh master: Imported Translations from Zanata https://review.openstack.org/441876 | 10:46 |
*** Jack_Iv has joined #openstack-telemetry | 10:51 | |
*** Jack_Iv has quit IRC | 10:53 | |
*** Jack_Iv has joined #openstack-telemetry | 10:53 | |
*** donghao has joined #openstack-telemetry | 10:54 | |
*** Jack_Iv has quit IRC | 10:55 | |
*** Jack_Iv has joined #openstack-telemetry | 10:55 | |
*** donghao has quit IRC | 11:00 | |
*** Gautam has quit IRC | 11:03 | |
*** Gautam has joined #openstack-telemetry | 11:04 | |
*** zhurong has joined #openstack-telemetry | 11:07 | |
*** Gautam has quit IRC | 11:08 | |
*** masber has quit IRC | 11:09 | |
*** masber has joined #openstack-telemetry | 11:10 | |
*** gongysh has quit IRC | 11:16 | |
*** catintheroof has joined #openstack-telemetry | 11:20 | |
*** cdent has joined #openstack-telemetry | 11:21 | |
*** iceyao has joined #openstack-telemetry | 11:22 | |
*** iceyao has quit IRC | 11:27 | |
*** g3ek has quit IRC | 11:31 | |
*** Gautam has joined #openstack-telemetry | 11:38 | |
*** g3ek has joined #openstack-telemetry | 11:40 | |
*** thorst has joined #openstack-telemetry | 11:41 | |
*** Gautam has quit IRC | 11:42 | |
*** thorst has quit IRC | 11:46 | |
*** dschultz has joined #openstack-telemetry | 11:57 | |
*** vint_bra has joined #openstack-telemetry | 11:58 | |
*** vint_bra has quit IRC | 11:59 | |
*** dschultz has quit IRC | 12:02 | |
flwang1 | jd_: sileht: ping | 12:05 |
flwang1 | we're running into a problem of instance status | 12:05 |
flwang1 | after shelve/unshelve, the instance is always showing active in the samples | 12:06 |
flwang1 | did you see this before? | 12:06 |
*** Jack_Iv has quit IRC | 12:17 | |
*** gkadam has quit IRC | 12:25 | |
*** catinthe_ has joined #openstack-telemetry | 12:29 | |
*** catintheroof has quit IRC | 12:29 | |
*** david-lyle has quit IRC | 12:30 | |
*** catintheroof has joined #openstack-telemetry | 12:30 | |
*** catinthe_ has quit IRC | 12:31 | |
*** thorst has joined #openstack-telemetry | 12:33 | |
*** david-lyle has joined #openstack-telemetry | 12:33 | |
*** shardy has quit IRC | 12:34 | |
*** zhurong has quit IRC | 12:35 | |
*** gkadam has joined #openstack-telemetry | 12:36 | |
*** Jack_Iv has joined #openstack-telemetry | 12:36 | |
*** gkadam has quit IRC | 12:43 | |
*** lhx_ has quit IRC | 12:54 | |
*** iceyao has joined #openstack-telemetry | 12:57 | |
*** dschultz has joined #openstack-telemetry | 13:01 | |
*** gongysh has joined #openstack-telemetry | 13:06 | |
*** gordc has joined #openstack-telemetry | 13:16 | |
*** dschultz has quit IRC | 13:17 | |
gordc | jd_: added some items to https://etherpad.openstack.org/p/gnocchi-incoming-bucket-scheduler to get details on non-metricd bucket handling | 13:24 |
gordc | do you need me to add my proposal? | 13:24 |
*** amoralej is now known as amoralej|lunch | 13:24 | |
*** links has quit IRC | 13:25 | |
*** Gautam has joined #openstack-telemetry | 13:28 | |
sileht | gordc, you haven't test last kombu on windows, seriously ? ;) | 13:33 |
openstackgerrit | Merged openstack/gnocchi master: cleanup unused var https://review.openstack.org/442864 | 13:34 |
jd_ | gordc: I'll reply inline, I think I understood your current code/proposal though so no need to write one, but if you find flaws in my mind let me know :) | 13:35 |
gordc | sileht: i know. i should stop being so lazy :P | 13:44 |
gordc | jd_: ack | 13:44 |
*** lhx_ has joined #openstack-telemetry | 13:46 | |
*** Jack_Iv has quit IRC | 13:50 | |
gordc | jd_: added notes. | 13:50 |
*** sileht has quit IRC | 13:51 | |
*** donghao has joined #openstack-telemetry | 13:54 | |
*** sileht has joined #openstack-telemetry | 13:55 | |
*** efoley has joined #openstack-telemetry | 13:57 | |
*** sileht has quit IRC | 13:59 | |
*** zhurong has joined #openstack-telemetry | 14:01 | |
*** sileht has joined #openstack-telemetry | 14:04 | |
*** sileht has quit IRC | 14:04 | |
jd_ | gordc: me too :) | 14:07 |
*** sileht has joined #openstack-telemetry | 14:08 | |
*** sileht has quit IRC | 14:08 | |
*** sileht has joined #openstack-telemetry | 14:10 | |
*** sileht has quit IRC | 14:11 | |
*** nadya has joined #openstack-telemetry | 14:12 | |
openstackgerrit | Merged openstack/aodh master: Switch to use stable data_utils https://review.openstack.org/442783 | 14:12 |
*** shardy has joined #openstack-telemetry | 14:13 | |
*** pradk has quit IRC | 14:17 | |
*** dave-mccowan has joined #openstack-telemetry | 14:20 | |
*** sileht has joined #openstack-telemetry | 14:26 | |
*** amoralej|lunch is now known as amoralej | 14:26 | |
gordc | jd_: reply added. | 14:26 |
gordc | i think i'll just push the indexing stuff to another patch for now. | 14:26 |
*** nadya has quit IRC | 14:27 | |
gordc | be easier to review anyways | 14:27 |
*** iceyao has quit IRC | 14:32 | |
jd_ | why doing PTG when you can spend hours debating design with gordc on an Etherpad :p | 14:34 |
gordc | jd_: this is very efficient. it's like communicating with mars. type message, send, wait 15 minutes, read, repeat :) | 14:36 |
jd_ | haha | 14:37 |
jd_ | TBH i am not sure it's worst than me trying to explain this in oral English real time with a time constraint of 40 minutes :p | 14:37 |
gordc | true. i cannot focus 40mins anyways. | 14:37 |
*** g3ek has quit IRC | 14:37 | |
* gordc is not listening at design summit | 14:38 | |
*** gongysh has quit IRC | 14:42 | |
*** nadya has joined #openstack-telemetry | 14:42 | |
*** g3ek has joined #openstack-telemetry | 14:47 | |
*** sileht has quit IRC | 14:47 | |
*** chlong_ has joined #openstack-telemetry | 14:48 | |
*** rbak has joined #openstack-telemetry | 14:53 | |
*** sileht has joined #openstack-telemetry | 14:54 | |
*** sileht has quit IRC | 14:54 | |
*** efoley has quit IRC | 14:55 | |
*** jahsis has joined #openstack-telemetry | 15:01 | |
jahsis | Hello all, I installed newton gnocchi on ubuntu 16.04, but when I try to run gnocchi-api, I receive error: | 15:02 |
jahsis | gnocchi-api: error: unrecognized arguments: --config-file=/etc/gnocchi/gnocchi.conf --log-file=/var/log/gnocchi/gnocchi-api.log | 15:02 |
jahsis | anyone had same issue and know how to solve it? | 15:02 |
*** fguillot has joined #openstack-telemetry | 15:05 | |
*** zhurong has quit IRC | 15:05 | |
*** sileht has joined #openstack-telemetry | 15:13 | |
*** pradk has joined #openstack-telemetry | 15:13 | |
gordc | jahsis: http://gnocchi.xyz/running.html#running-as-a-wsgi-application | 15:13 |
jahsis | gordc, thanks, but I think than ubuntu package should'nt have gnocchi-api service. Looks like issue in ubuntu package. | 15:17 |
gordc | ah, i see | 15:19 |
jd_ | gordc: I replied again but I think I'm starting where we diverge mainly | 15:20 |
*** Gautam has quit IRC | 15:20 | |
jd_ | gordc: it seems you think than computing a hash is a slow operation | 15:20 |
jd_ | whereas it should be considered almost as a noop, especially compared to database CRUD | 15:20 |
gordc | jd_: partly yes, but mainly storing the index so we can change number of buckets whenever we want. | 15:21 |
jd_ | I replied to that also | 15:21 |
gordc | well again. the db CRUD is done regardless | 15:21 |
jd_ | this is a myth you built I think | 15:21 |
jd_ | there is no need to change the number of buckets | 15:21 |
jd_ | consistent hashing is exactly what people built to avoid all the problem you want to implement :p | 15:22 |
jd_ | e.g. rebalancing | 15:22 |
jd_ | also I hate encoding, trying to fix the tooz bug you found :( | 15:23 |
gordc | jd_: this is fair. it's one thing to say we don't allow changing buckets (then i'm ok with no indexing reqs) | 15:24 |
gordc | but if we allow them to change buckets, then you run into issue how we guarantee all measures are processed | 15:25 |
jd_ | definitely | 15:26 |
jd_ | I think I wrote something at some point saying that it's likely to be doable and easy if every bucket is empty, that could be a first step if we want to implement that one day, but with consistent hashing, changing the number of buckets is to be considered a maintenance operation IMHO | 15:27 |
gordc | i think i mention this in gerrit, but the indexing part is basically just for dynamic bucket changes... and since we have it, it allows us to not compute hash constantly (and add zero additional db calls) | 15:28 |
jd_ | but I think it's totally fine to say that it's not supported for now and provide a good guidance on the default number of buckets you need | 15:28 |
gordc | jd_: if we don't support changing buckets, i don't think we need hashring either | 15:29 |
jd_ | gordc: ? | 15:29 |
gordc | it is much simpler to take metric.id mod buckets to find where to send it | 15:30 |
gordc | we will never need to rebalance | 15:30 |
jd_ | this is what I wrote gordc | 15:30 |
gordc | (becuase we can't) | 15:30 |
gordc | right. | 15:30 |
jd_ | the hashring it used for mapping metricd-workers -> buckets | 15:31 |
jd_ | (via tooz partitioner) | 15:31 |
jd_ | s/it/is/ | 15:31 |
gordc | but there's sorting into buckets step (from api pov) and assigning buckets to metricd pov | 15:31 |
gordc | i don't think it's useful for metricd-workers -> buckets either... i don't see how it's more useful than just dividing them equally? | 15:32 |
gordc | ie if i have 3 metricd, at start, a -> [1,2], b -> [3,4], c->[5,6] | 15:34 |
jd_ | how do you do this mapping then? | 15:35 |
jd_ | how a know it has to pick 1 and 2 ? | 15:35 |
gordc | if b leaves, it's very simple to just have take a -> 3, c -> 4, | 15:35 |
gordc | you konw how many buckets there are. | 15:35 |
gordc | you know how many metricd there are | 15:35 |
*** links has joined #openstack-telemetry | 15:36 | |
gordc | jd_: https://review.openstack.org/#/c/441389/4/gnocchi/cli.py if you look at line 170 | 15:37 |
gordc | hmm. i think there might be some concurrency issue in the code. ignore that :) | 15:40 |
jd_ | yeah that works too, though the rebalancing is going to be much heavier | 15:42 |
*** andreaf has joined #openstack-telemetry | 15:42 | |
jd_ | every metricd is going to lose some of its sacks and gain new sacks to take care | 15:42 |
*** yprokule has quit IRC | 15:42 | |
jd_ | whereas the hashring would limit that IIUC | 15:43 |
jd_ | (not saying it's particularly a huge problem in metricd's context though) | 15:43 |
andreaf | sileht, jd_, gordc: do you have any idea about what may be wrong with the grenade job on https://review.openstack.org/#/c/439414 ? Do you see that in other patches? | 15:43 |
gordc | sure, but it doesn't have make any new connections or anything? it's basically, interval1-> look at folder1, folder2. interval2 -> look at folder3, folder4? | 15:44 |
jd_ | andreaf: I think gordc fixed that but we're blocked by requirements | 15:44 |
jd_ | andreaf: IIUC | 15:44 |
jd_ | gordc: I didn't get your interval thing | 15:44 |
andreaf | jd_: heh ok, good to know, I will stop rechecking | 15:44 |
andreaf | jd_: thanks | 15:44 |
gordc | andreaf: (i think) i fixed it :) | 15:45 |
andreaf | gordc: ok another recheck coming then | 15:45 |
gordc | andreaf: https://review.openstack.org/#/c/442913/ | 15:45 |
gordc | well it's not merged | 15:45 |
catintheroof | jd_: hi! do you have an example on how the coordinator url should be configured on gnocchi if i want to use FILE coordinator ? | 15:45 |
gordc | we're blocked by some other pbr patch | 15:45 |
jd_ | catintheroof: file:///some/directory | 15:45 |
gordc | jd_: so metricd has processing_interval, basically every 60s, it will run through it's buckets and dump all the metrics_with_measures | 15:46 |
catintheroof | jd_: thx ! | 15:46 |
gordc | if my buckets change, the next interval, i just dump different buckets. | 15:46 |
andreaf | gordc: doh, right | 15:46 |
andreaf | gordc: I was too quick, sorry | 15:46 |
gordc | i think the rebalancing is important if each of those buckets required a separate connection, or if we had some localise stuff | 15:47 |
gordc | this is my understanding | 15:47 |
jd_ | gordc: if a scheduler rebalance too much (imagine an extreme case where 50% of it change) then it will add measures in its Queue that are already in another Queue, putting a lot of workers on contention because they will try to do the same thing | 15:48 |
jd_ | gordc: I'm not saying it's going to happen every 5 minutes lol but that might happen | 15:49 |
jd_ | gordc: the hashring has the good idea to solve that IIRC for free so | 15:49 |
jd_ | if it's free… :P | 15:49 |
gordc | jd_: ah, yes, in that case. | 15:50 |
gordc | *shrugs* i'll add a note. we can't change buckets anyways | 15:50 |
gordc | :P | 15:50 |
jd_ | it also provides replicas, which might be useful at some point? I wonder | 15:51 |
gordc | maybe, if you change the model of single scheduler/metricd | 15:52 |
gordc | but that just gives you the contentions i would think. | 15:52 |
*** sudipto has quit IRC | 15:52 | |
jd_ | right | 15:52 |
*** sudipto_ has quit IRC | 15:52 | |
jd_ | my main point is that I really don't want to involve the indexer in any of this as tooz ought to be enough | 15:53 |
jd_ | I'd be willing to remove as much as possible access to the indexer even if what we have currently (as you mentioned the AP details) | 15:54 |
gordc | lol. i think you're going to end up with a lot of duplicate data in indexer/storage... and really really long names to parse | 15:55 |
*** jahsis has quit IRC | 15:56 | |
jd_ | long names to parse? | 15:56 |
gordc | well you might need to store all that archive info in the object names | 15:58 |
gordc | i don't know ... just random musings | 15:58 |
*** rcernin has quit IRC | 15:58 | |
sileht | ceph doesn't support long name | 16:04 |
sileht | if ext4 is used 256 chars max I think | 16:04 |
*** donghao has quit IRC | 16:06 | |
gordc | sileht: i'll let you folks figure it out :) | 16:10 |
gordc | jd_: so i have one minor concern, where are we specifying total buckets? in conf? | 16:11 |
gordc | this might get really buggy, if all the conf files don't have same value... | 16:12 |
gordc | i don't know how big an issue this is. | 16:13 |
sileht | gordc, we can share the conf value throught the coordinator capability and raise warning in log if miss configred | 16:14 |
jd_ | conf file is ok gordc , they have to be identical everywhere | 16:15 |
jd_ | gnocchi does not work if you specify different database either, it's no different… :) | 16:15 |
sileht | haha :) | 16:16 |
gordc | jd_: have you tried? you have no proof it doesn't work :P | 16:16 |
jd_ | gordc: true. | 16:17 |
jd_ | it'll work with quantum computing, but not yet | 16:17 |
gordc | did we figure out why integration gate still fails randomly? the console shows no alarm, instance, stack | 16:20 |
gordc | http://logs.openstack.org/95/442395/1/gate/gate-telemetry-dsvm-integration-gnocchi-ubuntu-xenial/90640b3/console.html#_2017-03-08_02_48_38_888924 | 16:20 |
gordc | some reason gnocchi/panko return stuff fine. | 16:20 |
*** Kevin_Zheng has quit IRC | 16:23 | |
*** nadya has quit IRC | 16:26 | |
jd_ | gordc: no idea :( | 16:30 |
*** dschultz has joined #openstack-telemetry | 16:40 | |
*** nicodemus_ has joined #openstack-telemetry | 16:40 | |
nicodemus_ | hello | 16:41 |
nicodemus_ | I'm doing some testing with gnocchi stable/3.1 with S3 backend, and gnocchi-metricd is showing a repeating error: http://paste.openstack.org/show/601960/ | 16:43 |
nicodemus_ | I'm not quite sure if it's regarding S3, or perhaps the coordination... has anyone seen this error? | 16:44 |
gordc | i don't recall seeing that, but it's not related to s3... the queue is between scheduler worker giving groups metric_ids to processing workers | 16:48 |
*** links has quit IRC | 16:55 | |
openstackgerrit | Merged openstack/gnocchi master: simplify swift report https://review.openstack.org/441956 | 16:57 |
nicodemus_ | gordc, so the coordination looks like a more plausible culprit | 17:01 |
*** cdent has quit IRC | 17:01 | |
*** nadya has joined #openstack-telemetry | 17:03 | |
*** lhx_ has quit IRC | 17:16 | |
jd_ | nicodemus_: what's your coordination? | 17:18 |
*** tesseract has quit IRC | 17:18 | |
jd_ | I never saw that error either | 17:18 |
*** rbak has quit IRC | 17:20 | |
*** magicboiz has joined #openstack-telemetry | 17:20 | |
nicodemus_ | jd_, it's Redis, but since it's on AWS I'm not 100% sure it's deployed properly. I'm double-checking if with file for coordination changes | 17:20 |
magicboiz | Hi, Anyone has seen gnocchi/ceilometer error "Failed to connect to db, purpose metering retry later: 'NoneType' object has no attribute 'find'" before?? http://paste.openstack.org/show/601901/ | 17:21 |
*** rbak has joined #openstack-telemetry | 17:21 | |
jd_ | magicboiz: lol yes it happens if you don't set any database url for ceilometer api | 17:21 |
gordc | jd_: :/ so i'm trying it out right now... if we change buckets, we don't lose measures. we just have stale measures that are visible but can never be processed/removed | 17:23 |
*** vint_bra has joined #openstack-telemetry | 17:24 | |
gordc | which is worse. | 17:24 |
magicboiz | jd_:: in my config I have set something like: | 17:24 |
magicboiz | [dispatcher_gnocchi] | 17:24 |
magicboiz | filter_service_activity = False | 17:24 |
magicboiz | url = <<http://x.x.x.x:gnocchi_api_port>> | 17:24 |
jd_ | gordc: I'm lacking context here :) | 17:24 |
magicboiz | jd_: you mean that? | 17:24 |
jd_ | magicboiz: so that means ceilometer agent will send data to gnocchi at this address if asked to do so | 17:24 |
jd_ | magicboiz: no I don't | 17:24 |
jd_ | magicboiz: I know documentation is sparse but did you take a look at it first? it can help understanding how things work | 17:25 |
magicboiz | jd_: yes I did, believe me. And also blogs, etc. ;) | 17:25 |
gordc | magicboiz: ceilometer-api is not relevant if you have gnocchi. | 17:26 |
magicboiz | jd_: actually, I'm trying to debug why kolla (official openstack project) deploys ceilometer with gnocchi as backend and it fails.... | 17:26 |
magicboiz | gordc: why? | 17:26 |
jd_ | you mean ceilometer (official openstack project) I imagine | 17:26 |
jd_ | :p | 17:26 |
jd_ | gordc: why??? | 17:27 |
magicboiz | jd_: no, I mean kolla: https://docs.openstack.org/developer/kolla/ | 17:27 |
gordc | jd_: magicboiz: um because it's not?lol | 17:27 |
gordc | gnocchi is an alternative to ceilometer+mongodb not an alternative to mongodb | 17:28 |
jd_ | magicboiz: do you mean kolla or kolla (official openstack project)? | 17:28 |
jd_ | ok jk | 17:28 |
magicboiz | gordc: is not possible to setup gnocchi as backend for ceilometer, instead of using mongodb, while keeping ceilometer running? | 17:29 |
gordc | jd_: scheduler puts metric_ids to process on queue, not where it is. we compute that later. | 17:30 |
gordc | magicboiz: ceilometer is many parts. not just an api, agents still run | 17:31 |
gordc | they just write to gnocchi. | 17:31 |
jd_ | gordc: right that's probably not enough, you'd need to push bucket+metric+files | 17:31 |
magicboiz | according to https://docs.openstack.org/developer/ceilometer/architecture.html#storing-accessing-the-data, it is.... | 17:32 |
gordc | huh? | 17:32 |
gordc | if you notice, there is no ceilometer-api listed anywhere | 17:33 |
jd_ | I second that huh | 17:33 |
magicboiz | gordc: yes, I have ceilometer-agent-compute, collector, etc etc. But my last goal is to eliminate mongo from my deployment, while keeping ceilometer running... | 17:33 |
gordc | or if there is. it's wrong. | 17:33 |
jd_ | IIUC magicboiz wants ceilometer-api that uses gnocchi as a backend | 17:33 |
jd_ | but since that's not possible he is going to keep asking until we say how to do that :p | 17:33 |
gordc | magicboiz: https://docs.openstack.org/developer/ceilometer/architecture.html#high-level-architecture the pipeline pushes straight to gnocchi. | 17:34 |
magicboiz | gordc: ok, so why error "Failed to connect to db, purpose metering retry later: 'NoneType' object has no attribute 'find'"?? http://paste.openstack.org/show/601901/ | 17:35 |
*** nadya has quit IRC | 17:39 | |
gordc | jd_: i could push bucket+metric. but sigh... more changes :P | 17:40 |
jd_ | gordc: that can be a first change before anything? that'll help the further optimization idea anyway | 17:40 |
gordc | jd_: possibly. it just bothers me that they are connected but we're disconnecting them and having to deal with it in many places | 17:42 |
*** shardy has quit IRC | 17:45 | |
gordc | jd_: so, bucket+metric works when processing. but it's broken on delete. | 17:57 |
gordc | we actually can't figure out how to delete if bucket changes. | 17:58 |
gordc | (and unprocessed stuff remains) | 17:58 |
jd_ | why would bucket change? | 17:59 |
jd_ | gordc: ^ | 18:00 |
gordc | jd_: well it's a conf option and people are people | 18:01 |
jd_ | gordc: the number of bucket you mean? | 18:01 |
gordc | right | 18:01 |
jd_ | but it does not change, it's an init parameter and they are created on upgrade | 18:01 |
jd_ | you don't even have to put it in a conf I guess | 18:01 |
jd_ | gnocchi-upgrade --bucket 1024 | 18:02 |
jd_ | and voila | 18:02 |
gordc | basically, if it ever gets changed, there's a good chance, there's goingn to be stuff that we can never figure out how to delete but will be visible | 18:02 |
jd_ | but … it does not change | 18:02 |
jd_ | we already decided that | 18:02 |
gordc | that's what i was planning :) but how would metricd know how many bucket there are? | 18:02 |
jd_ | also it does not work if the ceph pool is deleted you know | 18:02 |
jd_ | gordc: ls? | 18:03 |
gordc | it has to count buckets on start? | 18:03 |
jd_ | yeah | 18:03 |
jd_ | if it's slow we can store it in the storage driver | 18:03 |
jd_ | whatever | 18:03 |
gordc | this seems very fragile. | 18:05 |
jd_ | how so? | 18:05 |
gordc | well this is all dependent on user not ever changing bucket | 18:06 |
jd_ | facepalm | 18:06 |
jd_ | but it can't change bucket | 18:06 |
jd_ | everything depends on user not deleting the ceph pool or shutting the sql server too | 18:07 |
jd_ | lol | 18:07 |
jd_ | gnocchi-upgrade --bucket=32 then you create 32 containers in swift and you write 32 in an object so you don't have to list the buckets and done | 18:07 |
gordc | but that's castrophic at least | 18:07 |
jd_ | next time you call upgrade it knows how many buckets they are | 18:07 |
jd_ | hahaha | 18:07 |
jd_ | so it's very fragile, but not enough, I get it | 18:08 |
jd_ | :P | 18:08 |
gordc | this will look like it's still working but there's jsut crap that you can't access but can see | 18:08 |
gordc | exactly :) | 18:08 |
jd_ | so IF the user connects to swift and manipulates containers it will fail indeed | 18:08 |
jd_ | same if it types random sql statements | 18:08 |
jd_ | :D | 18:08 |
gordc | well that's not through our api | 18:08 |
gordc | i'm just saying, my solution protects more stupidity | 18:09 |
gordc | :P | 18:09 |
jd_ | how so? if the user connects to the database and changes things, how does it work? | 18:11 |
jd_ | like the sack of a metric | 18:11 |
gordc | they wouldn't be connecting to db themselves? | 18:12 |
gordc | the idea was no matter what, index knows where it is writing to. whatever process that changes it, knows it has to cleanup the previous place after change. | 18:13 |
gordc | i'll push no-indexer-change patch soon. we can see if we need more idiot-proofing | 18:15 |
jd_ | gordc: so… they would not connect to db but they would connect to swift to create new bucket or change a file? | 18:15 |
jd_ | cmon :p | 18:16 |
jd_ | (change a file in swift I mean) | 18:16 |
gordc | huh? why would they connect to swift? | 18:16 |
jd_ | to change the buckets | 18:17 |
jd_ | since that's the only way to do it | 18:17 |
jd_ | [19:07:44] <jd_>gnocchi-upgrade --bucket=32 then you create 32 containers in swift and you write 32 in an object so you don't have to list the buckets and done | 18:17 |
gordc | you have an upgrade-agent compute new bucket location. if it changes, update bucket in indexer, process any old stuff in previous bucket, next... | 18:17 |
jd_ | if you do that then it's _impossible_ to change the bucket without doing things manually in the storage backend | 18:17 |
gordc | whys that? it's just not one step. | 18:18 |
jd_ | to create buckets? | 18:18 |
jd_ | what's not? | 18:19 |
gordc | gnocchi-upgrade already creates buckets? | 18:19 |
gordc | why would you need to do it manually? | 18:19 |
jd_ | no need | 18:19 |
jd_ | it's you inventing users that mess with buckets | 18:19 |
jd_ | so i'm trying to demonstrate what would be required to do so | 18:20 |
jd_ | mess with the buckets | 18:20 |
gordc | they run gnocchi-upgrade --bucket 32 and then gnocchi-upgrade --bucket 64 | 18:20 |
jd_ | "don't mess with the buckets boyzz 𝅘𝅥𝅮" | 18:20 |
gordc | that doesn't seem hard for user | 18:20 |
jd_ | gordc: ERROR | 18:20 |
jd_ | they are already 32 buckets | 18:20 |
jd_ | move on boyz | 18:21 |
jd_ | that's _easy_ no? | 18:21 |
jd_ | "don't mess with the buckets boyzz 𝅘𝅥𝅮" | 18:21 |
gordc | lol what if they underestimated target size? | 18:21 |
gordc | start over? | 18:21 |
jd_ | gordc: RTFM? | 18:21 |
jd_ | yep | 18:21 |
jd_ | as I said 3 or 4 times today, it's exactly what Swift did for a few years :) | 18:22 |
gordc | then why don't we just set it ourselves to a really big number? | 18:22 |
gordc | and make it constant | 18:22 |
jd_ | then it's not impossible to implement number of bucket change but it's not the first feature I'd do | 18:22 |
jd_ | gordc: I think it's a real option | 18:22 |
jd_ | having a default of 2^12 for example | 18:22 |
jd_ | which should be large enough for most people | 18:22 |
gordc | did i upload that patch? i had it as a constant. | 18:22 |
openstackgerrit | Julien Danjou proposed openstack/ceilometer master: agent: start coordinator at run() and never stops https://review.openstack.org/443267 | 18:25 |
openstackgerrit | Julien Danjou proposed openstack/ceilometer master: agent: only create partition coordinator if backend url provided https://review.openstack.org/443268 | 18:25 |
openstackgerrit | Julien Danjou proposed openstack/ceilometer master: coordination: create coordinator at init time https://review.openstack.org/443269 | 18:25 |
openstackgerrit | Julien Danjou proposed openstack/ceilometer master: coordination: stop checking for _coordinator to be None https://review.openstack.org/443270 | 18:25 |
openstackgerrit | Julien Danjou proposed openstack/ceilometer master: coordination: remove group_id check https://review.openstack.org/443271 | 18:25 |
openstackgerrit | Julien Danjou proposed openstack/ceilometer master: coordination: fix leave_group() async call https://review.openstack.org/443272 | 18:25 |
openstackgerrit | Julien Danjou proposed openstack/ceilometer master: coordination: make group_id to never be None https://review.openstack.org/443273 | 18:25 |
openstackgerrit | gordon chung proposed openstack/gnocchi master: WIP bucketise incoming https://review.openstack.org/441389 | 18:29 |
jd_ | 6 files changed, 71 insertions(+), 210 deletions(-) | 18:29 |
jd_ | the amount of useless code one can write | 18:29 |
jd_ | oneS | 18:29 |
openstackgerrit | Julien Danjou proposed openstack/ceilometer master: coordination: remove started check https://review.openstack.org/443277 | 18:31 |
flwang1 | gordc: | 18:33 |
flwang1 | (01:05:39) flwang1: we're running into a problem of instance status | 18:33 |
flwang1 | (01:06:06) flwang1: after shelve/unshelve, the instance is always showing active in the samples | 18:33 |
flwang1 | (01:06:11) flwang1: did you see this before? | 18:33 |
flwang1 | jd_: ^ | 18:33 |
flwang1 | thanks in advance | 18:33 |
gordc | flwang1: nope... but i also don't manage a (real) cloud | 18:35 |
flwang1 | gordc: ok, IIRC, instance metrics can be collected by notification and pollster, right? | 18:36 |
gordc | yes | 18:36 |
flwang1 | so will the notification impact the result collected by pollster? | 18:36 |
gordc | no | 18:36 |
*** cdent has joined #openstack-telemetry | 18:36 | |
gordc | or i don't know what you mean | 18:36 |
gordc | they both do their own thing. if you're asking if they generate same meters, then maybe, they will | 18:37 |
*** chlong_ has quit IRC | 18:37 | |
flwang1 | and how the pollster get the metadata of the instance? | 18:37 |
flwang1 | gordc: yep, i'm asking if they may generate same samples | 18:39 |
gordc | i imagine so. you can check admin-guide. there's a chart of meter's source | 18:39 |
flwang1 | gordc: ok, i see. the problem is very weird. i know a bit ceilometer i think, but currently the problem is totally out of my knowledge | 18:41 |
flwang1 | because the status collected by pollster is not correct | 18:41 |
flwang1 | ok, i will go through the code and bug you guys later, thanks a lot | 18:41 |
*** flwang1 has quit IRC | 18:41 | |
gordc | kk, bbl. going to get lunch | 18:41 |
*** rcernin has joined #openstack-telemetry | 18:42 | |
*** chlong_ has joined #openstack-telemetry | 18:53 | |
*** cdent has quit IRC | 19:03 | |
nicodemus_ | Does gnocchi stable/3.1 require a specific gnocchi dispatcher version? I'm using ceilometer mitaka, and for each POST gnocchi resurns 404 but then the dispatcher doesn't create the resource... | 19:07 |
*** cdent has joined #openstack-telemetry | 19:08 | |
*** rwsu has quit IRC | 19:09 | |
*** pcaruana has quit IRC | 19:13 | |
openstackgerrit | Julien Danjou proposed openstack/ceilometer master: coordination: remove started check https://review.openstack.org/443277 | 19:15 |
jd_ | nicodemus_: it should not, check your gnocchiclient version too | 19:16 |
nicodemus_ | jd_, is there a minimum version of gnocchiclient needed for stable/3.1? | 19:17 |
jd_ | nicodemus_: hum the latest one would be recommended | 19:17 |
jd_ | there was also some change with how ID are encoded | 19:17 |
nicodemus_ | jd_, I'll give it a try with the last client then. Thanks! | 19:17 |
jd_ | cool | 19:18 |
*** Jack_I has quit IRC | 19:18 | |
nicodemus_ | jd_, with gnocchiclient==3.1.1 I have the same behavior... the strange thing is that the dispatcher doesn't give a clear error log, it simply says "Not found (HTTP 404)" after each POST to the gnocchi api | 19:22 |
*** g3ek has quit IRC | 19:32 | |
*** flwang1 has joined #openstack-telemetry | 19:39 | |
*** amoralej is now known as amoralej|off | 19:39 | |
*** cdent has quit IRC | 19:40 | |
*** g3ek has joined #openstack-telemetry | 19:41 | |
flwang | jd_: can you remind me how ceilometer get the nova instance list by polling? Thanks | 19:51 |
*** Jack_I has joined #openstack-telemetry | 19:51 | |
flwang | jd_: gordc: in other words, at this line https://github.com/openstack/ceilometer/blob/kilo-eol/ceilometer/compute/pollsters/instance.py#L25 where is the 'resources' coming from? | 19:52 |
nicodemus_ | I'm seeing that gnocchi api stable/3.0 used to send a text/plain reply while stable/3.1 answers with content-type application/json, is that correct? | 19:56 |
gordc | flwang: https://github.com/openstack/ceilometer/blob/kilo-eol/ceilometer/agent/base.py#L123 | 19:58 |
flwang | gordc: cool, i have to admitted i have forgot most of the ceilometer code :) | 19:59 |
flwang | btw, the instance metric is collected by central agent or compute agent? | 19:59 |
gordc | central. (i remember last time it came but, just reminder, it's not there anymore) | 20:00 |
flwang | yep, i know. we're using kilo | 20:01 |
gordc | kk | 20:01 |
gordc | it should be central. but i'm half guessing. 50/50 | 20:01 |
*** narasimha_SV has joined #openstack-telemetry | 20:03 | |
nicodemus_ | Apparently the dispatcher is not creating the resources because the exception it's looking to create it is with a "resource not found" message, not a plain "Not found" message. My question is, under what circumstances could the gnocchi API reply with a "Not found" instead of a "Resource not found"? | 20:04 |
flwang | gordc: could you please let me know how the agent manger get the instance list by discovery, https://github.com/openstack/ceilometer/blob/kilo-eol/ceilometer/agent/base.py#L137 | 20:15 |
flwang | sorry, i don't have much time to understand all the code, it's an urgent issue | 20:15 |
*** chlong_ has quit IRC | 20:16 | |
gordc | flwang: https://github.com/openstack/ceilometer/blob/eb970605d7a7263007f36136bf5ae052cf44984a/ceilometer/compute/discovery.py#L120 | 20:25 |
flwang | gordc: so for kilo, is it here https://github.com/openstack/ceilometer/blob/kilo-eol/ceilometer/compute/discovery.py#L40 ? | 20:27 |
*** chlong_ has joined #openstack-telemetry | 20:28 | |
flwang | seems it's collected by compute agent | 20:28 |
nicodemus_ | gnocchi's newton dispatcher doesn't seem to work with the latest gnocchi client :( | 20:28 |
flwang | can anybody confirm that? | 20:28 |
*** sergio_ has joined #openstack-telemetry | 20:28 | |
*** sergio_ is now known as Guest88534 | 20:29 | |
gordc | nicodemus_: maybe open a bug? i recall there being some changes in webob which changed some stuff | 20:30 |
nicodemus_ | newton' dispatcher tries to call 'encode_resource_id' from gnocchiclient's utils.py that is present on client 2.7 but not on 3.1.1 | 20:31 |
nicodemus_ | I'm confusing myself with so many versions | 20:32 |
gordc | ocata dispatcher is designed to work against gnocchiclient 3.1 | 20:33 |
gordc | and gnocchi3.1 | 20:33 |
gordc | so if you want gnocchi 3.1, try just taking the code from ocata | 20:33 |
nicodemus_ | gordc, oooh now that makes sense | 20:34 |
gordc | and if you can contribute that to docs, that's even better. | 20:34 |
nicodemus_ | I'd need to recall the whole process to contribute, but I think I can do that :) | 20:35 |
*** rcernin has quit IRC | 20:36 | |
gordc | nicodemus_: no pressure :) | 20:36 |
flwang | gordc: based on this https://github.com/openstack/ceilometer/blob/ffdb2977e36e99528b70540a0c83de04fb13ffd6/setup.cfg#L55 seems the instance metric is collected by compute agent | 20:37 |
nicodemus_ | Let me ask you just one more question before changing for ocata (not related): if I have a resource that has an 'ended_at' date, what would happen if the dispatcher tries to POST new measures for that resource? | 20:37 |
flwang | gordc: pls skip my last msg | 20:38 |
gordc | flwang: yes, seems so https://github.com/openstack/ceilometer/blob/stable/mitaka/ceilometer/compute/pollsters/instance.py | 20:38 |
nicodemus_ | flwang, I believe instance is collected by compute agent. In my case, I use agent-central to poll through SNMP in order to get measures from the hypervisors | 20:38 |
nicodemus_ | and agent-compute for the instances on each compute node | 20:39 |
flwang | gordc: then i think we're running into a weird bug | 20:39 |
gordc | nicodemus_: i imagine you still can. i don't think it changes status of metrics | 20:39 |
flwang | if the instance metric is collected by compute agent | 20:39 |
nicodemus_ | gordc, kk. Thanks! | 20:39 |
flwang | after the instance is shelved, it won't belong to any host technically | 20:39 |
flwang | did i miss anything? | 20:39 |
flwang | with that context, can the compute agent still get the instance which has been shelved? | 20:40 |
gordc | flwang: maybe? i don't know what happens when shelved | 20:40 |
gordc | no | 20:40 |
gordc | it only queries whatever instances nova tells us is on the host | 20:41 |
gordc | (a little different) in ocata | 20:41 |
gordc | still have no idea if can see 'shelved' | 20:41 |
flwang | ok, so now, can we confirm the 'instance' metric is collected by the compute agent instead of central agent? | 20:42 |
flwang | because i think it will impact the final result | 20:42 |
gordc | yes, compute. | 20:42 |
gordc | i mean, it wont' matter since in kilo they would definitely be dependent on what nova tells us is on host | 20:43 |
*** adriant has joined #openstack-telemetry | 20:43 | |
*** Guest88534 has quit IRC | 20:46 | |
openstackgerrit | gordon chung proposed openstack/gnocchi master: push incoming into different sacks https://review.openstack.org/441389 | 20:49 |
flwang | gordc: yep, but if the instance is collected by central agent, then it doesn't make sense to use 'host' as the parameter | 21:02 |
jd_ | nicodemus_: ended_at is just an information field, you can post metric anyway | 21:11 |
nicodemus_ | got it. Thanks jd_ ! | 21:12 |
*** narasimha_SV has quit IRC | 21:13 | |
flwang | gordc: still around? | 21:18 |
flwang | jd_: gordc: i saw there is a discovery_cache https://github.com/openstack/ceilometer/blob/kilo-eol/ceilometer/agent/base.py#L135 | 21:19 |
flwang | so if last time i can see the instance, and this time when polling, the instance is gone, so will the cache be refreshed? | 21:19 |
*** thorst has quit IRC | 21:30 | |
*** thorst has joined #openstack-telemetry | 21:31 | |
*** thorst has quit IRC | 21:35 | |
nicodemus_ | I guess I know the answer, but.. is there any way of forcing gnocchi's URL in ceilometer ocata in the config file? I'm doing some testing and want to use an alternate gnocchi without having to change the endpoint in keystone | 21:36 |
gordc | nicodemus_: um... i imagine there's some endpoint_override param but i'm not entirely sure. | 21:49 |
*** thorst has joined #openstack-telemetry | 21:50 | |
gordc | stepping out. sorry. | 21:50 |
*** gordc has quit IRC | 21:50 | |
*** fguillot has quit IRC | 21:50 | |
openstackgerrit | gordon chung proposed openstack/gnocchi master: push incoming into different sacks https://review.openstack.org/441389 | 21:50 |
*** nicodemus_ has quit IRC | 21:53 | |
*** dave-mccowan has quit IRC | 22:13 | |
*** yassine has quit IRC | 22:13 | |
*** yassine has joined #openstack-telemetry | 22:21 | |
*** chlong_ has quit IRC | 22:24 | |
*** vint_bra has quit IRC | 22:28 | |
*** rwsu has joined #openstack-telemetry | 22:31 | |
*** catintheroof has quit IRC | 22:44 | |
flwang | jd_: ping | 22:53 |
flwang | any telemetry core around? | 22:54 |
*** thorst has quit IRC | 22:54 | |
*** thorst has joined #openstack-telemetry | 22:55 | |
flwang | pls tell me I'm wrong, for this line https://github.com/openstack/ceilometer/blob/kilo-eol/ceilometer/agent/base.py#L135 | 22:55 |
flwang | does that mean the resource list will be cached | 22:55 |
flwang | in other words, even the resource has been deleted, ceilometer will continually insert samples into db? | 22:56 |
*** thorst has quit IRC | 22:59 | |
*** iceyao has joined #openstack-telemetry | 23:04 | |
*** Jack_I has quit IRC | 23:06 | |
*** iceyao has quit IRC | 23:08 | |
*** yassine has quit IRC | 23:16 | |
*** thorst has joined #openstack-telemetry | 23:20 | |
*** thorst has quit IRC | 23:24 | |
*** catintheroof has joined #openstack-telemetry | 23:28 | |
*** g3ek has quit IRC | 23:30 | |
*** joadavis_ has joined #openstack-telemetry | 23:39 | |
*** g3ek has joined #openstack-telemetry | 23:39 | |
*** joadavis has joined #openstack-telemetry | 23:40 | |
*** joadavis_ has quit IRC | 23:41 | |
*** pradk has quit IRC | 23:55 | |
*** david-lyle has quit IRC | 23:56 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!