15:00:10 <eglynn_> #startmeeting ceilometer 15:00:11 <openstack> Meeting started Thu Nov 27 15:00:10 2014 UTC and is due to finish in 60 minutes. The chair is eglynn_. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:15 <openstack> The meeting name has been set to 'ceilometer' 15:00:39 <eglynn_> Happy Thanksgiving y'all :) 15:00:51 <eglynn_> who's all here for the ceilo meeting? 15:00:56 <llu-laptop> o/ 15:01:04 <ityaptin> o/ 15:01:07 <_elena_> o/ 15:01:50 <idegtiarov> o/ 15:01:57 <vrovachev> o/ 15:02:00 <eglynn_> #topic "Kilo-1 blueprints" 15:02:09 <eglynn_> #link https://launchpad.net/ceilometer/+milestone/kilo-1 15:02:39 <sileht> o/ 15:02:45 <gordc> o/ 15:02:49 <eglynn_> a "blocked" blueprint is one for which the corresponding specs review hasn't yet landed 15:02:50 <Guest48080> o/ 15:03:07 <eglynn_> but we expect it to do so, and the implementation work is progressing in parallel 15:03:44 <eglynn_> does anyone have any other BPs other than the 10 listed in https://launchpad.net/ceilometer/+milestone/kilo-1 that they hope to get for k1? 15:04:20 <eglynn_> gordc: e.g. are you thinking about k1 for the notification agent co-ordination stuff? 15:04:49 <gordc> eglynn_: working on it now... i think i might need sileht's listener-pool patch but i'm not exactly sure yet 15:05:01 <gordc> figuring stuff as i go along... 15:05:06 <eglynn_> gordc: cool enough, thanks! 15:05:20 <eglynn_> kilo-1 is Dec 18 BTW 15:05:43 <gordc> i'm going to say maybe 15:06:24 <eglynn_> gordc: coolness, we can add it to kilo-2 say for now and then bump forward to kilo-1 if all goes well? 15:07:29 <gordc> sure. 15:07:35 <eglynn_> cool 15:07:39 <gordc> sorry. in between meetings. :| 15:07:46 <eglynn_> np! 15:07:48 <eglynn_> anything else on kilo-1? 15:08:08 <gordc> nothing for me. i have elasticsearch patch but i don't know how to test it currently. 15:08:44 <eglynn_> gordc: elastisearch also possibly in play for k1, or more likely k2/3? 15:09:13 <gordc> probably same boat as notification... maybe k-1 probably k-2 15:09:19 <eglynn_> cool 15:09:29 <eglynn_> #topic first stable/juno release 15:09:35 <eglynn_> #link https://wiki.openstack.org/wiki/StableBranchRelease#Planned_stable.2Fjuno_releases 15:09:50 <eglynn_> this was planned for today, but pushed back to Dec 4th 15:09:52 <DinaBelova> o/ sorry for the delay, folks :) 15:10:05 <eglynn_> I've started doing backport for likely looking bugs 15:10:10 <eglynn_> *backports 15:10:48 <eglynn_> but if you've a particualr bugfix you've landed for k1 that you think is a good backport candidate 15:11:04 <eglynn_> please tag with "juno-backport-potential" 15:11:12 <DinaBelova> eglynn_, ok, thanks 15:11:40 <eglynn_> #topic "TSDaaS/gnocchi status" 15:11:57 <eglynn_> jd__: anything you want to bring to the table on gnocchi? 15:12:16 <eglynn_> #link https://review.openstack.org/#/q/status:open+project:stackforge/gnocchi,n,z 15:12:31 <eglynn_> progress on refining the RBAC policies 15:12:52 <eglynn_> also a nice patch exposing the back_window in the API 15:13:54 <DinaBelova> eglynn_, after I'll fix the merge agents chain, I'll start working og opentsdb driver 15:14:20 <DinaBelova> and new stackforge repo with co-processor deletion thing for it :) 15:14:28 <DinaBelova> I hope to start tomorrow 15:14:31 <eglynn_> DinaBelova: cool :) 15:14:41 <eglynn_> DinaBelova: BTW I heard yahoo were evaluating opentsdb for their own internal metrics storage (... just a datapoint showing wide adoption) 15:14:59 <DinaBelova> hehe, nice :) 15:15:07 <eglynn_> DinaBelova: co-processor deletion == retention logic ? 15:15:13 <DinaBelova> eglynn_, yes 15:15:16 <eglynn_> cool 15:15:19 <DinaBelova> as said, it'll be some java code 15:15:23 <DinaBelova> but in the separated repo 15:15:27 <DinaBelova> not to mess everyone 15:16:02 <_nadya_> DinaBelova: what about testing? any ideas :)? 15:16:09 <DinaBelova> _nadya_, well :) 15:16:30 <eglynn_> DinaBelova: could this be something that might be contrib'd back to the OpenTSDB folks? 15:16:38 <sileht> what about landing the dispatcher :p ? 15:16:39 <eglynn_> DinaBelova: ... as opposed to maintaining a separate repo 15:16:44 <DinaBelova> _nadya_, currently I wonder how to make integraiton tetsts possible 15:16:53 <DinaBelova> if it's possible at all for out OpenStack gate 15:16:58 <jd__> o/ 15:17:09 <jd__> yeah I think you have a good summary eglynn_ 15:17:12 <jd__> thanks 15:17:17 <eglynn_> jd__: np :) 15:17:20 <DinaBelova> eglynn_, probably that'll be the solution :) 15:17:21 <_nadya_> DinaBelova: TBH, I think it's impossible...We need at least HBase in devstack 15:17:26 <eglynn_> sileht: good point, I was blocked on the glance not setting the user_id on samples 15:17:35 <DinaBelova> let's see if that'll be possible as a part of their retention param impl 15:17:43 <eglynn_> sileht: I need to give that dispatcher patch another run-through 15:17:44 <sileht> eglynn_, I have removed that check 15:17:46 <DinaBelova> _nadya_, yeah, I suppose so 15:17:54 <DinaBelova> although 15:17:59 <eglynn_> sileht: coolness, I'd imagine good to fly in that case 15:18:06 <eglynn_> sileht: will look again after this meeting 15:18:17 <DinaBelova> _nadya_, I still hope to see Hbase devstack change being merged 15:18:44 <sileht> eglynn_, and then we can fix thegnocchi DB schema when we land the first glance resource into gnocchi 15:19:05 <eglynn_> sileht: cool, that sounds reasonable 15:19:32 <jd__> since we have created_by_* now I'm less reluctant to allow null project_id/user_id 15:19:55 <eglynn_> jd__: yeah, good point 15:20:58 <llu-laptop> jd__: but we do have existing metrics with null project_id/user_id, how to handle them? 15:21:16 <jd__> I think that replies to your question llu-laptop, no? 15:22:12 <llu-laptop> we have snmp/ipmi and alos metrics from SDN like opendaylight & opencontrail 15:22:33 <sileht> jd__, Should I accept null project_id too in the dispatcher ? 15:22:48 <ityaptin> DinaBelova: Also co-processor may be useful approach for Ceilometer HBase storage time-to-live. 15:22:50 <jd__> sileht: if we change the schema accordingly yeah 15:22:58 <DinaBelova> ityaptin, yes, fpr sure 15:23:43 <eglynn_> so the idea would be that the created_by_* gives a fallback on which to evaluate RBAC rules if the primary project *or* user ID is null, amiright? 15:25:00 <eglynn_> jd__: I'm resurrecting the Influx driver, was planning to start with a ceilo-specs proposal 15:25:07 <eglynn_> jd__: ... which brings up the question of whether the specs "process" applies in gnocchi-land? 15:25:31 <jd__> eglynn_: maybe, but for now I'd say, don't lose time with that? 15:25:33 <DinaBelova> eglynn_, I've got the same question day ago from nellysmitt 15:25:34 <DinaBelova> :) 15:25:45 <llu-laptop> sorry, maybe i'm out of context here. I think we are talking about the dispatcher to handle null project/uer_id, like https://review.openstack.org/#/c/98798/66/gnocchi/ceilometer/dispatcher.py#Ln94 15:25:50 <jd__> like, what interesting could we discuss on an Influx spec? 15:25:53 <DinaBelova> she was wondering where to start her relationship with gnocchi 15:26:04 <DinaBelova> bugs/blueprints 15:26:09 <DinaBelova> some one place to work in 15:26:29 <DinaBelova> so in future probably it'll be useful 15:26:50 <eglynn_> jd__: fair point :) ... I was thinking of using a spec to get all my ideas straight on mapping gnocchi semantics to influx features 15:26:51 <jd__> I've always have a todo list 15:27:06 <jd__> DinaBelova: just send her on IRC and talk to me? :O 15:27:08 <jd__> :) 15:27:19 <DinaBelova> jd__, it was late :) 15:27:28 <DinaBelova> but anyway, I said her it's A GOOD IDEA :) 15:27:30 <jd__> eglynn_: well if you think you need to write something, yeah why not, I guess it depends on how you work and your degree of confidence 15:27:35 <DinaBelova> sorry for caps 15:27:55 <jd__> eglynn_: I just don't thinks it's a requirement for now… I prefer to see people spending time writing code rather than specs :) 15:28:05 <jd__> DinaBelova: ack :) 15:28:06 <eglynn_> jd__: yeah, understood 15:28:46 <eglynn_> llu-laptop: yes, sileht said he has, or is going to, relax that requirement to have (s['project_id'] and s['user_id']) 15:29:54 <eglynn_> llu-laptop: see https://review.openstack.org/#/c/98798/66..67/gnocchi/ceilometer/dispatcher.py 15:30:12 <eglynn_> llu-laptop: now only the project_id must be set 15:31:01 <eglynn_> move on? 15:31:26 <eglynn_> #topic "Too many identical nova notifications" 15:31:31 <eglynn_> #link https://bugs.launchpad.net/ceilometer/+bug/1396257 15:31:33 <uvirtbot> Launchpad bug 1396257 in ceilometer "Redundant 'instance' samples from nova notifications" [Medium,Confirmed] 15:31:49 <ityaptin> Hi! 15:31:54 <ityaptin> At last 2 weeks many people complained about the fact that the "instance" statistics shows unexpected patterns. 15:32:00 <ityaptin> Now we collect all nova notification with event_type "compute.instance.*" as "instance" sample. This behavior generate too unexpected samples which terrifyingly affect 'instance' statistics. 15:32:01 <eglynn_> so is the key point that we're now triggering of *both* the start & end events? 15:32:11 <ityaptin> During next workflow: create instance, suspend, add floating ip, terminate instance, nova send 28 notifications. 15:32:15 <ityaptin> We collect all! 15:32:27 <ityaptin> There is 16 compute.instance.update messages with minor tasks which are not important for metering, how i think. 15:32:51 <ityaptin> I prepared paste with some data from these messages 15:32:52 <eglynn_> jd__: do you remember the motivation for including both start and end events here? https://review.openstack.org/#/c/38485/7/ceilometer/compute/notifications.py 15:33:04 <ityaptin> http://paste.openstack.org/show/139591/ 15:33:10 <ityaptin> I transformed message dictionary to set of tuples (key, value) and printed difference between current message and union of all previous at every step. Of course, I didn't print unique unnessesary fields like timestamp, message_id and payload.audit_period_ending. 15:33:43 <jd__> eglynn_: more data points 15:34:18 <eglynn_> jd__: but not any extra value, if start&end always come together? 15:34:35 <eglynn_> jd__: was the idea to allow "bracketing" as stacktach does? 15:34:47 <DinaBelova> eglynn_, problem is not only in start-end 15:34:47 <jd__> eglynn_: they are close but not together 15:34:56 <DinaBelova> but about update hotifications 15:34:59 <eglynn_> jd__: e.g. to bound the time taken to spin up the instance 15:35:07 <DinaBelova> that are almost identical in fact 15:35:09 <jd__> eglynn_: the idea is to have as many data point as we can, not trying to do anything fancy 15:35:23 <gordc> regardless if you specify .end or .start, we'll get the notifications in events. 15:35:24 <jd__> eglynn_: that'd be more a job for the event-API part I'd say 15:35:58 <gordc> events. takes every single notifcation we get... from the info priority. 15:36:10 <eglynn_> DinaBelova: yeah, but we changed the policy for start|end (previously we just triggered compute.instance.create.start and compute.instance.delete.end) 15:36:23 <eglynn_> sorry the opposite! 15:36:27 <DinaBelova> eglynn_, a-ha, I did not know this 15:36:33 <eglynn_> compute.instance.create.end and compute.instance.delete.start 15:36:57 <eglynn_> so the idea was not to bill for time when the instance is being deleted, or before it's fully usable 15:37:01 <jd__> yeah and then people started to think the duration of sampling was the duration of the instance uptime 15:37:11 <jd__> I changed that so it confuses more people and they stop doing that 15:37:17 <DinaBelova> eglynn_, I am basically not agains start/end for creation and deletion, or any other prcess like htis 15:37:26 <jd__> 😈 15:37:39 <DinaBelova> I'm kind of crazy with updates of different tasks from nova 15:37:56 <DinaBelova> and they are almost meaningless... 15:38:09 <eglynn_> ityaptin: so the other question is the tightly bunched compute.instance.update notifications 15:38:12 <DinaBelova> because if creation/deletion was ok, we'll see *.end 15:38:50 <eglynn_> DinaBelova: I dunno if it's possible to distinguish between a single isolated update event, and a tight sequence of them that's part of the same workflow 15:38:50 <ityaptin> I think we don't need update notifications 15:39:05 <gordc> maybe add a filter to ignore samples with same key value within short time?... too hacky? 15:39:10 <ityaptin> It's minor updates which are not important for us 15:39:11 <eglynn_> ityaptin: what about the changes in resource metadata 15:39:30 <eglynn_> gordc: yeah, sounds a bit fragile 15:39:34 <DinaBelova> gordc, kind of it... we have no guarantees for some timeout wo squash all of them 15:39:42 <llu-laptop> gordc: sounds risky 15:40:00 <gordc> sounds so risky it might work? :) j/k 15:40:44 <gordc> it's either that or maybe services are flooding 'info' priority as we do with logging 15:40:59 <llu-laptop> what about one notification hits one agent, and another lands in another agent? 15:41:00 <DinaBelova> gordc, lol 15:41:18 <ityaptin> eglynn: we collect metadata from instance.create.*, instance.terminate.* and smth else. 15:41:19 <llu-laptop> in that case, the timeout doesn't work too 15:41:35 <gordc> llu-laptop: my coordination will fix that... assuming it works.lol 15:41:53 <eglynn_> gordc: this discussion feeds into your bug about completely dropping "existence" samples 15:42:14 <eglynn_> ityaptin: I'm thinking about "isolated" compute.instance.update notifications 15:42:15 <gordc> eglynn_: yeah i was going to bring that up too... 15:42:38 <eglynn_> ityaptin: (i.e. after the instance is spun up, when some attribute is updated subsequently) 15:42:43 <gordc> eglynn_: i assume these volume=1 metrics make no sense in gnocchi? 15:42:48 <llu-laptop> gordc: agreed, that seems the right path 15:42:54 <ityaptin> eglynn: yep, I think about it too. 15:43:07 <gordc> llu-laptop: dropping 'existense' meters? 15:43:19 <llu-laptop> gordc: yes 15:43:25 <eglynn_> gordc: good point, certainly pre-aggregating a sequence of 1s is kinda wasted effort 15:43:34 <eglynn_> jd__: ^^^ amiright? 15:43:37 <gordc> :) cool cool. 15:44:02 * jd__ reads backlog 15:44:05 <gordc> yeah, i think we just need a migration plan... and have ability to alarm in events (not sure that's critical) 15:44:29 <llu-laptop> gordc: but that would cause the user to use a different set API to retrieve those events instead of metering? 15:44:39 <jd__> volume=1 metrics make no sense in gnocchi 15:45:02 <jd__> in Ceilometer it's just used to track existence of a resource 15:45:09 <jd__> that's what the Gnocchi indexer of resources is for 15:45:14 <gordc> llu-laptop: yeah... or do we want the api to convert events into samples? back to same issue of too many hits then. 15:45:17 <jd__> and is much more efficient… 15:46:00 <eglynn_> jd__: so would we need a deleted_at attribute on the gnocchi generic resource to capture the end of its lifespan? 15:46:23 <eglynn_> jd__: ... i.e. how to capture the end of the sequence of 1s for a particular resource 15:46:38 <gordc> eglynn_: or you can look at events. 15:47:04 <jd__> eglynn_: there are started_at, ended_at, if that's not enough we can add more fields 15:47:22 <jd__> or you can look at gordc answer 15:47:24 <jd__> ;) 15:48:18 <gordc> eglynn_: i'll switch my bug to a bp and we can discuss deprecating meters on the side. 15:48:29 <eglynn_> jd__: a-ha, k ... forgot about those resource fields 15:49:03 <eglynn_> jd__: ... do we currently set that in the ceilo dispatcher? 15:49:17 <eglynn_> jd__: ... or is the assumption that we'll special case the *.delete.start notification handling? 15:49:44 <eglynn_> gordc: cool, sounds like we need more discussion/thought on this one 15:49:44 <jd__> I think we update them in the dispatcher 15:49:55 <jd__> but the best thing to do will be to use the events later yeah 15:52:47 <eglynn_> so doing it in the dispatcher would require checking the event_type in the sample? 15:54:13 <eglynn_> i.e. checking if the sample was generated from a resource deletion notification 15:55:02 <eglynn_> sounds a bit unreliable if we don't get those notifications for all resources 15:55:06 <eglynn_> ... or if the deletion events don't follow a predictable naming pattern 15:55:22 <eglynn_> anyhoo, not going to solve it here 15:55:35 <jd__> that's a good reason to have schema :) 15:55:49 <eglynn_> yeah 15:56:07 <eglynn_> #topic "Reminder on dates/location for mid-cycle" 15:56:07 <jd__> changing the routing pattern would also be a good optimization 15:56:14 <jd__> like we could subscribe on only some notifications 15:57:09 <eglynn_> yeah, so currently we see them all, but discard ones that don't match a handlers declared event_types? 15:58:12 <eglynn_> running out of time 15:58:18 <eglynn_> just a reminder to update https://etherpad.openstack.org/p/CeilometerKiloMidCycle if you plan to attend 15:58:35 <eglynn_> 6 names up so far 15:58:40 <eglynn_> we'll make a call next week on whether we've quorum 15:59:11 <eglynn_> #topic "Open discussion" 15:59:30 <eglynn_> a minute left if anyone was anything to raise? 15:59:36 <eglynn_> *has anything 16:00:05 <DinaBelova> nope for me 16:00:11 <eglynn_> k, let's call it a wrap ... thanks for your time! :) 16:00:23 <eglynn_> #endmeeting ceilometer