#openstack-meeting log

15:00:10 <eglynn_> #startmeeting ceilometer
15:00:11 <openstack> Meeting started Thu Nov 27 15:00:10 2014 UTC and is due to finish in 60 minutes.  The chair is eglynn_. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:15 <openstack> The meeting name has been set to 'ceilometer'
15:00:39 <eglynn_> Happy Thanksgiving y'all :)
15:00:51 <eglynn_> who's all here for the ceilo meeting?
15:00:56 <llu-laptop> o/
15:01:04 <ityaptin> o/
15:01:07 <_elena_> o/
15:01:50 <idegtiarov> o/
15:01:57 <vrovachev> o/
15:02:00 <eglynn_> #topic "Kilo-1 blueprints"
15:02:09 <eglynn_> #link https://launchpad.net/ceilometer/+milestone/kilo-1
15:02:39 <sileht> o/
15:02:45 <gordc> o/
15:02:49 <eglynn_> a "blocked" blueprint is one for which the corresponding specs review hasn't yet landed
15:02:50 <Guest48080> o/
15:03:07 <eglynn_> but we expect it to do so, and the implementation work is progressing in parallel
15:03:44 <eglynn_> does anyone have any other BPs other than the 10 listed in https://launchpad.net/ceilometer/+milestone/kilo-1 that they hope to get for k1?
15:04:20 <eglynn_> gordc: e.g. are you thinking about k1 for the notification agent co-ordination stuff?
15:04:49 <gordc> eglynn_: working on it now... i think i might need sileht's listener-pool patch but i'm not exactly sure yet
15:05:01 <gordc> figuring stuff as i go along...
15:05:06 <eglynn_> gordc: cool enough, thanks!
15:05:20 <eglynn_> kilo-1 is Dec 18 BTW
15:05:43 <gordc> i'm going to say maybe
15:06:24 <eglynn_> gordc: coolness, we can add it to kilo-2 say for now and then bump forward to kilo-1 if all goes well?
15:07:29 <gordc> sure.
15:07:35 <eglynn_> cool
15:07:39 <gordc> sorry. in between meetings. :|
15:07:46 <eglynn_> np!
15:07:48 <eglynn_> anything else on kilo-1?
15:08:08 <gordc> nothing for me. i have elasticsearch patch but i don't know how to test it currently.
15:08:44 <eglynn_> gordc: elastisearch also possibly in play for k1, or more likely k2/3?
15:09:13 <gordc> probably same boat as notification... maybe k-1 probably k-2
15:09:19 <eglynn_> cool
15:09:29 <eglynn_> #topic first stable/juno release
15:09:35 <eglynn_> #link https://wiki.openstack.org/wiki/StableBranchRelease#Planned_stable.2Fjuno_releases
15:09:50 <eglynn_> this was planned for today, but pushed back to Dec 4th
15:09:52 <DinaBelova> o/ sorry for the delay, folks :)
15:10:05 <eglynn_> I've started doing backport for likely looking bugs
15:10:10 <eglynn_> *backports
15:10:48 <eglynn_> but if you've a particualr bugfix you've landed for k1 that you think is a good backport candidate
15:11:04 <eglynn_> please tag with "juno-backport-potential"
15:11:12 <DinaBelova> eglynn_, ok, thanks
15:11:40 <eglynn_> #topic "TSDaaS/gnocchi status"
15:11:57 <eglynn_> jd__: anything you want to bring to the table on gnocchi?
15:12:16 <eglynn_> #link https://review.openstack.org/#/q/status:open+project:stackforge/gnocchi,n,z
15:12:31 <eglynn_> progress on refining the RBAC policies
15:12:52 <eglynn_> also a nice patch exposing the back_window in the API
15:13:54 <DinaBelova> eglynn_, after I'll fix the merge agents chain, I'll start working og opentsdb driver
15:14:20 <DinaBelova> and new stackforge repo with co-processor deletion thing for it :)
15:14:28 <DinaBelova> I hope to start tomorrow
15:14:31 <eglynn_> DinaBelova: cool :)
15:14:41 <eglynn_> DinaBelova: BTW I heard yahoo were evaluating opentsdb for their own internal metrics storage (... just a datapoint showing wide adoption)
15:14:59 <DinaBelova> hehe, nice :)
15:15:07 <eglynn_> DinaBelova: co-processor deletion == retention logic ?
15:15:13 <DinaBelova> eglynn_, yes
15:15:16 <eglynn_> cool
15:15:19 <DinaBelova> as said, it'll be some java code
15:15:23 <DinaBelova> but in the separated repo
15:15:27 <DinaBelova> not to mess everyone
15:16:02 <_nadya_> DinaBelova: what about testing? any ideas :)?
15:16:09 <DinaBelova> _nadya_, well :)
15:16:30 <eglynn_> DinaBelova: could this be something that might be contrib'd back to the OpenTSDB folks?
15:16:38 <sileht> what about landing the dispatcher :p ?
15:16:39 <eglynn_> DinaBelova: ... as opposed to maintaining a separate repo
15:16:44 <DinaBelova> _nadya_, currently I wonder how to make integraiton tetsts possible
15:16:53 <DinaBelova> if it's possible at all for out OpenStack gate
15:16:58 <jd__> o/
15:17:09 <jd__> yeah I think you have a good summary eglynn_
15:17:12 <jd__> thanks
15:17:17 <eglynn_> jd__: np :)
15:17:20 <DinaBelova> eglynn_, probably that'll be the solution :)
15:17:21 <_nadya_> DinaBelova: TBH, I think it's impossible...We need at least HBase in devstack
15:17:26 <eglynn_> sileht: good point, I was blocked on the glance not setting the user_id on samples
15:17:35 <DinaBelova> let's see if that'll be possible as a part of their retention param impl
15:17:43 <eglynn_> sileht: I need to give that dispatcher patch another run-through
15:17:44 <sileht> eglynn_, I have removed that check
15:17:46 <DinaBelova> _nadya_, yeah, I suppose so
15:17:54 <DinaBelova> although
15:17:59 <eglynn_> sileht: coolness, I'd imagine good to fly in that case
15:18:06 <eglynn_> sileht: will look again after this meeting
15:18:17 <DinaBelova> _nadya_, I still hope to see Hbase devstack change being merged
15:18:44 <sileht> eglynn_, and then we can fix thegnocchi  DB schema when we land the first glance resource into gnocchi
15:19:05 <eglynn_> sileht: cool, that sounds reasonable
15:19:32 <jd__> since we have created_by_* now I'm less reluctant to allow null project_id/user_id
15:19:55 <eglynn_> jd__: yeah, good point
15:20:58 <llu-laptop> jd__: but we do have existing metrics with null project_id/user_id, how to handle them?
15:21:16 <jd__> I think that replies to your question llu-laptop, no?
15:22:12 <llu-laptop> we have snmp/ipmi and alos metrics from SDN like opendaylight & opencontrail
15:22:33 <sileht> jd__, Should I accept null project_id too in the dispatcher ?
15:22:48 <ityaptin> DinaBelova: Also co-processor may be useful approach for Ceilometer HBase storage time-to-live.
15:22:50 <jd__> sileht: if we change the schema accordingly yeah
15:22:58 <DinaBelova> ityaptin, yes, fpr sure
15:23:43 <eglynn_> so the idea would be that the created_by_* gives a fallback on which to evaluate RBAC rules if the primary project *or* user ID is null, amiright?
15:25:00 <eglynn_> jd__: I'm resurrecting the Influx driver, was planning to start with a ceilo-specs proposal
15:25:07 <eglynn_> jd__: ... which brings up the question of whether the specs "process" applies in gnocchi-land?
15:25:31 <jd__> eglynn_: maybe, but for now I'd say, don't lose time with that?
15:25:33 <DinaBelova> eglynn_, I've got the same question day ago from nellysmitt
15:25:34 <DinaBelova> :)
15:25:45 <llu-laptop> sorry, maybe i'm out of context here. I think we are talking about the dispatcher to handle null project/uer_id, like https://review.openstack.org/#/c/98798/66/gnocchi/ceilometer/dispatcher.py#Ln94
15:25:50 <jd__> like, what interesting could we discuss on an Influx spec?
15:25:53 <DinaBelova> she was wondering where to start her relationship with gnocchi
15:26:04 <DinaBelova> bugs/blueprints
15:26:09 <DinaBelova> some one place to work in
15:26:29 <DinaBelova> so in future probably it'll be useful
15:26:50 <eglynn_> jd__: fair point :) ... I was thinking of using a spec to get all my ideas straight on mapping gnocchi semantics to influx features
15:26:51 <jd__> I've always have a todo list
15:27:06 <jd__> DinaBelova: just send her on IRC and talk to me? :O
15:27:08 <jd__> :)
15:27:19 <DinaBelova> jd__, it was late :)
15:27:28 <DinaBelova> but anyway, I said her it's  A GOOD IDEA :)
15:27:30 <jd__> eglynn_: well if you think you need to write something, yeah why not, I guess it depends on how you work and your degree of confidence
15:27:35 <DinaBelova> sorry for caps
15:27:55 <jd__> eglynn_: I just don't thinks it's a requirement for now… I prefer to see people spending time writing code rather than specs :)
15:28:05 <jd__> DinaBelova: ack :)
15:28:06 <eglynn_> jd__: yeah, understood
15:28:46 <eglynn_> llu-laptop: yes, sileht said he has, or is going to, relax that requirement to have (s['project_id'] and s['user_id'])
15:29:54 <eglynn_> llu-laptop: see https://review.openstack.org/#/c/98798/66..67/gnocchi/ceilometer/dispatcher.py
15:30:12 <eglynn_> llu-laptop: now only the project_id must be set
15:31:01 <eglynn_> move on?
15:31:26 <eglynn_> #topic "Too many identical nova notifications"
15:31:31 <eglynn_> #link https://bugs.launchpad.net/ceilometer/+bug/1396257
15:31:33 <uvirtbot> Launchpad bug 1396257 in ceilometer "Redundant 'instance' samples from nova notifications" [Medium,Confirmed]
15:31:49 <ityaptin> Hi!
15:31:54 <ityaptin> At last 2 weeks many people complained about the fact that the "instance" statistics shows unexpected patterns.
15:32:00 <ityaptin> Now we collect all nova notification with event_type "compute.instance.*" as "instance" sample. This behavior generate too unexpected samples which terrifyingly affect 'instance' statistics.
15:32:01 <eglynn_> so is the key point that we're now triggering of *both* the start & end events?
15:32:11 <ityaptin> During next workflow: create instance, suspend, add floating ip, terminate instance, nova send 28 notifications.
15:32:15 <ityaptin> We collect all!
15:32:27 <ityaptin> There is 16 compute.instance.update messages with minor tasks which are not important for metering, how i think.
15:32:51 <ityaptin> I prepared paste with some data from these messages
15:32:52 <eglynn_> jd__: do you remember the motivation for including both start and end events here? https://review.openstack.org/#/c/38485/7/ceilometer/compute/notifications.py
15:33:04 <ityaptin> http://paste.openstack.org/show/139591/
15:33:10 <ityaptin> I transformed message dictionary to set of tuples (key, value) and printed difference between current message and union of all previous at every step. Of course, I didn't print unique unnessesary fields like timestamp, message_id and payload.audit_period_ending.
15:33:43 <jd__> eglynn_: more data points
15:34:18 <eglynn_> jd__: but not any extra value, if start&end always come together?
15:34:35 <eglynn_> jd__: was the idea to allow "bracketing" as stacktach does?
15:34:47 <DinaBelova> eglynn_, problem is not only in start-end
15:34:47 <jd__> eglynn_: they are close but not together
15:34:56 <DinaBelova> but about update hotifications
15:34:59 <eglynn_> jd__: e.g. to bound the time taken to spin up the instance
15:35:07 <DinaBelova> that are almost identical in fact
15:35:09 <jd__> eglynn_: the idea is to have as many data point as we can, not trying to do anything fancy
15:35:23 <gordc> regardless if you specify .end or .start, we'll get the notifications in events.
15:35:24 <jd__> eglynn_: that'd be more a job for the event-API part I'd say
15:35:58 <gordc> events. takes every single notifcation we get... from the info priority.
15:36:10 <eglynn_> DinaBelova: yeah, but we changed the policy for start|end (previously we just triggered compute.instance.create.start and compute.instance.delete.end)
15:36:23 <eglynn_> sorry the opposite!
15:36:27 <DinaBelova> eglynn_, a-ha, I did not know this
15:36:33 <eglynn_> compute.instance.create.end and compute.instance.delete.start
15:36:57 <eglynn_> so the idea was not to bill for time when the instance is being deleted, or before it's fully usable
15:37:01 <jd__> yeah and then people started to think the duration of sampling was the duration of the instance uptime
15:37:11 <jd__> I changed that so it confuses more people and they stop doing that
15:37:17 <DinaBelova> eglynn_, I am basically not agains start/end for creation and deletion, or any other prcess like htis
15:37:26 <jd__> 😈
15:37:39 <DinaBelova> I'm kind of crazy with updates of different tasks from nova
15:37:56 <DinaBelova> and they are almost meaningless...
15:38:09 <eglynn_> ityaptin: so the other question is the tightly bunched compute.instance.update notifications
15:38:12 <DinaBelova> because if creation/deletion was ok, we'll see *.end
15:38:50 <eglynn_> DinaBelova: I dunno if it's possible to distinguish between a single isolated update event, and a tight sequence of them that's part of the same workflow
15:38:50 <ityaptin> I think we don't need update notifications
15:39:05 <gordc> maybe add a filter to ignore samples with same key value within short time?... too hacky?
15:39:10 <ityaptin> It's minor updates which are not important for us
15:39:11 <eglynn_> ityaptin: what about the changes in resource metadata
15:39:30 <eglynn_> gordc: yeah, sounds a bit fragile
15:39:34 <DinaBelova> gordc, kind of it... we have no guarantees for some timeout wo squash all of them
15:39:42 <llu-laptop> gordc: sounds risky
15:40:00 <gordc> sounds so risky it might work? :) j/k
15:40:44 <gordc> it's either that or maybe services are flooding 'info' priority as we do with logging
15:40:59 <llu-laptop> what about one notification hits one agent, and another lands in another agent?
15:41:00 <DinaBelova> gordc, lol
15:41:18 <ityaptin> eglynn: we collect metadata from instance.create.*, instance.terminate.* and smth else.
15:41:19 <llu-laptop> in that case, the timeout doesn't work too
15:41:35 <gordc> llu-laptop: my coordination will fix that... assuming it works.lol
15:41:53 <eglynn_> gordc: this discussion feeds into your bug about completely dropping "existence" samples
15:42:14 <eglynn_> ityaptin: I'm thinking about "isolated" compute.instance.update notifications
15:42:15 <gordc> eglynn_: yeah i was going to bring that up too...
15:42:38 <eglynn_> ityaptin: (i.e. after the instance is spun up, when some attribute is updated subsequently)
15:42:43 <gordc> eglynn_: i assume these volume=1 metrics make no sense in gnocchi?
15:42:48 <llu-laptop> gordc: agreed, that seems the right path
15:42:54 <ityaptin> eglynn: yep, I think about it too.
15:43:07 <gordc> llu-laptop: dropping 'existense' meters?
15:43:19 <llu-laptop> gordc: yes
15:43:25 <eglynn_> gordc: good point, certainly pre-aggregating a sequence of 1s is kinda wasted effort
15:43:34 <eglynn_> jd__: ^^^ amiright?
15:43:37 <gordc> :) cool cool.
15:44:02 * jd__ reads backlog
15:44:05 <gordc> yeah, i think we just need a migration plan... and have ability to alarm in events (not sure that's critical)
15:44:29 <llu-laptop> gordc: but that would cause the user to use a different set API to retrieve those events instead of metering?
15:44:39 <jd__> volume=1 metrics make no sense in gnocchi
15:45:02 <jd__> in Ceilometer it's just used to track existence of a resource
15:45:09 <jd__> that's what the Gnocchi indexer of resources is for
15:45:14 <gordc> llu-laptop: yeah... or do we want the api to convert events into samples? back to same issue of too many hits then.
15:45:17 <jd__> and is much more efficient…
15:46:00 <eglynn_> jd__: so would we need a deleted_at attribute on the gnocchi generic resource to capture the end of its lifespan?
15:46:23 <eglynn_> jd__: ... i.e. how to capture the end of the sequence of 1s for a particular resource
15:46:38 <gordc> eglynn_: or you can look at events.
15:47:04 <jd__> eglynn_: there are started_at, ended_at, if that's not enough we can add more fields
15:47:22 <jd__> or you can look at gordc answer
15:47:24 <jd__> ;)
15:48:18 <gordc> eglynn_: i'll switch my bug to a bp and we can discuss deprecating meters on the side.
15:48:29 <eglynn_> jd__: a-ha, k ... forgot about those resource fields
15:49:03 <eglynn_> jd__: ... do we currently set that in the ceilo dispatcher?
15:49:17 <eglynn_> jd__: ... or is the assumption that we'll special case the *.delete.start notification handling?
15:49:44 <eglynn_> gordc: cool, sounds like we need more discussion/thought on this one
15:49:44 <jd__> I think we update them in the dispatcher
15:49:55 <jd__> but the best thing to do will be to use the events later yeah
15:52:47 <eglynn_> so doing it in the dispatcher would require checking the event_type in the sample?
15:54:13 <eglynn_> i.e. checking if the sample was generated from a resource deletion notification
15:55:02 <eglynn_> sounds a bit unreliable if we don't get those notifications for all resources
15:55:06 <eglynn_> ... or if the deletion events don't follow a predictable naming pattern
15:55:22 <eglynn_> anyhoo, not going to solve it here
15:55:35 <jd__> that's a good reason to have schema :)
15:55:49 <eglynn_> yeah
15:56:07 <eglynn_> #topic "Reminder on dates/location for mid-cycle"
15:56:07 <jd__> changing the routing pattern would also be a good optimization
15:56:14 <jd__> like we could subscribe on only some notifications
15:57:09 <eglynn_> yeah, so currently we see them all, but discard ones that don't match a handlers declared event_types?
15:58:12 <eglynn_> running out of time
15:58:18 <eglynn_> just a reminder to update https://etherpad.openstack.org/p/CeilometerKiloMidCycle if you plan to attend
15:58:35 <eglynn_> 6 names up so far
15:58:40 <eglynn_> we'll make a call next week on whether we've quorum
15:59:11 <eglynn_> #topic "Open discussion"
15:59:30 <eglynn_> a minute left if anyone was anything to raise?
15:59:36 <eglynn_> *has anything
16:00:05 <DinaBelova> nope for me
16:00:11 <eglynn_> k, let's call it a wrap ... thanks for your time! :)
16:00:23 <eglynn_> #endmeeting ceilometer