16:00:22 #startmeeting 16:00:22 Meeting started Thu May 10 16:00:22 2012 UTC. The chair is dachary. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:22 #chair nijaba dachary 16:00:22 #meetingname ceilometer 16:00:22 #link https://lists.launchpad.net/openstack/msg11523.html 16:00:22 #topic actions from previous meetings 16:00:22 #link http://eavesdrop.openstack.org/meetings/openstack-meeting/2012/openstack-meeting.2012-05-03-16.00.html 16:00:22 #info dachary removed obsolete comment about floating IP http://wiki.openstack.org/EfficientMetering?action=diff&rev2=70&rev1=69 16:00:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:23 #info dachary o6 : note that the resource_id is the container id. http://wiki.openstack.org/EfficientMetering?action=diff&rev2=71&rev1=70 16:00:24 Current chairs: dachary nijaba 16:00:25 The meeting name has been set to 'ceilometer' 16:00:34 jaypipes: hi 16:00:42 jd___: actions ? 16:01:21 nijaba: actions ? 16:01:25 #info The discussion about adding the source notion to the schema took place on the mailing list https://lists.launchpad.net/openstack/msg11217.html 16:01:25 #info The conclusion was to add a source field to the event record, but no additional record type to list existing sources. 16:02:15 nijaba: could you explain that a bit more please? 16:02:21 nijaba: what are existing sources? 16:03:00 jaypipes: sources could be different installation of openstack, or metering of other projects not sharing their creds with keystone 16:03:08 #info jd___ add Swift counters, add resource ID info in counter definition, describe the table http://wiki.openstack.org/EfficientMetering?action=diff&rev2=57&rev1=54 16:04:03 nijaba: k, got it. so basically, a source field that is NULLable. 16:04:22 jaypipes: or set to a default, as the implementor prefers 16:04:28 gotcha 16:04:37 #topic meeting organisation 16:04:37 #info This is 2/5 meetings to decide the details of the architecture of the Metering project https://launchpad.net/ceilometer 16:04:37 #info Today's focus is on the definition of external REST API 16:04:37 #info There has not been enough discussions on the list to cover all aspects and the focus of this meeting was modified to cope with it. 16:04:37 #info The meeting is time boxed and there will not be enough time to introduce inovative ideas and research for solutions. 16:04:38 #info The debate will be about the pro and cons of the options already discussed on the mailing list. 16:04:38 #link https://lists.launchpad.net/openstack/msg11368.html 16:04:54 comments anyone ? 16:05:06 dachary: on which topic? ;) 16:05:12 organization ;-) 16:05:23 * nijaba +1 the org 16:05:23 ok 16:05:26 :-D 16:05:28 #topic API defaults and API extensions 16:05:33 My only comment is that I believe Ceilometer shouldn't invent its own API extensions mechanism... it should use the system in Nova. 16:05:40 +1 jaypipes 16:05:44 jaypipes: +1 16:05:48 +1 jaypipes 16:05:56 * nijaba had no idea this was going on, so +1 16:05:59 it has its rough edges, but it gets you 90% of the way there. 16:06:17 I propose we table "extensions" for now and concentrate on the core API pending further discussion of extensions on the list. 16:06:28 +1 jaypipes 16:06:37 dachary: also, it might just be my misunderstanding, but I want to make sure that API extensions and plugins are clearly delineated. 16:07:00 dachary: the description in the mailing list thread of API extensions seems to bleed a bit into plugin land. :) 16:07:24 well, I kind of assume we only need plugins for the purpose of implementing API extensions 16:07:37 that's my understanding as well 16:07:42 Essentially, things like backend stores and such should not be API extensions, but rather plugins that use an adapter/driver model to have a pluggable implementation, using that same external API 16:07:43 which may not be true but I was only thinking about the API at the time 16:07:52 the other type of "plugins" being agents 16:08:02 we can also use plugins to add event monitors and polling to the agents running on the compute nodes 16:08:31 ok, just wanted to make sure things like /extensions/MongoDbBackend/ etc weren't being considered... 16:08:33 dhellmann: polling? the whol model we are discussing is push... 16:08:45 As far as the API is concerned, my suggestion was that each API extention is implemented as a plugin with a predefined interface. 16:08:48 well, we're going to have to poll libvirt, right? 16:08:50 some effort has been done in https://github.com/cloudbuilders/openstack-munin 16:09:08 dhellmann: an agent should poll libvirt and push 16:09:20 #agreed Ceilometer shouldn't invent its own API extensions mechanism... it should use the system in Nova. 16:09:33 dachary: k 16:09:39 nijaba, exactly. There may be other things that we want to/need to poll, though. 16:09:48 dhellmann: right 16:09:50 would nova-instancemonitor be useful ? 16:10:05 woorea: and recently improved by https://github.com/sileht/openstack-munin 16:10:16 not sure who brought it up on the ML, but I also agreed with the statement that ceilometer should try as much as possible to disaggregate the concept of collection from the concept of aggregation or reporting. 16:10:16 sprintnode: yes, but let's save this for the agent discussion 16:10:18 #link https://github.com/cloudbuilders/openstack-munin 16:10:23 #link https://github.com/sileht/openstack-munin 16:10:36 exactly 16:10:43 ok 16:10:54 nijaba, so if we define a plugin API for all of the things that poll and another for things that care about notification events then it is easy to add new counters 16:10:59 jaypipes: there seems to be a consensus on that (aggregation != collection) 16:11:02 but those mumin plugins are using sql db directly 16:11:14 nova db 16:11:19 dachary +1 16:11:29 dhellmann: here we are talking about the external rest API, not the internal agent API that will be discussed on the 24th 16:11:56 nijaba, sure, I'm speaking more generally about plugins: Use the nova system and use it everywhere. 16:12:10 dhellmann: ah, makes sense then :) 16:12:17 #action dachary add info to the wiki on the topic of poll versus push 16:12:43 so, should we discuss the core API? 16:12:54 let's move on to the next topic 16:13:08 #topic API defaults 16:13:14 #info GET list components 16:13:14 #info GET list components meters (argument : name of the component) 16:13:14 #info GET list [user_id|project_id|source] 16:13:14 #info GET list of meter_type 16:13:14 #info GET list of events per [user_id|project_id|source] ( allow to specify user_id or project_id 16:13:15 or both ) 16:13:15 #info GET sum of (meter_volume, meter_duration) for meter_type and [user_id|project_id|source] 16:13:16 #info other ? 16:13:22 this is the current list in the wiki 16:13:41 would "GET list of events" allow for filtering by event type? 16:13:46 I'm under the impression that there is thin line between the "core API" and the "extensions" 16:13:53 dachary: there was a proposal to allow queries for user_id && project_id 16:14:02 for anyone of thecounters 16:14:10 for example, I may want to charge a user a flat rate to create an instance and then a separate rate for keeping it alive for a period of time. So I need to know about creation events and aggregated runtime 16:14:20 #info GET list of events per user_id && project_id 16:14:21 for me : sum meter_volume and meter_duration is aggregation, not collector 16:14:46 Doesn't quite look like a RESTful API that is similar to the other OpenStack APIs... 16:15:05 jaypipes: what would you suggest? 16:15:47 nijaba: perhaps it is just me not understanding :) I was thinking of an API like GET /components, GET /components/, GET /components//events, etc 16:15:48 #link http://wiki.openstack.org/OpenStackRESTAPI 16:15:54 what are "components"? 16:16:05 dhellmann: swift, nova etc. 16:16:09 dhellmann: I assume a component was "nova-compute" or "nova-network", etc 16:16:15 dachary, is that the "source" field? 16:16:24 dhellmann: no 16:16:30 for me source is the host 16:16:48 #link http://wiki.openstack.org/EfficientMetering#Meters 16:16:54 dhellmann: source should be unique per AUTH system, not per component 16:17:00 it's the Component column of the above link dhellmann 16:17:08 woorea: nor the host 16:17:14 aha, I didn't realize that was a key piece of information 16:17:19 why would a client want that list? 16:17:46 dachary: may I suggest renaming "meter" to "metric"? 16:18:09 dhellmann: it was a suggestion from doug at hp yesterday on the ml 16:18:34 jaypipes: the proposal is poorly formated because we focus on the semantic. However, I fully agree that it should be a PATH ( or arguments I don't mind ) from which the parameters to the query are parsed. 16:19:03 dachary: gotcha. no probs. 16:19:44 jaypipes: we renamed counter into meter during the last meeting ;-) I'm ok with metric too (no strong feelings on names) but I'm not sure it will be readable. 16:19:45 Would a network xmit counter be the network traffic sent over the most recent hour, sent since the VM was booted, or sent since the VM host was booted? 16:20:09 * dachary not being a native english speaker does not help ;-) 16:20:15 dachary: :) no worries 16:20:18 Weighed: xmit is a delta 16:20:27 * jaypipes would prefer counter or metric to meter, but not a big deal 16:20:34 Weighed: generaly we store only deltas at this moment 16:20:39 Weighed: whatever the duration specifies, but should be a delta from the last measure 16:21:39 +1 16:21:44 on? 16:21:46 So the client cannot select a duration? 16:21:57 nijaba: on a delta from the last measure ;-) 16:22:05 Weighed: Client can 16:22:06 Weighed: yes it can, in the sum API 16:22:17 time interval should be part of the query and drive the results 16:22:21 But it will return delta sum for given period 16:22:39 having a delta assume you have the old value if you're polling from absolute counter, which may not be the case on agent restart 16:22:40 GET sum of (counter_volume, counter_duration) for counter_type and account_id 16:22:40 optional start and end for counter_datetime 16:22:41 should the query for a list of events allow filtering by type? 16:22:53 Weighed: the client must be able to select a duration. Actually I think (start + end) should be a common parameter to all queries. 16:22:54 this is what is specified in the wiki 16:23:03 or is that implied in that you ask each meter for the list? 16:23:26 same applies to list 16:23:32 so I agree with dachary 16:23:45 http://wiki.openstack.org/EfficientMetering#API 16:24:02 how are "get list of meter_type" and "list components meters" different? 16:24:35 dhellmann: list component meter will restrict the query to a cmpnent 16:24:36 I agree but we need also to decide if the end pointer is a end of current window or the begining of the next window (less, less or equal) 16:24:39 dhellmann: I think the query for a list of events should allow filtering by type 16:25:10 ss7pro: I tend to link [start,end[ 16:25:23 s/link/like/ 16:25:27 nijaba, I'm still trying to understand how the component part of the API is useful. I'll have to find that email thread. 16:25:28 * nijaba too 16:25:44 ok so end is a closing value 16:25:57 dachary, is that start <= timestamp < end? 16:26:07 yes 16:26:11 agreed 16:26:45 #agreed all meters have a [start,end[ ( start <= timestamp < end ) that limits the returned result to the events that fall in this period 16:26:53 dam 16:26:57 dhellmann: https://lists.launchpad.net/openstack/msg11504.html 16:26:58 #agreed all queries have a [start,end[ ( start <= timestamp < end ) that limits the returned result to the events that fall in this period 16:27:29 we need event_type (eg : start or end) + timestamp 16:27:39 nor start + end + event_type 16:27:51 There is one query that everyone agrees on, I think : GET /events that returns raw events. 16:28:04 What are raw events ? 16:28:11 nijaba, thanks 16:28:12 list of deltas ? 16:28:15 ss7pro: what is stored in the DB 16:28:26 ss7pro, yes, the discrete values recorded in the database 16:28:27 ss7pro: sorry for being imprecise 16:28:27 ss7pro: with no aggreagation 16:28:31 raw is unprocessed info, just collected 16:28:44 no business rules applied 16:28:45 There is one query that everyone agrees on, I think : GET /events that returns all fields for each event ( as described in http://wiki.openstack.org/EfficientMetering#Storage ) 16:29:09 what about components ? 16:29:16 raw events are determined by the service you query. nova= vm state changes, network usage, block storage create/delete ... 16:29:27 the list of components is just a list of strings for the names, right? 16:29:31 how do we like events to components ? 16:29:57 ss7pro, the meter type defines the component 16:29:59 events need a serviceTypeId associated with them 16:30:01 DanD_: we're talking about the events stored in the ceilometer storage, not the events sent by the nova component (for instance) 16:30:31 components generate events that are collected by "counters" (raw data) and the processed by business process 16:30:37 I know, but you still need to have the meta data to determine what to return on a query 16:30:43 s/the/then 16:30:49 dhellmann: But what part of the code will decide which counter belongs to which component ? 16:30:58 eg: external network traffic ? 16:31:00 ss7pro, the code that defines the counter 16:31:10 dhellmann +1 16:31:10 ss7pro: that's GET list of meter_type : return the list of all meters available . It describes the available meters as shown in http://wiki.openstack.org/EfficientMetering#Meters 16:31:10 the thing that actually collects the data 16:31:50 dhellmann: so it will require collector to be able to query openstack api 16:32:02 I would actually prefer to leave components out of the API entirely. Focusing just on the meters would let other systems inject data for aggregation without worrying about where it comes from. 16:32:20 ss7pro, I don't understand that conclusion 16:32:28 dhellmann: the "component" part of http://wiki.openstack.org/EfficientMetering#Meters is merely a hint 16:32:30 a collector can query openstack api, libvirt, logs, whatever 16:32:56 dhellmann: I think covering copomnent will have little effect as long as it is an option in the query, not a req 16:32:59 So how to guess what is the component for traffic to/from single IP address ? 16:33:03 dachary, it was until we added to the API. If the API has to be able to provide a list of components and has to know which meters are part of which component, then we have to store that information somewhere the API can find it. 16:33:19 woorea: yes. And then it passes along the information to the storage that stores it as described in http://wiki.openstack.org/EfficientMetering#Storage 16:33:32 ss7pro, a human would look at the documentation 16:33:55 metrics need to have a set of meta data associated with them so you can determine how to apply billing farther downstream. the component or service type along with other things like location, type, ... all contribute to the charges you will apply to the metric 16:34:04 but how collector can do this without API query ? 16:34:13 i sent an arch diagram to the list yesterday 16:34:18 dhellmann: I agree. The information in http://wiki.openstack.org/EfficientMetering#Meters must be stored somewhere. I'm not sure where. Database ? Configuration file ? Configuration file specific to an API extension ? 16:34:29 where you can see the scopes of every component 16:34:32 woorea: I missed it could you link the mail ? 16:34:34 DanD_ why does it matter that "quantum" collected billing data for me to calculate the bill? The meter type should be enough, right? "Network traffic in/out" 16:34:40 all the metrics needs to be associated with a resource 16:35:12 Divakar: But how to gues resource without nova API query ? 16:35:38 If we take example of counting traffic for single ip address ? 16:35:42 DanD_: in my mind the http://wiki.openstack.org/EfficientMetering#Meters are metadata common to all rows found inhttp://wiki.openstack.org/EfficientMetering#Storage and that can be looked up using the meter_type 16:35:42 dachary, maybe the code that defines the meters should provide a plugin for the API service to add a component name? we can work that out on the list, though. 16:35:45 one needs to have a list of resources 16:36:20 we charge differently depending on some of the characteristics of the service that we are metering. i.e. what data center it is in, ... 16:36:25 it can be separate api which provides the inventory data 16:36:27 Divakar: yes, that's the resource_id field of each record from http://wiki.openstack.org/EfficientMetering#Storage 16:36:44 dachary: I think it is always a good idea to always store the dictionnary with the data anyway 16:37:14 DanD_, that's a reasonable point. Somewhere in the spec there is a notion of "extra" data associated with each metering event, but that is not exposed in the aggregation API 16:37:14 here the diagram: http://markmail.org/search/?q=openstack%20ceilometer%20woorea#query:openstack%20ceilometer%20woorea+page:1+mid:lshk27cd7miz64tl+state:results 16:37:17 dhellmann: ok 16:37:39 ah, right, dachary, the resource_id can lead to that other information 16:38:07 so we should be able to aggregate by resource_id 16:38:20 dhellmann: good point 16:38:36 woorea: thanks 16:38:41 although if a resource does not exist at the point of billing (because the instance was destroyed, for example) that might not be enough 16:38:43 dhellman, if you don't allow the API queries to filter based on the criteria you use for billing, how do you seperate the data after the fact? 16:39:30 DanD_, also a good point. I was expecting to pull the raw data out and "translate" it to the type of data we need in our existing billing system. 16:40:30 DanD_, is a component for you just the name "compute" or is a specific instance of a compute node? 16:40:35 If a VM is disabled, would its CPU use be 0 or be NaN? 0 is OK for billing needs, but for diagnostics it is good to know the difference between down and 0 16:41:00 dhellmann: true. However, the billing is expected to extract meta data information independently. Otherwise we will end up replicating the full logs / archive all events from all components and providing a database of all historical events that ever happen in openstack. I believe that was agreed on during the last meeting. 16:41:09 depends on what you define as aggregation I guess. If you plan to just pull relatively raw data out of the API, then that works. but if you are looking to get something like, how much large vm usage did account x consume in data center 1 then its harder 16:41:21 dachary, because we are dealing with ephemeral objects, we might have to collect that data 16:41:46 DanD_, we want to be able to report for our customers how much they spent on each VM, not just how much on a type of VM 16:41:52 so we need both 16:41:57 Weighed: metering != monitoring: I would not do diagnostic with it 16:42:00 yes, I agree 16:42:07 * nijaba agrees too 16:42:43 we then need to expose ressource_id to the query... 16:42:52 since the metrics is going to be provided as samples if the vm is down for a particular period of time, there will be no sample isnt it? 16:42:53 and we do 16:42:54 DanD_, location of the resource is not something we've discussed collecting but I think we need to add that 16:43:47 we differentiate on region, data center and availability zone as well as the characteristics of the VM for compute 16:43:59 dhellmann: yes we do, it is the resource_id isn't it? 16:44:00 we might want to do the same 16:44:14 nijaba, I thought the resource ID was the UUID of the actual object (the instance, for example) 16:44:29 dhellmann: we need a pointer to the resource, that is unique. That's what resource_id provide. Matching this unique id to the actual resource is outside of the scope of the metering project. If we try to fit that in, we will never complete the project I'm afraid ;-) 16:44:32 the billing system can only query for the other information if that object still exists, which it may not 16:44:33 dhellmann: ok, so location as in zone... got it... 16:44:42 dhellmann: yes, resource ID was the UUID of the actual object (the instance, for example) 16:45:32 dachary, well, I'm afraid I'm with DanD_ on this one 16:45:40 pull(rest api) or push(driver) are the options for a billing system to integrate with ceilometer 16:45:49 dachary: but I would think that the zone (which is a subset of datacenter) is indeed needed 16:46:07 woorea: billing will pull the data 16:46:09 users can choose the way they want to work with ceilometer 16:46:43 ss7pro: we should offer the two options 16:46:46 dachary: one should be able to corelate the metering data with the resource in use for which a unique identifier of the resource isnt it a must to have? 16:46:47 that calls for a different storage and schema. 16:46:47 woorea: not in what we are proposing atm, but you are welcome to propose 16:46:54 if you use an external billing provider, then a pull model is not viable, not both options 16:47:03 in order to be able to audit the billing information, the user is going to want to know the names and unique ids of the things causing the charges. We need to record that at the time the charge is incurred. Each meter type will need to define what that data is 16:47:15 woorea: but that should be done via the ml and discussed in a separate meeting 16:47:18 nijaba: supose that ceilometer is not visible from outside 16:47:23 +1 dhellmann 16:47:35 nijaba: ok 16:47:35 DanD_, we're building a bridge to pull data out of ceilometer and push it into our existing billing system. 16:47:38 probably just a cron job 16:48:02 I think we must acknowledge that one hour won't be enough to resolve this. We will need to keep discussing this on the list and resolve the points that were raised. 16:48:03 you need a translation layer between those two pieces anyway because they are likely to have different views of the data 16:48:16 dachary: +1 16:48:23 dachary, +1 16:48:25 that's basically what we do as well. The benefit of exposing a push model would be that it would provide some leverage to get billing providers to conform 16:48:59 dhellmann: would you take the action to reformulate the API proposal as a start point for the dicussion on the ML? 16:49:06 sure 16:49:20 I will post a summary of this discussion to the list so that we can start independant threads to address each issue. Do you agree on this ? 16:49:23 dhellmann: thanks :) 16:49:28 yes 16:49:29 ok :-) 16:49:31 +1 16:49:34 +1 16:49:35 +1 16:49:38 dhellmann: the action is on you, thanks ;-) 16:49:58 #action dhellmann reformulate the API proposal as a start point for the dicussion on the ML. 16:50:17 #action dhellmann: reformulate the API proposal as a start point for the dicussion on the ML 16:50:20 dachary: I think we need to push other topics by one week as a consequence.... 16:50:30 That will give me time to think about the need to store meta data information and revisit the storage if it needs to be. 16:50:40 nijaba: +1 16:50:51 dachary, +1 16:50:58 #action dachary push next meetings one week 16:50:58 dachary: metadata is needed 16:51:26 counting network traffic is a typical example 16:51:26 ss7pro: I see why it is. I can't figure out how it will actually work. 16:51:42 ss7pro: yes 16:51:54 dachary: dictionary is the extended schema with data definition = metadata. 16:52:03 it's easy to say : this is outside of the scope. But it makes it a lot more difficult for the billing. 16:52:24 dachary: what does? 16:52:56 not storing metadata in the storage makes it more difficult for the billing to figure out what a resource_id relates to 16:52:57 dachary: There's also one more thing that ip addresses assigned to instances may change. We need also to track which ip address belong to which instances as this data is not available now 16:53:42 ss7pro: that's what I'm afraid of. The extent of the "required metada" is virtually boundless. 16:54:00 dachary: agreed on the metadata 16:54:04 (not sure it's proper english but you get my meaning ;-) 16:54:36 unless someone wants to add something, are we done ? 16:54:38 ss7pro: why would we care about this? will you bill differently on which instance an IP is attahced to? 16:55:09 nijaba: maybe not but it's an information that is valuable to the customer. To track the bandwidth usage for instance. 16:55:17 we only bill based on internal and external traffic, but I could see where you would charge for incremental addresses 16:55:47 DanD_: yes :-) 16:55:49 nijba: You need to know which customer generated traffic. 16:56:12 ss7pro: you will know that because tenant_id / project_id is part of the record. 16:56:15 So if instances are changing ip address (this is possible with quantum) you need to be sure that you charge the right customer 16:56:15 DanD_: floating ip billing: yes, but billing per floating per instance_type seems far fetched 16:56:21 each meter is associated to a tenant. 16:56:57 dachary: But collector needs to be aware of it, so it'll need to query nova API each time it's doing collection 16:57:06 we have a component that filters traffic based on IP and tracks the total bytes 16:57:29 DanD_, we will probably be doing that, too 16:57:53 and we will do that too. 16:58:11 Our customers will want to know which IP is responsible for the most of the bandwidth used. 16:58:30 thats a lot harder 16:58:31 thank you for your participation. That was a very rich session :-) 16:58:46 too rich maybe ;) 16:58:50 It's also needed to differentiate beetwen internal and external traffic 16:58:55 thanks all! 16:58:58 this is a complicated problem. :-) 16:59:01 thanks 16:59:02 thanks! 16:59:06 thanks! 16:59:10 nijaba: it shows the problem is not resolved ;-) That was no troll session. 16:59:17 true 16:59:20 #endmeeting