21:00:02 #startmeeting Ceilometer 21:00:02 #meetingtopic Ceilometer 21:00:02 #chair nijaba 21:00:02 #link http://wiki.openstack.org/Meetings/MeteringAgenda 21:00:03 Meeting started Wed Nov 7 21:00:02 2012 UTC. The chair is nijaba. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:05 The meeting name has been set to 'ceilometer' 21:00:08 Current chairs: nijaba 21:00:11 Hello everyone! Show of hands, who is around for the ceilometer meeting? 21:00:11 o/ 21:00:12 o/ 21:00:29 \o 21:00:31 o/ 21:00:36 0/ 21:00:41 o/ 21:01:10 o/ 21:01:18 #topic actions from previous meeting 21:01:38 #topic nijaba to send private email to all comitter to come vote on the etherpad in the next 24h 21:01:44 that was done 21:02:05 #topic nijaba to then update wiki page as follow: 3+ votes=high, 1or2 vote=medium, low for rest in terms of priorities 21:02:19 I updated the roadmap page, we'll discuss this a bit later 21:02:40 #topic nijaba to prepare survey for ml next week 21:02:56 some of view may have bee 21:03:04 invited to check it out 21:03:15 we'll discuss it in a bit 21:03:33 #topic dhellmann update versioning in ceilometer repo to match openstack standards 21:03:39 I'm working on that right now. 21:03:54 (hi) 21:03:56 I think I've figured out what to do, so I expect to have the changes ready before our next meeting. 21:03:57 cool. any issues? 21:04:04 it's confusing, but I'm working it out 21:04:14 if I run into blockers, I know who to talk to for help 21:04:19 isn't that just changing version to 2013.1 ? 21:04:28 ok, should we carry the action for next meeting? 21:04:37 jd__: no, there are some modules in common for keeping it up to date correctly 21:04:39 nijaba: yes 21:04:51 dhellmann: ok :) 21:04:52 jd__: dhellmann has to use a funky vresion generation script 21:05:09 #action dhellmann update versioning in ceilometer repo to match openstack standards 21:05:21 I've been trying to cargo-cult it from some changes proposed for quantum 21:05:38 #topic nijaba to add eglynn to ceilometer drivers on satureday if all goes well 21:05:42 but I guess I'm going to have to actually learn how this code works :-) 21:05:56 eglynn is now a core dev for ceilometer. congrats! 21:06:00 welcome! 21:06:01 thanks! 21:06:16 \o/ 21:06:19 go eglynn 21:06:42 asalkeld: you'll likely be next ;) 21:06:51 I've seen no objections for asalkeld's nomination. I need to go back and look at the date on that email. We wait 5 days, right? 21:07:02 right 21:07:03 t 21:07:09 that should be tomorrow 21:07:15 ah, good 21:07:34 #topic dhellmann update readthedocs copy of our docs 21:07:43 I believe that is done 21:08:01 nice, thanks 21:08:06 (c�ear 21:08:18 zykes-: ?? 21:08:31 #topic jd and nijaba to start preparing a video demo of ceilometer 21:08:52 nijaba: nothing :) 21:08:54 we discussed it on monday night, and now have a plan! 21:09:20 jd__ and I are to start a script, but have not started on it yet 21:09:27 so we'll carry it on 21:09:31 suggestions welcome :) 21:09:45 #action jd and nijaba to start preparing a video demo of ceilometer 21:09:48 this is an intro to ceilometer itself, right? not just for developers? 21:09:55 right 21:10:08 using horizon or ? 21:10:10 video demo, nice 21:10:24 we were thinking 5 min intro slides, then short demo 21:10:36 using the den 21:10:58 debug stuff jd__ wrote which is very nice to show activity 21:11:11 sounds like a good idea 21:12:04 #topic Review priorities as proposed on EfficientMetering/RoadMap 21:12:22 i think timjr is doing some interesting stuff with visualizations 21:12:26 might be interesting also 21:12:40 ok, so I updated the page using the result from our "voting" for priorities 21:12:53 harlowja: that would be welcome 21:12:58 nijaba: did we really get the sqlalchemy backend as "low" priority? 21:13:16 dhellmann: yep, so I think a few items need adjustement 21:13:34 indeed, that's a high priority for us at DH 21:13:47 jtran: what's the status of that driver, I haven't had a chance to look at it in a couple of days 21:13:59 sorry, reading back.. 21:14:15 oh sqlalchemy, i implemented the API methods for max and sum 21:14:22 I don't think there are anything left. 21:14:32 at least i didn't see any tickets in that regard 21:14:33 oh, good 21:14:39 I'll run some tests with it asap 21:14:45 nice 21:14:57 oop, late to the party 21:15:05 jtran: should I mark it as done on http://wiki.openstack.org/EfficientMetering/RoadMap? 21:15:07 yeah, so I'm currently screwing around with zipkin 21:15:09 timjr: signed u up for everything 21:15:11 ha 21:15:13 drat 21:15:29 nijaba: to be safe i'd wait until dhellmann takes a quick test 21:15:52 zipkin is a scala implementation of dapper (googles "distributed tracing infrastructure"). it's open source, and it has some d3.js rendering in the front end 21:16:04 jtran: done but not tested is still done, I think ;) 21:16:16 I think that's a really nice system for understanding what your openstack cluster is up to... so I'm prototyping with it to see what's required 21:16:17 nijaba: in that case, yes ! ;) 21:17:06 timjr: and you are basing it off the data we collect? 21:17:22 no 21:17:40 nijaba: did a single vote translate to a "low" priority on the roadmap? 21:18:02 eglynn: yep, that was what we agreed on, but need to tune now 21:18:22 * eglynn thought he'd voted for the 'assess Synaps' task 21:18:40 timjr: so this work is exploratory? 21:18:48 yes 21:19:11 eglynn: ah, right, my mistake there, should hvae been marked 21:19:14 whatever I do for monitoring, I don't want to make it impossible to use it for dapper-style tracing 21:19:25 so this is an easy way to check :) 21:19:29 nijaba: cool 21:19:34 eglynn: fixed 21:19:39 thanks! 21:19:43 timjr: makes sense 21:19:46 anyt 21:20:08 anything else on the roadmap that does not make sens before I sort it? 21:20:50 how long do we have to get features in? 21:20:54 also, if you know the bug # for the bugless tasks, feel free to complete 21:21:03 nijaba: we should take out the nova-volume item, since we decided not to do it 21:21:33 asalkeld: until G3 for the base implementation 21:21:41 http://wiki.openstack.org/GrizzlyReleaseSchedule 21:21:49 g3 is feb 21 21:21:58 dhellmann: I just wanted to keep the decision documented... 21:22:06 there is the monitoring blueprint 21:22:12 that's going to be a bit of a challenge with the holiday season at the start of this cycle 21:22:18 nijaba: ah, ok 21:22:21 but not sure it could land in time 21:22:27 https://blueprints.launchpad.net/ceilometer/+spec/monitoring 21:22:30 lots to do 21:22:57 asalkeld: the teambox link on that page gives me a 404 21:23:10 asalkeld: that should depends on multi-publisher blueprint I think, no? 21:23:10 yea, me too - I'll sort it 21:23:19 sure 21:23:35 I'll add it then :) 21:24:00 say, any reason why compute agent is in ceilometer and not in nova? 21:24:27 just seems to make more sense there IMO 21:24:39 because we're not core I'd say 21:24:40 asalkeld: it depends on ceilometer code that was only in our project at the time, and we were trying to be "self contained" as much as possible for the last cycle 21:24:44 asalkeld: simplicity, but we should push as much as possible to nova now that we are incubated 21:24:54 k 21:25:09 just makes that whole db issue go away 21:25:34 yeah -- it would be nice if there was a way to have nova load extensions that wanted periodic tasks 21:25:49 aren't we moving towards avoiding DB access by using the novaclient? 21:25:52 then we could release it as a plugin 21:26:15 (to list instances on a host etc.) 21:26:19 eglynn: yes, but we do still import nova's libvirt wrapper code, and it would be nice to get rid of that dependency, too 21:26:46 yea, dhellmann we could just have the same agent, but in nova 21:26:53 I think we could export the function we need over RPC 21:27:02 asalkeld: that might make sense 21:27:05 pollster for metering network bandwidth is marked as partially done. Any plan from anyone to complete? 21:27:07 that's something that could be accepted 21:27:33 nijaba: I think that's done 21:27:42 dhellmann: ok, \i'll fix then 21:27:45 dhellmann: not external 21:27:48 the associated bug is marked "fix released" 21:27:50 nijaba: not done 21:27:57 jtran: external traffic? 21:28:07 we only have VM vif counters 21:28:09 the network bandwidth metering only implemented for internal network banadwidth 21:28:16 jtran: do you have a good solution to distinguis external traffic? 21:28:28 jtran: dhellmann used something totally differetn 21:28:29 nijaba: no. i looked into that before using iptables accounting. is not easy 21:28:33 jtran: I don't think there's any way for us to get external stats 21:28:49 we're asking our router for those stats (each tenant has a software router) 21:28:51 dhellmann: ok, if we are not considering external traffic, then please go ahead and mark it done 21:28:53 jtran: so I think we should mak it done even though we have limitatins 21:29:21 * jd__ agrees 21:29:23 I agree. Even if we find a better solution, we're likely to need a couple of implementations for different configurations. 21:29:46 updated 21:30:16 I yhon you can get it from openvswitch 21:30:21 *I think 21:30:34 so, If you agree, I'll action myself to sort the list by prio, add bugs if none exist for each item on the list 21:30:45 salmon_: i haven't looked at the openvswich/quantum implementation. that's probably likely. 21:31:00 nijaba: don't hesitate to use blueprints also for changes/features :) 21:31:03 but I would need someone's help to give t-shirt size to features 21:31:07 backtracking to the ceilo-agent-moves-into-nova idea for a sec ... 21:31:18 so that seems to leak some monitoring-related concerns into nova, frequency of the polling cycle etc. 21:31:30 also, can we rely on the timeliness of periodic tasks within a loaded nova compute agent? 21:31:46 have as a seperate daemon 21:31:51 (as now) 21:32:06 and same/simerlar config options 21:32:07 would that defeat the purpose slightly? 21:32:08 eglynn: and keep it under our responsability to maintain it 21:32:21 what would be the point of moving it to another git repo, then? 21:32:25 (i.e. to simplify the deployment, one fewer worker etc.) 21:32:43 eglynn, it's to make accessing the data easier 21:33:04 not having to import nova stuff from ceilometer 21:33:15 eglynn: if celiometer provides a library, could then we ask the nova people to write hookins to that library, idk 21:33:15 yep, it would allow 'private' APIs to be used freely 21:33:22 more libraries maybe 21:33:30 yar 21:33:44 (just an idea) 21:33:44 so we make nova import ceilometer stuff instead? 21:33:49 other option though would be for nova to export a stable public API for ceilo to use 21:34:08 jd__, yes but minimal stats_send() api 21:34:16 eglynn: I like that better. How practical is it? 21:34:27 not really 21:34:29 if it exposes an API, wouldn't you have to poll? 21:34:40 we do anyway 21:34:40 timjr: yep, as now 21:34:57 problem is nova is only one project 21:34:58 poll a stable API exported via RPC providing CPU time, IO, etc for all virt supported by nova 21:35:15 then to do the same for all projects? 21:35:16 that sounds reasonable to me 21:35:21 that's something looking doable and acceptable for nova 21:35:38 Ok, it seems that we have a good discussion topic here. Should we action someone to think up a proposal to be debated next week? 21:35:40 hmmm, ya, so polling is one way, the monitoring stuff seems like it would be the push part though 21:35:55 monitoring stuff? 21:35:59 any volunteer? 21:36:10 harlowja: I think it's fine to have a queryable API for system stats like cpu time and so on 21:36:30 nijaba: I can work up a proposal for discussion 21:36:40 I can help 21:36:46 I can comment :p 21:36:47 asalkeld cool 21:36:55 but should nova be doing that, or should it just be broadcasting and letting some other system provide the query ontop of that raw data 21:36:56 timjr: not really so nice polling large clusters 21:37:08 #action eglynn to writup a nova integration proposal to be discussed next week 21:37:11 jeffreyb1: we would likely not use it for production monitoring 21:37:17 jeffreyb1: but there's no harm having it 21:37:35 timjr: famous last words 21:37:41 problem is it uses rpc 21:37:42 jeffreyb1: could be convenient: hit a little status URL to find out what a node thinks it's doing, instead of going off to your monitoring dashboard 21:37:49 timjr: harm being code confusion 21:38:03 timjr: yup, def convenient but not a good way to go long term IMO 21:38:08 Ok, so could you all please take a few moment today or tomorrow to help me fill the roadmap with the valid links? That would help a lot 21:38:28 and t shirt sizes 21:38:36 well, you've got to gather all the stats anyway, putting up the polling API is mostly about keeping a local buffer of stat values 21:38:46 XXXXXXX-small 21:38:51 nijaba: how you want to proceed? 21:38:53 timjr: agreed 21:38:56 timjr: those stats are good for r/t monitoring but the polling will get out of hand 21:39:17 jeffreyb1: again, I would not use polling for actual monitoring 21:39:19 timjr: simplicity though, start simple no? 21:39:38 I'd suggest each one have a pass at it for the action they care about in the next 24h 21:39:39 jeffreyb1: by out of hand, too frequent? 21:39:40 harlowja: I don't plan to implement it at present, but if ceilometer wants it, I don't see any conflict with our needs 21:39:41 timjr: so a different mechanism for monitoring of the same stats? 21:39:41 don't do local buffering, have simple broadcasting, get as far as u can with that, then add in local buffering, polling 21:39:54 nijaba: ack 21:39:58 nijaba: ack 21:40:01 eglynn: thinking of polling large clusters, 1000s of machines, kind of a pain 21:40:06 jeffreyb1: yeah. hadoop does that, for example. 21:40:13 shall we move on? 21:40:30 yes 21:40:30 sorry to be a pain, but could we please keep on the agenda until the open discussion? 21:40:35 eglynn: rather see fire and forget, let the collector deal with it 21:40:45 nijaba: sure 21:41:03 ok, I think we are ready to move to the next topic 21:41:04 jeffreyb1: re. scale, a local agent would just be polling the instances local to each compute node 21:41:25 nijaba: k 21:41:32 #topic Review survey prepared by nijaba 21:41:50 if you had the chance to review it, any comments about it? 21:42:07 do you think it is ready to be shared widely? 21:42:17 meaning the opnestack ml 21:42:45 nijaba: do you have a link handy? 21:43:06 nijaba: i tried submitting my survey and it says "requires input" ... 21:43:08 reviewed 21:43:10 #link http://www.surveymonkey.com/s/SY55BHR 21:43:24 even tho i made sure all fields had an order #. 21:43:46 'this question requires an answer' 21:43:51 jtran: really? I did not have this issue... :( 21:44:01 i reproduced it right now 21:44:22 survey questions numbered 1-16. then i even put something in question #2. click submit and that's what i get. using chrome on osx 21:44:29 1-14 i meant 21:45:34 jtran: yep, I just had the same pb. Did not use to have it. I'll disable the check for now, but will need to figure out what is going on 21:46:35 ok, I removed the restriction 21:46:43 sorry guys, I need to take kids to school be back in ~15min 21:47:08 asalkeld: we'll be around :) 21:47:50 what's next? 21:47:58 one general point on the survey, how are we gonna set expectations in terms of being bound by the result? 21:48:01 so anyone against us sharing the survey widely? 21:48:06 (e.g. for guidance only?) 21:48:15 eglynn: just a poll, not commitment 21:48:26 nijaba: sounds fair 21:48:29 it's really to make sure we are not too far off our potential users 21:49:08 dhellmann: since you suggested it, what's your pov? 21:49:44 we should stress those expectations in the email we send to the list 21:49:52 in the invitation, I mean 21:49:59 dhellmann: +1 21:50:08 cool 21:50:11 I'm not sure how to ask for input without asking for input. ;-) 21:50:53 eglynn: asalkeld: do you mind if I remove the qpid and zeromq items from the list. It seems the issues comes from having more than 14 items in the list 21:51:15 nijaba: fair enough 21:51:18 and eglynn I think qpid will be a req for rhat in any case, right? 21:51:29 nijaba: let's keep those and remove some of the internal architectural stuff 21:51:32 (I think Rh has sufficient interest in qpid to test anyway) 21:51:33 like removing nova imports 21:51:47 dhellmann: that would work too 21:51:48 * dhellmann can't see the list any more because he submitted answers to the survey 21:52:22 "remove db access" looks like another "features" users won't care about and that we're going to do anyway 21:52:40 dhellmann: I'll remove the nova import and the sqlalchemy and it should work. thanks 21:52:51 ok 21:54:36 ok, fixed now 21:55:09 so, I'll action myself to send the email tomorrow, unless someone is against that 21:55:16 +1 21:55:35 #action nijaba to send an invite to fill t 21:55:40 what's the whole survey stuff ? 21:55:41 #action nijaba to send an invite to fill the survey 21:56:06 #topic Open Discussion 21:56:25 zykes-: not sure I understand your question 21:56:32 did folks get a chance to review sandywalsh's unification write-up? 21:56:35 #link http://wiki.openstack.org/UnifiedInstrumentationMetering 21:56:44 yes 21:56:46 zykes-: we are soliciting input from potential users about what features they consider important 21:56:47 good stuff 21:56:57 unfortunately, no 21:57:14 yes, i read it 21:57:38 it seems complicated though 21:57:53 but i think the right direction, i think we are working on unifying the bottom layer 21:57:58 timjr: mainly 21:58:08 yep, agreed 21:58:16 really struggling with the reliance on rabbit 21:58:17 I wasn't sure tho' about the "Remove the Compute service that Ceilometer uses ..." suggestion 21:58:19 kinda ties in with the earlier discussion on moving stuff into nova 21:58:33 tach is one way, but i don't think the only way 21:58:48 http://wiki.openstack.org/InstrumentationMetricsMonitoring is the other one that is more 'low level' 21:58:49 well, I don't mind if people want to use amqp to send around messages, but I would consider that a configuration option 21:58:57 the notification system already does 21:58:57 back 21:59:08 timjr: sure 21:59:57 jeffreyb1: is the reliance on rabbit still a huge problem for you if sufficiently partitioned from the prod message bus? 22:00:26 (e.g. a separate rabbit broker/cluster) 22:00:37 eglynn: anything other than a simple point-to-point communication has many of its own failure modes that you would want to monitor 22:00:39 I thought we already agreed we would support multiple publishing methods. 22:00:42 eglynn: as tim alluded to, so long as it is a config option and pluggable to use something else then it is not a prob 22:01:02 dhellmann: yep 22:01:02 * nijaba let us run overtime as I do not think th 22:01:05 jeffreyb1: cool 22:01:06 btw, vish closed the BP and marked it as 'obsolete' 22:01:13 * nijaba let us run overtime as I do not think there is another meeting after us 22:01:27 he suggested we put it in openstack-common or "external" 22:01:41 yeah, i think that means somehow it wasn't clear enough :) 22:01:42 dhellmann: yep, that was acted for me too 22:02:00 the functions you call can be in a separate library, but the calls will have to land in nova and other components... 22:02:11 sure 22:02:50 seems not everyone wants a single library to emit the stats 22:03:10 might need to have tracing and metering/monitoring 22:03:13 um... well, I guess there's no accounting for taste 22:03:32 as seperate entities 22:03:39 I am not fussed 22:03:43 fwiw, I've been looking at https://github.com/BrightcoveOS/Diamond/ this week and it has some of the stuff we've discussed doing with different polling rates and publishing methods already 22:04:09 asalkeld: I would hope that the API is good enough that switching from two libraries to one is a simple matter of refactoring 22:04:11 cool looking 22:04:30 we're going to be using it for monitoring here at DH, so I wrote a ceph plugin for it. pretty easy. could use some polish, but maybe we can steal ideas or even collaborate 22:04:40 dhellmann: that's an interesting link 22:04:42 dhellmann: interesting ... 22:05:12 I'm not super happy with the "scan a directory for plugins" approach they took, and the packaging is rough, but all the pieces seem to be there. 22:05:39 they're focusing on monitoring, of course 22:05:55 I'm not sure if you can specify that the same data goes to different sources at different rates. 22:06:03 sorry, different destinations not sources 22:06:08 yip 22:06:09 * nijaba_ was temporarily disconnected :( 22:06:18 that would be fairly crucial 22:06:25 eglynn: yeah, definitely 22:06:46 they've been very welcoming of patches this week, even without me contacting them directly, so that might be worth a go 22:06:48 class Metric(object): 22:06:48 def __init__(self, path, value, timestamp=None, precision=0): 22:07:02 don't see how we can add more info 22:07:13 user/resource info 22:07:24 yeah, it's definitely not good enough for billing 22:07:48 although the publisher pulls data out of the Metric, so if we change that class we could add data that is only used by some publishers 22:08:01 the just need **kwargs 22:08:08 dhellmann: and the ncome the transport issue... 22:08:10 I'm not necessarily suggesting we use their daemon instead of ours, but we might get some ideas about, for example, how to configure things 22:08:33 defo worth a sniff around 22:08:35 nijaba_: some of that didn't come through, I think, I'm not sure what you mean 22:08:40 so would a set of arbitrary ket/value pairs be sufficient for billing purposes? 22:08:44 key/value, even 22:08:47 timjr: no 22:08:54 dhellmann: what else do you need? 22:09:06 we need timestamps, for one 22:09:11 messages need to be signed for auditing purposes 22:09:11 oh, sure 22:09:18 , timestamp=None 22:09:21 that's an interesting one 22:09:25 they have that 22:09:26 and counters for auditability 22:09:37 we need the metadata so consumers can compute rates based on properties of the instance 22:09:39 well we write the handler 22:09:43 nijaba: you mean unique message IDs? 22:09:46 and we need to know the owner 22:09:55 timjr: no, incremental counters 22:10:14 timjr: so that you can dtect missing or inserted messages 22:10:20 I see 22:10:29 so that is like a stateful metric? 22:10:39 nijaba: did counters make it onto the priority list for grizzly? :-) 22:10:44 sequenced metric 22:11:09 hmm 22:11:14 we don't really care a 22:11:23 about the order 22:11:33 more about tempering 22:11:41 nod 22:11:44 that makes sense 22:12:22 well worth a look 22:12:44 should we action something here? 22:12:59 I think if tampering were to become an issue, you've got some fundamental access control problems on your openstack cluster 22:13:06 nijaba: I'm not sure what that action would be. 22:13:07 ... but I can see being paranoid where billing is concerned 22:13:48 timjr: yep, that's something people tend to become paranoid about 22:13:52 * dhellmann needs to leave soon 22:13:57 * nijaba too 22:14:08 "investigate diamond for use to generate stats" 22:14:16 should we end the meeting for now? 22:14:20 yip 22:14:21 btw 22:14:23 k 22:14:29 ok 22:14:37 asalkeld: care to take that action? 22:14:49 sure 22:14:51 for chargeback >> bufunfa uses ceilometer 22:15:01 for people that care 22:15:07 #action asalkeld investigate diamond for use to generate stats 22:15:32 #endmeeting