#openstack-meeting log

15:00:51 <jd__> #startmeeting ceilometer
15:00:51 <openstack> Meeting started Thu Oct  3 15:00:51 2013 UTC and is due to finish in 60 minutes.  The chair is jd__. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:53 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:55 <openstack> The meeting name has been set to 'ceilometer'
15:01:00 <dragondm> o/
15:01:04 <lsmola> o/
15:01:05 <dhellmann> o/
15:01:05 <eglynn> o/
15:01:09 <gordc> o/
15:01:12 <jd__> hi guys
15:01:22 <DanD> o/
15:01:26 <thomasm> o/
15:01:26 <sileht> o/
15:01:28 <sandywalsh> o/
15:01:29 <apmelton> o/
15:02:26 <jd__> #topic Havana release
15:02:39 <jd__> a few words about Havana, so RC1 has been released yesterday
15:02:41 * dragondm applauds
15:02:45 <thomasm> Wahoo!
15:02:47 <eglynn> \o/
15:02:51 <terriyu> o/
15:02:52 <dhellmann> nice work, everyone!
15:02:54 <sandywalsh> congrats y'all
15:03:07 <lsmola> yaaay
15:03:16 <jd__> the milestone-proposed branch has been cut to at some point include fixes if needed, if we need to respin a rc2 release
15:03:23 <jd__> it's time to test heavily!
15:03:34 <jd__> congratulations to you all too! :)
15:03:48 <eglynn> jd__: re. needing an rc2 release ...
15:04:04 <eglynn> on https://review.openstack.org/47542 we discussed considering the leaking of admin-ness to non-admin users as a separate bug
15:04:05 <jd__> you can merge patches in master once again and that'll be in Icehouse
15:04:16 <eglynn> sounds like an RC2 candidate?
15:04:58 <jd__> eglynn: maybe
15:05:28 <eglynn> jd__: k, I'll file a bug and work on a fix
15:05:42 <jd__> I think I'd like to see the fix before deciding
15:05:45 <eglynn> jd__: (we can decide later if target'd at Icehouse or RC2)
15:05:48 <jd__> (I know that sounds weird)
15:05:58 <eglynn> jd__: no, that's fair enough
15:06:02 <jd__> cool :)
15:06:49 <jd__> and for people not following at home, I think I'll still be the benevolent democractic dictator for the Icehouse release
15:07:21 * dragondm hands jd__ a handfull of medals and a funny hat.
15:07:26 <lsmola> hehe
15:07:26 <eglynn> with 100% of the vote, Stalin would have been proud of that ;)
15:07:39 <thomasm> Why'd you take my hat?
15:07:49 <jd__> eglynn: 100 % of 0 vote :-)
15:08:01 <thomasm> Lol
15:08:05 <eglynn> :)
15:08:11 <gordc> jd__: you just broke maths.
15:08:38 <jd__> :-)
15:08:40 <jd__> #topic Release python-ceilometerclient?
15:08:49 <eglynn> I'd like to include https://review.openstack.org/#/c/49551 if poss
15:09:03 <jd__> works for me
15:09:06 <eglynn> cool
15:09:21 <jd__> anybody ping me or eglynn to release
15:09:33 <jd__> eglynn: so ping me or yourself to this to happen
15:09:38 <lsmola> eglynn, cool
15:09:43 <eglynn> cool
15:10:35 <jd__> #topic Talking about Hardware Agent
15:10:41 <jd__> #link https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices
15:10:47 <jd__> lsmola: floor is yours
15:10:54 <lsmola> I have prepared several questions
15:10:59 <jd__> 42!
15:11:13 <lsmola> i will just throw them in so we can start discuss that
15:11:22 <lsmola> 1. Should it act as a central Agent? Or should it be deployed on every hardware. It seems that the original author planned both:
15:11:23 <lsmola> http://www.cloudcomp.ch/2013/07/hardware-extension-for-ceilometer/
15:11:23 <lsmola> So the settings determining what is visible for the agent would reside in the agent somehow?
15:11:23 <lsmola> 2. Regarding security, is it better to have an Agent on each baremetal? Or rather have there only some daemon like snmpd and allowing communication from and to central agent.
15:11:23 <lsmola> 3. There is very little documentation about how to set this up. I couldn't find, from where the agent takes the list of resources it should poll. E.g. list of routers or list of baremetals it can see. Does anybody know?
15:11:29 <lsmola> 4. About the IMPI inspector. It is probable that he IPMI credentials will reside in Ironic (at least for Undercloud), so the inspector should be able to talk both to Ironic API, or to IPMI directly, right? So it works also when Ceilometer knows the IPMI credentials.
15:11:32 <thomasm> ow my eyes
15:11:39 <lsmola> hehe
15:11:56 <jd__> and you got only 49 minutes left to answer this
15:12:01 <lsmola> there was a warning :-) you have to catch
15:12:21 <dhellmann> what are the security implications of running an agent on the server with an image owned by the tenant?
15:12:31 <dhellmann> and how would we get the agent into the image in the first place?
15:13:01 <lsmola> dhellmann, our primary use case is to use it in udercloud via tripleo
15:13:13 <dhellmann> ah, ok, that makes more sense :-)
15:13:15 <lsmola> dhellmann, so it would be image element of tripleo
15:13:26 <dhellmann> sure
15:13:26 <eglynn> lsmola: "should it act as a central agent" == "should it not run on-host" ?
15:13:58 * eglynn confused on the "central agent" reference in #1
15:14:03 <lsmola> eglynn, if i get it correctly, central agent is running on Control node, right?
15:14:26 <lsmola> eglynn, so there is only one agent, polling everything
15:14:48 <eglynn> lsmola: well in theory it could be running anywhere, but yes you could call it a control node, and yes it polls everything
15:14:56 <sandywalsh> there would have to be something on the host answering the poll though.
15:14:57 <lsmola> jd__, btw. how is it with scaling of central agent? i thing that was the biggest concern of tripleo guys
15:15:13 <eglynn> lsmola: so you mean, load the hardware monitoring piece as an extension into the existing central agent?
15:15:15 <dhellmann> would a central agent use IPMI and SNMP?
15:15:23 <lsmola> sandywalsh, for nsmp, it will be snmpd
15:15:27 <jaypipes> hi guys, sorry I'm late... wanted to let you know I should have the Alembic removal patch updated and done today.
15:15:33 <sandywalsh> lsmola: gotcha
15:15:37 <lsmola> sandywalsh, there is also image element for this in tripleo
15:15:51 <jd__> lsmola: I should be working on it soon
15:16:04 <lsmola> sandywalsh, though you have to allow communication in firewall
15:17:00 <lsmola> ok so, anybody has an idea how the architecture should work?
15:17:06 <sandywalsh> yes, there is also the rackspace agent, which is open sourced. Perhaps could be manipulated for guest-side reporting. It's available for windows and linux.
15:17:12 <dhellmann> I'm not sure how we could handle baremetal securely using either mode for any use case other than tripleo. Are we trying for tenant images at all?
15:17:27 <sandywalsh> I think it's very xen-specific currently though
15:17:53 <dragondm> sandywalsh: yah, the rs agent uses xen store for communication.
15:18:12 <sandywalsh> yeah :/
15:19:27 <lsmola> hm
15:19:34 <jd__> if the user rans on baremetal, either the monitoring can be done by IPMI or the like, otherwise the user could have an agent posting via the API untrusted metrics that wouldn't not used for billing but could be used for stuff like autoscaling
15:19:36 <sandywalsh> #link https://github.com/rackerlabs/openstack-guest-agents-unix
15:19:48 <sandywalsh> #link https://github.com/rackerlabs/openstack-guest-agents-windows-xenserver
15:20:28 <eglynn> is IPMI alone sufficient though?
15:21:01 <eglynn> (from what we discussed previously, seemed that was limited to temperature, fan speed, voltage etc.)
15:21:02 <lsmola> eglynn, i am not sure, from metrics i saw, it didn't have everything
15:21:27 <lsmola> eglynn, there is a full list in the IPMI blueprint
15:22:27 <lsmola> jd__, the other thing is, that we might want to read syslog or other things from baremetal
15:22:30 <eglynn> lsmola: you mean the monitoring-physical-devices BP?
15:22:31 <jd__> eglynn: it may not indeed
15:22:43 <lsmola> jd__, so it would be much easier with agent on the baremetal
15:22:52 <jd__> lsmola: well everything that comes from baremetal cannot be trusted
15:23:00 <jd__> you can run an agent on baremetal and uses the API to post samples
15:23:03 <sandywalsh> but not everyone wants an agent on their machines
15:23:12 <sandywalsh> tradeoff
15:23:18 <lsmola> eglynn, https://blueprints.launchpad.net/ceilometer/+spec/ipmi-inspector-for-monitoring-physical-devices
15:23:19 <jd__> it should just be tagged as something coming from the user, not the operator
15:23:26 <sandywalsh> yep
15:23:26 <jd__> sandywalsh: clearly, it'll be optional
15:23:27 <eglynn> lsmola: thanks
15:23:32 <lsmola> eglynn, it should be IMPI inspector for hardware agent
15:24:24 <lsmola> sandywalsh, so it seems to me, it should be matter of configuration
15:24:43 <sandywalsh> yep, if you want the buttery goodness, you have to install it
15:25:18 <lsmola> sandywalsh, for tripleo, there could be several images prepared, so..
15:25:41 <sandywalsh> lsmola: right, our rackspace images have the agent preinstalled.
15:25:57 <lsmola> sandywalsh, yep
15:26:21 <lsmola> sandywalsh, and it could be preconfigured too, (or some configuraton scripts prepared)
15:26:49 <sandywalsh> some mount a "configuration drive"
15:27:24 <eglynn> so it seems like two alternative modes are possible (on- and off-host) ... are we saying that *both* approaches have their place?
15:27:39 <lsmola> by the way, regarding the 1, I have no idea how it works now, i haven't found anywhere the list of hosts that is polls, anybody have an idea?
15:27:40 <eglynn> (i.e. depends on the cloud deployer's policies)
15:27:57 <lsmola> eglynn, yes
15:27:57 <sandywalsh> yep, from the cm perspective I don't think things should really change. It's just a source of events.
15:28:18 <sandywalsh> be it from these agents/pollsters or something like Diamond or a periodic_task
15:28:43 <sandywalsh> whether it's a part of core CM is perhaps a bigger question
15:28:49 <jd__> eglynn: I think so indeed
15:28:58 <eglynn> and in the off-host/central-agent case, is the potential for data acquisition more constrained?
15:29:05 <sandywalsh> should we just focus on providing a solid api for these things
15:29:13 <eglynn> (so the two approaches aren't fully inter-change-able)
15:29:47 <sandywalsh> (api = http/udp/rpc-cast/whatever)
15:29:47 <jd__> eglynn: no they wouldn't be
15:30:09 <jd__> sandywalsh: we do have an HTTP API for posting sample taht should be enough
15:30:32 <jd__> I don't think the problem is in Ceilometer, it's rather on how to build things around what it offers right now :)
15:30:33 <sandywalsh> jd__: yep, depends on the volume/frequency, but a good start for sure
15:30:43 <lsmola> jd__, yes exactly
15:31:02 <jd__> sandywalsh: yeah, I also think there's a good chance your baremetal might not have access to RabbitMQ for example for security reasons
15:31:07 <jd__> sandywalsh: like different networks
15:31:44 <sandywalsh> sure, there's going to be a pile of deployment considerations ... too many to deal with. We should just focus on the api. and perhaps some example pollsters/agents
15:31:46 <lsmola> jd__, well for snmp, you need to allow udp communication with snmpd
15:31:49 <sandywalsh> (imho :)
15:31:56 <lsmola> jd__, if it's remote
15:32:12 <jd__> lsmola: yeah but that's just for polling, not a problem  :)
15:32:24 <lsmola> jd__, ok
15:32:37 <jd__> does that answer enough of your questions lsmola ?
15:32:47 <lsmola> jd__, well
15:33:04 <lsmola> jd__, I have almost the same amount of confusion :-D
15:33:21 <jd__> you're welcome.
15:33:29 <thomasm> hah
15:33:29 <lsmola> jd__, ok, so the main thing is, can we leave the hardware agent as it is?
15:33:39 <lsmola> jd__, as a separate from central
15:33:57 <lsmola> jd__, as it can be used as central of per host agent
15:34:02 <jd__> lsmola: good question, I think we are going to work on the central agent and improve it
15:34:12 <jd__> having to do that twice for no reason does not sound like a good option to me
15:34:15 <lsmola> jd__, this is a sttarting point for me, to get the hardware agent in
15:34:20 <jd__> considering the code for the hardware agent is quite the same AFAIK
15:34:38 <jd__> question is: is there anything you can't do with the central agent right now?
15:35:05 <lsmola> jd__, well i guess you can put central agent to each host too, if you want right?
15:35:26 <lsmola> jd__, is there some horizontal scaling of the central agent?
15:35:45 <jd__> lsmola: not yet, but this is going to be implemented
15:35:49 <lsmola> jd__, tripleo is afraid of large deployments
15:35:57 <lsmola> jd__, ok, fair enough
15:36:06 <sandywalsh> (we all are :)
15:36:14 <lsmola> hehe
15:36:21 <jd__> lsmola: if there is a hardwar agent I think it should be the agent polling things from inside the host and posting stuff to the REST API for example
15:36:36 <jd__> so it's not polling really hardware, but instances-on-hardware
15:36:46 <lsmola> jd__, yes, or directly to message buss
15:37:05 <jd__> lsmola: if you an access it, for sure
15:37:10 <lsmola> jd__, yes
15:37:26 <jd__> so that means basically that you can reuse the central agent
15:37:36 <jd__> only enable the pollsters that polls locally thinsg like CPU time etc
15:37:47 <lsmola> jd__, ok cool
15:37:49 <jd__> and write a new publisher to publisher over HTTP
15:37:52 <jd__> -er
15:38:03 <lsmola> jd__, then there is no reason to have separate hardware agent I guess
15:38:04 <jd__> something like that (maybe I miss some detail, I'm thinking out loud)
15:38:11 <lsmola> jd__, at least i don't see any
15:38:19 <jd__> lsmola: maybe there is and we don't see it yet, but for now I don't think so
15:38:24 <jd__> let's keep things simple if we can!
15:38:24 <eglynn> if there's an agent posting samples to the REST API from on-host, then we have propogate keystone credentials onto every host also, right?
15:38:30 <lsmola> jd__, agree
15:38:44 <jd__> eglynn: right
15:39:04 <jd__> eglynn: maybe we could use a new role or something to have a better split on permissions about posting sample from trusted sources
15:39:13 <lsmola> jd__, i guess it will be safer to run only daemons on hosts, that can be polled
15:39:14 <jd__> that's another (part of the) issue
15:39:39 <jd__> lsmola: I can't see how safer it would be :)
15:39:46 <sandywalsh> yeah, we should support anonymous posts
15:39:56 <sandywalsh> for monitoring  ... not billing
15:40:01 <eglynn> jd__: hmmm, yeah a very limited RBAC policy, plus possibly rate limiting to avoid DDoSing
15:40:12 <sandywalsh> yep
15:40:19 <sandywalsh> everything should be rate limited
15:40:31 <sandywalsh> or just use udp
15:40:36 <jd__> eglynn: I'm pretty sure we can re-use or add a middleware on Oslo for that
15:40:42 <jd__> eglynn: the DoSing thing
15:40:46 <eglynn> cool enough
15:40:51 <jd__> but good idea
15:40:59 <jd__> (yet another part of te problem :-)
15:41:29 <jd__> lsmola: if you got further point don't hesitate to send a mail to the list I guess
15:41:49 <lsmola> jd__, ok, so my main testing scheme is run the central agent and SNMPd on each host
15:42:13 <lsmola> jd__, i think I can go with this
15:42:25 <jd__> cool
15:42:34 <lsmola> jd__, i will try to talk with llu about merging of the hardware agent into central agent
15:42:44 <jd__> lsmola: sounds like a good idea
15:42:47 <lsmola> jd__, he seems to be on vacation
15:42:55 <jd__> it's too bad llu is not here
15:42:59 <jd__> ok :)
15:43:00 <lsmola> jd__, so when i catch him
15:43:19 <lsmola> jd__, could you also post the decision to the blueprint?
15:43:56 <jd__> hm I can try but if I forgot feel free to do it intead
15:44:13 <lsmola> jd__, ok fair enough :-)
15:44:22 <jd__> #topic Open discussion
15:44:42 <lsmola> jd__, i think that's all regarding hardware agent, I have my starting point
15:44:45 <lsmola> :-)
15:45:31 <vvechkanov> Hello all. I want to ask about notification via emails. Is it planned to add it?
15:45:34 <jd__> if anyone has anything else go ahead
15:45:40 <jd__> vvechkanov: for alarms?
15:45:43 <vvechkanov> Yes
15:45:52 <jd__> vvechkanov: it's planned somehow, but nobody took the task over AFAIK
15:46:07 <eglynn> IIRC we were thinking Marconi would provide SNS-like functionality
15:46:17 <eglynn> (for email/SMS/etc notifications)
15:46:31 <eglynn> not sure if that's still on the Marconi roadmap tho'
15:46:36 <jd__> you can write a Marconi target too eventually
15:46:50 <jd__> *possibly
15:46:58 <eglynn> perhaps we could do with a ceilo/marconi session at summit?
15:47:05 <vvechkanov> As I know Marconi don't plan to provide notfication...
15:47:08 <eglynn> k
15:47:17 <vvechkanov> I am wrong?
15:47:35 <eglynn> dunno, I'd need to check with the Marconi folks
15:47:57 <eglynn> (I've had that on my mind also, so I'll chase it up)
15:48:13 <gordc> i had a question about notification.info queue... is that strictly for ceilometer? apparently some ppl have been listening to that queue (for whatever reason) and when we enable ceilometer, they start racing and grabbing random msgs. wondering what's the correct way to impl this?
15:48:14 <jd__> eglynn: yeah good idea
15:48:43 <jd__> gordc: no, we consume from it by default but you can configure another queue to be used
15:48:57 <jd__> gordc: just add more topic in nova.conf for example, change the topic in ceilometer.conf
15:49:22 <sandywalsh> gordc: we have our services configured to publish to two exchanges, one for stacktach/ceilometer and another for our billing dept.
15:49:24 <jd__> notifications_topics=notification,ceilometer in nova.conf and notificatiosn_topics=ceilometer in ceilometer.conf
15:49:30 <jd__> something like that
15:50:08 <gordc> jd__, sandywalsh: cool cool. that's what i thought. wanted to confirm thats how everyone else is doing it before i make stuff up. :)
15:50:43 <jd__> sandywalsh: by the way I think that your notification patch introduces a new issue
15:50:47 <jd__> (kinda)
15:51:02 <sandywalsh> conflicting rabbit configs?
15:51:04 <jd__> because in such a context, now Ceilometer publishes and consumes only on "notification_topics"
15:51:18 <jd__> so with something I've described, it's problematic
15:51:31 <jd__> we should have notification_topics and notification_topics_to_consume
15:52:01 <sandywalsh> right, because we can't inject into the rpc_notifier ... so we'll need the _to_consume part
15:52:07 <sandywalsh> k, I'll look into that
15:53:26 <jd__> cool
15:53:40 <dragondm> btw, I gather all of the -2's for FF will be lifted soon?
15:53:53 <jd__> dragondm: already done (unless I forgot some)
15:54:06 <dragondm> Yah, there are still some.
15:54:34 <jd__> dragondm: feel free to send them to me so I can lift :)
15:54:53 <dragondm> The auto-expire hit a few. I've revived mine: https://review.openstack.org/#/c/42713/
15:55:02 <jd__> ah indeed
15:55:14 <jd__> I didn't looked into expired ones
15:55:49 <jd__> anything else guys? otherwise closing in a minute
15:56:34 <jd__> #endmeeting