#openstack-meeting log

15:03:44 <jd__> #startmeeting ceilometer
15:03:45 <openstack> Meeting started Thu Jan 23 15:03:44 2014 UTC and is due to finish in 60 minutes.  The chair is jd__. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:03:46 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:03:48 <openstack> The meeting name has been set to 'ceilometer'
15:04:08 <sileht> o/
15:04:10 <eglynn> o/
15:04:11 <ildikov_> o/
15:04:13 <llu-laptop> o/
15:04:39 <lsmola_> o/
15:04:40 <nadya_> o/
15:04:45 <scroiset> o/
15:05:27 <jd__> #topic Milestone status icehouse-2 / icehouse-3
15:05:41 <jd__> so icehouse-2 is being released right now
15:05:47 <eglynn> a tad disappointing how things panned out overall with icehouse-2
15:05:52 <tongli> o/
15:05:53 * jd__ nods
15:06:01 <eglynn> ... i.e. chronic problems in the gate caused stuff that was ready to go to be bumped to i-3 :(
15:06:18 <eglynn> ... /me would have preferred if the milestone was delayed by a few days to allow the verification queue to drain
15:06:25 <nadya_> my fix became merged after 2 hours after releasing i-2 :(
15:06:42 <eglynn> yep, I had similar with a BP
15:06:56 <jd__> :(
15:07:08 <ildikov_> my fix also failed on the gate...
15:07:27 <ildikov_> so it is at the end of the waiting list right now
15:07:41 <eglynn> upside: it gives us a head-start on our icehouse-3 slate
15:07:45 <dhellmann> o/
15:07:47 <jd__> we also need to triage icehouse-3
15:08:05 <eglynn> note we've only 5 full person-weeks of dev time remaining before the freeze for icehouse-3
15:08:13 <eglynn> #link https://wiki.openstack.org/wiki/Icehouse_Release_Schedule
15:08:25 <eglynn> ... gotta put my hand up, I'm usually one of the worst offenders for landing stuff late in the cycle
15:08:39 <eglynn> ... however for i-3 lets try to aim for a more even flow-rate on BP patches
15:08:56 <eglynn> (... as there's gonna be an even bigger rush on the gate for i-3)
15:08:59 <jd__> if you have blueprint for icehouse-3, please update
15:09:19 <eglynn> (... given the amount of stuff that was bumped off i-2, plus the natural backloading onto the 3rd milestone that always happens in every OS release)
15:09:39 <ildikov_> jd__: I have a bp for i-3, which depends on the complex query one
15:10:10 <jd__> hm
15:10:48 <ildikov_> jd__: it would be good to get the complex query discussion fixed in the early phase of i-3 as the feature is ready to fly already
15:11:36 <jd__> we'll try to do that indeed
15:11:44 <nadya_> we need to hurry up right now :) a lot of things to discuss
15:11:59 <ildikov_> jd__: tnx
15:12:39 <eglynn> over the final 2-3 weeks of the icehouse-3 leadin, one possible strategy is to prioritize BPs over bugfixes
15:12:58 <eglynn> (... much easier to get a bugfix landed post i-3 than to beg a FFE on an unfinished BP)
15:13:10 <eglynn> ... justsayin
15:13:19 <llu-laptop> eglynn: +1
15:14:15 <dhellmann> eglynn: prioritizing reviews on anything associated with a bp or bug report between now and then seems like a good idea, too -- encourage people to help with project tracking by filing the right "paperwork"
15:14:21 <jd__> I think we'll rediscuss this next week too anyway
15:14:25 <jd__> just update your bp in the meantime
15:14:31 <jd__> dhellmann: +1
15:14:35 <eglynn> dhellmann: absolutely
15:14:49 <jd__> moving on as we have a lot of topic
15:14:50 <jd__> #topic Tempest integration
15:14:57 <jd__> nadya_: a word on that?
15:15:05 <nadya_> yep!
15:15:07 <nadya_> Our client is apporoved! Congrats :)
15:15:31 <nadya_> But we still facing with problems. There is a bug in devstack (and tempest) about processing the last '/' in the url.  #link https://bugs.launchpad.net/tempest/+bug/1271556
15:15:34 <jd__> clap clap clap
15:15:39 <eglynn> \o/
15:15:47 <dhellmann> nice work!
15:15:57 <ildikov_> nadya_: +1
15:16:01 <nadya_> because of this we have -1 from Jenkins here #link https://review.openstack.org/#/c/67164/
15:16:30 <nadya_> but we are working on fix
15:16:40 <jd__> great
15:16:42 <nadya_> Besides, we have two nova-related patches: #link https://review.openstack.org/#/c/46744/ and #link https://review.openstack.org/#/c/64136/ . yassine, do you have any news:)?
15:17:01 <nadya_> if you are here...
15:17:21 <nadya_> We are still working on pollsters testing. Tempest part is started but we need #link https://review.openstack.org/#/c/66551/ for testing. JFYI
15:18:06 <nadya_> and the last point is alarm testing. It is lost #link https://review.openstack.org/#/c/39237/ . Nick, nsaje, are you here?
15:18:37 <jd__> I'd suggest to steal the patch if it's not restored
15:18:48 <eglynn> +1
15:19:11 <jd__> otherwise that looks really good, thanks for taking care nadya_
15:19:14 <nadya_> yep, actually Vadim has started to fix alarm patch
15:19:27 <eglynn> nadya_: great! thanks
15:19:45 <nadya_> So I think that's all from my side on this topic
15:19:54 <nadya_> ur welcome :)
15:19:56 <jd__> great :)
15:19:58 <jd__> #topic Release python-ceilometerclient?
15:20:07 <eglynn> I need this patch to land asap: https://review.openstack.org/68637
15:20:20 <jd__> I'll review
15:20:24 <eglynn> a super-simple fix
15:20:35 <eglynn> but the alarm-threshold-create verb is pretty much borked without it
15:20:43 <eglynn> (legacy alarm-create verb works fine tho')
15:20:47 <ildikov_> eglynn: it looks good on the gate now
15:21:00 <eglynn> ildikov_: cool, thanks
15:21:26 <jd__> #topic Discussion of the resource loader support patch
15:21:27 <eglynn> once that's landed I'll cut a new client
15:21:36 <jd__> eglynn: ack :)
15:21:38 <jd__> #link http://lists.openstack.org/pipermail/openstack-dev/2014-January/024837.html
15:21:52 <jd__> I still failed to read that thread
15:22:06 <llu-laptop> jd__: what's your concern about the 'not generic enough'?
15:22:39 <jd__> my main concern is that we externalized resources in an external file with a different module for handling it
15:22:48 <jd__> whereas I see no reason to not have it into the pipeline definition
15:23:09 <jd__> and the problem solved like caching are not to be solved at that level
15:24:12 <lsmola_> jd__: hmm what about resourecs that we want to have automatically retrieved
15:24:29 <llu-laptop> i just want to give the admin a way to get resource endpoints without restarting the agent
15:24:35 <jd__> lsmola_: could you elaborate?
15:24:37 <lsmola_> jd__: for example for tripleo, we will ask nova to give us list of IPs
15:25:00 <lsmola_> jd__: should we implement that as part of inspector logic that can be turned on?
15:25:02 <llu-laptop> lsmola_: i think we can have a restful resource loader then
15:25:14 <jd__> llu-laptop: reloading the file is a different problem, don't try to solve two problems at the same time
15:25:19 <lsmola_> jd__: or should this reside somewhere aprat from ceilometer as plugin
15:25:30 <jd__> llu-laptop: if we want to have automatic reload of config file, we'll do that in a generic way for all files for example
15:26:09 <jd__> lsmola_: that would be a sort of extension for getting resources list, we don't have that yet, but we could build something around that ultimately
15:26:36 <lsmola_> jd__: ok, rbrady will probably do it once the patches are in
15:26:57 <lsmola_> jd__: I just wanted to make sure that it make sense to have it in the ceilometer codebase
15:27:04 <jd__> lsmola_: basically we already list resources when we poll for things like instances (we list instances)
15:27:16 <lsmola_> jd__: when you can use it only when you deploy openstack via tripleo
15:27:54 <jd__> llu-laptop: does that help you?
15:28:04 <llu-laptop> jd__: so your suggestion is to drop the resource loader idea, and leave it to the pollster or inspector themselves?
15:28:13 <lsmola_> jd__: ok then, we will send the patch, thank you
15:28:28 <jd__> llu-laptop: no, it have them part of the pipelines definition
15:29:00 <llu-laptop> jd__: add another pipeline definition 'resourceloader'?
15:29:52 <jd__> llu-laptop: "resources" for the resources list you want, and if you don't know the resource list, then it's up to the pollster to be able to build one I guess
15:30:23 <jd__> if there can be different type of resource list for a pollster, yeah having a resourceloader parameter for a pipeline would make sense
15:30:47 <jd__> so far we have an implicit resourceloader for all pollsters
15:31:20 <llu-laptop> jd__: ok this is just what I mean by saying 'leave it to the pollster'
15:31:35 <jd__> llu-laptop: understood :)
15:31:43 <eglynn> e.g. a resource loader for the compute agent that polls nova-api?
15:31:52 <jd__> llu-laptop: I'm ok with 'leave it to the pollster' if there is no corner case, at least for now maybe
15:32:00 <eglynn> (... to discover the local virts)
15:32:13 <jd__> eglynn: yeah, we have that already, but it's used implicitely anyway
15:32:23 <llu-laptop> eglynn: I think the compute agent pollsters already does that, don't they?
15:32:49 <jd__> I don't know if we need to make it explicit in the pipeline – I don't know if there are cases where it might be useful to be able to change it
15:32:50 <eglynn> yep, but just wondering for consistency would that be moved into the new resourceloader sbatraction?
15:32:58 <jd__> eglynn: it should, if we go that road
15:33:04 <eglynn> cool
15:33:24 <llu-laptop> so currently, we don't see any imediate needs for resource loader
15:33:25 <llu-laptop> ?
15:33:43 <jd__> llu-laptop: I don't but I am not the sacred holder of All The Use-Cases
15:33:44 <llu-laptop> s/imediate/immeiate/
15:33:53 <jd__> so if you see use-cases, go ahead
15:34:11 <dhellmann> I may be missing some context, but I think this is part of what the "cache" argument passed to the pollsters was supposed to help with.
15:34:21 <eglynn> on a side-note ... I was a little stumped also by the concept that the baremetal hosts had no user or project ID being metered
15:34:22 <jd__> OTOH let's not implement YACO (Yet Another Config Option) just for sake of having one
15:34:46 <eglynn> lsmola_: does tripleo and/or ironic surface the "identity" associated with baremetal hosts?
15:34:50 <jd__> dhellmann: the question is about what's put inside the cache and by who :)
15:34:51 <eglynn> (... even if it's always an administrative user/tenant)
15:34:52 <dhellmann> if there was a set of pollsters that needed some data (the resources), a base class could load them and store them in the cache on each iteration
15:34:59 <dhellmann> jd__: ah
15:35:18 <lsmola_> eglynn: I believe Ip adress is enough for SNMP, right?
15:35:22 <jd__> dhellmann: i.e. the list of resources is cached, but the question is how do you get that list of resources (that's why we talk about resourceloader)
15:35:23 <llu-laptop> eglynn: this is not for baremetal only
15:35:30 <lsmola_> eglynn: that should be stored in Undercloud nova
15:35:50 <dhellmann> jd__: ok, got it
15:35:52 <eglynn> llu-laptop: sure, but at least *some* of the hosts would generally be baremetals, right?
15:36:37 <eglynn> ... /me just thinking in terms of capturing "owner" identity where it makes sense
15:37:12 <llu-laptop> eglynn: yes. but from snmp point of view, there is no way to know the undercloud project-id, because it doesn't require the undercloud to make itself working
15:37:26 <eglynn> llu-laptop: a-ha, I see ...
15:37:37 * jd__ stares at 'undercloud'
15:37:38 <lsmola_> eglynn: not sure what is 'owner' of baremetal :-)
15:38:19 <eglynn> lsmola_: the user that "registered" the host, if such a concept even exists (?)
15:38:21 <lsmola_> eglynn: I believe we don't work with project/tenants in Undercloud
15:38:30 <eglynn> lsmola_: k
15:38:32 <lsmola_> eglynn: or User
15:38:37 <jd__> shall we move on gentlemen?
15:38:43 <eglynn> sure
15:38:44 <lsmola_> eglynn: it might appear some day
15:38:48 <jd__> we got an overcloud topic next
15:39:02 <jd__> #topic Handle Heat notifications: new meters?, targeted to I3?
15:39:13 <jd__> scroiset: around?
15:39:18 <scroiset> it's me, yeah
15:39:19 <jd__> #link https://blueprints.launchpad.net/ceilometer/+spec/handle-heat-notifications
15:39:32 <scroiset> I would like share and agreed with you the resulting meters we will generate with heat notifications
15:39:34 <jd__> please enlighten us about your evil plan
15:39:44 <scroiset> I propose those described in BP
15:40:17 <scroiset> It's to be able to bill on stack CRUD firstly
15:40:18 <jd__> what's in the whiteboard looks more like notification than samples
15:40:33 <jd__> but we can map to samples for sure
15:41:07 <scroiset> notifications are described here #link https://wiki.openstack.org/wiki/SystemUsageData#orchestration.stack..7Bcreate.2Cupdate.2Cdelete.2Csuspend.2Cresume.7D..7Bstart.2Cerror.2Cend.7D:
15:41:14 <eglynn> the autoscaling aspect struck me as being a bit circular
15:41:22 <scroiset> sample proposed differs from notificaiton
15:41:32 <jd__> eglynn: you want to autoscale on the autoscaling meters?
15:41:53 <jd__> also know as übercloud
15:41:54 <scroiset> jd__: no
15:42:19 <eglynn> well would the flow be something like ... ceilo compute stats -> fire auotscale alarm -> heat launches instance -> heat notification of scale-up -> more ceilo samples
15:42:20 <scroiset> 1/ I want to be able to bill on stack crud
15:42:38 <scroiset> 2/ I want to be notify when an autoscaling is done
15:42:58 <jd__> scroiset: you may want to bill on numbers of stack too, I think this one's missing
15:43:00 <scroiset> the bp is for 1/
15:43:22 <jd__> otherwise I see no objection
15:43:28 <eglynn> ... yeah now that I wrote a flow down, it doesn't seem that unnatural
15:43:34 <scroiset> jd__: yes, I can do it by counting the stack.create sample
15:43:51 <llu-laptop> eglynn: I don't  see why the last 'more ceilo samples' would definitely trigger another 'fire autoscale alarm'
15:44:10 <eglynn> llu-laptop: yep, it wouldn't
15:44:29 <scroiset> llu-laptop: no it wouldn't indeed
15:44:33 <eglynn> (different meter in general of course)
15:45:01 <eglynn> ... just me thinking aloud, ignore & carry on :)
15:45:53 <scroiset> ... so. for new meter/samples, do you see the need ?
15:46:56 <scroiset> .. billing purpose only.
15:47:15 <scroiset> my point 2/ is another BP #https://blueprints.launchpad.net/ceilometer/+spec/alarm-on-notification
15:48:51 <scroiset> I'm feeling alone here, I'm surely not clear... am not I ?
15:49:05 <nadya_> we are here :)
15:49:07 <eglynn> well that other BP is intended to allow alarming on notifications (as opposed to allowing an operator to bill on the number of notifications over a period)
15:49:20 <tongli> @scroiset, I am working on that now.
15:49:37 <scroiset> tongli: I saw you'r the owner
15:49:40 <tongli> @scroiset, not exactly sure what your concern is.
15:50:01 <tongli> @scroiset, planning to submit the patch later today or tomorrow.
15:50:04 <jd__> re
15:50:05 <eglynn> (presuming that you want to be able to bill on the number of autoscale events say, not to generate an alarm on a single autoscale notification being received)
15:50:05 <jd__> sorry I got kicked out by a local BOFH
15:50:18 <scroiset> tongli: I would like create an alarm on the event orhestration.autoscaling.end to be alerted when occurs
15:50:38 <jd__> yeah tongli is working on that
15:50:42 <scroiset> tongli: cool
15:50:50 <jd__> let's circle back to that at the end if we have time
15:50:58 <jd__> I think scroiset concern are good now
15:51:08 <tongli> @scroiset, yeah, you will be able to do that when the patch gets merged,I've been working with jd__ and eglynn on it.
15:51:08 <jd__> #topic Should I proceed with aggregation?
15:51:18 <jd__> #link https://blueprints.launchpad.net/ceilometer/+spec/aggregation-and-rolling-up
15:51:27 <eglynn> I left some comments on https://etherpad.openstack.org/p/ceilometer-aggregation
15:51:36 <nadya_> So guys, I created a bp and have started implementation
15:51:44 <jd__> I'm not really opiniated yet on that one
15:51:46 <eglynn> IIUC only stats queries that have periods that actually line up with wall-clock boundaries will benefit from the pre-aggregation
15:51:59 <eglynn> nadya_ is that correct? ... or a gross simplification?
15:52:42 <eglynn> (... in practice, I'm not sure these wallclock-clamped queries will be the majority)
15:53:03 <jaypipes> eglynn: as opposed to what exactly?
15:53:04 <eglynn> as I think alarming, charting applications etc, would tend to use NOW as their baseline for queries
15:53:08 <nadya_> there should be a mechanism of merging old-data and online
15:53:13 <eglynn> ... not NOW-(minutes past the hour)
15:53:29 * dhellmann apologizes for having to leave early
15:53:55 <eglynn> jaypipes: say stats query with period one hour, but with start and end timestamps decoupled from an hour boundary
15:54:07 <nadya_> NOW is not a problem. You may use cache for 10 hours before NOW and get other data from db directly
15:54:09 <eglynn> jaypipes: ... I put a worked example in nadya_'s etherpad
15:54:15 <jaypipes> eglynn: ah, yes, agree completely.
15:54:41 <jaypipes> eglynn: I'm still not sold on the idea that the aggregate table has value over just a simple caching layer for the statistics table.
15:55:50 <eglynn> nadya_: but if the query periods are 10:07-11:06,11:07-12:06,... and the cache periods are 10:00-10:59,11:00-11:59,... then its a total cache-miss, or?
15:56:32 <nadya_> eglynn, we just do not have cache for this
15:57:21 <nadya_> eglynn, if I create half-hour aggregates it will work as well
15:57:43 <nadya_> it may be configurable. I think an hour is ok for now
15:57:43 <eglynn> nadya_: yep, so I'm wondering if such queries are in majority, would the cache give that much benefit
15:58:30 <nadya_> eglynn, hour-cache is for long-queries by definition
15:58:31 <jaypipes> eglynn: the only time I can see those queries being in the majority is in a graphical user interface that shows a graph of meters on hourly intervals...
15:59:24 <jaypipes> eglynn: but I'm not sold that such a use case is not better implemented as a simple memcache caching layer that saves the results of a SQL query against the main meter table...
15:59:29 <lsmola_> jaypipes: something like that is being implemented in Horizon
15:59:59 <eglynn> lsmola_: the current horizon metering dashboard doesn't use time-clamped queries, or?
16:00:09 * jd__ is not that sold too on caching
16:00:11 <jaypipes> lsmola_: sure, understood. but the architecture/design of a backend server should never be dictated by the needs of a front-end UI.
16:00:44 <lsmola_> eglynn: you mean with use of period parameter?
16:01:43 <eglynn> lsmola_: I mean [(start, start+period), (start+period, start+2*period), ..., (end-period, end)]
16:01:46 <jd__> we need to wrap up now guys
16:01:54 <jd__> it may be better to continue this on the list as needed
16:01:57 <nadya_> I'm afraid we're out of time. To summ up, this functionality is to make long-time queries faster
16:02:21 <eglynn> lsmola_: where start % 1hour != 0 even if period == 1hour
16:02:22 <lsmola_> eglynn: well that is shown in the timeseries line chart
16:02:28 <cody-somerville> :)
16:02:30 <jaypipes> continue discussion in #openstack-ceilometer?
16:02:38 <eglynn> jaypipes: sure
16:02:42 <jd__> #endmeeting