15:03:44 #startmeeting ceilometer 15:03:45 Meeting started Thu Jan 23 15:03:44 2014 UTC and is due to finish in 60 minutes. The chair is jd__. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:46 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:03:48 The meeting name has been set to 'ceilometer' 15:04:08 o/ 15:04:10 o/ 15:04:11 o/ 15:04:13 o/ 15:04:39 o/ 15:04:40 o/ 15:04:45 o/ 15:05:27 #topic Milestone status icehouse-2 / icehouse-3 15:05:41 so icehouse-2 is being released right now 15:05:47 a tad disappointing how things panned out overall with icehouse-2 15:05:52 o/ 15:05:53 * jd__ nods 15:06:01 ... i.e. chronic problems in the gate caused stuff that was ready to go to be bumped to i-3 :( 15:06:18 ... /me would have preferred if the milestone was delayed by a few days to allow the verification queue to drain 15:06:25 my fix became merged after 2 hours after releasing i-2 :( 15:06:42 yep, I had similar with a BP 15:06:56 :( 15:07:08 my fix also failed on the gate... 15:07:27 so it is at the end of the waiting list right now 15:07:41 upside: it gives us a head-start on our icehouse-3 slate 15:07:45 o/ 15:07:47 we also need to triage icehouse-3 15:08:05 note we've only 5 full person-weeks of dev time remaining before the freeze for icehouse-3 15:08:13 #link https://wiki.openstack.org/wiki/Icehouse_Release_Schedule 15:08:25 ... gotta put my hand up, I'm usually one of the worst offenders for landing stuff late in the cycle 15:08:39 ... however for i-3 lets try to aim for a more even flow-rate on BP patches 15:08:56 (... as there's gonna be an even bigger rush on the gate for i-3) 15:08:59 if you have blueprint for icehouse-3, please update 15:09:19 (... given the amount of stuff that was bumped off i-2, plus the natural backloading onto the 3rd milestone that always happens in every OS release) 15:09:39 jd__: I have a bp for i-3, which depends on the complex query one 15:10:10 hm 15:10:48 jd__: it would be good to get the complex query discussion fixed in the early phase of i-3 as the feature is ready to fly already 15:11:36 we'll try to do that indeed 15:11:44 we need to hurry up right now :) a lot of things to discuss 15:11:59 jd__: tnx 15:12:39 over the final 2-3 weeks of the icehouse-3 leadin, one possible strategy is to prioritize BPs over bugfixes 15:12:58 (... much easier to get a bugfix landed post i-3 than to beg a FFE on an unfinished BP) 15:13:10 ... justsayin 15:13:19 eglynn: +1 15:14:15 eglynn: prioritizing reviews on anything associated with a bp or bug report between now and then seems like a good idea, too -- encourage people to help with project tracking by filing the right "paperwork" 15:14:21 I think we'll rediscuss this next week too anyway 15:14:25 just update your bp in the meantime 15:14:31 dhellmann: +1 15:14:35 dhellmann: absolutely 15:14:49 moving on as we have a lot of topic 15:14:50 #topic Tempest integration 15:14:57 nadya_: a word on that? 15:15:05 yep! 15:15:07 Our client is apporoved! Congrats :) 15:15:31 But we still facing with problems. There is a bug in devstack (and tempest) about processing the last '/' in the url. #link https://bugs.launchpad.net/tempest/+bug/1271556 15:15:34 clap clap clap 15:15:39 \o/ 15:15:47 nice work! 15:15:57 nadya_: +1 15:16:01 because of this we have -1 from Jenkins here #link https://review.openstack.org/#/c/67164/ 15:16:30 but we are working on fix 15:16:40 great 15:16:42 Besides, we have two nova-related patches: #link https://review.openstack.org/#/c/46744/ and #link https://review.openstack.org/#/c/64136/ . yassine, do you have any news:)? 15:17:01 if you are here... 15:17:21 We are still working on pollsters testing. Tempest part is started but we need #link https://review.openstack.org/#/c/66551/ for testing. JFYI 15:18:06 and the last point is alarm testing. It is lost #link https://review.openstack.org/#/c/39237/ . Nick, nsaje, are you here? 15:18:37 I'd suggest to steal the patch if it's not restored 15:18:48 +1 15:19:11 otherwise that looks really good, thanks for taking care nadya_ 15:19:14 yep, actually Vadim has started to fix alarm patch 15:19:27 nadya_: great! thanks 15:19:45 So I think that's all from my side on this topic 15:19:54 ur welcome :) 15:19:56 great :) 15:19:58 #topic Release python-ceilometerclient? 15:20:07 I need this patch to land asap: https://review.openstack.org/68637 15:20:20 I'll review 15:20:24 a super-simple fix 15:20:35 but the alarm-threshold-create verb is pretty much borked without it 15:20:43 (legacy alarm-create verb works fine tho') 15:20:47 eglynn: it looks good on the gate now 15:21:00 ildikov_: cool, thanks 15:21:26 #topic Discussion of the resource loader support patch 15:21:27 once that's landed I'll cut a new client 15:21:36 eglynn: ack :) 15:21:38 #link http://lists.openstack.org/pipermail/openstack-dev/2014-January/024837.html 15:21:52 I still failed to read that thread 15:22:06 jd__: what's your concern about the 'not generic enough'? 15:22:39 my main concern is that we externalized resources in an external file with a different module for handling it 15:22:48 whereas I see no reason to not have it into the pipeline definition 15:23:09 and the problem solved like caching are not to be solved at that level 15:24:12 jd__: hmm what about resourecs that we want to have automatically retrieved 15:24:29 i just want to give the admin a way to get resource endpoints without restarting the agent 15:24:35 lsmola_: could you elaborate? 15:24:37 jd__: for example for tripleo, we will ask nova to give us list of IPs 15:25:00 jd__: should we implement that as part of inspector logic that can be turned on? 15:25:02 lsmola_: i think we can have a restful resource loader then 15:25:14 llu-laptop: reloading the file is a different problem, don't try to solve two problems at the same time 15:25:19 jd__: or should this reside somewhere aprat from ceilometer as plugin 15:25:30 llu-laptop: if we want to have automatic reload of config file, we'll do that in a generic way for all files for example 15:26:09 lsmola_: that would be a sort of extension for getting resources list, we don't have that yet, but we could build something around that ultimately 15:26:36 jd__: ok, rbrady will probably do it once the patches are in 15:26:57 jd__: I just wanted to make sure that it make sense to have it in the ceilometer codebase 15:27:04 lsmola_: basically we already list resources when we poll for things like instances (we list instances) 15:27:16 jd__: when you can use it only when you deploy openstack via tripleo 15:27:54 llu-laptop: does that help you? 15:28:04 jd__: so your suggestion is to drop the resource loader idea, and leave it to the pollster or inspector themselves? 15:28:13 jd__: ok then, we will send the patch, thank you 15:28:28 llu-laptop: no, it have them part of the pipelines definition 15:29:00 jd__: add another pipeline definition 'resourceloader'? 15:29:52 llu-laptop: "resources" for the resources list you want, and if you don't know the resource list, then it's up to the pollster to be able to build one I guess 15:30:23 if there can be different type of resource list for a pollster, yeah having a resourceloader parameter for a pipeline would make sense 15:30:47 so far we have an implicit resourceloader for all pollsters 15:31:20 jd__: ok this is just what I mean by saying 'leave it to the pollster' 15:31:35 llu-laptop: understood :) 15:31:43 e.g. a resource loader for the compute agent that polls nova-api? 15:31:52 llu-laptop: I'm ok with 'leave it to the pollster' if there is no corner case, at least for now maybe 15:32:00 (... to discover the local virts) 15:32:13 eglynn: yeah, we have that already, but it's used implicitely anyway 15:32:23 eglynn: I think the compute agent pollsters already does that, don't they? 15:32:49 I don't know if we need to make it explicit in the pipeline – I don't know if there are cases where it might be useful to be able to change it 15:32:50 yep, but just wondering for consistency would that be moved into the new resourceloader sbatraction? 15:32:58 eglynn: it should, if we go that road 15:33:04 cool 15:33:24 so currently, we don't see any imediate needs for resource loader 15:33:25 ? 15:33:43 llu-laptop: I don't but I am not the sacred holder of All The Use-Cases 15:33:44 s/imediate/immeiate/ 15:33:53 so if you see use-cases, go ahead 15:34:11 I may be missing some context, but I think this is part of what the "cache" argument passed to the pollsters was supposed to help with. 15:34:21 on a side-note ... I was a little stumped also by the concept that the baremetal hosts had no user or project ID being metered 15:34:22 OTOH let's not implement YACO (Yet Another Config Option) just for sake of having one 15:34:46 lsmola_: does tripleo and/or ironic surface the "identity" associated with baremetal hosts? 15:34:50 dhellmann: the question is about what's put inside the cache and by who :) 15:34:51 (... even if it's always an administrative user/tenant) 15:34:52 if there was a set of pollsters that needed some data (the resources), a base class could load them and store them in the cache on each iteration 15:34:59 jd__: ah 15:35:18 eglynn: I believe Ip adress is enough for SNMP, right? 15:35:22 dhellmann: i.e. the list of resources is cached, but the question is how do you get that list of resources (that's why we talk about resourceloader) 15:35:23 eglynn: this is not for baremetal only 15:35:30 eglynn: that should be stored in Undercloud nova 15:35:50 jd__: ok, got it 15:35:52 llu-laptop: sure, but at least *some* of the hosts would generally be baremetals, right? 15:36:37 ... /me just thinking in terms of capturing "owner" identity where it makes sense 15:37:12 eglynn: yes. but from snmp point of view, there is no way to know the undercloud project-id, because it doesn't require the undercloud to make itself working 15:37:26 llu-laptop: a-ha, I see ... 15:37:37 * jd__ stares at 'undercloud' 15:37:38 eglynn: not sure what is 'owner' of baremetal :-) 15:38:19 lsmola_: the user that "registered" the host, if such a concept even exists (?) 15:38:21 eglynn: I believe we don't work with project/tenants in Undercloud 15:38:30 lsmola_: k 15:38:32 eglynn: or User 15:38:37 shall we move on gentlemen? 15:38:43 sure 15:38:44 eglynn: it might appear some day 15:38:48 we got an overcloud topic next 15:39:02 #topic Handle Heat notifications: new meters?, targeted to I3? 15:39:13 scroiset: around? 15:39:18 it's me, yeah 15:39:19 #link https://blueprints.launchpad.net/ceilometer/+spec/handle-heat-notifications 15:39:32 I would like share and agreed with you the resulting meters we will generate with heat notifications 15:39:34 please enlighten us about your evil plan 15:39:44 I propose those described in BP 15:40:17 It's to be able to bill on stack CRUD firstly 15:40:18 what's in the whiteboard looks more like notification than samples 15:40:33 but we can map to samples for sure 15:41:07 notifications are described here #link https://wiki.openstack.org/wiki/SystemUsageData#orchestration.stack..7Bcreate.2Cupdate.2Cdelete.2Csuspend.2Cresume.7D..7Bstart.2Cerror.2Cend.7D: 15:41:14 the autoscaling aspect struck me as being a bit circular 15:41:22 sample proposed differs from notificaiton 15:41:32 eglynn: you want to autoscale on the autoscaling meters? 15:41:53 also know as übercloud 15:41:54 jd__: no 15:42:19 well would the flow be something like ... ceilo compute stats -> fire auotscale alarm -> heat launches instance -> heat notification of scale-up -> more ceilo samples 15:42:20 1/ I want to be able to bill on stack crud 15:42:38 2/ I want to be notify when an autoscaling is done 15:42:58 scroiset: you may want to bill on numbers of stack too, I think this one's missing 15:43:00 the bp is for 1/ 15:43:22 otherwise I see no objection 15:43:28 ... yeah now that I wrote a flow down, it doesn't seem that unnatural 15:43:34 jd__: yes, I can do it by counting the stack.create sample 15:43:51 eglynn: I don't see why the last 'more ceilo samples' would definitely trigger another 'fire autoscale alarm' 15:44:10 llu-laptop: yep, it wouldn't 15:44:29 llu-laptop: no it wouldn't indeed 15:44:33 (different meter in general of course) 15:45:01 ... just me thinking aloud, ignore & carry on :) 15:45:53 ... so. for new meter/samples, do you see the need ? 15:46:56 .. billing purpose only. 15:47:15 my point 2/ is another BP #https://blueprints.launchpad.net/ceilometer/+spec/alarm-on-notification 15:48:51 I'm feeling alone here, I'm surely not clear... am not I ? 15:49:05 we are here :) 15:49:07 well that other BP is intended to allow alarming on notifications (as opposed to allowing an operator to bill on the number of notifications over a period) 15:49:20 @scroiset, I am working on that now. 15:49:37 tongli: I saw you'r the owner 15:49:40 @scroiset, not exactly sure what your concern is. 15:50:01 @scroiset, planning to submit the patch later today or tomorrow. 15:50:04 re 15:50:05 (presuming that you want to be able to bill on the number of autoscale events say, not to generate an alarm on a single autoscale notification being received) 15:50:05 sorry I got kicked out by a local BOFH 15:50:18 tongli: I would like create an alarm on the event orhestration.autoscaling.end to be alerted when occurs 15:50:38 yeah tongli is working on that 15:50:42 tongli: cool 15:50:50 let's circle back to that at the end if we have time 15:50:58 I think scroiset concern are good now 15:51:08 @scroiset, yeah, you will be able to do that when the patch gets merged,I've been working with jd__ and eglynn on it. 15:51:08 #topic Should I proceed with aggregation? 15:51:18 #link https://blueprints.launchpad.net/ceilometer/+spec/aggregation-and-rolling-up 15:51:27 I left some comments on https://etherpad.openstack.org/p/ceilometer-aggregation 15:51:36 So guys, I created a bp and have started implementation 15:51:44 I'm not really opiniated yet on that one 15:51:46 IIUC only stats queries that have periods that actually line up with wall-clock boundaries will benefit from the pre-aggregation 15:51:59 nadya_ is that correct? ... or a gross simplification? 15:52:42 (... in practice, I'm not sure these wallclock-clamped queries will be the majority) 15:53:03 eglynn: as opposed to what exactly? 15:53:04 as I think alarming, charting applications etc, would tend to use NOW as their baseline for queries 15:53:08 there should be a mechanism of merging old-data and online 15:53:13 ... not NOW-(minutes past the hour) 15:53:29 * dhellmann apologizes for having to leave early 15:53:55 jaypipes: say stats query with period one hour, but with start and end timestamps decoupled from an hour boundary 15:54:07 NOW is not a problem. You may use cache for 10 hours before NOW and get other data from db directly 15:54:09 jaypipes: ... I put a worked example in nadya_'s etherpad 15:54:15 eglynn: ah, yes, agree completely. 15:54:41 eglynn: I'm still not sold on the idea that the aggregate table has value over just a simple caching layer for the statistics table. 15:55:50 nadya_: but if the query periods are 10:07-11:06,11:07-12:06,... and the cache periods are 10:00-10:59,11:00-11:59,... then its a total cache-miss, or? 15:56:32 eglynn, we just do not have cache for this 15:57:21 eglynn, if I create half-hour aggregates it will work as well 15:57:43 it may be configurable. I think an hour is ok for now 15:57:43 nadya_: yep, so I'm wondering if such queries are in majority, would the cache give that much benefit 15:58:30 eglynn, hour-cache is for long-queries by definition 15:58:31 eglynn: the only time I can see those queries being in the majority is in a graphical user interface that shows a graph of meters on hourly intervals... 15:59:24 eglynn: but I'm not sold that such a use case is not better implemented as a simple memcache caching layer that saves the results of a SQL query against the main meter table... 15:59:29 jaypipes: something like that is being implemented in Horizon 15:59:59 lsmola_: the current horizon metering dashboard doesn't use time-clamped queries, or? 16:00:09 * jd__ is not that sold too on caching 16:00:11 lsmola_: sure, understood. but the architecture/design of a backend server should never be dictated by the needs of a front-end UI. 16:00:44 eglynn: you mean with use of period parameter? 16:01:43 lsmola_: I mean [(start, start+period), (start+period, start+2*period), ..., (end-period, end)] 16:01:46 we need to wrap up now guys 16:01:54 it may be better to continue this on the list as needed 16:01:57 I'm afraid we're out of time. To summ up, this functionality is to make long-time queries faster 16:02:21 lsmola_: where start % 1hour != 0 even if period == 1hour 16:02:22 eglynn: well that is shown in the timeseries line chart 16:02:28 :) 16:02:30 continue discussion in #openstack-ceilometer? 16:02:38 jaypipes: sure 16:02:42 #endmeeting