#openstack-meeting log

15:00:18 <jd__> #startmeeting ceilometer
15:00:19 <openstack> Meeting started Thu Feb  6 15:00:18 2014 UTC and is due to finish in 60 minutes.  The chair is jd__. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:22 <openstack> The meeting name has been set to 'ceilometer'
15:00:40 <eglynn> o/
15:00:41 <jd__> #link https://wiki.openstack.org/wiki/Meetings/Ceilometer
15:00:42 <tongli> hi, @jd__
15:00:44 <tongli> o/
15:00:47 <jd__> hi everyone
15:00:50 <gordc> o/
15:00:53 <ildikov_> o/
15:02:04 <nprivalova> o/
15:02:24 <jd__> #topic Milestone status icehouse-3
15:02:39 <jd__> #link https://launchpad.net/ceilometer/+milestone/icehouse-3
15:02:53 <jd__> so a lot of things are started, but it'd be great to finish ASAP
15:02:57 <ildikov_> we still need approval for this patch: https://review.openstack.org/#/c/62157/
15:03:03 <jd__> otherwise we'll be caught in the gate storm
15:03:33 <jd__> ildikov_: yeah I'll try to take a look at it
15:03:40 <gordc> ildikov_: i may have time to review tomorrow as well.
15:03:50 <ildikov_> jd__: thanks
15:04:13 <jd__> otherwise not much to add on my part yet
15:04:35 <ildikov_> thanks guys, it would be really good, if we could go on wih the statistics bp and also have the patch sets of the complex query landed in i-3
15:04:38 <jd__> anything else about one of your blueprint?
15:04:59 <nprivalova> I'm still confused about aggregation
15:05:17 <nprivalova> not sure should I continue or not
15:05:52 <jd__> nprivalova: do you have a requirement on that?
15:05:52 <eglynn> nprivalova: did we come to any conclusion on the overlapping periods issue I raised?
15:05:52 <sileht> o/
15:05:59 <ityaptin> o/
15:06:30 <eglynn> nprivalova: ... i.e. the question of whether aggregation can be helpful in the common case of periods that overlap
15:06:31 * jd__ dodges the issue
15:06:46 <nprivalova> eglynn: we agreed that it is not for alarming
15:07:17 <nprivalova> #link https://blueprints.launchpad.net/ceilometer/+spec/base-aggregation
15:07:45 <eglynn> nprivalova: k, then the question really is the potential benefit for the other common cases of recurring statistics queries
15:08:10 <eglynn> nprivalova: ... if we can detect when the same query constraints recurr
15:08:13 <nprivalova> yep, I agree. I saw a comment about billing use case
15:08:29 <eglynn> nprivalova: ... and match the actual query constraints to the pre-aggregated values
15:08:49 <nprivalova> anyway, I think we may continue with meeting :)
15:08:53 <jd__> ok
15:09:00 <jd__> #topic Tempest integration
15:09:06 <jd__> wassup on that?
15:09:23 <nprivalova> we have the following
15:09:25 <nprivalova> https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:bp/add-basic-ceilometer-tests,n,z
15:09:46 <nprivalova> so notifications part is done
15:09:52 <nprivalova> but we have a bug :)
15:10:29 <nprivalova> #link https://bugs.launchpad.net/ceilometer/+bug/1274607
15:10:31 <uvirtbot> Launchpad bug 1274607 in ceilometer "ceilometer-agent-notification is broken without eventlet monkey patching" [Critical,In progress]
15:11:01 <nprivalova> yep, so that's why we have only -1 from Jenkins
15:11:16 <jd__> fair enough, that one should be resolved soon fortunately
15:11:18 <nprivalova> I'm testing the fix
15:11:48 <jd__> #topic Release python-ceilometerclient?
15:11:57 <eglynn> no need this AFAIK
15:12:07 <eglynn> *for this
15:12:12 <jd__> ok :)
15:12:15 <jd__> #topic Polling-on-demand discussion (ityaptin)
15:12:28 <jd__> ityaptin: enlighten us
15:12:46 <ityaptin> about pollsters on demand. Use cases of this feature are tests and debug.
15:13:06 <nprivalova> #link https://review.openstack.org/#/c/66551/
15:13:12 <jd__> (nprivalova: the fix works if you have https://review.openstack.org/#/c/71124/)
15:13:34 <eglynn> so the purpose of this is to trigger polling for tests ... could the same be acheived by simply configuring the test with a v. short pipeline interval?
15:13:37 * dhellmann apologizes for being late
15:14:13 <gordc> https://blueprints.launchpad.net/ceilometer/+spec/run-all-pollsters-on-demand
15:14:29 <ityaptin> And exists proposal to turn on this feature only with flag 'debug', because somebody can DoS ceilometer with starting pollstering.
15:14:31 <jd__> dhellmann: you're… not fired!
15:14:37 * dhellmann whew!
15:14:50 <eglynn> i.e. the test needs to precipitate events that happen relatively infrequently (i.e. polling cycles with the boilerplate pipeline.yaml)
15:15:10 <eglynn> ... so one approach would be simply to make these events more frequent in the test scenario
15:15:11 <gordc> ityaptin: how does the flag get set? the DoS issue was a concern when i read bp
15:15:12 <sileht> fyi: I have added this to devstack: CEILOMETER_PIPELINE_INTERVAL=10
15:15:22 <jd__> the problem is that polling != having sample anyway, there's no guarantee that samples are going to be available N seconds after being polled
15:15:27 <dhellmann> is this for tempest tests or unit tests?
15:15:28 <jd__> nothing's synchronous
15:15:31 <jd__> dhellmann: tempest
15:15:36 <tongli> @eglynn, that still won't be the same, I would think.
15:15:40 <sileht> perhaps we can just set a different value for devstack-gate
15:15:57 <jd__> DoS concern? I doubt that, it's a feature available on RPC
15:16:07 <jd__> sure the admin can DoS himself, but well.. he's admin
15:16:11 <eglynn> tongli: not exactly equivalent, but perhaps a close enough analogue?
15:16:31 <nprivalova> I think it's not only for tempest. When I install devstack it is useful just to check that pollsters work ok, without waiting interval
15:16:47 <jd__> nprivalova: agreed
15:16:50 <ityaptin> gordc: For example -  debug option
15:16:56 <eglynn> nprivalova: ... but the test has to wait anyway for some "ingestion" lag
15:17:05 <tongli> @eglynn, I think it will be nice to hit an enter key, then expect the code hit the break point.
15:17:38 <sileht> eglynn, agree, for example the swift accounts size is done by a async swift task
15:17:48 <tongli> @eglynn, @ityaptin, or you use the new notification alarm.
15:18:01 <sileht> eglynn, so you have to wait swift have updated the value because ceilometer poll it
15:18:12 <sileht> because/before
15:18:15 <tongli> which will simply trigger it as soon as a notification is present on the bus.
15:18:16 * jd__ has no problem with that feature
15:18:30 <dhellmann> if we're going to have a special test mode, it seems like it makes the most sense to make that a separate executable that runs the polling one time and exits
15:18:41 <nprivalova> actually the question was about default value for debug flag :)
15:18:42 <dhellmann> rather than adding a test mode to the main service
15:19:03 <gordc> dhellmann: agreed
15:19:05 <eglynn> dhellmann: ... that sounds reasonable to me
15:19:05 <jd__> dhellmann: if that would be synchronous, that'd be better
15:19:09 <ityaptin> tongli: If we want to test pollsters it does not suitable
15:19:23 <jd__> dhellmann FTW
15:19:24 <tongli> @ityaptin, true.
15:19:29 <dhellmann> jd__: yeah, just refactor the code that runs the polling pipelines so it can be called from a console script
15:19:45 <jd__> dhellmann: I vote for that definitely, because that would be much better for Tempest
15:19:47 <dhellmann> that wouldn't do anything for testing the collector
15:20:04 <dhellmann> do we need some way to have the collector notify tests when data is available?
15:20:15 <jd__> what I don't know is if it's reasonable to use that trick in tempest?
15:20:28 <jd__> dhellmann: that'd be great
15:20:41 <dhellmann> jd__: good point, we wouldn't really be testing the polling service
15:20:48 <dhellmann> but we could have separate tests for that
15:21:18 <dhellmann> if the point is to show that the service works and the pollsters work, do they have to run in the same test to know they work?
15:21:22 <jd__> or we can also use the API method as in the current patch if it's synchronous, i.e. the GET /pollsters returns only when all pollsters are run
15:21:24 <nprivalova> we may set configs only in devstack
15:21:37 <nprivalova> there is o way to hack smth in tempest
15:21:48 <jd__> having a callback on the collector is another issue, I don't have a solution yet but we can think about something else later I gues
15:21:49 <eglynn> in general I wonder how does tempest handle asserting other asynchronous tasks have completed?
15:21:50 <dhellmann> nprivalova: ah, so we have to set devstack to configure ceilometer for tempest?
15:21:57 <eglynn> such as spinning up an instance
15:22:04 <eglynn> or a volume becoming available?
15:22:07 <nprivalova> dhellmann: AFAIK, yes
15:22:09 <dhellmann> eglynn: I see a lot of polling for status and timing out in the errors in the recheck list
15:22:16 <dhellmann> nprivalova: ok
15:22:18 <jd__> eglynn: by waiting and timing out, which has the potential to make Ceilometer the new Neutron :/
15:23:08 <eglynn> yeah I guess
15:23:22 <nprivalova> maybe we should move it to mailing list?
15:23:32 <dhellmann> are we emitting any sort of notification of having received data?
15:23:38 <eglynn> ... /me is made a bit nervous by making big changes to the ceilo execution path for testing
15:23:43 <dhellmann> could we write the test to watch the log for a specific message, or to listen for a notification?
15:23:52 <dhellmann> eglynn: yeah
15:24:01 <eglynn> ... in the sense that we end up testing something other than what actually runs in prod
15:24:02 <jd__> sending notification when we receive notifications?
15:24:28 <dhellmann> jd__: otherwise I guess the test would call the api over and over until it got the data it wanted?
15:24:32 <jd__> Ceilometer inception
15:24:41 <nprivalova> oh no :)
15:24:53 <jd__> dhellmann: yeah… polling and timing out :(
15:24:53 <eglynn> yeah so in prod its not the extra notification being emitted that has value, it's these data being visible in the API
15:25:29 <dhellmann> eglynn: sure, I'm just trying to figure out how to write the test with the least polling
15:25:36 <dhellmann> maybe polling is the best thing we can do
15:25:41 <eglynn> ... I dunno, suppose we did something funky with mongo replication
15:25:44 <jd__> I think so for now
15:25:54 <dhellmann> polling would certainly be simplest
15:25:55 <eglynn> ... and the data stopped being visible from a secondary replica
15:26:01 <eglynn> ... but the tests still pass
15:26:03 <jd__> now the question is, is it acceptable to have a different path for polling (a request to the API) rather than the regular timer in term of testing
15:26:17 <dhellmann> eglynn: but our tests aren't for mongo, they're for our code
15:26:24 <nprivalova> notifications is another question. Now we are speaking only about polling
15:26:51 <eglynn> dhellmann: I thinking of our mongo storage driver doing some replication aware logic that has the potential to be broken
15:26:53 <dhellmann> nprivalova: what I was hinting at was having ceilometer send a notification that the test could listen for to know when data had arrived, instead of polling the API
15:27:14 <dhellmann> eglynn: if we have to put replication logic in our driver, then we'd have to test for it -- we don't have anything like that now, right?
15:27:49 <eglynn> dhellmann: nope, we don't ... that was just off the cuff example of something that could break
15:27:56 <dhellmann> eglynn: ok
15:28:01 <jd__> I think this is going too far?
15:28:14 <tongli> @dhellmann, I am working the notification alarm, if that is what you asked.
15:28:24 <jd__> I think my previous question is a good one, can I haz a cheese^W^Wyour opinion?
15:28:27 <eglynn> dhellmann: and might not be caught by a test that just asserted for a special notification that the collector had seen the incoming metering message
15:28:45 <tongli> @dhellmann, when a notification appears, you can make something happen,
15:28:45 <dhellmann> jd__: ok, I think we're talking about 2 different things
15:28:56 <dhellmann> tongli: good point, just a sec
15:28:56 <eglynn> s/collector/notification agent/
15:29:09 <dhellmann> jd__: I was talking about how the test would know when ceilometer's collector had received data
15:29:19 <nprivalova> let us write to mailing list again because honestly I don't see any solution now
15:29:48 <dhellmann> nprivalova: good idea
15:29:51 <jd__> dhellmann: I know, but that's a different topic that the one we're discussing
15:29:59 <dhellmann> sorry, I thought we had moved on
15:30:02 <jd__> dhellmann: so I'd like to have an answer on the first point, first :)
15:30:16 * dhellmann wonders when jd__ became such a stickler ;-)
15:30:17 <jd__> which is having different path used to poll the data
15:30:20 <nprivalova> and please take a look into notification tests in tempest, because we need to be sure that tests are correct
15:30:26 <jd__> lol
15:30:35 <dhellmann> I think it's a mistake to build something in for testing that is too different from something that would be useful in production
15:30:49 <dhellmann> we have a periodic polling loop, so we need a test that shows that we poll periodically
15:30:49 <eglynn> dhellmann: +1
15:30:57 <jd__> agreed
15:31:02 <dhellmann> if we have an API to trigger polling, then we need a *separate* test to show that the api triggers polling
15:31:16 * jd__ hits the channel with his maillet
15:31:19 <dhellmann> so we might as well just test for the code we have now, since we can't avoid it
15:31:51 <dhellmann> if, as nprivalova says, we have to use the devstack configuration, then we will need to adjust the polling interval there to something relatively small and use that for the test
15:32:06 * jd__ nods
15:32:12 <eglynn> yep ... my suggestion exactly
15:32:17 <jd__> as far as the notification of notification received is concern, I think it's something we should think about
15:32:19 <dhellmann> alternately, if we could have the test adjust that interval -- maybe by starting a second copy of the service? -- then we could do all of this in tempest
15:32:25 <jd__> but probably not here and now :)
15:33:15 <dhellmann> jd__: for notification of notifications, we might be able to use the alarm trigger feature, but that is using some production code to test other production code
15:33:25 <jd__> indeed
15:33:28 <dhellmann> so it might be better conceptually to just have the test poll the API looking for the data
15:33:47 <eglynn> as long as the polling is "smart" enough, it that approach really that bad?
15:33:50 <jd__> that would be good enough for now anyway
15:33:54 <dhellmann> which is less elegant, in some sense, but more "correct" from a testing standpoint
15:33:57 <jd__> eglynn: we'll see?
15:34:04 <dhellmann> eglynn: nah, it just feels a little heavy-handed
15:34:28 <eglynn> by "smart" I mean say using a reasonably adaptive/backed-off intra-poll delay
15:34:43 <jd__> it's tempest, you can hammer the API
15:34:43 <dhellmann> eglynn: right
15:34:47 <dhellmann> haha
15:34:56 <jd__> "adaptive", tsss :)
15:35:01 <eglynn> LOL :)
15:35:04 <jd__> GIVE ME THE DAMN DATA YOU API
15:35:12 <jd__> that's how we should do it
15:35:29 <jd__> shall we move on gentlemen?
15:35:29 * dhellmann opens a blueprint to change the API to allow queries in all caps
15:35:39 <nprivalova> unfortunately we should commit it to devstack first :)
15:35:41 * jd__ puts his maillet away
15:35:54 <sileht> devstack already have CEILOMETER_PIPELINE_INTERVAL configuration variable, so we just have to set it in gate-devstack
15:36:10 <jd__> (and gentlewomen)
15:36:26 <jd__> nprivalova: would that be a problem?
15:36:35 <nprivalova> sileht: I will work on this
15:36:39 <jd__> good point sileht
15:36:53 <dhellmann> sileht saves us from over-engineering
15:37:18 <jd__> #topic Work with metadata discussion
15:37:18 <nprivalova> jd__, I don't know :) maybe you have a power to commit everything to everywhere
15:37:31 <nprivalova> it's me again
15:37:41 <jd__> nprivalova: I may or may not have some super power :D
15:38:08 <nprivalova> The long story short:
15:38:17 <nprivalova> When user requests meters or resources their's metadata is being flattened.
15:38:21 <nprivalova> On other hand, when meter or resource is stored to db their metadata is flattened too.
15:38:28 <nprivalova> These two processes are independent and now two different flatten-functions exist.
15:38:33 <nprivalova> We decided to keep only one of them (related bug #link https://bugs.launchpad.net/ceilometer/+bug/1268618).
15:38:35 <uvirtbot> Launchpad bug 1268618 in ceilometer "similar flatten dict methods exists" [Medium,In progress]
15:38:45 <nprivalova> After some discussions with team I decided to use dict_to_keyval everywhere. The reason is that this func allow user to create queues on lists and doesn't contain bugs.
15:38:55 <nprivalova> So the question: API layer is the only place where recursive_keypairs is used and this function contais a bug.
15:39:16 <nprivalova> The perfect solution is to change recursive_keypairs=>dict_to_keyval in API, but output of these funcs are different
15:39:20 <nprivalova> You may take a look here #link https://review.openstack.org/#/c/67704/4/ceilometer/api/controllers/v2.py
15:39:29 <nprivalova> Is it absolutely forbidden to make any changes in API output? We may postpone to change recursive_keypairs=>dict_to_keyval in API but maybe we may fix a bug in recursive_keypairs and fix all our wrong tests?
15:40:05 <dhellmann> nprivalova: what's the bug in recursive_keypairs?
15:40:20 <eglynn> well it would be forbidden I'd say to make changes that could break existing API callers
15:40:37 <dhellmann> yes, changing the return format would require an API version bump
15:40:45 <nprivalova> should I fix the bug but simulate it again in API to keep the behaviour?
15:40:50 <eglynn> #link https://wiki.openstack.org/wiki/APIChangeGuidelines
15:40:53 <dhellmann> which isn't out of the question, but is probably not something we want to do at this point in the cycle
15:41:18 * jd__ shakes in fear of APIv3
15:41:35 <nprivalova> #link https://bugs.launchpad.net/ceilometer/+bug/1268628
15:41:37 <uvirtbot> Launchpad bug 1268628 in ceilometer "recursive_keypairs doesn't throw 'separator' param to next iteration" [Undecided,In progress]
15:41:50 <gordc> nprivalova: i guess your fix is good then. i actually don't like how we're outputing some odd formatting... but it will change output to fix it.
15:41:50 <dhellmann> nprivalova: ah
15:42:39 <gordc> since the consensus is to not change output i think we need to keep your patch in to keep output consistent as before.
15:42:59 <nprivalova> yep, just wanted to clear that
15:43:20 <jd__> cool
15:43:28 <jd__> I like when we all agree
15:43:46 <jd__> #topic Open discussion
15:43:48 <nprivalova> and one more cr https://review.openstack.org/#/c/68583/
15:43:53 <gordc> nprivalova: you've no idea how much it bothers me seeing 'a.b:c:d' keys.lol i'll review patch again
15:44:15 <nprivalova> gordc: cool :)
15:44:41 <tongli> anyone know if the ctrl+c problem was fixed or not?
15:44:59 <nprivalova> tongli: where? in devstack?
15:45:06 <tongli> yes.
15:45:20 <tongli> I do not think that is specific to devstack though.
15:45:33 <sileht> tongli, https://review.openstack.org/#/c/70338/ this one is missing I think for CRTL+C issue
15:45:51 <nprivalova> tongli: oh, I'm not alone :) Today I faced it several times with devstack-master
15:46:04 <gordc> tongli:  it's patched. the oslo sync code just got merged.
15:46:20 <tongli> ok. good.
15:47:11 <nprivalova> do you have some bug-scrub procedure?
15:48:10 <eglynn> nprivalova: ... do you mean triaging the bug queue?
15:48:17 <eglynn> ... or squashing the actual bugs with a concerted effort at fixing?
15:48:33 <eglynn> ... such as a "bug-squashing day"
15:49:04 <nprivalova> I meant clean-up
15:49:18 <nprivalova> and triaging, yes
15:49:40 <tongli> nova had few of these days in the past,
15:49:53 <eglynn> a clean-up that ends with a neater, prioritized queue ... but not necessarily with fixed bugs, right?
15:50:16 <ddutta> Hi I am a noob in ceilometer .... was reading code ..... any place i can help to start learning about the code?
15:50:33 <dhellmann> hi, ddutta!
15:50:39 <jd__> ddutta: try fixing a bug?
15:50:46 <nprivalova> and the same with bps. I just found a bug with 'Confirmed' state that was fixed half of the year ago :)
15:50:59 <ddutta> btw I found a trivial typo too :) https://review.openstack.org/#/c/71431/
15:51:37 <ddutta> dhellmann: hi ... would love to do something here as my interests are in streaming data mining and machine learning :) ...
15:52:22 <dhellmann> ddutta: you've seen http://docs.openstack.org/developer/ceilometer/ right?
15:52:22 <eglynn> nprivalova: ... the newer bugs seem to be traiged fairly rapidly in general, but seems like we may need to do a periodic trawl of the older ones for dupes/stales etc.
15:52:32 <ddutta> will take on some simple bugs for starters to get more code and design insight ,......
15:52:36 <gordc> nprivalova: which bug was that? i occasionally run through bugs to clean them up a bit... i tend to let jenkins switch bug status so i guess it missed it in this case.
15:52:53 <ddutta> dhellmann: yes I started to read those
15:53:52 <gordc> ddutta: i tend to throw breakpoints in code i'm interested in and step through... probably doesn't work for everyone but works for me.
15:54:13 <dhellmann> ddutta: +2 on that patch, good eye
15:54:20 <nprivalova> gordc: ah, ok. it was https://bugs.launchpad.net/ceilometer/+bug/1217412 . We've changed the status
15:54:21 <uvirtbot> Launchpad bug 1217412 in ceilometer "HBase DB driver losing historical resource metadata" [Medium,Fix released]
15:55:18 <ddutta> dhellmann: thx .... on to the bugs now
15:55:27 <ddutta> gordc: good idea ....
15:55:57 <gordc> nprivalova: ah, yeah. that status wasn't updated by build... i guess anyone can change the status so if you notice anything feel free to make updates.
15:57:28 <jd__> time to wrap up guys
15:57:41 <jd__> feel free to continue in #openstack-ceilometer :)
15:57:48 <jd__> happy hacking!
15:57:50 <jd__> #endmeeting