15:01:04 #startmeeting ceilometer 15:01:04 Meeting started Thu Jan 9 15:01:04 2014 UTC and is due to finish in 60 minutes. The chair is jd__. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:08 The meeting name has been set to 'ceilometer' 15:01:10 o/ 15:01:13 o/ 15:01:16 o/ 15:01:19 o/ 15:01:20 o/ 15:01:24 o/ 15:01:26 o/ 15:01:26 o/ 15:01:29 o/ 15:01:43 Happy New Year! :) 15:01:48 happy new year folks 15:01:48 o/ 15:01:53 o/ 15:01:58 happy new year 15:02:01 Happy new year! :) 15:02:25 Happy New Year to all 15:02:38 #topic Tempest integration (nprivalova) 15:02:42 nprivalova: floors is yours 15:02:44 -s 15:02:52 hi all! 15:03:55 First of all I'd like to draw you attention to our tempest tests :) 15:04:15 We have the big problem with code duplication 15:04:48 duplication within the tempest code tree? 15:05:02 now I guess we've managed with the situation but the question is how to track the blueprints 15:05:22 eglynn, yes. We had several variants of client, e.g. 15:06:00 from mailing list I got only one suggestion: to create a spreadsheet with changes 15:06:26 but I think it is not much better that just having an etherpad 15:06:28 nprivalova: yeah, the requirement to have a tempest-specific client is awkward 15:06:31 but I guess there's a good reason behind it 15:06:44 o/ 15:06:47 (to avoid bugs in the API being masked by compensating bugs in the client library) 15:07:03 (... at least that was my understanding of the motivation) 15:08:02 yes, I'm not against client in Tempest. But we had several change requests with it's implementation in Tempest 15:08:14 eglynn: I understand that as well 15:08:26 where would this spreadsheet live? 15:08:29 or github? 15:08:34 s/or/on/ 15:08:45 wiki page? 15:09:05 it's my question :) Nova guys use the following https://docs.google.com/spreadsheet/ccc?key=0AmYuZ6T4IJETdEVNTWlYVUVOWURmOERSZ0VGc1BBQWc&usp=drive_web#gid=0 15:09:29 wow, that's a long list 15:10:00 yes, looks not very good 15:10:17 now we have just a https://etherpad.openstack.org/p/ceilometer-test-plan 15:10:23 D= 15:11:02 whatever works 15:11:24 pick it and other will follow :) 15:11:26 but my question: how to make people look at it? People just come to tempest and start to create tests. But these tests are already being implemented by others 15:11:49 nprivalova: there's no magic bullet, just yell at them, -1 and point to the original patch? :) 15:12:03 you can't force people to look for something unfortunately 15:12:26 an email to the mailing list announcing it would help, but I agree with jd__ that you should -1 patches that duplicate effort 15:12:37 hmmm, I seem to remember the tempest folks at one point suggesting the use of an initially empty WIP patch as a mutex 15:12:46 (to avoid duplication of effort) 15:12:54 I guess we expect to have enough tests that we don't want to open bugs or blueprints for them individually? 15:13:00 eglynn: that only works if people looks at Gerrit first 15:13:17 nprivalova: ask dkranz about that mutex idea, I can't recall the details 15:13:20 while we're at it, I'd like to emphasize something on Tempest testing 15:13:21 jd__: true that 15:13:31 ok. I hope that at least core team is aware of our document :) 15:13:38 I'd like that people writing Tempest tests be _very_ careful about how they test things, avoiding race conditions 15:14:00 typically it is NOT safe to do things like "spawn a VM and check that there's a meter", since you don't know how much time is needed to have a meter 15:14:19 we don't want ceilo to become the new neutron ;) 15:14:20 writing such tests that would work 95% of the time is a terrible idea and is going to make us look like Neutron 15:14:28 JFYI, our client is https://review.openstack.org/#/c/55276/ 15:14:31 * jd__ high fives eglynn's great mind 15:14:39 LOL :) 15:14:51 so *PLEASE* be careful about what you write 15:15:04 jd__, please come and review us :) 15:15:05 and add synchronous mechanism if needed in Ceilometer to make test *RELIABLE* 15:15:27 nprivalova: I wish I wish I had enough time, but it's getting harder to follow everything :) 15:15:57 so I'm just throwing that for now, that's all I can do, and if it was obvious for all of you, wonderful ;) 15:16:06 nprivalova: we talked at one point about having ceilomter-core added to reviews, are the test authors doing that? 15:16:24 dhellmann: I know I'm on reviews, don't know aobut ceilometer-core 15:16:26 but we definitely need a core reviewer from ceilometer team for tempest test. I hope we will have more tests soon 15:16:40 * dhellmann has 280 unread threads related to code reviews in his mailbox 15:16:43 dhellmann, yes 15:17:20 nprivalova: good, thanks 15:18:08 my goal for this meeting is to remind you about tempest. So it's rather ok for now, we may move on. I will ping you on the next meeting 15:18:17 :) 15:18:24 cool 15:18:35 moving on 15:18:42 #topic Batch notification processing (herndon) 15:18:50 it might be good to have tempest status as a recurring topic 15:19:00 like the client lib 15:19:07 dhellmann: +1 15:19:11 dhellmann, yep. I've added it to agenda 15:19:44 I'll keep it on the agenda 15:19:46 herndon_: around? 15:19:49 yep 15:19:55 herndon_: enlighten us 15:20:00 just didn't want to interrupt 15:20:19 :) 15:20:20 so, I'd like to chat about this email thread that happened before the holidays: http://lists.openstack.org/pipermail/openstack-dev/2013-December/022789.html 15:20:48 It seems like for the most part people liked the idea of consuming notifications in batches. 15:21:03 I believe DanD had some concerns, wanted to give him an opportunity to voice those if he's around 15:21:22 am here 15:21:35 care to elaborate a bit? 15:21:51 herndon_: I think the end result is "please write a patch", no? 15:21:56 herndon_: or is anything unclear? 15:22:07 unclear, ie, which patch to write 15:22:12 my concern with the message q based approach is that we then don't get caching/batch consumption for other collection methods 15:22:25 we already have api post 15:22:32 discussions on ipmi, snmp 15:22:38 herndon_: change the prototype of data recording to accept multiple records to be able to batch INSERTs first, then the RPC part 15:22:40 those should also support this 15:23:00 sandywalsh proposed putting the code into ceilometer, not oslo, as a first step. This will likely have a faster turn around time 15:23:11 which code? 15:23:15 yep, in the collector 15:23:19 would the batching would only apply to samples derived from notifications, or also to metering messages derived from polling? 15:23:44 optionally turned on but could be shared by all consumers (sample/event/etc) 15:23:46 if we put in in the processing pipeline then it could apply to all collection 15:23:59 It could do both I suppose. I'm mostly interested in notifications (and specifically, events), but it would be good to be able to batch everything. 15:24:09 * eglynn is thinking about timeliness for alarm triggering 15:24:18 As I understand it, it'd apply to anything that consumes notifications? instead of a single message in and product of a single message out, it'd transact in lists... 15:24:26 you need to be able to turn it off as well 15:24:26 eglynn, it would be optionally turned on 15:24:44 DanD - the problem with batching up http posted data is there's no way to tell the client that something went wrong... I think that data must be inserted directly. 15:24:50 DanD, not sure if the pipeline is the right place ... perhaps before it? 15:25:14 with AMQP, we have the opportunity to hold on to the message until it is committed to the storage back end. 15:25:17 sandy, that could work as well 15:25:33 could we have another publisher for batch? 15:25:53 herndon_: how? AMQP doesn't interact with storage backend in ceilometer 15:25:54 yeah, it's really only beneficial for queue based inputs ... wouldn't be good for other forms like http 15:26:32 llu-laptop: the collector wouldn't ack anything in the batch until the batch is committed. so if something goes wrong, all of those messages just get requeued 15:26:36 would we need selectivity for batching metering messages for some meters but not others? 15:26:50 (i.e. suppress batching for cpu_util samples) 15:27:00 there is also udp messages that do not need the reliability but could have huge volumes 15:27:18 agree with eglynn for selectively batch 15:27:57 eglynn: I don't understand... 15:28:23 herndon_: I'd want to ensure that cpu_util samples hit the DB immediately on receipt 15:28:29 eglynn, if samples are coming from events, then timeliness can't be that big a concern anyway. There's always latency there. 15:28:30 batching for http is useful is the incoming http rate is reasonably high. you can batch the incoming requests based on size/time and do async responses back to the client after the request is completely processed 15:28:47 eglynn, I would assume you'd want UDP for things like CPU samples? 15:28:48 herndon_: as say autoscaling actions may be driven off alarms based on cpu_util 15:28:55 ok, I see 15:29:14 sandywalsh: well out-of-the-box AMQP is used for everything 15:29:32 sandywalsh: ... but yeah, could configure the pipeline.yaml that way 15:29:55 but the metering data doesn't come in on the notification topic right? 15:30:07 eglynn, yeah, perhaps we should reconsider using the queue for samples as default 15:30:19 herndon_, hmm, good point 15:30:44 herndon_: yeah that's why I asked whether it just applied to notifications 15:30:51 thinking for starters, lets just batch all of the notifications and build up a more generally useful batching mechanism. Then we can decide how to use it elsewhere? 15:30:53 herdon: metering data has an own metering topic, if I remeber well to the configuration file 15:31:14 herdon: until someone not sets it for notification, if it is allowed 15:31:47 ildikov, perfect 15:32:00 (as it should be :) 15:32:07 batching notification = sending an array of notification in one AMQP message? 15:32:21 herndon_^^ sorry, I'm typing too fast and mistyped :( 15:32:28 no, consuming a batch of notifications off the queue 15:32:41 ok so it's just on the reader side 15:32:46 jd__, yeah 15:32:47 no, it means holding them so we can batch write them to storage so it's not 1 read: 1 write 15:32:48 yeah 15:32:56 that's what I understand from the list thread but this disussion confused me 15:33:09 N queue reads: 1 disk write 15:33:42 Sounds like it's time to write a BP and start writing the code 15:33:45 1 ack per batch of the sequence number acks everything prior 15:33:49 herndon_: my point ;) 15:33:58 back to my orginal question :), do we want to support batch writes/caching for all sources of data not just message queues? 15:34:15 rhochmuth, Are we able to specify messages that failed in the bulk ack scheme so they either get dead-lettered or pushed back onto the queue? 15:34:20 DanD: other source of data being..? 15:34:25 DanD: in storge side? 15:34:26 DanD, my vote is start with message queues 15:34:35 Something like bulk ack up to message 1001, except for message 24 and 32 15:34:49 thomasem, i don't think so, 15:34:55 thomasem: if you "nack" single messages before the greater "ack", then yes 15:35:05 Gotcha 15:35:05 cool 15:35:20 I knew we had talked about that previously but the details slipped my mind. :) Thanks! 15:35:22 rhochmuth, I don't think bulk ack() would work in a multi collector scenario either 15:35:23 should we implement our own TCP? 15:35:31 sandywalsh, why? 15:35:42 nprivalova: I didn't read that 15:35:43 DanD - batching is not going to take place in oslo, so theoretically we could create UDP batches. 15:35:55 jd__, :) 15:36:17 sandywalsh: afaik a bulk ack only acks messages for that client. 15:36:17 nprivalova - care to fill us in on what that means? 15:36:21 so we are talking about batching on the ceilometer queue, not the service queue? 15:36:29 ^^ dragondm 15:36:30 :) 15:36:37 That's how it's supposed to work 15:36:40 dragondm: tjat 15:36:41 dragondm, that was my concern ... if that's the case, we're ok in the happy-day scenario 15:36:43 that's correct 15:37:00 dragondm, otherwise, we have to individually ack for thomasem's use-case 15:37:25 either way, not a biggie 15:37:36 I think the message tag used to ack is specific to the client 15:37:37 in some way 15:37:47 it's specific to the channel. 15:37:50 So, consumer A won't effect consumer B 15:37:56 when acking or nacking 15:38:01 anyway, I think we're done, I'll write up the BP and send it out. 15:38:04 yup. 15:38:14 I think the big issue is eglynn's concern and if they're on different queues we're golden. Just need to support batching on a per-queue basis 15:38:42 so, then it's just like jd__ said, submit branch ... sounds like a worthy effort 15:38:46 herndon_: ... means a warning against re-inventing the wheel I'd guess 15:38:46 sandywalsh: yep, agreed (IIUC) 15:38:54 +1 15:39:06 moving on then :) 15:39:07 herndon_, we need to send ack and nack, 'sliding window'. all of these remind me TCP protocol implementation. Just an association :) 15:39:09 neat 15:39:32 #topic Release python-ceilometerclient? 15:39:47 I'm guessing herndon_ wants to wait for https://review.openstack.org/54582/ 15:39:50 Haha, support for event API is coming, promis. 15:39:53 promise* 15:39:54 yes please! 15:39:54 :P 15:40:25 fair enough 15:40:32 eglynn like it, so it MUST be ready to go :) 15:40:38 herndon_: is that the only patch you're waiting on landing? 15:40:39 s/like/likes 15:40:43 let's wait for next meeting to see if we can do it unless eglynn handles the burden before 15:40:48 for the client? yes 15:41:07 herndon_: I'll cut the client release once that patch lands 15:41:20 jd__: cool enough 15:41:34 #topic Open discussion 15:41:41 there's some timeleft if you want to raise anything 15:41:53 Hello, I posted two blueprints. 15:42:05 Sorry I am asking this again and haven't got any directions so far on how to proceed - Is Active/Active Ceilometer Central agent fully supported in Havana or is that something planned in Icehouse ? Can someone provide some inputs, any docs ? 15:42:06 sorry for the noise, but I need some reviews for https://review.openstack.org/#/q/status:open+project:openstack/ceilometer+branch:master+topic:bp/support-resources-pipeline-item,n,z it's been there for some time 15:42:12 Those are https://blueprints.launchpad.net/ceilometer/+spec/monitoring-network and https://blueprints.launchpad.net/ceilometer/+spec/monitoring-network-from-opendaylight. 15:42:19 I hope your comments. 15:42:28 We have started work on supporting VMware vSphere for Ceilometer. We've come up with technical aspects of implementation like mapping meters from Ceilometer to the counters available in vSphere. The corresponding blue print is - https://wiki.openstack.org/wiki/Ceilometer/blueprints/vmware-vcenter-server. It would be good if it gets reviewed by others. 15:43:08 hvprash_: havana don't have active/active central agent 15:43:37 And I posted patches about those BP. 15:43:41 any proposed blueprint ? or is it under the central agent improvement bp ? 15:43:57 I've started a POC for aggregation and rolling_up. If you are interested in it please take a look https://review.openstack.org/#/c/65681/ . I'm working on detailed description in bp 15:44:50 IIRC we chatted before about a mid-cycle meet-up for the euro-ceilo devs at FOSDEM 15:44:57 that still on the cards? 15:45:02 sadly, /me won't be able to make it to FOSDEM this year :( 15:45:04 Those BP are not under the central agent improvement bp. 15:45:25 Should I move to under the the central agent improvement bp ? 15:45:35 s/the the/the/ 15:45:36 hvprash_: icehouse we hope 15:45:48 hvprash_: that's the central agent improvement bp created for, it's an unmbrella 15:45:50 jd__: can you refer me to the bp ? 15:46:09 eglynn: that sucks! how is that possible :( 15:46:15 ... but if there's quorum of other folks attending might be worth organizing something 15:46:16 llu-laptop, ah ok ! 15:46:29 hvprash_: https://blueprints.launchpad.net/ceilometer/+spec/central-agent-improvement 15:46:54 eglynn: I'll send a mail on -dev 15:47:23 jd__, assuming that also includes the unification of the ceilometer central agent like mulitple datacenter csources etc 15:47:32 eglynn: should we move FOSDEM to Dublin next year for ya? ;) 15:47:52 jd__: now you're talkin'! 15:47:58 jd_: thanks for the pointer to the central agent improvement bp.. is this being actively targeted for icehouse-3? 15:48:06 eglynn: that'll just change the beers color. 15:48:22 absubram_: it is, finger crossed 15:48:26 jd__: yeah, we don't do fruit in our beer! 15:48:35 No, just pure happiness. 15:48:37 ah, one question from me! I need your heads for discussion https://review.openstack.org/#/c/52670/7/ceilometer/api/controllers/v2.py 15:48:44 thomasem: yup :> 15:49:03 dhellmann^^ 15:49:06 thanks jd__ 15:49:26 eglynn: Is there a formal process of getting the technical implementation of a BP verified? 15:49:43 Akhil: git review 15:50:01 yep, wot he said ... 15:51:17 sileht: going at FOSDEM this year btw? 15:51:47 eglynn: Which is after the code is written and we post a review. Is there any platform for discussing architecture before coding or the BP approval is enough? 15:51:48 nprivalova: I'll take a look 15:51:59 jd__, I don't think so 15:52:47 Akhil: Just ML and IRC chats... 15:52:49 Akhil: BP approval is usually enough, but if it controversial best to also raise it on the ML and/or IRC to give folks a chance to object before you do all the implementation work 15:52:53 dhellmann, thanks 15:53:02 jd__, do you mind if i can sync offline with you to get some insight into this BP ? not sure if all the cases are captured in this or might be my lack in understanding 15:53:38 jd__: llu-laptop: just a quick request.. if there is a plan to add active/activa HA support via this central agent impeovement bp.. can we add that to the white board in the bp? 15:54:17 eglynn: Thanks !!! We'll send out a mail with the points of concern. 15:54:28 Akhil: cool 15:54:37 hvprash_: we can discuss it on openstack-dev I think 15:54:49 jd_, that works.. thx a lot 15:54:59 so who's going to FOSDEM finally, probably nijaba and me, that's it? 15:55:08 * dhellmann can't make it 15:55:20 not sure if it's worth organizing something if I'm alone coding 15:55:27 Awwww 15:55:28 :-) 15:55:33 You can put a camera on yourself 15:55:53 Or telepresence robots 15:56:18 * nealph votes for robots 15:56:19 * dhellmann would watch a live-stream of jd__ coding 15:56:20 is that telemetry? 15:57:57 heh 15:58:13 If we were to hold up little gauges to the camera, perhaps. 16:00:07 that ends it dudes 16:00:09 #endmeeting