15:01:08 #startmeeting ceilometer 15:01:09 Meeting started Thu Jul 31 15:01:08 2014 UTC and is due to finish in 60 minutes. The chair is eglynn. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:13 The meeting name has been set to 'ceilometer' 15:01:16 o/ 15:01:19 o/ 15:01:20 o/ 15:01:20 yay! 15:01:23 o/ 15:01:27 o/ 15:01:27 o/ 15:01:34 hey y'all 15:02:10 o/ 15:02:13 #topic Juno-3 planning 15:02:28 k, so we're down to the wire now on getting juno spec landed 15:02:49 major ones to land are ... 15:03:15 'big-data' sql part 2 by gordc 15:03:29 s/by/featuring/ 15:03:40 o/ 15:03:55 * gordc needs to get to work 15:03:59 and one of the central agent SPoF specs, featuring fabiog_ or nsaje 15:04:41 I'm think most of the rest of the open proposals are going to punt to kilo at this stage 15:04:46 *thinking 15:05:03 is Gnocchi dispatcher for the collector targeted/tracked? 15:05:11 o/ 15:05:18 jd__: yep 15:05:39 jd__: https://blueprints.launchpad.net/ceilometer/+spec/dispatcher-for-gnocchi-integration 15:05:57 here's the current list for juno-3 ... https://launchpad.net/ceilometer/+milestone/juno-3 15:06:02 thanks 15:06:24 so I'm goning to add in "blocked" launchpad BPs for the two I mentioned earlier 15:06:30 (pending the specs getting landed) 15:06:34 eglynn: so just the central agent related spec we need to push through? any other ones that we should prioritise? 15:06:48 (so that they show on the radar used by the release manager) 15:07:13 gordc, central agent HA do you mean? 15:07:19 gordc: yep we need yours sql-a II and one of the central agent specs (or a combo) 15:07:23 links for the major ones eglynn mentioned: #link https://review.openstack.org/110985 #link https://review.openstack.org/101009 15:07:28 oh, sorry 15:07:28 cool cool 15:08:07 so, please do fire up your reviewing engines and lets target EoW to get the specs aspect closed off if possible 15:08:15 and/or #link https://review.openstack.org/#/c/101282/ 15:08:48 is there a way to push https://review.openstack.org/#/c/104784/ faster? 15:09:03 nsaje and eglynn: I will update it with the latest changes 15:09:20 Kurt_Rao: from the moment we can agree how to do it :) 15:09:25 Kurt_Rao: I guess address the issues in the -1 reviews? 15:09:37 so which one is for central agent SPOF, https://review.openstack.org/110985 or https://review.openstack.org/#/c/101282/ ? I think the former one, right? 15:10:14 llu-laptop: they are different approaches, it's up to you cores which approach to take, or a combination of the two 15:10:14 llu-laptop: these capture alternative approaches to the same problem 15:10:15 eglynn: the problem is that there are conflicting preferences 15:10:34 ildikov: conflicting preferences on instance.uptime? 15:10:48 nsaje: eglynn: got that, thx 15:10:50 eglynn: on how to have smth that looks like an instance.uptime 15:11:25 Sure. Can Fabio please comment on how notification works if the VM is on for a really long time? If no state change notification received, no new sample will be generated, and that means billing system will not know what's the uptime 15:11:34 Kurt_Rao: so if we cannot agree there, than a meeting topic maybe would speed up the process a bit 15:11:39 ildikov: OK, so it only land if those conflicts are resolved on gerrit I guess 15:11:44 * eglynn states the obvious ... 15:12:20 ildikov, Kurt_Rao: let's try to resolve this on gerrit if possible 15:12:21 eglynn: but thanks for reminding ;) 15:12:26 ok 15:12:27 sure 15:12:52 Kurt_Rao: good point, can you add that to gerrit as well (if you haven't already) 15:13:27 Kurt_Rao: if there are not new notifications it means that the VM has been running all the time 15:13:47 Kurt_Rao: why do you need a constant reminder? 15:13:50 or it crashed and you don't know it? 15:14:19 jd__: if it crashed there is a state change 15:14:28 gordc: Kurt_Rao: yeap, I've raised this problem earlier on gerrit, so the mechanism how that meter would be generated is not clear 15:14:34 fabiog_: i think he's using polling as a heartbeat type verification. but yeah, we can discuss later or on gerrit 15:14:53 fabiog_: and you need something to notify you 15:14:55 but maybe we should continue this in the open discussion part, if we will have time there today 15:14:58 * jd__ shuts up 15:15:05 gordonc: and jd__: nova does that already 15:15:14 * gordc puts away the tape. 15:15:59 fabiog_: I think we'd better discuss on the gerrit. But the reason I like the spec is, it's really easy for the billing system to use it. Simply query statistics api with sum aggregation 15:16:11 yeah in general, easiest to follow if all the points and counter-points are captured in the same place (i.e. gerrit) 15:17:11 so I guess the point ot bear in mind in that juno-3 will be cut 5 weeks from today 15:18:12 ... and with that happy thought ;) 15:18:21 shat we move on to the next topic? 15:18:28 wow...that silence speaks volumes. :) 15:18:40 nealph: LOL :) 15:18:54 :D 15:18:59 #topic Tempest status / in-tree functional testing 15:19:13 hehe, ok 15:19:28 so I guess y'all saw the thread on the ML from the QA crew? 15:19:51 #link http://lists.openstack.org/pipermail/openstack-dev/2014-July/041057.html 15:20:05 fyi, so we have nova and glance notifications merged, and swift, cinder, neutron almost done - that's about the api testing 15:20:12 just to fit the topic 15:20:45 DinaBelova: so the outstanding tempest patches, just the alarm-history now? 15:21:06 DinaBelova: did you guys ever run Ilya's performance tests against current code to see if anything performance has degraded since prevoius tests? 15:21:35 * eglynn answers own question ... https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:bp/add-basic-ceilometer-tests,n,z 15:21:36 eglynn, 'almost done' == 'final review cycle' 15:22:08 DinaBelova: cool 15:22:30 gordc, last time we ran tests were mid-cycle time 15:22:34 gordc: when you do suspect the degardation happened, between juno-1 and juno-2? 15:22:44 gordc, on master I mean 15:23:20 eglynn: when was juno-2? i guess the tempest tests started to fail around July1? 15:23:39 gordc: juno-2 was July 24th 15:24:03 gordc: yeah so that's about mid-way between j1 & j2 15:24:12 eglynn: although it started to get better right before we disabled tempest tests... 15:24:17 well, we used ~end of june code to test - that time it was ok 15:24:25 DinaBelova: would it be possible to spin up the tests again for juno-2? 15:24:27 at least on testing envs 15:24:34 of course, sure 15:24:52 action on ityaptin, please :) 15:25:05 actually, didn't we talk about making it easy for the perf test runs to be reproducible? 15:25:15 DinaBelova: thanks! also are there instructions on how to run tests myself? 15:25:21 eglynn, yeah, sorry, still in progress 15:25:27 ityaptin is finishing the doc 15:25:27 DinaBelova: cool cool 15:25:30 #action ityaptin re-run performance tests against juno-2 15:25:45 to do that actually reproducible by other folks 15:25:49 not only him :) 15:26:01 ityaptin promises to finish it today :) 15:26:03 :D 15:26:18 DinaBelova: cool, that would be really useful so that gordc can say bisect the timeline between j1 & j2 15:26:54 eglynn, sorry, not so easy to find time for doc writing with all these multi-node-jenkins researches 15:26:54 DinaBelova: also, do you wanna speak to your experimental mongodb jobs in the gate? 15:27:01 a-ha 15:27:03 yeah 15:27:13 only one change is separating us with this 15:27:26 https://review.openstack.org/#/c/110247/ 15:27:30 it was even approved 15:27:44 but eventually gate job failed 15:27:53 although it was ok while the check 15:27:59 so the approval was removed 15:28:27 after I went through the logs I found nothing related to the change, although I could not actually find the reason... 15:28:38 that's why I'm trying to ping sdague :) 15:28:57 probably his experience will help me to find the actual issue - if it was 15:29:19 and if there will be no reason to keep this change on the review - it'll be merged I guess 15:29:24 DinaBelova: a-ha, ok, does it seem like a transient failure? 15:29:36 eglynn, yes, indeed 15:30:04 eglynn, actually I can't understand what was the reason for it, but I see nothing familiar in the previous runs logs 15:30:18 latest run is all green, so maybe a few more green rechecks might convince Joshua? 15:30:30 (i.e. convince him to re-approve) 15:30:35 eglynn, I'll run some :) 15:30:40 cool, thanks! 15:30:50 #topic TSDaaS/gnocchi status 15:30:58 eglynn: whoa 15:31:08 jd__: floor is yours, sir! 15:31:13 I had added the second half of the earlier topic 15:31:25 cdent: did I cut that off too abruptly? 15:31:25 so what's new, a couple of new API call just to make things simpler to use 15:31:32 yeah: in-tree functional testing 15:31:40 The topic is: What we doing about that? 15:31:45 (if anything) 15:31:49 hehe, I have python-opentsdbclient repo, btw :) /me moving the code from the abandoned patch to it 15:31:51 cdent: I mentioned that at the start of the topic 15:31:56 the new statistics/carbonara code has been merged, I don't recall if it was the case last week :) 15:32:01 cdent: ... and nobody bit 15:32:02 I didn't see any plan or resolution 15:32:24 It seemed rather that we moved on before any biting could happen.. 15:32:27 cdent: ok, let's return to it after gnocchi update 15:32:39 ✓ 15:33:03 now I'm just a bit worried about nothing moved on the dispatcher code (hint ildikov :) 15:33:19 and I'm gonna start working on archive policy next 15:33:26 jd out 15:33:37 jd__: /me is still in docco writing mode, so don't worry, I'm already suffering for my sins :( :) 15:33:37 jd__: cool, I'll need to work with you on the archiving policy 15:33:58 eglynn: more than reviewing my patches? 15:33:58 jd__: ... to fit it in with the capabilities of influxdb 15:33:59 ildikov, hehe :D 15:34:03 eglynn: ack 15:34:12 eglynn: well I'll start something soon and we'll iterate over that I guess 15:34:24 jd__, cool! 15:34:52 jd__: yeah, so I reached out to pauldix on the status of the influx releases, it's a little behind but prolly enough for me to get restarted with the driver 15:35:04 cool 15:35:59 anything else on the tasty Italian pasta? 15:36:18 * DinaBelova found it in the local shop 15:36:20 not from me 15:36:30 eglynn: we could have an Amatriciana too ;-) 15:36:34 neither from me :( 15:36:43 ildikov: no ETA btw? 15:36:53 *NO PRESSURE* 15:36:54 :-D 15:37:31 jd__, you're too cruel :) 15:37:36 jd__: will start writing some code this week 15:37:45 cool :) 15:37:50 ildikov: ok cool, though remember we're on Thursday already ;-) 15:37:55 oh yeah a shameless plug for votes ... https://www.openstack.org/vote-paris/Presentation/rethinking-ceilometer-metric-storage-with-gnocchi-time-series-as-a-service 15:37:56 * jd__ whispers 15:38:21 jd__: no need to remind me about this simple fact... ;) 15:38:33 :-) 15:38:49 k, shall we return to the in-tree tests? 15:38:54 * jd__ nods 15:38:56 eglynn, yeah 15:38:57 #topic in-tree functional testing 15:39:04 so I already linked the ML thread above 15:39:17 It's basically just a question of: are we going to try to do something about soonish, or wait for tempest librarization? 15:39:28 this was also discussed a bit at the project/status meeting on Tuesday 15:39:37 cdent: the time horizon is kilo 15:39:55 I didn't read that thread but I definitely like the idae 15:40:04 It really surprises me, given the state of testing that it's not a higher priority. 15:40:14 since I've been ranting about that for months… :D 15:40:30 I pushed for a small number of solid exemplars to be in place initially, to avoid too much divergence/duplication of effort when the other projects follow suit 15:40:55 cdent: to be exact here's the statement ... 15:41:04 i'd guess that we'd expect to see a good pattern in place by juno, and maybe push harder for wide adoption in kilo 15:41:17 that sounds to me that it drills down to the usual problem "who's gonna work on that?" 15:41:25 sounds like politics 15:41:36 "good pattern" == "built out for 1 or maybe 2 projects" 15:41:38 if nobody has the time there's no point discussing it… 15:41:53 sounds like resource-starvation more than politics 15:42:14 the jeblair comment, not the jd__ comment 15:42:28 cdent: ? 15:42:33 It's politics to prioritize features over fundamental 15:42:34 I think there is general agreement on it being the only sane approach to solve the tempest madness 15:42:52 well, the question is in the resources, yeah.. Vadim still busy on current tempest tests (just simple, just as tempesty as they can be, but actually no time no for this approach from him) 15:43:02 jeblair: just taking your name in vain above, explaining teh discussion at the project/release status meeting on Tuesday last 15:43:13 cdent: i mean that we have a couple of projects that have started on a pattern for in tree functional testing 15:43:29 cdent: so we'll work out the kinks there, and hopefully then other projects will have an easier time adopting it 15:43:31 full context is here: http://eavesdrop.openstack.org/meetings/project/2014/project.2014-07-29-21.05.log.html#l-18 15:43:41 eglynn, thanks for the link here 15:43:50 thanks 15:44:23 jeblair: +1 on the kinks being ironed out once or twice, as opposed to multiple times in parallel in potentially different ways 15:44:24 DinaBelova: sorry, been at Nova meetup this week. I just reaaproved the d-g change 15:44:31 so the action, for now, as far as ceilo is concerned is wait, yes? 15:44:35 sdague, oh, thank you sir!! 15:44:53 From my tablet in a coffee shop :-) 15:44:54 cdent: we're good at that 15:45:10 sdague, hehe, nice :) have a nice coffee-break :) 15:45:17 cdent: i think so; maybe start thinking about how it would work and check in on on other projects as they make progress on it 15:45:45 cdent: no, the actions are to participate in the discussion, ensure our needs will be catered for, keep abreast of the exemplar projects as they progress 15:45:48 It would be useful for there to be some stronger expression on what these things need to do. Not _now_, but _why_. 15:45:55 swift and neutron both have some amount of functional testing 15:45:58 or what eglynn says :) 15:45:58 s/now/how/ 15:46:18 jeblair, cdent, eglynn - Vadim may take action on this in next few weeks (speaking about the research) 15:46:26 and we all will keep track on this 15:46:34 As the stuff I've read feels like it makes some assumptions about the goals and it would help if it was a bit more explicit. 15:46:51 * jd__ hopes next step will be in-tree documentation 15:47:09 jd__, ++ 15:47:19 cdent: a big part of the goal is better testing closer to the project -- so we catch bugs is a simpler environment instead of counting on a huge complex deployment to expose them 15:47:37 * cdent nods 15:47:47 that's understood 15:47:54 ok, if we defined some vector of working/investigating, may we move on? 15:47:59 yes 15:47:59 cdent: perhaps you could chime in also in the ML thread seeking that explicitness? 15:48:04 will do 15:48:18 moving on? 15:48:26 +1 15:48:29 #topic Meeting for Central Agent HA revised proposal 15:48:48 fabiog_: was the intent to seek a seperate meeting on this? 15:48:53 eglynn: yes 15:49:15 I would like to have a meeting over the phone and shared screen to illustrate the proposal 15:49:28 eglynn, yeah, with slides, etc 15:49:36 fabiog_: so we've only 5 weeks left to juno-3, so we'll need to move fast on this 15:49:38 I also contacted Joshua to have him on the meeting and see how we can leverage Taskflow 15:49:51 there is also another proposal on the table 15:49:52 fabiog_ is right here - it'll be more visualized 15:49:59 fabiog_: when did you have in mind? 15:50:16 eglynn: I can do it next week as early as Mon?tue 15:50:20 nsaje, will you attend the call? 15:50:23 depends on people avail 15:50:24 it'll be cool 15:50:26 fabiog_: will the slidedeck be distributed in advance? 15:50:38 fabiog_: Monday is a public holiday in Ireland 15:50:43 I can send it around later today 15:50:50 ok let's try Tue 15:50:53 +1 15:50:56 +1 15:51:02 fabiog_: yep tmrw or Tues 15:51:06 DinaBelova: yes 15:51:20 is this time ok for you guys? I will try to invite someone from Y! to explain what Taskflow can do 15:51:24 nsaje, cool, just to have all interested eyes in one place 15:51:32 I mean Tue 8am PDT 15:51:33 not good for me but doesn't matter 15:51:47 fabiog_: yep, this time works for me, can you send an invite? 15:51:55 I work asynchronously :-) 15:52:00 jd__: well we can find a time that does suit? 15:52:08 fabiog_: PDT means UTC-? 15:52:09 yes I will send an invite to all of you 15:52:14 eglynn: doodle? 15:52:17 fabiog_: I would participate too 15:52:19 eglynn, jd__ is async :D 15:52:27 jd__: yep, a capital idea! 15:52:39 if you want to have everyone, just doodle it 15:52:40 * eglynn loves doodle :) 15:52:46 UTC is PDT+8 15:52:52 otherwise I don't really care to be absent, I'll bash the final specs ;-P 15:53:04 is now around 9am PDT? 15:53:11 yes Kurt_Rao 15:53:15 :) 15:53:16 Kurt_Rao: yes 15:53:24 fabiog_: http://doodle.com/create 15:53:25 #action fabiog_ create a http://doodle.com/ poll to find the best meeting time 15:53:32 move on? 15:53:46 #topic scaling for firehoses 15:53:54 cdent: floor is yours sir! 15:54:01 this message: http://lists.openstack.org/pipermail/openstack-dev/2014-July/041645.html 15:54:20 raised the question of how many notifications can ceilometer _really_ handle 15:54:21 tl;dr: how to make the swift middleware scale? 15:54:40 I think it is more general than that. 15:54:47 I think so too 15:54:56 infinite because it's cloud scale 15:55:00 NEXT TOPIC PLEASE 15:55:01 ... at least the question in that mail 15:55:16 jd__: the approach of rolling up samples in the middleware have any legs, d'ya think? 15:55:32 The general question is: Is ceilometer going to be able to cope with modern services that are ossum. 15:55:47 I have no faith that it does. 15:55:57 http://www.urbandictionary.com/define.php?term=ossum 15:56:21 eglynn, current researches shows that ~400 notifications per second on mongodb on small installation 15:56:33 cdent ^^ 15:56:49 up too 800 in case of tuning of backend, etc. 15:56:54 cdent: that's a broad statement, you applying that to all of ceilo or the swift middleware> 15:56:55 but I guess not more now 15:57:06 queue will be the bottleneck now 15:57:15 s/>/?/ 15:57:33 since all components are scalable, I don't really see the bottleneck yet? 15:58:11 jd__, we had no opportunity to test on all HA/scalable services - no so much resources 15:58:23 Okay, so the sense I get is that this guy is just throwing noise around and we can mostly ignore him in the short term. 15:58:30 cdent :D 15:58:31 ;) 15:59:06 cdent: IIUC the concern is one notification per request hitting swift-proxy 15:59:08 ? 15:59:09 I think the question is OK and the answer it "we design everything to scale if there's a bottleneck show us" 15:59:50 * cdent regrets raising the point in this context 15:59:59 jd__, no scalable central agent for now ;) 16:00:09 although it's not from this part 16:00:14 cdent: why? 16:00:17 I mean not this consuming part 16:00:30 cdent: time too limited, or? 16:00:42 DinaBelova: right and we're working on it 16:00:50 IRC a very poor format for measured discussion. 16:01:10 I'll take it back to the mailing list. 16:01:11 you mean, to troll? 16:01:15 :-) 16:01:16 cdent: we're running out of time today ... can we punt ironic/IMPI to the next meeting? 16:01:25 jd__, active/passive solution won't add the scalability, only HA I guess 16:01:26 Yes. 16:01:29 oh, yeah 16:01:31 out of time 16:01:41 amazebals 16:01:51 sorry folks, gonna have to cut it off now 16:02:01 DinaBelova: that's what we're working on 16:02:08 #endmeeting