15:01:00 #startmeeting ceilometer 15:01:01 Meeting started Thu Apr 24 15:01:00 2014 UTC and is due to finish in 60 minutes. The chair is eglynn. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:04 The meeting name has been set to 'ceilometer' 15:01:09 o/ 15:01:11 o/ 15:01:15 o/ 15:01:15 o/ 15:01:17 Hello! 15:01:18 o/ 15:01:28 o/ 15:01:41 o/ 15:01:56 o/ 15:02:03 o/ 15:02:11 welcome all! ... the new timeslot seems to suit more peeps at a 1st glance 15:02:36 eglynn: +1 :) 15:02:42 <_nadya_> o/ 15:02:47 #topic summit scheduling 15:02:55 o/ 15:02:57 so the ceilometer track is gonna have 10 slots this time round 15:03:05 (one less than either HK or Portland) 15:03:19 all day Wed plus morning slots on the Thurs 15:03:28 o/ 15:03:36 (clashing with Heat sessions unfortunately, but c'est la vie ...) 15:03:46 I count 18 proposals at this point... 15:04:08 nealph: yeap, we're over-subscribed to the tune of circa 2:1 15:04:19 * nealph smells a consolidation coming 15:04:22 ... which is *good* as it shows interest in the project 15:04:28 nealph: :) 15:04:31 nealph: those decisions are why the PTLs get paid the big bucks ;-) 15:04:45 riiiiight. :) 15:04:48 we had several shared slots in HK too, as I remember 15:04:54 dhellmann: LOL :) 15:05:12 dhellmann: I only accept payment in the form of the finest craft beer ;) 15:05:35 yeah so the downside is the high contention for the available slots 15:05:57 the ceilo core team has been working on a "collaborative scheduling" exercise 15:06:17 I'll translate the output of that into something vaguely coherent on sched.org by EoW 15:06:34 ideally all these discussion would be done on gerrit in the open 15:06:43 maybe we'll get to that for Paris ... 15:07:14 also new for this summit ... there'll be a dedicated space for overflow sessions from each project track 15:07:20 eglynn: the source for the current proposal system is in git somewhere, so we could possibly add features 15:07:47 dhellmann: cool, let's think about far enough in advance of the K* summit 15:07:53 * dhellmann nods 15:08:12 see https://wiki.openstack.org/wiki/Summit under "Program pods" for a short blurb on the pod idea 15:08:41 ... inevitably the contention will result in us punting some of the session proposals to the pod 15:09:00 ... we've also identified a bunch of candidate sessions for merging 15:10:08 eglynn:looking at the pod description...seems like some conversations would fit there well and others not so much 15:10:29 i.e. collaboration within ceilometer team yes, cross-team no 15:10:40 nealph: yeap, that is true 15:10:54 okay...guessing the core team has that in mind. :) 15:10:58 nealph: cross-team also more likely to suffer from scheduling conflicts 15:12:00 nealph: ... yeah IIRC none of the punts are obviously cross-team proposals, so I think we're good on that point 15:12:01 * nealph sighs 15:12:26 eglynn: I think so too 15:12:46 nealph: sigh == "scheduling is hard" ? 15:13:11 cool...appreciate the work core team is doing. excited to see the schedule. sighing because we always seem to conflict with heat. :)_ 15:13:30 (and was hoping to talk about tripleo) :) 15:13:42 nealph: yeah I wasn't filled with joy about that either conflict either 15:14:15 nealph: ... in previous summits heat and ceilo were kept apart because we had some pressing cross-project issues to discuss 15:14:29 (i.e. autoscaling/alarming) 15:15:42 perhaps I'm remembering wrong then...regardless, will be good sessions I'm sure. 15:15:43 nealph: ... but yeah we've conflicted before also, I guess only so many ways to slice and dice the 4 days 15:16:59 BTW any cores who haven't fully filled in their prefs on the proposals (and want to get their say in), pls do so by EoD today 15:17:13 move on folks? 15:17:25 +1 15:17:31 #topic update on f20-based gating 15:17:42 we discussed this last week 15:17:52 BTW https://review.openstack.org/86842 hasn't landed yet 15:18:29 I suspect because the reviewer pulling the trigger on the +1 becomes unoffiical "nursemaid" for the new job 15:18:34 eglynn: at least it seems close to 15:18:36 so many +'s :) 15:19:00 <_nadya_> HP doesn't have image as i understood 15:19:08 only rackspace 15:19:13 possibly that's the reason 15:19:22 <_nadya_> yep 15:19:26 _nadya_: yeah, there was push-back from the infra guys on that redundancy issue 15:20:10 _nadya_: at the infra meeting, when you brought it up ... pushback away from f20, towards trusty, right? 15:20:28 <_nadya_> I guess the only thing we may do is to wait 15:20:58 possibly after summit we'll have ubuntu14/f20 - both will be cool for mongo 15:21:02 * eglynn not entirely understanding the infra team's logic TBH 15:21:21 not on the image redundancy across HP & RAX clouds 15:21:39 ... more on the point that the TC decides distro policy for project, infra just implement it 15:21:52 TC policy is ... 15:21:53 <_nadya_> eglynn: jobs may run on hp-cloud or rackspace. it's not determined as I understand (maybe wrong) 15:22:10 #link http://eavesdrop.openstack.org/meetings/tc/2013/tc.2013-01-08-20.02.html 15:22:26 _nadya_, yes, that's true 15:22:32 _nadya_: jobs need to be runnable on *both* clouds, or? 15:22:38 that's not determined 15:22:40 <_nadya_> eglynn: yep 15:23:00 eglynn - if there is no image on HP -> jobs going to it will fail 15:23:11 and ok for rackspace 15:23:12 DinaBelova: yep, agreed 15:23:38 but on the TC policy point, pasting here for reference ... 15:23:51 "OpenStack will target its development efforts to latest Ubuntu/Fedora, but will not introduce any changes that would make it impossible to run on the latest Ubuntu LTS or latest RHEL." 15:24:13 infra interpretation is ... 15:24:18 "basic functionality really ought to be done in the context of one of the long-term distros" 15:25:19 sounds like a tension between target-to-latest and gate-on-long-term-distros 15:25:51 dhellmann: ... were you around on the TC when that distro policy ^^^ was agreed? 15:27:13 ... k, I'll take that discussion on the distro policy off-line 15:27:17 ... moving on 15:27:22 eglynn: no, I think that predates me 15:27:28 :) 15:27:49 we might have inherited that from the ppb 15:28:14 dhellmann: cool, I'll see if I can clarify with the infra folks at their next meeting 15:28:25 eglynn: good idea 15:29:05 * eglynn doesn't want to be caught in the mongo-less gate situation again, so having alternative gate distros with latest version is goodness IMO 15:29:10 #topic launchpad housekeeping 15:29:25 DinaBelova has identified a bunch of bugs & BPs in LP that need attention 15:29:35 (in terms of status relfecting reality) 15:29:44 #link https://etherpad.openstack.org/p/ceilometer-launchpad-cleaning 15:29:49 well, yes 15:30:04 maybe we could have some periodic rounds on LP 15:30:21 while surfing the launchpad it turned out that there are some things that should be fixed I guess 15:30:25 yeah if anyone wants to pitch in and help with the bureaucracy ... we could divide and conquer 15:30:36 ildikov, well, there are triage-days in some of hte OS projects 15:30:47 maybe use DinaBelova's etherpad as a mutex? 15:30:50 DinaBelova: sounds good 15:31:01 DinaBelova: I thought somethings similar 15:31:26 i.e. if you're gonna attack a section, mark it on the etherpad so that you effort isn't duplicated? 15:31:29 well, so they are running 1-2 days a month if there is no much load 15:31:39 triage-days I mean 15:32:02 yeah, I'd be on for regular triage days in general 15:32:19 eglynn, DinaBelova: or having a round robin schedule for cores and whoever wants to join for checking LP periodically 15:32:29 eglynn: it was an issue earlier too 15:32:41 eglynn: and I think it will be a continuous issue TBH 15:32:45 ildikov, eglynn - it's up for you)) 15:32:56 the solution I mean 15:32:56 :) 15:33:02 and other core team members, sure) 15:33:07 so I'm definitely on for triage-days or anything similar 15:33:21 DinaBelova: thanks for the etherpad BTW 15:33:31 my preference is to avoid too much heavy-weigth scheduling of the core team's time 15:33:35 eglynn, please mark this as info - decision about triage days 15:33:40 ildikov, np 15:33:41 DinaBelova: in long term it will be not effective, but it will be good for now for a heads up for sure 15:34:04 ildikov, I guess yes 15:34:16 ... as everyone has chaotic demands on their schedules, hard to make round-robin scheduling stick 15:34:18 eglynn: sure, that's true also 15:34:20 * jd__ used to triage NEW once a week at least :( 15:34:30 for now mess is huge :( 15:34:47 eglynn: it is a painful process anyway I think, no one likes administration... 15:35:00 jd__, well, I guess it'll be great - as now there is a traffic jam really)) 15:35:07 jd__: right, I'll follow your lead and go with a once-a-week trawl 15:35:15 DinaBelova: i'll take a quick look through the list. thanks for building it. 15:35:22 gordc, np 15:35:42 ... and if anyone wants to also pitch in on a best-effort basis, that would be welcome also 15:36:02 gordc, the main problem here is that there is also huge list of completely new bigs/bps 15:36:11 and I did not mention them here 15:36:34 you can't really clean BP i think because you can't even delete it, they just rot 15:36:42 s/it/them/ 15:37:01 well, at least we may set priority for the almost merged things 15:37:12 does launchpad has any advanced feature to help this kind of work? 15:37:14 as there are lots of them here too 15:37:32 llu-laptop, don't think so 15:37:43 :( 15:38:21 what's the thought on moving to gerrit for blueprint review? 15:38:37 ... as was recently discussed for nova on the ML? 15:38:41 eglynn: like nova-specs? 15:38:50 +1 15:39:03 ... not a solution for existing BP cruft, but might prevent the accretion in the future 15:39:36 eglynn: +1 from me too 15:39:58 iiuc, those nova specs all have a blueprint, too, and the reviews are only used to track changes to the implementation plan but not the status of the blueprint's implementation or schedule 15:39:59 #action eglynn look into the mechanics of reviewing new BPs in gerrit 15:40:08 +1, this 15:40:13 ildikov, eglynn but please notice that that will also make load on the core reviewers even much 15:40:15 on LP the outdate ones can be set to an invalid state or something like this 15:40:20 ... as it's now 15:40:39 DinaBelova: sure, but at least the BPs will be finally reviewed 15:40:45 ildikov, sure 15:40:46 DinaBelova: yes, true, but the tradeoff is that code reviews should be easier, because we would have agreed to the design in advance 15:40:55 and not appearing on the LP without the need 15:41:03 dhellmann +1 15:41:14 DinaBelova: ... true enough, but we'd have to look at that extra upfront workload as an investment for the future 15:41:16 dhellmann: +1 15:41:53 k, we're up against the shot-clock here so better move on 15:41:56 b.t.w. does the approval of ceilometer-spec patch will be reflected on launchpad blueprint? 15:42:31 llu-laptop: my understanding was that "approved" status would be gated on the gerrit review of the BP 15:42:48 llu-laptop: ... as opposed to just being set on an ad-hoc basis 15:43:16 eglynn: got that, please move on 15:43:23 #topic Tempest integration 15:43:31 _nadya_: anything new to report? 15:44:11 btw these patches are the following 15:44:12 <_nadya_> no valuable updates. Postgres doesn't work quick enough too :( 15:44:19 #link https://review.openstack.org/#/q/status:open+project:openstack/tempest+owner:vrovachev,n,z 15:44:45 <_nadya_> DinaBelova: some became abandoned today or yesterday 15:44:55 DinaBelova: so those patches are still effectively blocked from landing by the sqla performance issues? 15:44:56 already restored) 15:45:02 eglynn, yes 15:45:13 we're blocked by the f20/ubuntu14 15:45:21 with mongo 15:45:31 that is working 30x times faster 15:45:33 <_nadya_> we may move on to 'performance tests' :) 15:45:43 k, so no change then until we sort out the sqla issues and/or gate the longer-running tests on mongo 15:45:48 _nadya_, yes) 15:46:04 <_nadya_> eglynn: yep 15:46:04 eglynn, yes, it looks so 15:46:23 fair enough, so that dovetails nicely to the next topic 15:46:30 #topic Performance testing 15:46:41 ityaptin: the floor is yours, sir! 15:47:10 how you known we started perfomance testing 15:47:54 #link https://docs.google.com/document/d/1ARpKiYW2WN94JloG0prNcLjMeom-ySVhe8fvjXG_uRU/edit?usp=sharing 15:47:57 we tests mysql, mongo, hbase - it's standalone backends and habse cluster on VMs 15:48:01 ityaptin: it was on the meeting agenda and discussed on the IRC channel yesterday 15:48:56 And I guess all core reviers to take look on this document 15:49:19 as currently we got feedback from you, eglynn, I guess and that's it 15:49:22 I mean from core team 15:49:43 tests results of mysql shows what it's work more than 30 times slower than hbase or mongo 15:49:47 dhellmann, ildikov, jd__, gordc ^^ 15:49:50 ityaptin: I think it will be very useful after having the revised SQL models, etc 15:50:08 DinaBelova, ityaptin: so I was also wondering if the test harness used to generate the load was up on github or somewhere similar? 15:50:14 does that include any tuning of mysql itself? or changes to our indexes? 15:50:29 ildikov, yes) 15:50:38 so we can compare how much the situation is better 15:50:53 <_nadya_> to evaluate "new model" we need results for the old one. to compare 15:51:02 ityaptin: ... having the loadgen logic in the public domain would be really useful for anyone following up on your results 15:51:05 eglynn, not yet) but I can, if you want) 15:51:15 <_nadya_> dhellmann: no, no tuning 15:51:21 ildikov: excellent, that would be great! 15:51:23 ityaptin, I guess it'll be anyway better to share it 15:51:29 ityaptin: ^^^ 15:51:47 darned tab completion of irc nicks! 15:51:57 okay, and one more time - core team - please take a look on the results 15:52:05 and please propose your own cases/ideas 15:52:18 eglynn: np ;) 15:52:20 as ityaptin is going to continue this work 15:52:21 _nadya_: it would be useful to have some profiling info about what part of the sql driver is so slow, to see if tuning helps 15:52:26 <_nadya_> so the main plea is "please tell us what results do you want to see" 15:52:30 mongo was really slow at one point, until we improved the indexes there 15:53:17 dhellmann, I'll try to find bottleneck in mysql 15:53:22 <_nadya_> dhellmann: ok, will take that into account 15:53:23 DinaBelova: this is with a single worker?... i get the feeling sql will never work unless with multiple workers (it was consistently 10x slower for me) 15:53:46 gordc, yes single one for now 15:54:09 gordc: there is a 3-collector setup 15:54:13 gordc, ityaptin is planning to try other variant too 15:54:17 dhellmann: did you mean there SQLAlchemy as driver? 15:54:17 dhellman, if we have some results, it will be shown 15:54:47 DinaBelova, ityaptin, _nadya_: any ideas on the cause of the saw-tooth pattern observed in time-per-message for hbase as the load is scaled up in the 3-collector case? 15:55:13 we had 2 ideas: 15:55:15 I wonder if there's a more efficient way to implement _create_or_update() in that sql driver 15:55:26 ... i.e. the semi-regular spikes in the time-per-message 15:55:46 1) It's connection pool size in hbase and mongo. It is not true. 15:55:50 the api response time are concerning. especially for only 100k samples... i was testing with 1million and it was comparable/better than that on sql. 15:56:09 _create_or_update was changed not so long ago or it was just planned to be changed? 15:56:13 we test different poolsizes in hbase and all results have same pattern 15:56:38 dhellmann: i'd hope to drop the _create_or_update logic... the update option is a bottleneck. 15:56:39 <_nadya_> eglynn: ityaptin speaking about 1-collector case 15:56:40 2) greenpool size 15:56:48 gordc: yeah 15:57:32 grenn pool size is not tested yet. 15:57:36 <_nadya_> eglynn: there is a peak there with 400 m/s. Regarding 3-collectors case I don't know the answer yet 15:57:49 fyi, this is the etherpad i created for ceilometer reschema session: https://etherpad.openstack.org/p/ceilometer-schema 15:57:54 _nadya_: k 15:58:25 <_nadya_> eglynn: actually we don't tune hbase yet. only schema optimization 15:58:39 ok, any questions here? as that's close to the end of the meeting 15:58:58 gordc: yeah re. API responsiveness, the "was 141.458 a typo or in seconds?" comment is revealing 15:59:23 ... that sounds unusable as an API 15:59:31 :( 15:59:38 eglynn, yeah... 15:59:39 eglynn: agreed..especially for such a small set. 15:59:55 ... would cause an LB or haproxy to drop the connection long before the API call completes :( 16:00:17 right, we a lot of work to do on performance 16:00:24 but we're outta time now 16:00:31 eglynn, +1 16:00:40 lets continue the discussion in ATL 16:00:53 thanks folks! ... let's close now 16:01:01 bye 16:01:01 #endmeeting ceilometer