15:00:50 <eglynn> #startmeeting ceilometer 15:00:50 <openstack> Meeting started Thu Sep 11 15:00:50 2014 UTC and is due to finish in 60 minutes. The chair is eglynn. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:54 <openstack> The meeting name has been set to 'ceilometer' 15:01:10 <eglynn> hey folks, who is around for the ceilo weekly? 15:01:13 <nealph> o/ 15:01:14 <nsaje> o/ 15:01:16 <llu-laptop> o/ 15:01:41 <eglynn> #topic Juno-3 status and FFEs 15:01:52 <cdent> o/ 15:01:55 <nsaje> people taking a rest after j-3 :) 15:01:55 <eglynn> so we've made some progress on the FFEs :) 15:02:09 <sileht> o/ 15:02:39 <eglynn> nealph has for his pass-event-format landed with a bunch of +1s from the PaaS side of the house, nice work! :) 15:02:56 <nealph> eglynn: thx! 15:03:06 <idegtiarov> o/ 15:03:29 <eglynn> the main resource renormalization patch for bigger-data-sql is safely landed 15:03:44 <eglynn> so the outstanding one there is the sql-a ORM->core switch 15:04:00 <cdent> +1 15:04:04 <gordc> o/ 15:04:17 <nsaje> that's great news 15:04:24 <eglynn> cdent has been doing some detailed measurements of the performance benefits of the ORM switchout using rally 15:04:45 <gordc> thanks to all those who've helped test out the performance patches. 15:05:26 <eglynn> gordc: agreed :) 15:05:33 <cdent> the stack of variables involved in any performance test makes it hard to be absolutely sure about the details of the improvement, but its definitely an improvement 15:06:10 <gordc> cdent: good to hear. 15:06:29 <eglynn> cdent, gordc: I'm thinking we should declare victory on it sooner rather than later as the window of opportunity to land this may start closing soon 15:06:45 <cdent> I think that's fair and safe. 15:06:48 <DinaBelova> o/ 15:07:07 <gordc> eglynn: you mean the sqla-core patch or the tc gap requirement? 15:07:19 <eglynn> ... so I think I'll go ahead and approve, and we can close out quantifying the performance gain in parallel 15:07:30 <gordc> eglynn: gotcha... i think that's safe. 15:08:09 <eglynn> gordc: mainly the sqla-core patch in the short term 15:08:24 <eglynn> gordc: ... but in the longer term also the overall aggregated benefit of all the sql-a improvements made during Juno 15:08:34 <gordc> eglynn: makes sense to me. 15:08:45 <eglynn> (as we're likely to have to make an end-of-cycle progress report to the TC) 15:08:47 <eglynn> cool 15:09:06 <eglynn> the last outstanding FFE is ipmi-support 15:09:17 <llu-laptop> sorry to shout out here, the ipmi patches https://review.openstack.org/#/q/project:openstack/ceilometer+topic:bp/ipmi-support,n,z will appreciate more reviews 15:09:18 <eglynn> still under active review 15:09:51 <eglynn> llu-laptop: yep, it feels like it getting close, gordc has added a bunch of feedback on the latter patch 15:10:14 <gordc> llu-laptop: i gave it a quick look... i'm not really familiar enough to approve it. 15:10:26 <nsaje> llu: I'll get to it after the meeting as well 15:10:36 <eglynn> cool, thanks folks 15:10:45 <llu-laptop> eglynn, gordc: thx. Sorry gzhai isn't around here now, will notify him tomorrow morning 15:10:56 <llu-laptop> nsaje: thanks for the review 15:11:02 <eglynn> llu-laptop: thank you! 15:11:25 <eglynn> so other than those 3 FFEs, we've also got a bunch of bugs still targetted at RC1 15:11:27 <eglynn> https://launchpad.net/ceilometer/+milestone/juno-rc1 15:12:02 <eglynn> I'm going to start bumping the non-critical ones that look like they won't make it 15:12:32 <llu-laptop> what's the status of bug https://bugs.launchpad.net/bugs/1337761? 15:12:35 <uvirtbot> Launchpad bug 1337761 in ceilometer "significant performance degradation when ceilometer middleware for swift proxy uses" [High,In progress] 15:12:42 <llu-laptop> it seems no patch for that? 15:12:45 <eglynn> so if you've any further fixes up your sleeves, please get them proposed soon :) 15:12:56 <eglynn> cdent: can you speak to that one? ^^^ 15:13:30 <gordc> i just got a patch in... 30hr merge time 15:13:37 <eglynn> ouch! 15:13:49 <cdent> there was a patch in progress on that and it stalled out. nsaje and DinaBelova gave lots of good feedback that the author didn't appear to respond to 15:14:02 <gordc> i'm just happy i didn't need to recheck 15:14:05 <eglynn> cdent: ok, it doesn't look like its likely to get traction by RC1 15:14:11 <cdent> no 15:14:19 <eglynn> sounds like a candidate for bumping, sadly 15:14:36 <eglynn> just to clarify the endgame for RC1 ... 15:14:50 <eglynn> I think I mis-spoke at this meeting last week 15:15:01 <cdent> I think one of the reasons it stalled is because several people said "we should totally change that and not even have it in the ceilometer package" 15:15:25 <eglynn> until RC1, everything just needs to be landed on master, no need for backporting 15:15:43 <eglynn> then the propose/juno branch will be cut, based off the juno-rc1 tag 15:16:12 <eglynn> from RC1 on, any patches for RCn (n>1) will need to be landed on master first then backported to proposed/juno 15:16:26 <eglynn> meanwhile master is opened for kilo development 15:16:31 <nsaje> good approach! 15:16:59 <llu-laptop> so no close window on master branch? 15:17:08 <eglynn> so expect a lot of scrutiny from the rel mgr for any post-RC1 patches, definitely in our interest to get as much as possible landed for RC1 15:17:12 <cdent> is there a date for RC1 to be cut or is that at the whim of ttx 15:17:54 <nsaje> 25th of Sep if I'm not mistaken 15:18:04 <nsaje> https://wiki.openstack.org/wiki/Juno_Release_Schedule 15:18:13 <cdent> thanks 15:18:22 <ttx> cdent: no, it's when the RC1 buglist is empty 15:18:27 <eglynn> cdent: not a set date, it's decided by agreement with PTLs, likely to be EoW next week 15:18:43 <cdent> thanks ttx :) 15:18:49 <ttx> ideally in the second half of September, yes 15:19:10 <nsaje> likely not if there's a 30hr merge time :) 15:20:01 <eglynn> is it my imagination, or has that merge time been slowing reducing since the juno-3 gate rush? 15:20:16 <eglynn> ... me wonders if 30hrs was an outlier 15:20:38 <gordc> eglynn: you don't want to look at zuul 15:20:59 <eglynn> gordc: ... it generally makes my eyes bleed :( 15:21:17 * cdent loves to look at zuul 15:22:39 * eglynn hates the feeling of deflation when a nearly-complete verify is cancelled and all jobs for that patch are reset to "queued" 15:22:51 <eglynn> ... so close, and yet so far 15:23:40 <eglynn> OK, to sum up ... overall I think we're in relatively good shape on the FFE BPs, so we need to switch focus a bit to bugs over the coming week 15:23:47 <eglynn> shall we move on? 15:23:56 <eglynn> #topic TSDaaS/gnocchi status 15:23:56 <cdent> so to clarify: master is not kilo until rc>=2? 15:24:20 <eglynn> is the pasta-chef around, I wonder? 15:25:01 <eglynn> cdent: no, master == kilo from RC1 onwards 15:25:16 <cdent> thanks eglynn 15:25:29 <sileht> I will start the work around the ceilometer dispatcher 15:25:30 <gordc> eglynn: we have an internal meeting 15:25:45 <eglynn> cdent: there way not even be an RC2 if we don't need it (RCn with n>=2 is on-demand) 15:25:59 <eglynn> gordc: cool enough, let's punt gnocchi so 15:26:06 <eglynn> #topic Tempest status 15:26:42 <eglynn> this one may be short also, as DinaBelova is officially on vacation IIUC 15:26:58 <gordc> i think we're going to need to start reenabling tests post juno given state of gate. 15:27:22 <eglynn> gordc: do you mean the nova notification test that was skipped? 15:27:43 <eglynn> #link https://review.openstack.org/115212 15:27:59 <gordc> eglynn: yeah. seems like there's bigger issues... i'm not sure there's anything specifically wrong with our notification test though 15:28:13 <gordc> doesn't seem like it based on memory/cpu usage of ceilometer services 15:28:45 <eglynn> gordc: ... yeah, /me notes Dina's comment "all the gate slaves eating kind of 30-40% of RAM on them (unfortunately, ZooKeeper among them :))" 15:28:52 <gordc> #link http://lists.openstack.org/pipermail/openstack-dev/2014-September/045394.html 15:29:04 <gordc> ^^ memory usage issues 15:29:42 <eglynn> ... that ZK usage mentioned is orthogonal to our central agent scale out, right? 15:30:29 <cdent> the devstack stuff I'm working on right now is related to "larger issues" of process not existing for random reasons, e.g. the notification agent being unavailable 15:30:30 <cdent> not sure if that is related here 15:30:42 <DinaBelova> eglynn, there is the issue in the fact that on all the slaves ZK is installed and doing just noothing eating the RAM - in production it's ok, in the gate - not so much 15:30:46 <cdent> but there's a huge slew of changes related to memory and cpu usage in devstack queued up 15:31:08 <DinaBelova> that's not the central agent HA issue or any thing like that - that's infra issue 15:31:11 <gordc> i'm not sure how decisions are made to pull in dependencies... technically the central agent scale out isn't being enabled in gates 15:31:12 <cdent> the zk is apparently leakage from other images and is supposed to be fixed soonish 15:31:19 <DinaBelova> and ZK is not the only service there 15:31:27 <cdent> (according to discussion tuesday night) 15:31:37 <DinaBelova> gordc, ++ 15:31:54 * DinaBelova found the laptop :) 15:32:01 <eglynn> DinaBelova: a-ha, so that ZK installation in the jenkins slaves would have pre-dated the ceilometer central agent's usage of ZK via tooz, right? 15:32:29 <eglynn> gordc: yeah that was my working assumption also 15:32:29 <nsaje> eglynn: central agent tests don't really use neither tooz, nor ZK 15:32:36 <DinaBelova> eglynn, for the gate we can even use memcached 15:32:45 <DinaBelova> not so reosurce eating thing in the gate 15:32:48 <DinaBelova> nsaje ++ 15:32:53 <DinaBelova> we're using tooz 15:32:59 <DinaBelova> not the ZK directly 15:33:06 <eglynn> DinaBelova, nsaje: cool, got it 15:33:20 <nsaje> DinaBelova: I'm guessing coordination_url is empty in devstack (as it's the default), no? 15:33:32 <DinaBelova> nsaje, I'm 100% sure yes 15:33:37 <jd__> o/ 15:33:46 <DinaBelova> no coordination url set there for sure 15:33:49 <DinaBelova> jd__ ++ 15:33:51 <DinaBelova> oops 15:33:52 <eglynn> nsaje: I guess the longer term goal would be to enable central agent scale out in the gate also? 15:33:55 <DinaBelova> I meant hello :) 15:34:03 <nsaje> eglynn: definitely 15:34:08 <eglynn> nsaje: cool 15:34:19 <DinaBelova> eglynn, of course - but with memcached for the first time I guess 15:34:23 <nsaje> eglynn: but we could easily use memcached 15:34:25 <gordc> eglynn: +1, be good to have it enabled and verified. 15:34:41 <eglynn> yeap, aggreed 15:34:44 <gordc> nsaje: yeah memcached might be better... it works fine locally for me too. 15:34:50 <nsaje> as we won't be doing functional testing with simulating agent crashes etc. in the gate 15:34:59 <nsaje> so no worry about backend performance 15:35:07 <DinaBelova> gordc, ZK works good as well :) it's just not the gating thing :) 15:35:24 <eglynn> nsaje: still if we're recommending tooz/zk in production, that's what we should have running in the gate also, right? 15:35:25 <gordc> DinaBelova: agreed 15:35:47 <DinaBelova> eglynn, I propose in this case periodical jobs probably 15:35:48 <eglynn> (even if we don't explicitly test the scale in/out) 15:35:48 <jd__> shouldn't be a problem to have ZK in the gate, it's already installed at least 15:35:56 <nsaje> eglynn: ok, good point 15:35:58 <jd__> we use it for tooz testing 15:36:04 <DinaBelova> jd__, well... that's actually the problem 15:36:18 <DinaBelova> jd__ - did you see RAM overusage thread? 15:36:30 <jd__> I saw it but didn't read it through 15:36:51 <eglynn> jd__: see Dina's comment on https://review.openstack.org/115212 ... "on all the gate slaves eating kind of 30-40% of RAM on them (unfortunately, ZooKeeper among them :))" 15:36:54 <DinaBelova> jd__, it turned out that ZK is one of the things installed to all the gate slaves that eat too much RAM 15:37:13 <jd__> too bad 15:37:16 <DinaBelova> it was a big noise around it previous week 15:37:27 <DinaBelova> jd__ - well, this overusage is not only becasue of ZK 15:37:37 <DinaBelova> but it's among of the 'bad' services 15:38:04 <eglynn> ok, so it sounds like the ZK issue is something we'll need to solve in order for the ceilo central agent to be run "realistically" in the gate 15:38:07 <DinaBelova> the problem is that currently it's installed to all the slaves no matter if we need it for the testing or not 15:38:15 <DinaBelova> eglynn, yes, indees 15:38:18 <DinaBelova> indeed* 15:38:45 <DinaBelova> jd__, to learn more about that RAM overusage research you may ping infra folks 15:38:52 <cdent> there's some work in progress to narrow the number of workers that a lot of services are using which will make room for things like zk 15:38:59 <llu-laptop> DinaBelova: which openstack service is actually use ZK currently in gate? 15:39:11 <DinaBelova> llu-laptop, it is installed to test tooz 15:39:37 <jd__> nova might be 15:39:53 <eglynn> service groups? 15:39:57 <DinaBelova> cdent, yes, I hope so as well 15:40:38 <jd__> eglynn: yes 15:41:40 <gordc> cdent: just fyi, i enabled multi workers for the collector to verify multi-worker functionality works... but kept it at half # of core 15:41:57 <eglynn> ok, so we're not going to solve the problem here & now ... but was there a sense on that discussion that ZK is gonna ne removed from the mix by the infra folks? 15:42:02 * cdent nods at gordc 15:42:06 <eglynn> is gonna *be 15:42:22 <cdent> eglynn: somebody was looking into it to "fix the leakage" 15:42:26 <cdent> so that it was only there where it was needed 15:42:29 <eglynn> ... i.e. if it's considered one of the 'bad' services 15:42:38 <eglynn> cdent: a-ha, ok 15:42:47 <DinaBelova> eglynn, it was the beginning of this discussion when I saw it... 15:42:54 <DinaBelova> so I'm not sure about the result 15:43:12 <eglynn> right-o, all good info ... seems like the discussion/investigation is still active 15:43:13 <DinaBelova> I guess cdent is closer to the truth here than I 15:43:33 <cdent> I'm just reporting what I heard in os-qa, but there wasn't any timetable 15:43:46 <DinaBelova> eglynn, yeah... because nobody wants to remove innocent package there :) 15:44:13 <eglynn> DinaBelova: yeah, that would be unjust :) 15:44:37 <DinaBelova> cdent, I was lurking the infra channel (my irc client was ringing about words like ZK, etc.) but it was ~1am in my time zone that time so I just left it 15:45:06 <cdent> Yeah, seems like all the action for those guys in the late or early hours 15:45:43 <eglynn> yeap, unfortunate timing from the European perspective 15:45:50 <cdent> since jd__ is back can we get a gnocchi update before we run out? 15:46:00 <eglynn> jd__: BTW we punted on the gnocchi topic earlier, do you want to circle back to that? 15:46:06 <eglynn> cdent: snap! :) 15:46:07 <cdent> jinx 15:46:12 <jd__> sure I don't have much to say though 15:46:28 <jd__> I've resumed my work on the archive policy today 15:46:37 <jd__> and I'm planning to continue and get this merged ASAP 15:46:41 <jd__> eglynn: I'll need you to review on that 15:46:49 <jd__> I need to also continue to discuss with you about InfluxDB 15:46:51 <eglynn> jd__: I'll take another look at those patches before EoD today 15:47:12 <jd__> the collector dispatcher job should be handled by sileht, finger crossed :) 15:47:36 <jd__> we also need to check some things about the result fetch part 15:47:46 <jd__> I think there's some thing wrong with how we merge data from different timeseries 15:47:51 <jd__> (we already discussed that but didn't make any change) 15:48:05 <jd__> and once that's done we should be pretty good towards having a Juno release 15:48:19 <jd__> (all what I just said I mean :) 15:48:20 <eglynn> sileht: can you sync up with ildikov on that dispatcher work 15:48:22 <eglynn> ? 15:49:52 <eglynn> jd__: "the result fetch part" is on the measurements get from multiple entities or? 15:50:12 <jd__> nop, it's about that which I'd call aggregation 15:50:30 <jd__> it's about if we have results spanned over multiple archives for an entity (with different granularity) the current code merges them 15:50:35 <jd__> and the result are kinda "weird" in the end 15:50:45 <eglynn> jd__: a-ha, got it, thanks 15:50:47 <jd__> there's no technical challenge, just a correct design to pick 15:51:02 <jd__> and change the code to reflect what we think is right 15:51:22 <jd__> I'll probably run a mail thread for that when I get to it 15:51:28 <DinaBelova> sileht, also idegtiarov wanted to work on the dispatcher - he said about that ~2week ago on the meeting - he was on the vacation last week, so the talk was about to start work on it this one or the next 15:51:30 <jd__> (when I don't know I ask :-) 15:51:53 <eglynn> jd__: yeah, a thread on that would be good, thanks! 15:53:25 <eglynn> DinaBelova: yep, so sileht, idegtiarov & ildikov will need to coordinate their efforts on that to avoid any duplication of effort 15:53:35 <DinaBelova> eglynn, yeah, +1 15:54:56 <eglynn> #topic open discussion 15:55:15 <eglynn> jd__ has nominated two new ceilometer cores in the past couple days :) 15:55:29 <eglynn> #link http://lists.openstack.org/pipermail/openstack-dev/2014-September/045537.html 15:55:35 <eglynn> #link http://lists.openstack.org/pipermail/openstack-dev/2014-September/045734.html 15:55:53 <vrovachev> applause :) 15:56:00 <eglynn> vrovachev: indeed :) 15:56:02 <cdent> yeah, +1 on both of those 15:56:17 <nsaje> thanks for the nomination jd__ ! 15:56:20 <cdent> do I get a vote or am I only able to unofficially say "woot!"? 15:56:34 <eglynn> if you haven't done so yet, please consider chiming in on the ML by Monday next 15:57:01 <eglynn> cdent: all w00ts will be appreciated I'm sure :) 15:57:10 <cdent> excellent 15:58:35 <eglynn> anyone go anything else they'd like to chat about? 15:58:49 <cdent> nope 15:58:56 <eglynn> right-o, let's call this a wrap so 15:59:06 <eglynn> ... thanks all for a productive meeting! :) 15:59:10 <eglynn> #endmeeting ceilometer