#openstack-meeting log

15:00:50 <eglynn> #startmeeting ceilometer
15:00:50 <openstack> Meeting started Thu Sep 11 15:00:50 2014 UTC and is due to finish in 60 minutes.  The chair is eglynn. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:54 <openstack> The meeting name has been set to 'ceilometer'
15:01:10 <eglynn> hey folks, who is around for the ceilo weekly?
15:01:13 <nealph> o/
15:01:14 <nsaje> o/
15:01:16 <llu-laptop> o/
15:01:41 <eglynn> #topic Juno-3 status and FFEs
15:01:52 <cdent> o/
15:01:55 <nsaje> people taking a rest after j-3 :)
15:01:55 <eglynn> so we've made some progress on the FFEs :)
15:02:09 <sileht> o/
15:02:39 <eglynn> nealph has for his pass-event-format landed with a bunch of +1s from the PaaS side of the house, nice work! :)
15:02:56 <nealph> eglynn: thx!
15:03:06 <idegtiarov> o/
15:03:29 <eglynn> the main resource renormalization patch for bigger-data-sql is safely landed
15:03:44 <eglynn> so the outstanding one there is the sql-a ORM->core switch
15:04:00 <cdent> +1
15:04:04 <gordc> o/
15:04:17 <nsaje> that's great news
15:04:24 <eglynn> cdent has been doing some detailed measurements of the performance benefits of the ORM switchout using rally
15:04:45 <gordc> thanks to all those who've helped test out the performance patches.
15:05:26 <eglynn> gordc: agreed :)
15:05:33 <cdent> the stack of variables involved in any performance test makes it hard to be absolutely sure about the details of the improvement, but its definitely an improvement
15:06:10 <gordc> cdent: good to hear.
15:06:29 <eglynn> cdent, gordc: I'm thinking we should declare victory on it sooner rather than later as the window of opportunity to land this may start closing soon
15:06:45 <cdent> I think that's fair and safe.
15:06:48 <DinaBelova> o/
15:07:07 <gordc> eglynn: you mean the sqla-core patch or the tc gap requirement?
15:07:19 <eglynn> ... so I think I'll go ahead and approve, and we can close out quantifying the performance gain in parallel
15:07:30 <gordc> eglynn: gotcha... i think that's safe.
15:08:09 <eglynn> gordc: mainly the sqla-core patch in the short term
15:08:24 <eglynn> gordc: ... but in the longer term also the overall aggregated benefit of all the sql-a improvements made during Juno
15:08:34 <gordc> eglynn: makes sense to me.
15:08:45 <eglynn> (as we're likely to have to make an end-of-cycle progress report to the TC)
15:08:47 <eglynn> cool
15:09:06 <eglynn> the last outstanding FFE is ipmi-support
15:09:17 <llu-laptop> sorry to shout out here, the ipmi  patches https://review.openstack.org/#/q/project:openstack/ceilometer+topic:bp/ipmi-support,n,z will appreciate more reviews
15:09:18 <eglynn> still under active review
15:09:51 <eglynn> llu-laptop: yep, it feels like it getting close, gordc has added a bunch of feedback on the latter patch
15:10:14 <gordc> llu-laptop: i gave it a quick look... i'm not really familiar enough to approve it.
15:10:26 <nsaje> llu: I'll get to it after the meeting as well
15:10:36 <eglynn> cool, thanks folks
15:10:45 <llu-laptop> eglynn, gordc: thx. Sorry gzhai isn't around here now, will notify him tomorrow morning
15:10:56 <llu-laptop> nsaje: thanks for the review
15:11:02 <eglynn> llu-laptop: thank you!
15:11:25 <eglynn> so other than those 3 FFEs, we've also got a bunch of bugs still targetted at RC1
15:11:27 <eglynn> https://launchpad.net/ceilometer/+milestone/juno-rc1
15:12:02 <eglynn> I'm going to start bumping the non-critical ones that look like they won't make it
15:12:32 <llu-laptop> what's the status of bug https://bugs.launchpad.net/bugs/1337761?
15:12:35 <uvirtbot> Launchpad bug 1337761 in ceilometer "significant performance degradation when ceilometer middleware for swift proxy uses" [High,In progress]
15:12:42 <llu-laptop> it seems no patch for that?
15:12:45 <eglynn> so if you've any further fixes up your sleeves, please get them proposed soon :)
15:12:56 <eglynn> cdent: can you speak to that one? ^^^
15:13:30 <gordc> i just got a patch in... 30hr merge time
15:13:37 <eglynn> ouch!
15:13:49 <cdent> there was a patch in progress on that and it stalled out. nsaje and DinaBelova gave lots of good feedback that the author didn't appear to respond to
15:14:02 <gordc> i'm just happy i didn't need to recheck
15:14:05 <eglynn> cdent: ok, it doesn't look like its likely to get traction by RC1
15:14:11 <cdent> no
15:14:19 <eglynn> sounds like a candidate for bumping, sadly
15:14:36 <eglynn> just to clarify the endgame for RC1 ...
15:14:50 <eglynn> I think I mis-spoke at this meeting last week
15:15:01 <cdent> I think one of the reasons it stalled is because several people said "we should totally change that and not even have it in the ceilometer package"
15:15:25 <eglynn> until RC1, everything just needs to be landed on master, no need for backporting
15:15:43 <eglynn> then the propose/juno branch will be cut, based off the juno-rc1 tag
15:16:12 <eglynn> from RC1 on, any patches for RCn (n>1) will need to be landed on master first then backported to proposed/juno
15:16:26 <eglynn> meanwhile master is opened for kilo development
15:16:31 <nsaje> good approach!
15:16:59 <llu-laptop> so no close window on master branch?
15:17:08 <eglynn> so expect a lot of scrutiny from the rel mgr for any post-RC1 patches, definitely in our interest to get as much as possible landed for RC1
15:17:12 <cdent> is there a date for RC1 to be cut or is that at the whim of ttx
15:17:54 <nsaje> 25th of Sep if I'm not mistaken
15:18:04 <nsaje> https://wiki.openstack.org/wiki/Juno_Release_Schedule
15:18:13 <cdent> thanks
15:18:22 <ttx> cdent: no, it's when the RC1 buglist is empty
15:18:27 <eglynn> cdent: not a set date, it's decided by agreement with PTLs, likely to be EoW next week
15:18:43 <cdent> thanks ttx :)
15:18:49 <ttx> ideally in the second half of September, yes
15:19:10 <nsaje> likely not if there's a 30hr merge time :)
15:20:01 <eglynn> is it my imagination, or has that merge time been slowing reducing since the juno-3 gate rush?
15:20:16 <eglynn> ... me wonders if 30hrs was an outlier
15:20:38 <gordc> eglynn: you don't want to look at zuul
15:20:59 <eglynn> gordc: ... it generally makes my eyes bleed :(
15:21:17 * cdent loves to look at zuul
15:22:39 * eglynn hates the feeling of deflation when a nearly-complete verify is cancelled and all jobs for that patch are reset to "queued"
15:22:51 <eglynn> ... so close, and yet so far
15:23:40 <eglynn> OK, to sum up ... overall I think we're in relatively good shape on the FFE BPs, so we need to switch focus a bit to bugs over the coming week
15:23:47 <eglynn> shall we move on?
15:23:56 <eglynn> #topic TSDaaS/gnocchi status
15:23:56 <cdent> so to clarify: master is not kilo until rc>=2?
15:24:20 <eglynn> is the pasta-chef around, I wonder?
15:25:01 <eglynn> cdent: no, master == kilo from RC1 onwards
15:25:16 <cdent> thanks eglynn
15:25:29 <sileht> I will start the work around the ceilometer dispatcher
15:25:30 <gordc> eglynn: we have an internal meeting
15:25:45 <eglynn> cdent: there way not even be an RC2 if we don't need it (RCn with n>=2 is on-demand)
15:25:59 <eglynn> gordc: cool enough, let's punt gnocchi so
15:26:06 <eglynn> #topic Tempest status
15:26:42 <eglynn> this one may be short also, as DinaBelova is officially on vacation IIUC
15:26:58 <gordc> i think we're going to need to start reenabling tests post juno given state of gate.
15:27:22 <eglynn> gordc: do you mean the nova notification test that was skipped?
15:27:43 <eglynn> #link https://review.openstack.org/115212
15:27:59 <gordc> eglynn: yeah. seems like there's bigger issues... i'm not sure there's anything specifically wrong with our notification test though
15:28:13 <gordc> doesn't seem like it based on memory/cpu usage of ceilometer services
15:28:45 <eglynn> gordc: ... yeah, /me notes Dina's comment "all the gate slaves eating kind of 30-40% of RAM on them (unfortunately, ZooKeeper among them :))"
15:28:52 <gordc> #link http://lists.openstack.org/pipermail/openstack-dev/2014-September/045394.html
15:29:04 <gordc> ^^ memory usage issues
15:29:42 <eglynn> ... that ZK usage mentioned is orthogonal to our central agent scale out, right?
15:30:29 <cdent> the devstack stuff I'm working on right now is related to "larger issues" of process not existing for random reasons, e.g. the notification agent being unavailable
15:30:30 <cdent> not sure if that is related here
15:30:42 <DinaBelova> eglynn, there is the issue in the fact that on all the slaves ZK is installed and doing just noothing eating the RAM - in production it's ok, in the gate - not so much
15:30:46 <cdent> but there's a huge slew of changes related to memory and cpu usage in devstack queued up
15:31:08 <DinaBelova> that's not the central agent HA issue or any thing like that - that's infra issue
15:31:11 <gordc> i'm not sure how decisions are made to pull in dependencies... technically the central agent scale out isn't being enabled in gates
15:31:12 <cdent> the zk is apparently leakage from other images and is supposed to be fixed soonish
15:31:19 <DinaBelova> and ZK is not the only service there
15:31:27 <cdent> (according to discussion tuesday night)
15:31:37 <DinaBelova> gordc, ++
15:31:54 * DinaBelova found the laptop :)
15:32:01 <eglynn> DinaBelova: a-ha, so that ZK installation in the jenkins slaves would have pre-dated the ceilometer central agent's usage of ZK via tooz, right?
15:32:29 <eglynn> gordc: yeah that was my working assumption also
15:32:29 <nsaje> eglynn: central agent tests don't really use neither tooz, nor ZK
15:32:36 <DinaBelova> eglynn, for the gate we can even use memcached
15:32:45 <DinaBelova> not so reosurce eating thing in the gate
15:32:48 <DinaBelova> nsaje ++
15:32:53 <DinaBelova> we're using tooz
15:32:59 <DinaBelova> not the ZK directly
15:33:06 <eglynn> DinaBelova, nsaje: cool, got it
15:33:20 <nsaje> DinaBelova: I'm guessing coordination_url is empty in devstack (as it's the default), no?
15:33:32 <DinaBelova> nsaje, I'm 100% sure yes
15:33:37 <jd__> o/
15:33:46 <DinaBelova> no coordination url set there for sure
15:33:49 <DinaBelova> jd__ ++
15:33:51 <DinaBelova> oops
15:33:52 <eglynn> nsaje: I guess the longer term goal would be to enable central agent scale out in the gate also?
15:33:55 <DinaBelova> I meant hello :)
15:34:03 <nsaje> eglynn: definitely
15:34:08 <eglynn> nsaje: cool
15:34:19 <DinaBelova> eglynn, of course - but with memcached for the first time I guess
15:34:23 <nsaje> eglynn: but we could easily use memcached
15:34:25 <gordc> eglynn: +1, be good to have it enabled and verified.
15:34:41 <eglynn> yeap, aggreed
15:34:44 <gordc> nsaje: yeah memcached might be better... it works fine locally for me too.
15:34:50 <nsaje> as we won't be doing functional testing with simulating agent crashes etc. in the gate
15:34:59 <nsaje> so no worry about backend performance
15:35:07 <DinaBelova> gordc, ZK works good as well :) it's just not the gating thing :)
15:35:24 <eglynn> nsaje: still if we're recommending tooz/zk in production, that's what we should have running in the gate also, right?
15:35:25 <gordc> DinaBelova: agreed
15:35:47 <DinaBelova> eglynn, I propose in this case periodical jobs probably
15:35:48 <eglynn> (even if we don't explicitly test the scale in/out)
15:35:48 <jd__> shouldn't be a problem to have ZK in the gate, it's already installed at least
15:35:56 <nsaje> eglynn: ok, good point
15:35:58 <jd__> we use it for tooz testing
15:36:04 <DinaBelova> jd__, well... that's actually the problem
15:36:18 <DinaBelova> jd__ - did you see RAM overusage thread?
15:36:30 <jd__> I saw it but didn't read it through
15:36:51 <eglynn> jd__: see Dina's comment on https://review.openstack.org/115212 ... "on all the gate slaves eating kind of 30-40% of RAM on them (unfortunately, ZooKeeper among them :))"
15:36:54 <DinaBelova> jd__, it turned out that ZK is one of the things installed to all the gate slaves that eat too much RAM
15:37:13 <jd__> too bad
15:37:16 <DinaBelova> it was a big noise around it previous week
15:37:27 <DinaBelova> jd__ - well, this overusage is not only becasue of ZK
15:37:37 <DinaBelova> but it's among of the 'bad' services
15:38:04 <eglynn> ok, so it sounds like the ZK issue is something we'll need to solve in order for the ceilo central agent to be run "realistically" in the gate
15:38:07 <DinaBelova> the problem is that currently it's installed to all the slaves no matter if we need it for the testing or not
15:38:15 <DinaBelova> eglynn, yes, indees
15:38:18 <DinaBelova> indeed*
15:38:45 <DinaBelova> jd__, to learn more about that RAM overusage research you may ping infra folks
15:38:52 <cdent> there's some work in progress to narrow the number of workers that a lot of services are using which will make room for things like zk
15:38:59 <llu-laptop> DinaBelova: which openstack service is actually use ZK currently in gate?
15:39:11 <DinaBelova> llu-laptop, it is installed to test tooz
15:39:37 <jd__> nova might be
15:39:53 <eglynn> service groups?
15:39:57 <DinaBelova> cdent, yes, I hope so as well
15:40:38 <jd__> eglynn: yes
15:41:40 <gordc> cdent: just fyi, i enabled multi workers for the collector to verify multi-worker functionality works... but kept it at half # of core
15:41:57 <eglynn> ok, so we're not going to solve the problem here & now ... but was there a sense on that discussion that ZK is gonna ne removed from the mix by the infra folks?
15:42:02 * cdent nods at gordc
15:42:06 <eglynn> is gonna *be
15:42:22 <cdent> eglynn: somebody was looking into it to "fix the leakage"
15:42:26 <cdent> so that it was only there where it was needed
15:42:29 <eglynn> ... i.e. if it's considered one of the 'bad' services
15:42:38 <eglynn> cdent: a-ha, ok
15:42:47 <DinaBelova> eglynn, it was the beginning of this discussion when I saw it...
15:42:54 <DinaBelova> so I'm not sure about the result
15:43:12 <eglynn> right-o, all good info ... seems like the discussion/investigation is still active
15:43:13 <DinaBelova> I guess cdent is closer to the truth here than I
15:43:33 <cdent> I'm just reporting what I heard in os-qa, but there wasn't any timetable
15:43:46 <DinaBelova> eglynn, yeah... because nobody wants to remove innocent package there :)
15:44:13 <eglynn> DinaBelova: yeah, that would be unjust :)
15:44:37 <DinaBelova> cdent, I was lurking the infra channel (my irc client was ringing about words like ZK, etc.) but it was ~1am in my time zone that time so I just left it
15:45:06 <cdent> Yeah, seems like all the action for those guys in the late or early hours
15:45:43 <eglynn> yeap, unfortunate timing from the European perspective
15:45:50 <cdent> since jd__ is back can we get a gnocchi update before we run out?
15:46:00 <eglynn> jd__: BTW we punted on the gnocchi topic earlier, do you want to circle back to that?
15:46:06 <eglynn> cdent: snap! :)
15:46:07 <cdent> jinx
15:46:12 <jd__> sure I don't have much to say though
15:46:28 <jd__> I've resumed my work on the archive policy today
15:46:37 <jd__> and I'm planning to continue and get this merged ASAP
15:46:41 <jd__> eglynn: I'll need you to review on that
15:46:49 <jd__> I need to also continue to discuss with you about InfluxDB
15:46:51 <eglynn> jd__: I'll take another look at those patches before EoD today
15:47:12 <jd__> the collector dispatcher job should be handled by sileht, finger crossed :)
15:47:36 <jd__> we also need to check some things about the result fetch part
15:47:46 <jd__> I think there's some thing wrong with how we merge data from different timeseries
15:47:51 <jd__> (we already discussed that but didn't make any change)
15:48:05 <jd__> and once that's done we should be pretty good towards having a Juno release
15:48:19 <jd__> (all what I just said I mean :)
15:48:20 <eglynn> sileht: can you sync up with ildikov on that dispatcher work
15:48:22 <eglynn> ?
15:49:52 <eglynn> jd__: "the result fetch part" is on the measurements get from multiple entities or?
15:50:12 <jd__> nop, it's about that which I'd call aggregation
15:50:30 <jd__> it's about if we have results spanned over multiple archives for an entity (with different granularity) the current code merges them
15:50:35 <jd__> and the result are kinda "weird" in the end
15:50:45 <eglynn> jd__: a-ha, got it, thanks
15:50:47 <jd__> there's no technical challenge, just a correct design to pick
15:51:02 <jd__> and change the code to reflect what we think is right
15:51:22 <jd__> I'll probably run a mail thread for that when I get to it
15:51:28 <DinaBelova> sileht, also idegtiarov wanted to work on the dispatcher - he said about that ~2week ago on the meeting - he was on the vacation last week, so the talk was about to start work on it this one or the next
15:51:30 <jd__> (when I don't know I ask :-)
15:51:53 <eglynn> jd__: yeah, a thread on that would be good, thanks!
15:53:25 <eglynn> DinaBelova: yep, so sileht, idegtiarov & ildikov will need to coordinate their efforts on that to avoid any duplication of effort
15:53:35 <DinaBelova> eglynn, yeah, +1
15:54:56 <eglynn> #topic open discussion
15:55:15 <eglynn> jd__ has nominated two new ceilometer cores in the past couple days :)
15:55:29 <eglynn> #link http://lists.openstack.org/pipermail/openstack-dev/2014-September/045537.html
15:55:35 <eglynn> #link http://lists.openstack.org/pipermail/openstack-dev/2014-September/045734.html
15:55:53 <vrovachev> applause :)
15:56:00 <eglynn> vrovachev: indeed :)
15:56:02 <cdent> yeah, +1 on both of those
15:56:17 <nsaje> thanks for the nomination jd__ !
15:56:20 <cdent> do I get a vote or am I only able to unofficially say "woot!"?
15:56:34 <eglynn> if you haven't done so yet, please consider chiming in on the ML by Monday next
15:57:01 <eglynn> cdent: all w00ts will be appreciated I'm sure :)
15:57:10 <cdent> excellent
15:58:35 <eglynn> anyone go anything else they'd like to chat about?
15:58:49 <cdent> nope
15:58:56 <eglynn> right-o, let's call this a wrap so
15:59:06 <eglynn> ... thanks all for a productive meeting! :)
15:59:10 <eglynn> #endmeeting ceilometer