15:02:27 <rhochmuth> #startmeeting monasca 15:02:28 <openstack> Meeting started Wed Mar 9 15:02:27 2016 UTC and is due to finish in 60 minutes. The chair is rhochmuth. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:31 <openstack> The meeting name has been set to 'monasca' 15:02:35 <rhochmuth> o/ 15:02:43 <tgraichen> o/ 15:02:43 <slogan_r_> morning 15:02:43 <ho_away> hi 15:02:44 <bklei> o/ 15:02:48 <rhochmuth> running a little late this morning 15:02:50 <shinya_kwbt> o/ 15:02:50 <rhochmuth> sorry 15:03:10 <slogan_r_> it's 7 am here, late is good :-) 15:03:55 <rhochmuth> #topic summit 15:04:18 <rhochmuth> ment to mention that the agenda is posted at, https://etherpad.openstack.org/p/monasca-team-meeting-agenda 15:04:26 <rhochmuth> Agenda for Wednesday March 9, 2016 (15:00 UTC) 15:04:26 <rhochmuth> 1. Austin Summit Monasca Sessions 15:04:26 <rhochmuth> 2. Monasca Agent discussion from previous week with SAP regarding overriding dimensions 15:04:26 <rhochmuth> 3. Stale alarms for metrics that don't exist anymore (deleted vms) -- how should we address? 15:04:26 <rhochmuth> 4. Multiple metrics per http request on statistics and measurements resource. 15:04:27 <rhochmuth> 1. See review at https://review.openstack.org/#/c/289675/ 15:04:27 <rhochmuth> 5. Brief status update for Anomaly Detection 15:04:46 <rhochmuth> So, it looks like we had a descent number of Monasca related sessions 15:04:48 <rhochmuth> accepted 15:04:56 <rhochmuth> Thanks everyone 15:05:22 <jobrs> hi 15:05:28 <rhochmuth> hi jobrs 15:05:43 <jobrs> here for topic 2. 15:05:50 <rhochmuth> i've put it for four design summit sessions at the summit 15:06:08 <rhochmuth> assuming that occurs, we should be able to have some discussions there 15:06:37 <ho_away> nice! 15:06:56 <rhochmuth> so, i'll keep you posted on what i get, and then we can adjust the agenda as we get closer 15:07:45 <rhochmuth> the the summary is overall planning/status, discussion on new features and performance, logging api/implementation, and monasca/networking/broadview 15:08:05 <rhochmuth> if we run over, then we can find open spots to discuss more 15:08:16 <rhochmuth> does that sound reasonable? 15:08:32 <bklei> sounds good to me 15:08:58 <slogan_r_> sounds good here 15:09:05 <ho_away> if possible, i would like to add anomaly detection 15:09:18 <ho_away> in new features? 15:09:30 <rhochmuth> ho_away: if you'll be attending then i'll request a another spot for that topic 15:09:41 <rhochmuth> unfortunatley, the folks from bristol won't be there 15:09:54 <rhochmuth> so, i was worried about general attendance 15:10:14 <rhochmuth> but i'll request another spot, and we can discuss that topic in detail with whoever attends 15:10:20 <ho_away> i know, i didn't get permission for it yet but i will go there. 15:10:37 <rhochmuth> i'll definitely be up for a discssuon and planning on that topic 15:10:58 <ho_away> after i get permission i will let you know thanks! 15:11:03 <rhochmuth> ok, thanks 15:11:14 <rhochmuth> #topic agent 15:11:30 <rhochmuth> jobrs: this is a carry over from last week 15:11:36 <rhochmuth> i guess we broke you 15:11:44 <jobrs> yep 15:11:53 <rhochmuth> so, do you have a proposal 15:12:09 <jobrs> I shared some links last time, they are in the logs 15:12:19 <rhochmuth> we could add a parameter to adjust the behaviour 15:12:29 <rhochmuth> keep the old or the new 15:12:41 <rhochmuth> based on a parameter 15:12:48 <rhochmuth> woudl that be acceptable? 15:12:59 <jobrs> I would be happy already if we would have a shared view on what the 'service' dimension is good for 15:13:14 <jobrs> a) openstack service 15:13:19 <jobrs> b) technical service 15:13:50 <jobrs> a) means that plugins for generic components/services do not know the answer 15:14:17 <jobrs> b) means that plugins for generic components set it themselves 15:14:33 <jobrs> but b) also means that there is not standard dimension for openstack services as registered in the ks catalog 15:14:53 <rhochmuth> This is what we've been trying to do 15:14:57 <jobrs> b) also means that we have quite redundancy between component and service - what is the difference at all? 15:15:24 <jobrs> to me the previous behavior was more consistent 15:15:44 <jobrs> have 'user' set 'service' if it is a generic component 15:16:04 <jobrs> and not do something like component='mysql', service='mysql',process='mysqld' 15:16:06 <rhochmuth> service = compute, networking, …, when the entity being monitored corresponds to a a specific openstack component, such as nova-api 15:16:16 <jobrs> exactly 15:16:23 <rhochmuth> not done yet 15:16:40 <jobrs> it is for monasca at least 15:17:00 <rhochmuth> then service = mysql, rabbitmq, …, if it is a "shared" service, unless the shared component isn't really being shared 15:17:23 <rhochmuth> it is often the case the mysql and rabbitmq are sharess across many openstack services 15:17:38 <rhochmuth> but it can be the case that it is deployed 1-to-1 with a service 15:17:39 <jobrs> agreed, and in that case I believe it is fair that the one configuring the agent is taking care to set the --service parameter on monasca-setup 15:18:22 <rhochmuth> So, component should correspond to the specific "process" that is being montiroed 15:18:28 <jobrs> in any case the plugin cannot possibly know 15:18:29 <rhochmuth> so, for Nova 15:18:39 <rhochmuth> service=compute, component = nova-api, ... 15:18:49 <rhochmuth> For "mysql" 15:18:52 <rhochmuth> service = mysql 15:18:56 <rhochmuth> component = mysqld 15:19:14 <rhochmuth> In some cases, the values are the same 15:19:21 <jobrs> there is a process dimension 15:19:56 <jobrs> the plugin cannot know for what service mysql is used 15:20:11 <jobrs> same with apache 15:20:36 <jobrs> etc., so plugins should IMHO set dimensions sparingly 15:20:45 <rhochmuth> But, it can be overriden 15:20:46 <jobrs> or at least not override 'service' 15:20:57 <jobrs> no, it cannot - no longer 15:21:17 <jobrs> the order was reversed 15:21:32 <rhochmuth> got it 15:21:44 <rhochmuth> i think we just coded to what worked for our deployment 15:22:03 <jobrs> same with us :-) 15:22:22 <rhochmuth> we ran into a problem because there were plugins that were not setting the service dimension 15:22:39 <rhochmuth> this created problems in the ui 15:23:01 <rhochmuth> so, in that case what we wanted was to always supply a service dimension = "uncategorized" 15:23:01 <jobrs> sure, but this can be fixed when configuring the agent, not? 15:23:30 <bklei> i'm concerned about automatically changing the default dimensions -- if we do that i'd prefer the change be config file driven so old behavior continues to work 15:23:49 <jobrs> maybe this belongs to the UI layer? not sure 15:23:56 <bklei> we've run into issues with dimension changes and bloat 15:24:30 <jobrs> us too, that is a big issue in my opinion 15:24:32 <rhochmuth> so, my proposal is to restore the old behaviour, and then create an option to enable the new behaviour 15:24:52 <bklei> +1 15:24:53 <rhochmuth> i'm trying to get the rbrandt 15:25:15 <rhochmuth> he's not arround 15:25:16 <jobrs> adds complexity 15:25:24 <Christian____> there is an "old" bug: A metrics graph becomes not to appear when adding/deleting a dimension https://bugs.launchpad.net/monasca/+bug/1485859 15:25:24 <openstack> Launchpad bug 1485859 in Monasca "A metrics graph becomes not to appear when adding/deleting a dimension." [Undecided,Triaged] 15:25:51 <Christian____> that means changing dimensions is not a good idea... 15:26:18 <jobrs> that is what I meant with big issue. you cannot force people not to add dimensions, that is what they are good for 15:26:46 <rhochmuth> what we wanted to have happen was to always have a dimension of service=uncategorized 15:26:57 <rhochmuth> if one wasn't supplied 15:27:27 <jobrs> to me this looks like a presentation-layer problem 15:27:29 <rhochmuth> i'm not exactly sure at this point, why that ended up modifying the default behavriou 15:28:01 <rhochmuth> so, i'll check with rbrandt and come up with a proposal to fix 15:28:10 <rhochmuth> if that sounds ok 15:29:17 <rhochmuth> it isn't a presentation layer problem, btw 15:29:33 <rhochmuth> the problem occurs when searching for metrics and alarms 15:29:51 <rhochmuth> we don't support being able to get metrics and alarms that don't have a supplied dimension 15:29:51 <jobrs> I was just talking of the specific case of the service dimension 15:30:05 <bklei> agreed -- if this is the issue rbak found -- you end up with metrics you can't query for without merging 15:30:13 <jobrs> other than that I believe it is a bigger issue which will not be solved by the default service domain value 15:30:37 <jobrs> +1 15:30:38 <rhochmuth> so, you can't search for the absencse of a dimension 15:30:48 <rhochmuth> today 15:30:55 <rhochmuth> and there is no way to do that in some databases 15:30:58 <rhochmuth> like influxdb 15:31:38 <rhochmuth> so, we wanted to supply a default dimension = uncategoried everywhere 15:31:53 <rhochmuth> but that broke the original baheviour 15:31:55 <jobrs> but it is not just "service" 15:32:09 <rhochmuth> yes, it applies to any dimension 15:32:36 <jobrs> so this is what I do not like about the option, it does not really fix the problem (for us) 15:32:51 <rhochmuth> but from a ui perspective we usually only group by hostname, service, 15:33:00 <rhochmuth> why? 15:33:29 <rhochmuth> why doesn't it work? 15:33:41 <jobrs> we have other dimensions 15:34:09 <jobrs> e.g. in kubernetes: namespace, ressource_controller, ... 15:34:37 <jobrs> the great thing about dimensions is that you can have your own ones 15:35:25 <Christian____> Roland: Is the number of dimensions fixed (and should not be changed)? All not used dimensions will have default value "uncategorized"? 15:37:38 <rhochmuth> Besides going back to the old behaviour, which doesn't fix the problem I'm trying to address, is there a specific proposal that we can implement 15:37:55 <rhochmuth> We have a problem 15:38:10 <rhochmuth> I'm just looking for a specific way to resolve at this point that is implementatble 15:38:22 <jobrs> sure 15:38:47 <jobrs> unfortunately I am not an influxdb expert 15:38:48 <tgraichen> but in your case, can't we just add service=uncategorized in the monasca agent in case the service is not set from the plugin and not via agent config? 15:39:33 <rhochmuth> ok, i'll look into that 15:39:53 <rhochmuth> i don't have an answer right now 15:39:54 <slogan_r_> that would imply the db defaults to that value when items added or removed, correct? Is that the case now? 15:39:56 <jobrs> the global configuration is overridden by the plugins 15:40:05 <rhochmuth> but, if it is possible, we'll try to do that 15:40:15 <jobrs> my proposal would be to make merge-metrics a default 15:40:20 <slogan_r_> er, added or updated 15:40:40 <rhochmuth> ok, i like that idea 15:41:00 <jobrs> äh, ...my default 15:41:05 <rhochmuth> not sure why we didn't do that, but i'll investigate and try to get back to that 15:41:48 <jobrs> so you tell the API which dimensions should be expanded 15:41:58 <jobrs> and the remaining ones are merged 15:42:05 <jobrs> that gives some stability 15:42:51 <jobrs> rbaks grafana plugin is exposing this behavior to the user, so it is possible it seems 15:43:31 <jobrs> if the api would support it, too, then the db-driver could do optimizations to reduce the number of queries (mid-term) 15:46:05 <rhochmuth> ok, i'll look at the code and work with rbrandt and see what we come up with 15:46:09 <rhochmuth> sound good? 15:46:51 <jobrs> sounds great 15:46:55 <jobrs> thank you 15:47:06 <rhochmuth> ok, thanks 15:47:11 <rhochmuth> switching topics 15:47:19 <rhochmuth> #topic stale alarms 15:47:28 <bklei> that's me 15:47:34 <rhochmuth> sorry times up 15:47:38 <rhochmuth> just kidding 15:47:41 <bklei> anyone else encounter this? :) 15:47:44 <ho_away> lol 15:47:47 <rhochmuth> yes, 15:47:48 <bklei> so here's the scenario 15:48:08 <bklei> the overview page in horizon, when tracking vms with alarm defs 15:48:08 <rhochmuth> we, this is one of the topi issues we've been looking at 15:48:17 <bklei> goes gray after a vm goes away 15:48:25 <bklei> and requires a manual alarm delete step 15:48:48 <bklei> so we could have a prune process specific to this, but i wonder if there's a better idea/solution 15:48:54 <bklei> to handle stale stuff in the UI 15:49:01 <rhochmuth> there is a better way 15:49:09 <bklei> good, i'm all ears 15:49:27 <rhochmuth> but first let me say that we are solving this in our helion distributino using a script and cron job 15:49:36 <rhochmuth> that is a short-term solution 15:49:48 <rhochmuth> because the better solution is harder 15:49:50 <bklei> right, and we could do the same here -- we'd welcome you sharing ^^ 15:50:07 <rhochmuth> i'll try and get that script open-sourced somewhere 15:50:26 <bklei> gracias, and there's discussion about a more elegant solution? 15:50:27 <rhochmuth> the way we were intending was to use the Events API 15:50:34 <bklei> maybe discussion in austin? 15:50:46 <rhochmuth> Sure, we can discuss 15:50:48 <bklei> aah, so lifecycle trigger 15:50:50 <bklei> i like that 15:50:53 <rhochmuth> Correct 15:51:02 <rhochmuth> the Events APi woudl receive all VM lifecycle events 15:51:36 <bklei> that's clean -- so we can do the cron/script as a workaround till then 15:51:40 <rhochmuth> and the events engine would have a handler associated with it to delete the alarm, when the VM end event occurs 15:52:06 <rhochmuth> The only alternative right now is a cron/script 15:52:26 <rhochmuth> the script atually invokes the nova api to determine the VMs that have been deleted 15:52:36 <rhochmuth> so it is not purely time based 15:52:38 <slogan_r_> that would work 15:52:40 <bklei> nice 15:53:06 <rhochmuth> the problem with the events api/engine is that it is not getting any development time right now 15:53:15 <rhochmuth> so, it could be a long wait 15:53:25 <rhochmuth> for that to be deliever 15:53:28 <slogan_r_> so, I presume these events events are defined by nova and they are publishing to OSLO? 15:53:34 <rhochmuth> correct 15:53:40 <slogan_r_> s/events events/events/ 15:53:48 <rhochmuth> openstack notification, VM lifecycle events 15:53:56 <rhochmuth> there is a wiki that describes them all 15:54:02 <rhochmuth> but i dont' have the link 15:54:06 <rhochmuth> right now 15:55:10 <rhochmuth> so, is that enought of a discussion 15:55:12 <rhochmuth> on that topic 15:55:14 <bklei> thx 15:55:23 <rhochmuth> we are running out of time 15:55:34 <rhochmuth> #topic multiple metrics 15:55:39 <rhochmuth> So, i posted some code 15:55:42 <rhochmuth> it isn't complete 15:55:58 <rhochmuth> if bklei and rbak could look at it that woudl be good 15:56:26 <rhochmuth> what i would like to do is add a query parameter, "multiple_metrics" or something simialr 15:56:37 <bklei> yeah, looks good to me if we add the parm to differentiate behavior 15:56:45 <rhochmuth> to enable returning multiple metrics in a single measurements or statistics resource 15:56:53 <bklei> we'll definitely be using that 15:56:57 <rbak> I'll take a look as soon as the meeting is done. 15:56:59 <rhochmuth> this will probably improve your over all query perofmrance 10X at least 15:57:01 <rhochmuth> if not more 15:57:08 <rhochmuth> i'm actually hoping for 100X 15:57:30 <rhochmuth> i don't think this is possibel for influxb using an in-database query 15:57:43 <bklei> gonna be freaking awesome -- in vertica land :) 15:57:43 <rhochmuth> so, please take a look 15:58:27 <rhochmuth> I think we are out of time for anomaly detection 15:58:43 <ho_away> ok, next week :-) 15:58:48 <rhochmuth> sorry ho_away 15:58:52 <rhochmuth> you are up first next week 15:59:03 <rhochmuth> i'll also touch-base with luis 15:59:04 <ho_away> thanks! 15:59:29 <ho_away> yeah, i will send you email to have a meeting 15:59:31 <rhochmuth> jobrs: will get back to you 15:59:36 <bklei> thx for hosting rhochmuth! 15:59:39 <rhochmuth> thx ho_away 15:59:45 <rhochmuth> thanks everyone 15:59:49 <tgraichen> bye 15:59:51 <jobrs> thank you, looking forward too multiple metrics, too 15:59:54 <slogan_r_> later 15:59:56 <ho_away> thanks Roland 16:00:04 <Christian____> bye and thx 16:00:16 <shinya_kwbt> bye 16:00:16 <rhochmuth> #endmeeting