15:00:33 #startmeeting monasca 15:00:34 c y'all 15:00:36 Meeting started Wed Feb 1 15:00:33 2017 UTC and is due to finish in 60 minutes. The chair is rhochmuth. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:37 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:40 The meeting name has been set to 'monasca' 15:00:44 https://etherpad.openstack.org/p/monasca-team-meeting-agenda 15:00:54 Agenda for Wednesday February 1 2017 (15:00 UTC) 15:00:54 1. What were the most important features added in the Ocata release? In response to Nick Chase, Editor in Chief, OpenStack:Unlocked. 15:00:54 2. Review new blueprints: Lot's of exciting new developments in progress. 15:00:54 1. Templated alarms: https://blueprints.launchpad.net/monasca/+spec/templated-alarms 15:00:54 2. Monasca sidecar: https://blueprints.launchpad.net/monasca/+spec/monasca-sidecar 15:00:54 3. Prometheus instrumentation: https://blueprints.launchpad.net/monasca/+spec/prometheus-instrumentation 15:00:55 4. Mapper: https://review.openstack.org/#/c/420774/ 15:00:55 5. Log Query API: https://blueprints.launchpad.net/monasca/+spec/log-query-api 15:00:56 6. Alarm inhibition/silencing 15:00:56 7. Alarm grouping 15:00:57 8. Monasca Query Language (MQL) 15:00:57 3. Reviews 15:00:58 1. https://review.openstack.org/#/c/427571/ 15:00:58 1. FYI: https://review.openstack.org/#/c/425152/ 15:01:22 o\ 15:01:24 o/ 15:01:29 o/ 15:01:31 o/ 15:01:38 hi everyone 15:01:40 o/ 15:01:43 hi 15:01:47 looks like a good agenda to cover this week 15:01:55 exciting times are ahead 15:02:17 hi 15:02:18 witek: are there any items you need to cover on the ocata release? 15:02:33 stable branch comes soon 15:03:02 it sounds like we are in good shape 15:03:02 but we can wait till we have everything what we need in 15:03:29 o/ 15:03:45 so, let's get started then 15:03:59 #topic What were the most important features added in the Ocata release? 15:04:25 i had an email form Nick Chase the other day asking about the most important features that we've done for the Ocata release 15:04:32 i owe him a response 15:04:46 i realize this release we didn't add as many features 15:04:56 but, the future is looking really good 15:05:27 anyway, if anyone has an important feature that i should response to nick chase about, please let me know 15:06:03 Kibana plugin for logs has been added as OpenStack repo 15:06:40 and what does that include specifically? 15:06:58 authorisation and multi-tenancy 15:07:24 Are all the queries now scoped to the tenand ID? 15:07:28 the ES indeces are scoped per project 15:07:40 yes 15:08:07 So, basically parity with Grafana, in a snese 15:08:11 sense 15:08:15 ? 15:08:45 with the difference, that it does not use monasca-datasource 15:09:08 but basicaly yes 15:09:30 and, it still requires your forked version of Kibana? 15:09:57 no, no forked Kibana 15:10:15 so, it basically integrates with the Kibana plugin system 15:10:22 correct 15:10:23 without any changes to their source 15:10:51 note to future self, pay more attention 15:10:54 :-) 15:11:17 witek: that is really great news! 15:11:35 your team has made great progress 15:11:45 did ES ever get back to you? 15:11:56 thank you, I will forward to the team :) 15:12:10 no 15:12:30 so, basically, this is all working without any help or involvement from ES? 15:12:43 right 15:12:54 fantastic! 15:13:17 so, do folks have any other favorite features worth mentioning 15:13:26 i know Tomasz has done a lot of work 15:13:46 adding support for the request ID to the monasca APIs was done 15:14:01 there was a lot of infra work 15:14:11 there are some great things in progress 15:14:26 support for InfluxDB 1.1 was added 15:14:30 by me 15:14:33 can't forget that 15:14:37 :-) 15:14:42 thank you rhochmuth 15:14:56 several interesting additions to the agent 15:15:20 @Roland support for influxdb 1.1. -- is that for Java implementaiton? 15:15:24 support for docker, kubernetes and prometheus 15:15:50 Guest20448: That was only completed for the Python code 15:16:20 So, I think I have enough for Nick 15:16:31 I'll cc several folks, on my response 15:16:58 the docker plugin is in the agent. 15:17:06 ahh yes, thanks 15:17:08 the kubernetes and prometheus monitoring is still up for review 15:17:10 :( 15:17:40 that's ok, i think i'll fdiscuss a lot of the great work that is in progress 15:17:45 too 15:17:55 but, the kibana work will be the highlight 15:18:02 of this release 15:18:26 thanks everyone 15:18:38 #topic blueprints and upcoming work 15:19:00 so, i wanted to point out that there is a lot of work in progress 15:19:09 several blueprints have been created 15:19:19 and a few more are about to be created 15:19:42 Templated alarm descriptions for human readable alerts 15:19:43 https://blueprints.launchpad.net/monasca/+spec/templated-alarms 15:20:45 There is an implementation that is available on sap's own branch 15:20:55 that will get moved over at some point 15:21:16 but, want to make folks aware of what is coming and to start to provide feedback 15:21:23 ... in the near future 15:21:35 jobrs: thx 15:22:19 last week we discussed, https://blueprints.launchpad.net/monasca/+spec/monasca-sidecar 15:22:35 wanted to check-in on whether anyone has had sometime to look at this and supply feedback 15:23:21 it sounds like some of the issues seen with monasca-statsd that led to the sidecar are being resolved 15:23:54 timothyv89: how so? 15:24:05 timothyb89: ^ 15:24:16 now that we're supporting dogstatsd: https://review.openstack.org/#/c/426706/ 15:24:54 we created statsd-coverage for notification, persister and api without running into issues 15:24:55 one of the issues we saw was language support, dogstatsd has bindings for many languages 15:25:24 IMHO we should just add compatibility to the agent. 15:25:34 agress 15:25:37 agree 15:25:58 submitted the first part yesterday. What is missing is (restored) support for histograms 15:26:10 i thought the sidecar was also desirable for other reasons in a kubernetes env 15:26:16 yes 15:26:24 it still does solve some problems, yes 15:26:43 +1, for blackbox monitoring 15:27:05 running the statsd as side container still carries the need then for authenticating with the api unless you run the statsd as a service in kubernetes which causes high udp traffic 15:27:31 we use prom-statsd sidecar 15:27:40 yes i think that is the right answer 15:27:43 which is compatible to DogStatsd 15:28:31 so, is the statsd work jobrs is adding as replacing the sidecar 15:28:36 or do we want both 15:28:41 i thought both 15:28:50 yep 15:29:03 ok, so i think we are good then 15:29:12 i'm actually going to try to start reviewing code again 15:29:26 sorry, i've been really busy and got behind on any significant reviews 15:29:49 and relying on others to carry this work forward and keep it moving along 15:30:35 So, on a related topic, there is the "mapper", 15:30:42 https://review.openstack.org/#/c/420774/ 15:31:02 let me add one thing to statsd: I believe monasca-sidecar is great, but it would be perfect if only it would process DogStatsd 15:31:25 +1 to ^ 15:31:41 and in regards to the review I will be looking at it again today or tomorrow 15:31:45 that is an option as well, sure 15:32:02 thx jobrs and timothyb89 15:32:36 there is a new blueprint for https://blueprints.launchpad.net/monasca/+spec/log-query-api 15:32:45 adding a log query api 15:33:29 this is being worked on by witek and steve simpson 15:33:44 oposite order 15:33:45 Also see, https://wiki.openstack.org/wiki/Monasca/Logging/Query_API_Design 15:34:09 OK, Steve is leading the charge 15:34:20 witek is contributing to it 15:35:24 Steve, are you there? 15:35:24 And, we have a number of engineers at HPE working at alarm inhibition, alarm silencing, alarm grouping, and a new MOnasca QUery Language (MQL) 15:35:47 blueprints will be showing up soon for those other areas that i mentioned 15:36:02 that's great -- charter has talked about alarm silencing too 15:36:06 where's waldo? 15:36:58 also, there is a lot of work in creating helm charts for monasca between hpe and sap 15:37:03 or sap and hpe 15:37:19 no blueprint for that 15:37:34 i'm wondering with all this work if we should plan on a mid-cycle 15:38:10 sure 15:38:13 seems like we have enough that a couple of days of focused planning and discussions would be a good idea 15:38:19 +1 15:38:36 do we want to do this in late february 15:38:48 slightly before or after the PTG 15:38:55 just in case someone is going to the PTG? 15:39:26 no conflict from charter guys 15:40:27 we could hold it then on wednesday adn thursday during the same week of ptg 15:40:52 if there aren't any conflicts 15:41:07 is SAP going to PTG? 15:41:34 jobrs: ^^^ 15:41:42 dhague: ^^^ 15:42:21 currently not planned 15:42:33 our impression was that participation would be very low on Monasca 15:42:41 correct 15:42:52 we woudl like to do our own monasca mid-cycle remotely 15:43:05 via videoconferenceing the same week as ptg 15:43:11 probably wednesday and thursday 15:43:24 if that works for you and your team? 15:43:41 basically, in three weeks 15:43:54 should be fine 15:44:06 cool, sounds good to me then 15:44:56 let's tentatively mark that wed and thursday down and unless any conficts occur we'll meet remotely that week 15:45:13 +! 15:45:27 #topic reviews 15:45:35 https://review.openstack.org/#/c/427571/ 15:46:56 i just added a +1 15:47:13 https://review.openstack.org/#/c/425152/ 15:47:19 this looks good to me 15:48:04 tomasz has been busy 15:48:06 https://review.openstack.org/#/c/425559/ 15:49:24 https://review.openstack.org/#/c/356403/ 15:49:32 I'll take a look at both of the above 15:49:40 but, it looks good to me 15:50:19 i think that is the end of today's agenda 15:50:24 #topic open floor 15:50:52 I have a very technical question about the thresholder 15:51:03 jobrs: sure 15:51:40 we observe that deleted alarm-definitions resp. their alarms are still processed, causing alarm-state-transition messages 15:51:54 until we restart/kill the thresholder. 15:52:22 we recognised this after we introduced a canary test which creates temporary alarm-definitions every minute 15:52:24 if you delete the alarm definition, all alarms created by the definition should have been deleted too 15:52:32 no errors or suspicious entries in the logs. 15:53:19 config DB is also clean; code-review of thesholder did also not reveal anything suspicious 15:53:29 jobrs: thats a lot of alarm definitions to be pushing through the system on a constant basis 15:53:43 hmmm, we can take a look and verify 15:54:08 I can tell you something that may help in debugging 15:54:22 so, if i understand, you create an alarm defiition, which results in new alarms 15:54:48 If i'm not mistaken it is still the API which handles the alarm definition deletion, while the threshold engine waits for a message on kafka before clearing it's internal memory 15:55:01 then, you delete the alarm definition, and the alarms are still there? 15:55:53 I delete the alarm-definition, the API updates the deleted_at field and the alarms are cleared from the database by the thesholder event handler. 15:56:12 our cleanup job takes care of the rest: removing alarm-definitions with deleted_at != null from the tables 15:56:58 the database is clean. but if I watch the alarm-state-transitions topic (using kafkacat) I observe new messages about very old alarms/alarmdefs 15:57:52 hmmm, so the threshold engine still has the alarm definition or alarm 15:57:58 rbrndt, rhochmuth: continue on IRC? 15:58:00 yes 15:58:04 has a bug been filed about this? 15:58:15 no 15:58:29 it sounds like something that is probably happening everywhere, perhaps we should continue in a bug report 15:58:34 and I could not believe that this is a bug in the code. So we are looking for other explanations 15:59:21 on the surface sounds like a bug 15:59:30 thanks, that is a good proposal. I will file a bug and let you know, ok? 15:59:31 if you can create a report 15:59:38 that woudl be great 15:59:44 we'll sdtart looking into ti 15:59:45 we started alrady ? 15:59:48 lots of detail will help us reproduce this faster 16:00:01 sure 16:00:08 tomasztrebski: we started an hour ago 16:00:12 I just wanted to check whether you ran into something similar 16:00:12 we are wrapping up 16:00:17 thanks jobrs 16:00:23 i need to end the meeting 16:00:27 for the next group 16:00:28 damn it...I messed up hours 16:00:32 thanks everyone 16:00:45 tomasztrebski: i saw your reviews in the etherpad agenda 16:00:48 i'l; review 16:00:59 rhochmuth: thx 16:01:00 see you all next week 16:01:02 cheers everyone 16:01:04 bey 16:01:05 or sooner 16:01:08 *bye 16:01:11 #endmeeting