15:00:43 #startmeeting monasca 15:00:43 Meeting started Wed Sep 11 15:00:43 2019 UTC and is due to finish in 60 minutes. The chair is witek_. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:46 The meeting name has been set to 'monasca' 15:01:04 hello everyone 15:01:06 hello 15:01:52 anyone else around? 15:01:59 hi 15:02:14 hi hosanai, good to see you 15:03:28 small group but let's start 15:03:36 #topic monasca.io 15:04:07 thanks to Roland for extending the domain registration 15:04:16 and Tim for fixing the website 15:04:34 so we have monasca.io up and running again 15:05:09 we have also agreed we can use it as the community landing page 15:05:21 we'll have to update the content 15:05:54 after completing this we 15:06:29 the content is maintained in GitHub 15:06:49 https://github.com/monasca/monasca.github.io 15:07:17 the same rules as for other repos are valid 15:07:42 contributors should create PRs and sign their commits 15:07:58 these will be reviewed by core reviewers 15:08:15 thanks again to Roland and Tim 15:08:38 hi dougsz 15:08:48 o/ Sorry I'm late 15:09:02 hi Wasaac 15:09:02 hi, sorry same as ^ 15:09:12 just shortly updated on monasca.io page 15:09:43 it's up again and we have approval to use it as the community landing page 15:10:08 Now that it is back, do we need to let any users know it is operational again? I think it was a user that first pointed out it was down 15:10:44 Where would one announce to users if not on that page? 15:10:59 I'm sure they could figure it out themselves if they try deploying a helm chart with dependency 15:11:28 joadavis: do you think about openstack-discuss? 15:12:22 Might be nice to update it a little before promoting it? I recall one of our customers go annoyed at the Documentation, coming soon page! (http://monasca.io/docs/index.html) 15:12:29 I was trying to remember which user it was that first reported the problem. Was it t-mobile? 15:12:57 I was thinking a more direct contact than a broadcast message 15:13:14 👍 15:13:40 I'll send an email to the reporter, and after updating the content we can promote at openstack-discuss 15:14:27 #topic Review Priority flag 15:14:43 http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009162.html 15:15:14 I've suggested a simple process for setting review priority flag 15:15:59 any change could be proposed to get prioritized by anyone in the list or in the meeting 15:16:25 Any core reviewer, preferably from a different company, could confirm such proposed change by setting RV +1. 15:16:42 what do you think? 15:16:47 I think it's a good idea - we can focus review efforts 15:17:23 we also give more visibility important changes 15:19:02 additionally, Doug suggested publishing Gerrit Dashboard with the list of prioritized changes as one of the items 15:19:06 http://www.tinyurl.com/monasca 15:20:19 cool 15:21:23 I guess we can push a change to our docs and see in the review if folks like it 15:22:56 Sounds like a good plan 15:22:58 +1 15:23:32 cool, thanks 15:23:49 #topic Kafka client status update 15:24:25 when testing Kafka publisher in API with tempest test for Python 2 I 15:25:11 I've found a bug in monasca-common 15:25:11 https://review.opendev.org/680653 15:26:27 with this bugfix all tempest tests for API Kafka client upgrade pass 15:27:59 with the changes in API and notification (persister already merged) we have all python components updated 15:28:31 so the goal is getting close to completion 15:29:10 I'll appreciate your reviews 15:30:12 that's all from me, do you have other topics? 15:31:01 brtknr asked me to give an update about his work on improving InfluxDB performance 15:31:20 I probably should, but don't currently. If anyone is interested in discussing Monasca Events I did think about it a week ago. :P 15:31:25 #topic improving InfluxDB performance 15:32:08 The first change is to support scoping dimension queries by time to speed up Grafana dashboard loads: https://review.opendev.org/#/c/670318 15:32:51 The problem we have hit, is that InfluxDB, cannot accurate return dimension values within a specified time window. It can return them within the shard duration, which can vary between ~days and ~weeks. 15:33:16 The current situation is that you load a DB with a dynamic query to fetch all hostnames for the time series. 15:33:44 With this patch you would load a subset of hostnames within the timewindow on the dashboard. This is a lot faster. 15:34:01 However, what we actually get are hostnames from outside the time window on the dashboard. 15:34:21 The query is a lot faster, but the API doesn't really do what it says. 15:34:34 oops, any idea why? 15:34:44 sorry, I'm trying not to laugh out loud at that 15:34:57 :) It's a fundamental limitation with Influx 15:35:34 So to confirm, the list of hosts returned will contain all hosts that did post data in the time window 15:35:36 If the return is a superset of what you are looking for, it would be tempting to do some post-processing filtering to remove hostnames you don't want 15:36:00 Wasaac: yes, that's correct 15:36:00 But will also include those that posted outside the time window, but within the shard window 15:36:29 joadavis: yes, the problem is the dimension values are not returned with timestamps, so no post-processing is possible 15:36:38 yuck 15:37:17 brtknr tried alternative queries, which work, but the performance is worse than just getting all values 15:37:35 and right now (without scoping) we get all dimension values which where ever written? 15:38:06 yeah, and in large DBs, that query can take 30mins + and lock up the InfluxDB instance while it runs 15:38:43 Sounds to me that getting a faster return is worth more than having an accurate list 15:38:50 Provided that's understood by users 15:38:53 Wasaac: +1 15:38:57 That is the question 15:39:36 the list will get much shorter and eventually updated after the shard "expires" completely 15:39:46 seems to be good enough to me 15:40:05 ok, thanks, we will press on with that change then 15:40:21 If anyone thinks of any comments please add them to: https://review.opendev.org/#/c/670318 15:40:37 Second change to improve performance, is to use an InfluxDB per tenant 15:40:56 So that queries run against smaller datasets 15:41:33 This one seems to work nicely, but brtknr needs to investigate tempest failures 15:41:57 so it requires changes both in API and persister? 15:42:19 That's correct, and a migration script to move data to the new layout 15:42:22 https://review.opendev.org/#/q/topic:story/2006331+(status:open+OR+status:merged) 15:42:31 ^ If anyone is interested 15:42:56 That's it from me on these two. 15:43:24 that's the first step in implementing scalable InfluxDB setup 15:43:40 very nice, thanks 15:44:43 do you think automatic partitioning (not only based on tenant) would be possible in future? 15:45:54 Hmm, I suppose it should be with the right adapter 15:46:28 It would be nice to investigate TimescaleDB at that stage since it provides clustering out of the box 15:47:13 I'm not sure about their licensing 15:48:06 https://www.timescale.com/products 15:48:38 some features are limited to enterprise version 15:48:46 but I haven't looked in detail 15:49:37 Hmm, yes, automated data retention policies is missing from open-source 15:49:42 there is also https://eng.uber.com/m3/ 15:49:56 with only Go client 15:50:20 which is not a blocker though 15:50:37 Yeah, but good point - it's clearly not a task for this cycle 15:53:09 hosanai: do you have any update on monasca-analytics? 15:53:42 500 million metrics per second, Uber claim, I wonder if anyone actually deployed it outside of Uber 15:54:01 :) 15:54:12 witek_: now i'm working on python3.6/3.7 support. 15:54:22 * oops, aggregates 15:55:07 hosanai: any idea about the time frame? the community goal has the deadline in 3 days 15:55:55 witek_: do my best :-) 15:56:12 Glad hosanai mentioned py3. A documentation question came up - we should be sure that documentation reflects any python 3 changes to match up with any changes we made in code or cli 15:56:56 likely not much difference, but I hadnt thought of that aspect when I looked at py3 gate tests in the past 15:57:41 thanks for the info. 15:58:51 any last comment before I wrap up? 15:59:20 thanks for joining today 15:59:42 let's make some progress on reviews 15:59:43 thanks all 15:59:47 thx! 15:59:50 thanks all, bye 15:59:51 and see you next week 15:59:53 bye 15:59:57 #endmeeting