15:00:43 <witek_> #startmeeting monasca
15:00:43 <openstack> Meeting started Wed Sep 11 15:00:43 2019 UTC and is due to finish in 60 minutes.  The chair is witek_. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:46 <openstack> The meeting name has been set to 'monasca'
15:01:04 <witek_> hello everyone
15:01:06 <joadavis> hello
15:01:52 <witek_> anyone else around?
15:01:59 <hosanai> hi
15:02:14 <witek_> hi hosanai, good to see you
15:03:28 <witek_> small group but let's start
15:03:36 <witek_> #topic monasca.io
15:04:07 <witek_> thanks to Roland for extending the domain registration
15:04:16 <witek_> and Tim for fixing the website
15:04:34 <witek_> so we have monasca.io up and running again
15:05:09 <witek_> we have also agreed we can use it as the community landing page
15:05:21 <witek_> we'll have to update the content
15:05:54 <witek_> after completing this we
15:06:29 <witek_> the content is maintained in GitHub
15:06:49 <witek_> https://github.com/monasca/monasca.github.io
15:07:17 <witek_> the same rules as for other repos are valid
15:07:42 <witek_> contributors should create PRs and sign their commits
15:07:58 <witek_> these will be reviewed by core reviewers
15:08:15 <witek_> thanks again to Roland and Tim
15:08:38 <witek_> hi dougsz
15:08:48 <Wasaac> o/ Sorry I'm late
15:09:02 <witek_> hi Wasaac
15:09:02 <dougsz> hi, sorry same as ^
15:09:12 <witek_> just shortly updated on monasca.io page
15:09:43 <witek_> it's up again and we have approval to use it as the community landing page
15:10:08 <joadavis> Now that it is back, do we need to let any users know it is operational again?  I think it was a user that first pointed out it was down
15:10:44 <Wasaac> Where would one announce to users if not on that page?
15:10:59 <joadavis> I'm sure they could figure it out themselves if they try deploying a helm chart with dependency
15:11:28 <witek_> joadavis: do you think about openstack-discuss?
15:12:22 <dougsz> Might be nice to update it a little before promoting it? I recall one of our customers go annoyed at the Documentation, coming soon page! (http://monasca.io/docs/index.html)
15:12:29 <joadavis> I was trying to remember which user it was that first reported the problem. Was it t-mobile?
15:12:57 <joadavis> I was thinking a more direct contact than a broadcast message
15:13:14 <dougsz> 👍
15:13:40 <witek_> I'll send an email to the reporter, and after updating the content we can promote at openstack-discuss
15:14:27 <witek_> #topic Review Priority flag
15:14:43 <witek_> http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009162.html
15:15:14 <witek_> I've suggested a simple process for setting review priority flag
15:15:59 <witek_> any change could be proposed to get prioritized by anyone in the list or in the meeting
15:16:25 <witek_> Any core reviewer, preferably from a different company, could confirm such proposed change by setting RV +1.
15:16:42 <witek_> what do you think?
15:16:47 <dougsz> I think it's a good idea - we can focus review efforts
15:17:23 <witek_> we also give more visibility important changes
15:19:02 <witek_> additionally, Doug suggested publishing Gerrit Dashboard with the list of prioritized changes as one of the items
15:19:06 <witek_> http://www.tinyurl.com/monasca
15:20:19 <joadavis> cool
15:21:23 <witek_> I guess we can push a change to our docs and see in the review if folks like it
15:22:56 <dougsz> Sounds like a good plan
15:22:58 <Wasaac> +1
15:23:32 <witek_> cool, thanks
15:23:49 <witek_> #topic Kafka client status update
15:24:25 <witek_> when testing Kafka publisher in API with tempest test for Python 2 I
15:25:11 <witek_> I've found a bug in monasca-common
15:25:11 <witek_> https://review.opendev.org/680653
15:26:27 <witek_> with this bugfix all tempest tests for API Kafka client upgrade pass
15:27:59 <witek_> with the changes in API and notification (persister already merged) we have all python components updated
15:28:31 <witek_> so the goal is getting close to completion
15:29:10 <witek_> I'll appreciate your reviews
15:30:12 <witek_> that's all from me, do you have other topics?
15:31:01 <dougsz> brtknr asked me to give an update about his work on improving InfluxDB performance
15:31:20 <joadavis> I probably should, but don't currently.  If anyone is interested in discussing Monasca Events I did think about it a week ago. :P
15:31:25 <witek_> #topic improving InfluxDB performance
15:32:08 <dougsz> The first change is to support scoping dimension queries by time to speed up Grafana dashboard loads: https://review.opendev.org/#/c/670318
15:32:51 <dougsz> The problem we have hit, is that InfluxDB, cannot accurate return dimension values within a specified time window. It can return them within the shard duration, which can vary between ~days and ~weeks.
15:33:16 <dougsz> The current situation is that you load a DB with a dynamic query to fetch all hostnames for the time series.
15:33:44 <dougsz> With this patch you would load a subset of hostnames within the timewindow on the dashboard. This is a lot faster.
15:34:01 <dougsz> However, what we actually get are hostnames from outside the time window on the dashboard.
15:34:21 <dougsz> The query is a lot faster, but the API doesn't really do what it says.
15:34:34 <witek_> oops, any idea why?
15:34:44 <joadavis> sorry, I'm trying not to laugh out loud at that
15:34:57 <dougsz> :) It's a fundamental limitation with Influx
15:35:34 <Wasaac> So to confirm, the list of hosts returned will contain all hosts that did post data in the time window
15:35:36 <joadavis> If the return is a superset of what you are looking for, it would be tempting to do some post-processing filtering to remove hostnames you don't want
15:36:00 <dougsz> Wasaac: yes, that's correct
15:36:00 <Wasaac> But will also include those that posted outside the time window, but within the shard window
15:36:29 <dougsz> joadavis: yes, the problem is the dimension values are not returned with timestamps, so no post-processing is possible
15:36:38 <joadavis> yuck
15:37:17 <dougsz> brtknr tried alternative queries, which work, but the performance is worse than just getting all values
15:37:35 <witek_> and right now (without scoping) we get all dimension values which where ever written?
15:38:06 <dougsz> yeah, and in large DBs, that query can take 30mins + and lock up the InfluxDB instance while it runs
15:38:43 <Wasaac> Sounds to me that getting a faster return is worth more than having an accurate list
15:38:50 <Wasaac> Provided that's understood by users
15:38:53 <witek_> Wasaac: +1
15:38:57 <dougsz> That is the question
15:39:36 <witek_> the list will get much shorter and eventually updated after the shard "expires" completely
15:39:46 <witek_> seems to be good enough to me
15:40:05 <dougsz> ok, thanks, we will press on with that change then
15:40:21 <dougsz> If anyone thinks of any comments please add them to: https://review.opendev.org/#/c/670318
15:40:37 <dougsz> Second change to improve performance, is to use an InfluxDB per tenant
15:40:56 <dougsz> So that queries run against smaller datasets
15:41:33 <dougsz> This one seems to work nicely, but brtknr needs to investigate tempest failures
15:41:57 <witek_> so it requires changes both in API and persister?
15:42:19 <dougsz> That's correct, and a migration script to move data to the new layout
15:42:22 <dougsz> https://review.opendev.org/#/q/topic:story/2006331+(status:open+OR+status:merged)
15:42:31 <dougsz> ^ If anyone is interested
15:42:56 <dougsz> That's it from me on these two.
15:43:24 <witek_> that's the first step in implementing scalable InfluxDB setup
15:43:40 <witek_> very nice, thanks
15:44:43 <witek_> do you think automatic partitioning (not only based on tenant) would be possible in future?
15:45:54 <dougsz> Hmm, I suppose it should be with the right adapter
15:46:28 <dougsz> It would be nice to investigate TimescaleDB at that stage since it provides clustering out of the box
15:47:13 <witek_> I'm not sure about their licensing
15:48:06 <witek_> https://www.timescale.com/products
15:48:38 <witek_> some features are limited to enterprise version
15:48:46 <witek_> but I haven't looked in detail
15:49:37 <dougsz> Hmm, yes, automated data retention policies is missing from open-source
15:49:42 <dougsz> there is also https://eng.uber.com/m3/
15:49:56 <witek_> with only Go client
15:50:20 <witek_> which is not a blocker though
15:50:37 <dougsz> Yeah, but good point - it's clearly not a task for this cycle
15:53:09 <witek_> hosanai: do you have any update on monasca-analytics?
15:53:42 <dougsz> 500 million metrics per second, Uber claim, I wonder if anyone actually deployed it outside of Uber
15:54:01 <witek_> :)
15:54:12 <hosanai> witek_: now i'm working on python3.6/3.7 support.
15:54:22 <dougsz> * oops, aggregates
15:55:07 <witek_> hosanai: any idea about the time frame? the community goal has the deadline in 3 days
15:55:55 <hosanai> witek_: do my best :-)
15:56:12 <joadavis> Glad hosanai mentioned py3.  A documentation question came up - we should be sure that documentation reflects any python 3 changes to match up with any changes we made in code or cli
15:56:56 <joadavis> likely not much difference, but I hadnt thought of that aspect when I looked at py3 gate tests in the past
15:57:41 <hosanai> thanks for the info.
15:58:51 <witek_> any last comment before I wrap up?
15:59:20 <witek_> thanks for joining today
15:59:42 <witek_> let's make some progress on reviews
15:59:43 <joadavis> thanks all
15:59:47 <hosanai> thx!
15:59:50 <dougsz> thanks all, bye
15:59:51 <witek_> and see you next week
15:59:53 <witek_> bye
15:59:57 <witek_> #endmeeting