*** openstackgerrit has quit IRC | 01:30 | |
*** openstackgerrit has joined #openstack-monasca | 01:33 | |
openstackgerrit | jianweizhang proposed openstack/monasca-api master: Docker support keystone insecure option https://review.openstack.org/651418 | 01:33 |
---|---|---|
*** openstackstatus has quit IRC | 04:35 | |
*** openstackstatus has joined #openstack-monasca | 04:36 | |
*** ChanServ sets mode: +v openstackstatus | 04:36 | |
*** pcaruana has joined #openstack-monasca | 05:06 | |
*** witek has joined #openstack-monasca | 07:23 | |
*** pcaruana has quit IRC | 07:34 | |
*** pcaruana has joined #openstack-monasca | 07:35 | |
*** mohankumar has joined #openstack-monasca | 07:44 | |
*** bandorf has joined #openstack-monasca | 08:07 | |
*** bandorf has quit IRC | 08:08 | |
*** dougsz has joined #openstack-monasca | 08:09 | |
*** bandorf has joined #openstack-monasca | 08:14 | |
openstackgerrit | Michał Piotrowski proposed openstack/monasca-thresh master: Create Docker image and build in Zuul https://review.openstack.org/649298 | 08:28 |
openstackgerrit | Adrian Czarnecki proposed openstack/monasca-api master: [WIP] Merge log-api and api https://review.openstack.org/651249 | 08:31 |
openstackgerrit | Adrian Czarnecki proposed openstack/monasca-api master: [WIP] Merge log-api and api https://review.openstack.org/651249 | 08:34 |
*** chaconpiza has quit IRC | 08:35 | |
*** oneswig has joined #openstack-monasca | 08:40 | |
*** chaconpiza has joined #openstack-monasca | 08:44 | |
oneswig | A quick question arising from the discussion here http://eavesdrop.openstack.org/irclogs/%23openstack-telemetry/%23openstack-telemetry.2019-04-09.log.html#t2019-04-09T06:36:15 - wondered if the Monasca team had considered Gnocchi as an HA-capable free alternative to InfluxDB? Apparently Gnocchi can speak InfluxDB protocol, so it may not be too hard to achieve. | 08:54 |
oneswig | I guess given the discussion was driven by concerns over Gnocchi's ongoing development, it might not be so wise... | 08:55 |
witek | oneswig: we did discuss it during the PTG in fall 2017 | 08:58 |
witek | but nobody was willing to invest time and implement it | 08:59 |
oneswig | Thanks witek - so many features, so little time... | 08:59 |
witek | I think the project is not actively developed anymore | 09:00 |
witek | oneswig: have you considered using Kafka consumer groups to replicate measurements between InfluxDB instances? | 09:09 |
openstackgerrit | Merged openstack/monasca-common master: Use proper naming for docker services image zuul jobs https://review.openstack.org/650011 | 09:33 |
oneswig | witek: no, I don't think so | 09:40 |
*** oneswig has quit IRC | 09:40 | |
openstackgerrit | Merged openstack/monasca-api master: Use proper naming for docker service image zuul job https://review.openstack.org/650885 | 09:46 |
openstackgerrit | Merged openstack/monasca-log-api master: Use proper naming for docker service image zuul job https://review.openstack.org/650887 | 09:46 |
openstackgerrit | Merged openstack/monasca-persister master: Add coverage report display https://review.openstack.org/650264 | 09:57 |
*** mohankumar has quit IRC | 11:16 | |
openstackgerrit | Merged openstack/monasca-tempest-plugin master: Use proper naming for docker service image zuul job https://review.openstack.org/650886 | 11:21 |
*** mohankumar has joined #openstack-monasca | 11:36 | |
*** bobh has joined #openstack-monasca | 11:55 | |
openstackgerrit | Michał Piotrowski proposed openstack/monasca-ui master: Unit tests fail https://review.openstack.org/651511 | 12:07 |
*** haru5ny has joined #openstack-monasca | 12:09 | |
*** haru5ny has quit IRC | 12:11 | |
openstackgerrit | Michał Piotrowski proposed openstack/monasca-ui master: Unit tests fail https://review.openstack.org/651512 | 12:12 |
*** haru5ny has joined #openstack-monasca | 12:12 | |
*** haru5ny has quit IRC | 12:13 | |
*** haru5ny has joined #openstack-monasca | 12:14 | |
*** haru5ny has quit IRC | 12:14 | |
*** bobh has quit IRC | 12:18 | |
openstackgerrit | Adrian Czarnecki proposed openstack/monasca-api master: [WIP] Merge log-api and api https://review.openstack.org/651249 | 12:46 |
*** mohankumar has quit IRC | 12:58 | |
*** irclogbot_1 has joined #openstack-monasca | 13:03 | |
*** altlogbot_2 has joined #openstack-monasca | 13:07 | |
openstackgerrit | Merged openstack/monasca-agent master: Use proper naming for docker service image zuul job https://review.openstack.org/650882 | 13:14 |
openstackgerrit | Merged openstack/monasca-notification master: Use proper naming for docker service image zuul job https://review.openstack.org/650883 | 13:23 |
*** openstackgerrit has quit IRC | 14:14 | |
witek | Courtesy Monasca meeting reminder in #openstack-monasca: witek, jayahn,iurygregory,ezpz,igorn,haad,sc,joadavis, akiraY,tobiajo,dougsz_,fouadben, amofakhar, aagate, haruki,kaiokmo,pandiyan,charana,guilhermesp,chaconpiza,toabctl | 15:00 |
witek | #startmeeting monasca | 15:00 |
dougsz | hello all | 15:00 |
openstack | Meeting started Wed Apr 10 15:00:52 2019 UTC and is due to finish in 60 minutes. The chair is witek. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
*** openstack changes topic to " (Meeting topic: monasca)" | 15:00 | |
openstack | The meeting name has been set to 'monasca' | 15:00 |
witek | hi dougsz | 15:00 |
chaconpiza | Hi | 15:01 |
witek | hi chaconpiza | 15:01 |
bandorf | hi, everybody | 15:01 |
witek | hi | 15:01 |
witek | agenda for today: | 15:02 |
witek | https://etherpad.openstack.org/p/monasca-team-meeting-agenda | 15:02 |
witek | I don | 15:02 |
witek | sorry, let's start | 15:02 |
witek | #topic monasca-thresh replacement | 15:02 |
*** openstack changes topic to "monasca-thresh replacement (Meeting topic: monasca)" | 15:02 | |
witek | I started making myself thought how can we replace monasca-thresh | 15:03 |
witek | as we urgently do need to replace it | 15:03 |
witek | and so I looked how Prometheus or Aodh are doing this | 15:04 |
witek | and they both don't work on streams but query from the DB | 15:04 |
witek | which is much easier to implement | 15:04 |
witek | and then I thought we could actually try to use what Prometheus offers | 15:05 |
witek | and came up with this document | 15:05 |
witek | https://docs.google.com/presentation/d/1tvllnWaridOG-t-qj9D2brddeQXsYNyZwoYUfby_3Ns/edit?usp=sharing | 15:05 |
witek | I've seen your first comments, thanks a lot for that | 15:05 |
witek | I'd like to start discussion, what do you think of that approach? is that plausible? | 15:06 |
bandorf | maybe we can discuss smaller topics first? and then conclude wether it's plausible? | 15:07 |
*** haru5ny has joined #openstack-monasca | 15:07 | |
witek | right, do we have to discuss if monasca-thresh should be replaced? | 15:08 |
chaconpiza | What about the upgrade from current solution to the new one using Prometheus for current clients? | 15:09 |
Dobroslaw | hi | 15:09 |
joadavis | hi Dobroslaw | 15:09 |
witek | chaconpiza: you mean, what operator would have to do to upgrade from one Monasca version to another? | 15:09 |
chaconpiza | yes | 15:10 |
bandorf | I propose to discuss this (migration) later, when a decision has been taken | 15:10 |
witek | the measurement schema would change, so although saved in InfluxDB, some data migration would have to happen if new functionality would be required | 15:11 |
joadavis | well, if we keep the monasca api and just use prometheus for the thresholding and alarming, it might not be much change for a current client | 15:11 |
bandorf | Regarding your problem statement, Witek: I agree with topic 1,2 and 5. | 15:11 |
bandorf | $ (complex cluster): I can't really judge | 15:11 |
bandorf | $=4 | 15:11 |
bandorf | topic 3: High resource consumption: This is certainly true. However, I 'm not sure if this is caused by monasca itself or bei storm | 15:12 |
bandorf | bei = by | 15:12 |
Dobroslaw | I'm not sure if Prometheus actually will be lighter than storm... | 15:12 |
joadavis | yes, would definitely want to qualify performance | 15:13 |
joadavis | and footprint | 15:13 |
Dobroslaw | would be nice if we find someone using prometheus at production and tell us how much resources it's using, on average and with data spikes | 15:13 |
Dobroslaw | I found quite few people complaining about memory usage | 15:14 |
dougsz | Dobroslaw: We're using it, I haven't benchmarked it yet, but I've frequently seen it at the top of `top` | 15:14 |
bandorf | Using "remote read" from influx causes some further overhead - don't know, to what extent | 15:15 |
Dobroslaw | and it don't have build in max memory tuning options | 15:15 |
dougsz | In addition to extending alarm expression language (#2) we also have a requirement to include metadata with alarms | 15:15 |
Dobroslaw | I think I linked to discussion, like using 10x more memory per measurement... | 15:15 |
witek | dougsz: where does the metadata come from, and can that requirement be addressed with Prometheus? | 15:17 |
joadavis | I've talked to a few people who have the impression that Prometheus has a smaller footprint than Monasca, but I suspect that is relative to their install (or just marketing speak) | 15:17 |
dougsz | witek: For example, we want to create a Jira ticket for every log error message. The metadata would include a snippet of the error message. Not sure if it can be done with Prometheus either. I think the approach would be to use something like mtail to make logs scrapable. | 15:18 |
Dobroslaw | it's invasive change, HA will need to be handled differently, not sure how to fast test it with monasca | 15:18 |
witek | Dobroslaw: what would be an alternative? | 15:20 |
Dobroslaw | unfortunately I don't have alternative, just bringing important point, monasca most likely would be installed on same machine with prometheus | 15:21 |
Dobroslaw | and sharing resources with it | 15:21 |
joadavis | we may need a POC to show it can be done... | 15:22 |
witek | remote read is for sure an important aspect, Prometheus normally makes use of built-in aggregations and in proposed setup, the calculation would have to be done on the complete dataset | 15:22 |
witek | complete dataset for a given alerting rule only of course, normally the last 10 minutes of data or so | 15:24 |
witek | dougsz: how do you use Prometheus, do you have many alerting rules? how much data? | 15:25 |
dougsz | We aren't using it at scale yet and we don't have a large number of alerting rules. | 15:27 |
dougsz | We've combined it with mtail to generate metrics from log messages | 15:27 |
dougsz | Currently we use Prometheus as the TSDB, no Influx yet | 15:28 |
dougsz | We use kolla-ansible for the deployment - there are quite a few exporters included in that out of the box | 15:30 |
witek | yes, for the collector part we should advertise the monasca-agent Prometheus plugin better | 15:31 |
witek | thanks dougsz | 15:32 |
dougsz | +1 - I think that's a big win - Prometheus exporters are generally pretty up-to-date and it's great we can take advantage from the Monasca Agent. | 15:32 |
witek | bandorf has commented on the delay until the alarm get's triggered | 15:32 |
witek | is that an issue? | 15:32 |
dougsz | I think it's a good point. | 15:33 |
witek | is it a requirement for anyone? | 15:33 |
bandorf | I had a brief discussion with Cristiano (Product Management) about this. His opinion was: In a typical OpenStack environment, it should be OK. In other scenarios (IoT-demo-fire alarm) it is not. | 15:34 |
dougsz | Generally we haven't used the buffering capabilities of Kafka too much, but it's slightly concerning that alarms could stop working if there was a large burst of metrics. | 15:34 |
joadavis | may depend on use case. Some of the auto-scaling/self healing may want faster alarming | 15:34 |
joadavis | to reduce downtimes and interruptions | 15:36 |
witek | I think the streaming based implementation would be much more complicated, requiring knowledge of Kafka Streams or Apache Storm | 15:37 |
witek | or not scalable, like monasca-aggregator | 15:38 |
witek | the only way to scale aggregator is to shard the data and consume from different Kafka topics | 15:39 |
witek | which is also a valid approach after all | 15:39 |
witek | I have one another concern about Prometheus based set up | 15:40 |
witek | Prometheus defines all its alerting rules and notification via config files | 15:41 |
witek | there is no API for setting them | 15:41 |
witek | only query API to get the current configuration | 15:41 |
joadavis | yeah, that is a concern especially if we do an HA setup (keeping the config files in sync) | 15:42 |
joadavis | does changing a rule then require restarting the Prometheus service? | 15:42 |
witek | reloading | 15:43 |
witek | ok, let's sum up what we have on advantages: | 15:44 |
witek | * great community eco-system with many integrations | 15:45 |
witek | * very flexible alerting rules | 15:45 |
witek | * and query language for visualisations | 15:45 |
witek | * easy deployment | 15:46 |
witek | anything else? | 15:47 |
witek | disadvantages: | 15:48 |
dougsz | * could also monitor the monasca components directly? eg. alert if influxdb goes down | 15:48 |
witek | yes, I'm not sure if that's Prometheus specific | 15:49 |
joadavis | disadvantage: * potentially large footprint and resource usage | 15:49 |
joadavis | disadvantage: * no guaranteed delivery of metrics (requirement for billing systems, not as much a concern for alerting) | 15:50 |
witek | * remote read requires getting complete data chunks from InfluxDB for every evaluation | 15:50 |
joadavis | disadvantage: * no native HA support, requires work to design | 15:51 |
dougsz | disadvantage: * HA model for Prometheus server isn't totally clear (to me at least) | 15:51 |
witek | joadavis: well, with Kafka and InfluxDB we do get guaranteed delivery | 15:51 |
dougsz | disadvantage: * Alerting chain is even more complex. Eg. Monasca API -> Kafka -> Persister -> Influx -> Prometheus -> Alert manager | 15:52 |
bandorf | disadvantage: * longer latency time until alarm gets fired | 15:52 |
bandorf | unknown: * impact of 'remote read to influxdb' | 15:53 |
witek | I would also argue with HA model, it's the same model as for InfluxDB, and we can use API and Kafka to help make it better | 15:53 |
witek | disadvantage: no API for alerting rules and notifications, config based operation | 15:54 |
joadavis | I have a question about whether this puts Cassandra out of our design, but we are short on time so we can save that for another day | 15:54 |
witek | for this set up, we could not use Cassandra, it does not have remote read | 15:55 |
witek | OK, let's cut it here for now | 15:56 |
witek | let's quickly go through the other topics: | 15:56 |
witek | #topic Retirement of Openstack Ansible Monasca roles | 15:56 |
*** openstack changes topic to "Retirement of Openstack Ansible Monasca roles (Meeting topic: monasca)" | 15:56 | |
witek | http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004610.html | 15:56 |
witek | guimaluf: are you around? | 15:57 |
witek | unfortunately I don't know anyone using OSA | 15:57 |
witek | #topic Telemetry discussion | 15:58 |
*** openstack changes topic to "Telemetry discussion (Meeting topic: monasca)" | 15:58 | |
witek | http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004851.html | 15:58 |
witek | there was a quick of meeting for Telemetry project yesterday | 15:58 |
witek | with the new PTL | 15:58 |
witek | after there was nobody starting for the PTL in Train | 15:59 |
witek | anyone, they have considered if they should continue to rely on Gnocchi or search for alternatives | 16:00 |
joadavis | I want us to have a good response for taht | 16:00 |
joadavis | I need to write a thoughtful email back and recommend monasca-ceilometer :) | 16:00 |
witek | as Mark has written in his email, it would be good to maintain just one monitoring project in OpenStack | 16:01 |
dougsz | was just thinking about ceilosca | 16:01 |
joadavis | but we could also have larger discussions about where the monasca agent and ceilometer agent overlap and how to make mon-agent cover all | 16:01 |
witek | joadavis: do we want to sync about the answer to the mailing list? | 16:03 |
joadavis | sure. I can write a draft and send it to you, or you can | 16:03 |
witek | OK, ping you offline | 16:03 |
joadavis | with these kind of questions I start thinking in pictures, but that is hard to do in text emails | 16:04 |
witek | #topic PTG | 16:04 |
*** openstack changes topic to "PTG (Meeting topic: monasca)" | 16:04 | |
witek | we have a conflict with self-healing session on the first day, Thursday | 16:04 |
witek | should we start our sessions on Friday? | 16:04 |
witek | and free the slot? | 16:05 |
dougsz | sounds sensible | 16:05 |
chaconpiza | +1 | 16:05 |
witek | joadavis: chaconpiza ? | 16:05 |
Dobroslaw | I'm not sure if chaconpiza will be returning on Friday | 16:05 |
chaconpiza | I will come back on Saturday, I found a good connection flight :) | 16:06 |
Dobroslaw | oh, great | 16:06 |
joadavis | I'm ok with that. I think one of our goals for this PTG should be working with other projects and SIGs | 16:06 |
witek | OK, thanks for joining today | 16:07 |
witek | and for good discussion | 16:07 |
witek | next week I'm in vacation | 16:07 |
witek | so could some else please start the meeting | 16:07 |
witek | all from me, bye | 16:08 |
dougsz | Thanks all, and have a good vacation | 16:08 |
Dobroslaw | bye | 16:08 |
joadavis | bye | 16:08 |
haru5ny | thank you, bye. | 16:08 |
chaconpiza | Ok, enjoy the vacations. Bye. | 16:08 |
witek | #endmeeting | 16:08 |
*** openstack changes topic to "OpenStack Monitoring as a Service | https://wiki.openstack.org/wiki/Monasca" | 16:08 | |
openstack | Meeting ended Wed Apr 10 16:08:38 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:08 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/monasca/2019/monasca.2019-04-10-15.00.html | 16:08 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/monasca/2019/monasca.2019-04-10-15.00.txt | 16:08 |
openstack | Log: http://eavesdrop.openstack.org/meetings/monasca/2019/monasca.2019-04-10-15.00.log.html | 16:08 |
*** haru5ny has quit IRC | 16:08 | |
joadavis | One topic we didn't get to in the meeting - is anyone working on python3 support? I suspect we have some check tests running in Zuul which are labeled py3 but aren't actually executing the unit tests (at least for monasca-agent). | 16:21 |
*** altlogbot_2 has quit IRC | 16:45 | |
*** dougsz has quit IRC | 17:00 | |
*** witek has quit IRC | 17:44 | |
-openstackstatus- NOTICE: Restarting Gerrit on review.openstack.org to pick up new configuration for the replication plugin | 19:05 | |
*** bobh has joined #openstack-monasca | 21:39 | |
*** bobh has quit IRC | 21:47 | |
*** bobh has joined #openstack-monasca | 21:48 | |
*** bobh has quit IRC | 22:14 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!