Tuesday, 2019-05-28

*** ricolin has joined #openstack-publiccloud01:13
*** hberaud|gone is now known as hberaud07:03
*** damien_r has joined #openstack-publiccloud07:07
*** ricolin_ has joined #openstack-publiccloud07:12
*** ricolin has quit IRC07:15
*** gtema has joined #openstack-publiccloud07:20
*** witek has joined #openstack-publiccloud07:31
*** logan- has quit IRC07:33
*** logan- has joined #openstack-publiccloud07:36
*** hberaud is now known as hberaud|school-r09:56
*** ricolin_ has quit IRC10:02
*** hberaud|school-r is now known as hberaud|lunch10:04
*** gtema has quit IRC10:48
*** gtema has joined #openstack-publiccloud10:50
*** tobias-urdin has joined #openstack-publiccloud11:13
*** hberaud|lunch is now known as hberaud11:14
*** gtema has quit IRC11:32
*** ncastele has joined #openstack-publiccloud12:02
*** ncastele has quit IRC12:15
*** gtema has joined #openstack-publiccloud13:18
*** damien_r has quit IRC13:56
tobberydbergo/14:00
witekhello14:00
tobberydberghi witek14:01
witekjust two of us today?14:03
*** ncastele has joined #openstack-publiccloud14:03
tobberydberglooks like the 2 of at this point at least =)14:04
ncasteleA bit late but I'm here :)14:04
witekhi ncastele14:04
ncasteleHi :)14:04
tobias-urdino/14:05
tobberydberghi ncastele and tobias-urdin14:05
tobias-urdini missed the last meeting, but read through the meeting logs14:05
tobberydberg+114:05
tobberydberg#startmeeting publiccloud_wg14:05
openstackMeeting started Tue May 28 14:05:47 2019 UTC and is due to finish in 60 minutes.  The chair is tobberydberg. Information about MeetBot at http://wiki.debian.org/MeetBot.14:05
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.14:05
*** openstack changes topic to " (Meeting topic: publiccloud_wg)"14:05
openstackThe meeting name has been set to 'publiccloud_wg'14:05
ncasteleso what do we want to achieve by the end of this hour?14:06
tobberydbergSo, continue the discussions from last week14:06
tobberydbergWrap up from last week14:07
tobberydbergwe agreed pretty much on that a first good step (phase 1) would be to focus on collecting data and storing the raw data14:07
tobberydberg#link https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal14:07
ncasteleyep, then we discuss about prometheus as a possible technical solution14:08
tobberydbergYes14:08
tobberydbergI still like that idea .... have extremely limited experience though14:08
ncastelethat's my concern, I'm not enough into prometheus to feel comfortable about this solution14:09
ncastelewe should probably go deeper into our needs for collecting and storing data, and challenge those needs with someone who has a better overview/understanding of prometheus14:09
gtemaI personally think this is kind of misuse of the existing solution for different purposes14:09
tobberydbergI would say, we should definitely first of all find the measurements that we need, and then choose technology that can solve that14:09
witekI think we can split the discussion into two parts, collection and storage14:09
tobias-urdinprometheus doesn't consider itself a reliable storage for data for example being used for billing according to their page iirc14:10
gtema++14:10
tobberydbergagree with that witek14:10
ncastele+114:10
tobberydbergtobias-urdin exporting the data to other storage backend?14:10
tobberydberg#link https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage14:11
tobias-urdinyou mean using prometheus for just the scraping of data then store it some other place, might be a idea14:11
tobberydbergyes14:11
gtematobberydberg - nope, the purpose is totally different, where loosing few measurements is not a big deal14:11
tobias-urdinpersonally don't like scraping first because you have to specify what to scrape and also during issues there is no queue-like behavior where you can retrieve data that you missed14:12
tobberydberg(TBH ... ceilometer never gave us that reliability either ;-) )14:12
gtemaI prefer exporting data directly to any TSDB14:13
tobias-urdin+114:13
tobias-urdinon that, i like the approach of having a tested solution do the collecting part and just writing integrations for openstack14:13
tobberydbergyou mean more the setup of ceilometer but with another storage?14:14
tobias-urdinmaybe for hypervisor based data, scraping is optimal, for the reasons mnaser said about scaling14:14
tobias-urdinwhich is pretty much what ceilometer does today14:14
tobias-urdincentral polling and agent polling on compute nodes14:14
tobberydbergright14:15
ncastelecan we challenge a bit the usage of a TSDB for our purpose? I know that it seems obvious, but we are already using a TSDB on our side and it has some limitation14:15
witekncastele: what do you mean?14:16
tobias-urdinwhat are you running? imo billing and tsdb doesn't necessarily need to be related other than we aggregate and save metrics to billing14:16
ncastelewe are using TSDB to push heartbeat of our instances for billing purpose, and as we need a lot of information for each point (instance id, status, flavor, tenant, ...), with the volume of instances we are handling, it's hard to handle the load14:17
tobberydberg"aggregate and save metrics to billing" that is what is important imo14:17
tobias-urdinall depends on how much you need to "prove" i.e how much do you want to aggregate14:18
witekncastele: common solution to that is to push measurements to StatsD deamon which does aggregations and saves to TSDB14:18
ncastelewhen we are talking about metrics, which kind of metrics are you talking about? Just events (like creation, deletion, upsize, etc.), or more something like heartbeat (a complete overview of everything that is running into the infrastructure)14:19
ncastele?14:19
gtemaand comming from billing in telecoms: storing only aggregates is a very bad approach14:19
tobberydbergI mean, there are those to categories of metrics ... some that are event driven only and those that is more "scraping/heartbeat" driven14:20
gtemaalways the raw values need to be stored, and later additionally aggregates for "invoicing purposes"14:20
tobberydberg*two14:20
tobias-urdinfwiw metrics when i speak is volume-based so nothing with events, we pretty much only care about usage/volume and not events14:21
ncasteleraw data for usage in metrics could be pretty huge regarding the number of instances we are handling, tbh, even in a tsdb storage14:22
witekgtema: that's a different type of aggregate14:22
tobberydbergfor example tobias-urdin ... how would you collect data for a virtual router? Usage? Or on event basis?14:25
tobias-urdindo you want data from the virtual router (bandwidth for example) or just the amount of virtual routers (or interfaces etc)?14:26
ncasteleit depends on your BM around virtual router: do you plan to charge the virtual router resource, or their trafic?14:26
tobias-urdinwe go by the second one14:26
ncastelebandwidth should always be billed (so stored/aggregated) by usage (number of GiB)14:27
ncastelefor instances, we can just bill the number of hours14:27
witekbtw. here the list of metrics provided by OVS Monasca plugin14:28
witekhttps://opendev.org/openstack/monasca-agent/src/branch/master/docs/Ovs.md14:28
ncasteleimo it's two different way of collecting, we don't have to use the same algorithm/backend to store and collect hours of instances than GiB of bandwidth14:28
tobberydbergboth, bandwith is ofcourse usage ... but the existance of a router something else14:28
tobias-urdinncastele: +1 i'd prefer a very good distrinction of them both because they can easily rack up a lot of space, especially if you want raw values14:29
ncastelehours of instances/routers/etc. should not take a lot of space imo: we "just" need to store creation and deletion date14:30
tobberydberg+114:31
tobias-urdinregarding metrics that we polled each minute, they easily racked up to like 1 TB in a short timespan before we limited our scope14:31
tobberydbergI mean, router existence can be aggregated as usage as well, but the calculation of that can easily be done via events as well, with less raw data14:31
tobias-urdinthen we swapped to gnocchi and we use pretty much nothing now since we can aggregate on the interval we want14:31
tobberydbergso you are using ceilometer for all the collection of data today tobias-urdin ?14:34
tobias-urdinyes14:35
tobberydbergwith gnocchi then, and that works well for you for all kind of resources?14:35
*** dasp has quit IRC14:36
ncastelesame on our side: ceilometer for data collection, then some mongo db for aggregating, and long term storage in a postgresql and a tsdb (and it's working, but we reached some limitation in this architecture)14:36
witekncastele: have you looked at Monasca?14:37
tobias-urdingnocchi is ok, but imo both ceilometer and gnocchi became more troublesome to use, could be simplified a little. our third-party billing engine also does some instance hour lookups against nova's simpleusage api14:38
ncastelewitek nope, not yet unfortunately :/14:39
tobberydbergSo we all have a little bit of different setups and preferences when it comes to the collection AND storage parts. (just trying to get some kind of view of the situation here)14:40
ncastelethe approach we were thinking of before discovering this working group, to achieve per second billing, was just some dirty SQL queries on nova database to collect usage. The main issue with this approach is that it needs specific implementation for each data collection14:41
tobias-urdinncastele: what kind of usage do you want to pull from the nova db?14:42
tobberydbergyea, that can probably "work" for instances, but definitely not for neutron resources since they are deleted from the databse14:42
ncasteletobberydberg yes. We should probably take time, for each of us, to define our needs regarding collecting so we will be able to address easier each of those needs with a solution14:43
tobberydbergdon't think that is the way to solve it though :-)14:43
ncasteledon't think either :)14:43
tobberydbergI believe so to14:43
witekin long term, I think the services should instrument their code to provide application specific metrics, these could be scraped or pushed to the monitoring system depending on a given architecture14:44
ncasteletobias-urdin: we wanted to pull instance id, flavor, start, stop, because that's 90% of what we need to bill instances14:44
witekthe code instrumentation is a flexible and future-proof approach, the monitoring will evolve together with the application14:44
tobberydbergMy suggestion is that get a summarised list of the metrics we need to measure, it is not all about instances, and how these can be measured (scraping or what not)14:45
tobberydbergDo you think that is a potential way forward? I'm open for any suggestions14:46
ncastelethat's a good start. We will not cover all resources/services, but that's a good way to focus on those we need to go forward14:47
tobberydbergwitek Agree that would be the best solution, but I'm pretty sure that won't happen any time soon14:47
witektobberydberg: it could be the mission of this group to drive and coordinate, there are other groups interested as well, like e.g. self-healing14:48
tobberydbergwitek absolutely, might be that we come to that conclusion after all14:49
tobberydbergso, added this section "Metrics we need" under the section "Limitation of scope"14:50
witeksounds good14:50
ncastele+114:50
ncastelecan we plan to fill it for the next meeting ?14:51
tobberydbergWould be good if all can try to identify and contribute to this section until next meeting, that will be next Thursday at the same time (1400 UTC) in this channel14:51
tobberydbergyou were quicker than me ncastele :-)14:51
ncastele:)14:52
tobias-urdincool, sounds like good next step14:52
tobberydbergadded some examples of a suggestion of how to structure that, resource, units, how to collect the data ... feel free to change and structure in whatever way you feel works best14:55
tobberydbergAnything else someone wants to rise before we close todays meeting?14:55
ncastelenot on my side :)14:56
witekthanks tobberydberg14:59
witeksee you next week14:59
ncastelethanks for this exceptional meeting14:59
ncastelesee u next week14:59
tobias-urdinnot really, we might want some heads up to see what the ceilometer-reboot comes up with14:59
tobias-urdinthanks tobberydberg14:59
tobberydbergthanks for today folks! Talk next week!14:59
tobberydbergindeed14:59
tobberydberg#endmeeting15:00
*** openstack changes topic to "New meeting time!! Thursday odd weeks at 1400 UTC in this channel!!"15:00
openstackMeeting ended Tue May 28 15:00:08 2019 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:00
openstackMinutes:        http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.html15:00
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.txt15:00
openstackLog:            http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.log.html15:00
*** hberaud is now known as hberaud|school-r15:32
*** ncastele has quit IRC15:33
*** hberaud|school-r is now known as hberaud15:39
*** witek has quit IRC15:53
*** ricolin_ has joined #openstack-publiccloud16:10
*** gtema has quit IRC16:28
*** ricolin_ has quit IRC17:00
*** hberaud is now known as hberaud|gone17:01
*** irclogbot_0 has quit IRC17:17
*** irclogbot_0 has joined #openstack-publiccloud17:19
*** dasp has joined #openstack-publiccloud20:20
*** dasp has quit IRC20:38
*** dasp has joined #openstack-publiccloud20:38

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!