*** ricolin has joined #openstack-publiccloud | 01:13 | |
*** hberaud|gone is now known as hberaud | 07:03 | |
*** damien_r has joined #openstack-publiccloud | 07:07 | |
*** ricolin_ has joined #openstack-publiccloud | 07:12 | |
*** ricolin has quit IRC | 07:15 | |
*** gtema has joined #openstack-publiccloud | 07:20 | |
*** witek has joined #openstack-publiccloud | 07:31 | |
*** logan- has quit IRC | 07:33 | |
*** logan- has joined #openstack-publiccloud | 07:36 | |
*** hberaud is now known as hberaud|school-r | 09:56 | |
*** ricolin_ has quit IRC | 10:02 | |
*** hberaud|school-r is now known as hberaud|lunch | 10:04 | |
*** gtema has quit IRC | 10:48 | |
*** gtema has joined #openstack-publiccloud | 10:50 | |
*** tobias-urdin has joined #openstack-publiccloud | 11:13 | |
*** hberaud|lunch is now known as hberaud | 11:14 | |
*** gtema has quit IRC | 11:32 | |
*** ncastele has joined #openstack-publiccloud | 12:02 | |
*** ncastele has quit IRC | 12:15 | |
*** gtema has joined #openstack-publiccloud | 13:18 | |
*** damien_r has quit IRC | 13:56 | |
tobberydberg | o/ | 14:00 |
---|---|---|
witek | hello | 14:00 |
tobberydberg | hi witek | 14:01 |
witek | just two of us today? | 14:03 |
*** ncastele has joined #openstack-publiccloud | 14:03 | |
tobberydberg | looks like the 2 of at this point at least =) | 14:04 |
ncastele | A bit late but I'm here :) | 14:04 |
witek | hi ncastele | 14:04 |
ncastele | Hi :) | 14:04 |
tobias-urdin | o/ | 14:05 |
tobberydberg | hi ncastele and tobias-urdin | 14:05 |
tobias-urdin | i missed the last meeting, but read through the meeting logs | 14:05 |
tobberydberg | +1 | 14:05 |
tobberydberg | #startmeeting publiccloud_wg | 14:05 |
openstack | Meeting started Tue May 28 14:05:47 2019 UTC and is due to finish in 60 minutes. The chair is tobberydberg. Information about MeetBot at http://wiki.debian.org/MeetBot. | 14:05 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 14:05 |
*** openstack changes topic to " (Meeting topic: publiccloud_wg)" | 14:05 | |
openstack | The meeting name has been set to 'publiccloud_wg' | 14:05 |
ncastele | so what do we want to achieve by the end of this hour? | 14:06 |
tobberydberg | So, continue the discussions from last week | 14:06 |
tobberydberg | Wrap up from last week | 14:07 |
tobberydberg | we agreed pretty much on that a first good step (phase 1) would be to focus on collecting data and storing the raw data | 14:07 |
tobberydberg | #link https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal | 14:07 |
ncastele | yep, then we discuss about prometheus as a possible technical solution | 14:08 |
tobberydberg | Yes | 14:08 |
tobberydberg | I still like that idea .... have extremely limited experience though | 14:08 |
ncastele | that's my concern, I'm not enough into prometheus to feel comfortable about this solution | 14:09 |
ncastele | we should probably go deeper into our needs for collecting and storing data, and challenge those needs with someone who has a better overview/understanding of prometheus | 14:09 |
gtema | I personally think this is kind of misuse of the existing solution for different purposes | 14:09 |
tobberydberg | I would say, we should definitely first of all find the measurements that we need, and then choose technology that can solve that | 14:09 |
witek | I think we can split the discussion into two parts, collection and storage | 14:09 |
tobias-urdin | prometheus doesn't consider itself a reliable storage for data for example being used for billing according to their page iirc | 14:10 |
gtema | ++ | 14:10 |
tobberydberg | agree with that witek | 14:10 |
ncastele | +1 | 14:10 |
tobberydberg | tobias-urdin exporting the data to other storage backend? | 14:10 |
tobberydberg | #link https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage | 14:11 |
tobias-urdin | you mean using prometheus for just the scraping of data then store it some other place, might be a idea | 14:11 |
tobberydberg | yes | 14:11 |
gtema | tobberydberg - nope, the purpose is totally different, where loosing few measurements is not a big deal | 14:11 |
tobias-urdin | personally don't like scraping first because you have to specify what to scrape and also during issues there is no queue-like behavior where you can retrieve data that you missed | 14:12 |
tobberydberg | (TBH ... ceilometer never gave us that reliability either ;-) ) | 14:12 |
gtema | I prefer exporting data directly to any TSDB | 14:13 |
tobias-urdin | +1 | 14:13 |
tobias-urdin | on that, i like the approach of having a tested solution do the collecting part and just writing integrations for openstack | 14:13 |
tobberydberg | you mean more the setup of ceilometer but with another storage? | 14:14 |
tobias-urdin | maybe for hypervisor based data, scraping is optimal, for the reasons mnaser said about scaling | 14:14 |
tobias-urdin | which is pretty much what ceilometer does today | 14:14 |
tobias-urdin | central polling and agent polling on compute nodes | 14:14 |
tobberydberg | right | 14:15 |
ncastele | can we challenge a bit the usage of a TSDB for our purpose? I know that it seems obvious, but we are already using a TSDB on our side and it has some limitation | 14:15 |
witek | ncastele: what do you mean? | 14:16 |
tobias-urdin | what are you running? imo billing and tsdb doesn't necessarily need to be related other than we aggregate and save metrics to billing | 14:16 |
ncastele | we are using TSDB to push heartbeat of our instances for billing purpose, and as we need a lot of information for each point (instance id, status, flavor, tenant, ...), with the volume of instances we are handling, it's hard to handle the load | 14:17 |
tobberydberg | "aggregate and save metrics to billing" that is what is important imo | 14:17 |
tobias-urdin | all depends on how much you need to "prove" i.e how much do you want to aggregate | 14:18 |
witek | ncastele: common solution to that is to push measurements to StatsD deamon which does aggregations and saves to TSDB | 14:18 |
ncastele | when we are talking about metrics, which kind of metrics are you talking about? Just events (like creation, deletion, upsize, etc.), or more something like heartbeat (a complete overview of everything that is running into the infrastructure) | 14:19 |
ncastele | ? | 14:19 |
gtema | and comming from billing in telecoms: storing only aggregates is a very bad approach | 14:19 |
tobberydberg | I mean, there are those to categories of metrics ... some that are event driven only and those that is more "scraping/heartbeat" driven | 14:20 |
gtema | always the raw values need to be stored, and later additionally aggregates for "invoicing purposes" | 14:20 |
tobberydberg | *two | 14:20 |
tobias-urdin | fwiw metrics when i speak is volume-based so nothing with events, we pretty much only care about usage/volume and not events | 14:21 |
ncastele | raw data for usage in metrics could be pretty huge regarding the number of instances we are handling, tbh, even in a tsdb storage | 14:22 |
witek | gtema: that's a different type of aggregate | 14:22 |
tobberydberg | for example tobias-urdin ... how would you collect data for a virtual router? Usage? Or on event basis? | 14:25 |
tobias-urdin | do you want data from the virtual router (bandwidth for example) or just the amount of virtual routers (or interfaces etc)? | 14:26 |
ncastele | it depends on your BM around virtual router: do you plan to charge the virtual router resource, or their trafic? | 14:26 |
tobias-urdin | we go by the second one | 14:26 |
ncastele | bandwidth should always be billed (so stored/aggregated) by usage (number of GiB) | 14:27 |
ncastele | for instances, we can just bill the number of hours | 14:27 |
witek | btw. here the list of metrics provided by OVS Monasca plugin | 14:28 |
witek | https://opendev.org/openstack/monasca-agent/src/branch/master/docs/Ovs.md | 14:28 |
ncastele | imo it's two different way of collecting, we don't have to use the same algorithm/backend to store and collect hours of instances than GiB of bandwidth | 14:28 |
tobberydberg | both, bandwith is ofcourse usage ... but the existance of a router something else | 14:28 |
tobias-urdin | ncastele: +1 i'd prefer a very good distrinction of them both because they can easily rack up a lot of space, especially if you want raw values | 14:29 |
ncastele | hours of instances/routers/etc. should not take a lot of space imo: we "just" need to store creation and deletion date | 14:30 |
tobberydberg | +1 | 14:31 |
tobias-urdin | regarding metrics that we polled each minute, they easily racked up to like 1 TB in a short timespan before we limited our scope | 14:31 |
tobberydberg | I mean, router existence can be aggregated as usage as well, but the calculation of that can easily be done via events as well, with less raw data | 14:31 |
tobias-urdin | then we swapped to gnocchi and we use pretty much nothing now since we can aggregate on the interval we want | 14:31 |
tobberydberg | so you are using ceilometer for all the collection of data today tobias-urdin ? | 14:34 |
tobias-urdin | yes | 14:35 |
tobberydberg | with gnocchi then, and that works well for you for all kind of resources? | 14:35 |
*** dasp has quit IRC | 14:36 | |
ncastele | same on our side: ceilometer for data collection, then some mongo db for aggregating, and long term storage in a postgresql and a tsdb (and it's working, but we reached some limitation in this architecture) | 14:36 |
witek | ncastele: have you looked at Monasca? | 14:37 |
tobias-urdin | gnocchi is ok, but imo both ceilometer and gnocchi became more troublesome to use, could be simplified a little. our third-party billing engine also does some instance hour lookups against nova's simpleusage api | 14:38 |
ncastele | witek nope, not yet unfortunately :/ | 14:39 |
tobberydberg | So we all have a little bit of different setups and preferences when it comes to the collection AND storage parts. (just trying to get some kind of view of the situation here) | 14:40 |
ncastele | the approach we were thinking of before discovering this working group, to achieve per second billing, was just some dirty SQL queries on nova database to collect usage. The main issue with this approach is that it needs specific implementation for each data collection | 14:41 |
tobias-urdin | ncastele: what kind of usage do you want to pull from the nova db? | 14:42 |
tobberydberg | yea, that can probably "work" for instances, but definitely not for neutron resources since they are deleted from the databse | 14:42 |
ncastele | tobberydberg yes. We should probably take time, for each of us, to define our needs regarding collecting so we will be able to address easier each of those needs with a solution | 14:43 |
tobberydberg | don't think that is the way to solve it though :-) | 14:43 |
ncastele | don't think either :) | 14:43 |
tobberydberg | I believe so to | 14:43 |
witek | in long term, I think the services should instrument their code to provide application specific metrics, these could be scraped or pushed to the monitoring system depending on a given architecture | 14:44 |
ncastele | tobias-urdin: we wanted to pull instance id, flavor, start, stop, because that's 90% of what we need to bill instances | 14:44 |
witek | the code instrumentation is a flexible and future-proof approach, the monitoring will evolve together with the application | 14:44 |
tobberydberg | My suggestion is that get a summarised list of the metrics we need to measure, it is not all about instances, and how these can be measured (scraping or what not) | 14:45 |
tobberydberg | Do you think that is a potential way forward? I'm open for any suggestions | 14:46 |
ncastele | that's a good start. We will not cover all resources/services, but that's a good way to focus on those we need to go forward | 14:47 |
tobberydberg | witek Agree that would be the best solution, but I'm pretty sure that won't happen any time soon | 14:47 |
witek | tobberydberg: it could be the mission of this group to drive and coordinate, there are other groups interested as well, like e.g. self-healing | 14:48 |
tobberydberg | witek absolutely, might be that we come to that conclusion after all | 14:49 |
tobberydberg | so, added this section "Metrics we need" under the section "Limitation of scope" | 14:50 |
witek | sounds good | 14:50 |
ncastele | +1 | 14:50 |
ncastele | can we plan to fill it for the next meeting ? | 14:51 |
tobberydberg | Would be good if all can try to identify and contribute to this section until next meeting, that will be next Thursday at the same time (1400 UTC) in this channel | 14:51 |
tobberydberg | you were quicker than me ncastele :-) | 14:51 |
ncastele | :) | 14:52 |
tobias-urdin | cool, sounds like good next step | 14:52 |
tobberydberg | added some examples of a suggestion of how to structure that, resource, units, how to collect the data ... feel free to change and structure in whatever way you feel works best | 14:55 |
tobberydberg | Anything else someone wants to rise before we close todays meeting? | 14:55 |
ncastele | not on my side :) | 14:56 |
witek | thanks tobberydberg | 14:59 |
witek | see you next week | 14:59 |
ncastele | thanks for this exceptional meeting | 14:59 |
ncastele | see u next week | 14:59 |
tobias-urdin | not really, we might want some heads up to see what the ceilometer-reboot comes up with | 14:59 |
tobias-urdin | thanks tobberydberg | 14:59 |
tobberydberg | thanks for today folks! Talk next week! | 14:59 |
tobberydberg | indeed | 14:59 |
tobberydberg | #endmeeting | 15:00 |
*** openstack changes topic to "New meeting time!! Thursday odd weeks at 1400 UTC in this channel!!" | 15:00 | |
openstack | Meeting ended Tue May 28 15:00:08 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.html | 15:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.txt | 15:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.log.html | 15:00 |
*** hberaud is now known as hberaud|school-r | 15:32 | |
*** ncastele has quit IRC | 15:33 | |
*** hberaud|school-r is now known as hberaud | 15:39 | |
*** witek has quit IRC | 15:53 | |
*** ricolin_ has joined #openstack-publiccloud | 16:10 | |
*** gtema has quit IRC | 16:28 | |
*** ricolin_ has quit IRC | 17:00 | |
*** hberaud is now known as hberaud|gone | 17:01 | |
*** irclogbot_0 has quit IRC | 17:17 | |
*** irclogbot_0 has joined #openstack-publiccloud | 17:19 | |
*** dasp has joined #openstack-publiccloud | 20:20 | |
*** dasp has quit IRC | 20:38 | |
*** dasp has joined #openstack-publiccloud | 20:38 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!