Tuesday, 2019-05-28

*** ricolin has joined #openstack-publiccloud		01:13
*** hberaud\|gone is now known as hberaud		07:03
*** damien_r has joined #openstack-publiccloud		07:07
*** ricolin_ has joined #openstack-publiccloud		07:12
*** ricolin has quit IRC		07:15
*** gtema has joined #openstack-publiccloud		07:20
*** witek has joined #openstack-publiccloud		07:31
*** logan- has quit IRC		07:33
*** logan- has joined #openstack-publiccloud		07:36
*** hberaud is now known as hberaud\|school-r		09:56
*** ricolin_ has quit IRC		10:02
*** hberaud\|school-r is now known as hberaud\|lunch		10:04
*** gtema has quit IRC		10:48
*** gtema has joined #openstack-publiccloud		10:50
*** tobias-urdin has joined #openstack-publiccloud		11:13
*** hberaud\|lunch is now known as hberaud		11:14
*** gtema has quit IRC		11:32
*** ncastele has joined #openstack-publiccloud		12:02
*** ncastele has quit IRC		12:15
*** gtema has joined #openstack-publiccloud		13:18
*** damien_r has quit IRC		13:56
tobberydberg	o/	14:00
witek	hello	14:00
tobberydberg	hi witek	14:01
witek	just two of us today?	14:03
*** ncastele has joined #openstack-publiccloud		14:03
tobberydberg	looks like the 2 of at this point at least =)	14:04
ncastele	A bit late but I'm here :)	14:04
witek	hi ncastele	14:04
ncastele	Hi :)	14:04
tobias-urdin	o/	14:05
tobberydberg	hi ncastele and tobias-urdin	14:05
tobias-urdin	i missed the last meeting, but read through the meeting logs	14:05
tobberydberg	+1	14:05
tobberydberg	#startmeeting publiccloud_wg	14:05
openstack	Meeting started Tue May 28 14:05:47 2019 UTC and is due to finish in 60 minutes. The chair is tobberydberg. Information about MeetBot at http://wiki.debian.org/MeetBot.	14:05
openstack	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	14:05
*** openstack changes topic to " (Meeting topic: publiccloud_wg)"		14:05
openstack	The meeting name has been set to 'publiccloud_wg'	14:05
ncastele	so what do we want to achieve by the end of this hour?	14:06
tobberydberg	So, continue the discussions from last week	14:06
tobberydberg	Wrap up from last week	14:07
tobberydberg	we agreed pretty much on that a first good step (phase 1) would be to focus on collecting data and storing the raw data	14:07
tobberydberg	#link https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal	14:07
ncastele	yep, then we discuss about prometheus as a possible technical solution	14:08
tobberydberg	Yes	14:08
tobberydberg	I still like that idea .... have extremely limited experience though	14:08
ncastele	that's my concern, I'm not enough into prometheus to feel comfortable about this solution	14:09
ncastele	we should probably go deeper into our needs for collecting and storing data, and challenge those needs with someone who has a better overview/understanding of prometheus	14:09
gtema	I personally think this is kind of misuse of the existing solution for different purposes	14:09
tobberydberg	I would say, we should definitely first of all find the measurements that we need, and then choose technology that can solve that	14:09
witek	I think we can split the discussion into two parts, collection and storage	14:09
tobias-urdin	prometheus doesn't consider itself a reliable storage for data for example being used for billing according to their page iirc	14:10
gtema	++	14:10
tobberydberg	agree with that witek	14:10
ncastele	+1	14:10
tobberydberg	tobias-urdin exporting the data to other storage backend?	14:10
tobberydberg	#link https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage	14:11
tobias-urdin	you mean using prometheus for just the scraping of data then store it some other place, might be a idea	14:11
tobberydberg	yes	14:11
gtema	tobberydberg - nope, the purpose is totally different, where loosing few measurements is not a big deal	14:11
tobias-urdin	personally don't like scraping first because you have to specify what to scrape and also during issues there is no queue-like behavior where you can retrieve data that you missed	14:12
tobberydberg	(TBH ... ceilometer never gave us that reliability either ;-) )	14:12
gtema	I prefer exporting data directly to any TSDB	14:13
tobias-urdin	+1	14:13
tobias-urdin	on that, i like the approach of having a tested solution do the collecting part and just writing integrations for openstack	14:13
tobberydberg	you mean more the setup of ceilometer but with another storage?	14:14
tobias-urdin	maybe for hypervisor based data, scraping is optimal, for the reasons mnaser said about scaling	14:14
tobias-urdin	which is pretty much what ceilometer does today	14:14
tobias-urdin	central polling and agent polling on compute nodes	14:14
tobberydberg	right	14:15
ncastele	can we challenge a bit the usage of a TSDB for our purpose? I know that it seems obvious, but we are already using a TSDB on our side and it has some limitation	14:15
witek	ncastele: what do you mean?	14:16
tobias-urdin	what are you running? imo billing and tsdb doesn't necessarily need to be related other than we aggregate and save metrics to billing	14:16
ncastele	we are using TSDB to push heartbeat of our instances for billing purpose, and as we need a lot of information for each point (instance id, status, flavor, tenant, ...), with the volume of instances we are handling, it's hard to handle the load	14:17
tobberydberg	"aggregate and save metrics to billing" that is what is important imo	14:17
tobias-urdin	all depends on how much you need to "prove" i.e how much do you want to aggregate	14:18
witek	ncastele: common solution to that is to push measurements to StatsD deamon which does aggregations and saves to TSDB	14:18
ncastele	when we are talking about metrics, which kind of metrics are you talking about? Just events (like creation, deletion, upsize, etc.), or more something like heartbeat (a complete overview of everything that is running into the infrastructure)	14:19
ncastele	?	14:19
gtema	and comming from billing in telecoms: storing only aggregates is a very bad approach	14:19
tobberydberg	I mean, there are those to categories of metrics ... some that are event driven only and those that is more "scraping/heartbeat" driven	14:20
gtema	always the raw values need to be stored, and later additionally aggregates for "invoicing purposes"	14:20
tobberydberg	*two	14:20
tobias-urdin	fwiw metrics when i speak is volume-based so nothing with events, we pretty much only care about usage/volume and not events	14:21
ncastele	raw data for usage in metrics could be pretty huge regarding the number of instances we are handling, tbh, even in a tsdb storage	14:22
witek	gtema: that's a different type of aggregate	14:22
tobberydberg	for example tobias-urdin ... how would you collect data for a virtual router? Usage? Or on event basis?	14:25
tobias-urdin	do you want data from the virtual router (bandwidth for example) or just the amount of virtual routers (or interfaces etc)?	14:26
ncastele	it depends on your BM around virtual router: do you plan to charge the virtual router resource, or their trafic?	14:26
tobias-urdin	we go by the second one	14:26
ncastele	bandwidth should always be billed (so stored/aggregated) by usage (number of GiB)	14:27
ncastele	for instances, we can just bill the number of hours	14:27
witek	btw. here the list of metrics provided by OVS Monasca plugin	14:28
witek	https://opendev.org/openstack/monasca-agent/src/branch/master/docs/Ovs.md	14:28
ncastele	imo it's two different way of collecting, we don't have to use the same algorithm/backend to store and collect hours of instances than GiB of bandwidth	14:28
tobberydberg	both, bandwith is ofcourse usage ... but the existance of a router something else	14:28
tobias-urdin	ncastele: +1 i'd prefer a very good distrinction of them both because they can easily rack up a lot of space, especially if you want raw values	14:29
ncastele	hours of instances/routers/etc. should not take a lot of space imo: we "just" need to store creation and deletion date	14:30
tobberydberg	+1	14:31
tobias-urdin	regarding metrics that we polled each minute, they easily racked up to like 1 TB in a short timespan before we limited our scope	14:31
tobberydberg	I mean, router existence can be aggregated as usage as well, but the calculation of that can easily be done via events as well, with less raw data	14:31
tobias-urdin	then we swapped to gnocchi and we use pretty much nothing now since we can aggregate on the interval we want	14:31
tobberydberg	so you are using ceilometer for all the collection of data today tobias-urdin ?	14:34
tobias-urdin	yes	14:35
tobberydberg	with gnocchi then, and that works well for you for all kind of resources?	14:35
*** dasp has quit IRC		14:36
ncastele	same on our side: ceilometer for data collection, then some mongo db for aggregating, and long term storage in a postgresql and a tsdb (and it's working, but we reached some limitation in this architecture)	14:36
witek	ncastele: have you looked at Monasca?	14:37
tobias-urdin	gnocchi is ok, but imo both ceilometer and gnocchi became more troublesome to use, could be simplified a little. our third-party billing engine also does some instance hour lookups against nova's simpleusage api	14:38
ncastele	witek nope, not yet unfortunately :/	14:39
tobberydberg	So we all have a little bit of different setups and preferences when it comes to the collection AND storage parts. (just trying to get some kind of view of the situation here)	14:40
ncastele	the approach we were thinking of before discovering this working group, to achieve per second billing, was just some dirty SQL queries on nova database to collect usage. The main issue with this approach is that it needs specific implementation for each data collection	14:41
tobias-urdin	ncastele: what kind of usage do you want to pull from the nova db?	14:42
tobberydberg	yea, that can probably "work" for instances, but definitely not for neutron resources since they are deleted from the databse	14:42
ncastele	tobberydberg yes. We should probably take time, for each of us, to define our needs regarding collecting so we will be able to address easier each of those needs with a solution	14:43
tobberydberg	don't think that is the way to solve it though :-)	14:43
ncastele	don't think either :)	14:43
tobberydberg	I believe so to	14:43
witek	in long term, I think the services should instrument their code to provide application specific metrics, these could be scraped or pushed to the monitoring system depending on a given architecture	14:44
ncastele	tobias-urdin: we wanted to pull instance id, flavor, start, stop, because that's 90% of what we need to bill instances	14:44
witek	the code instrumentation is a flexible and future-proof approach, the monitoring will evolve together with the application	14:44
tobberydberg	My suggestion is that get a summarised list of the metrics we need to measure, it is not all about instances, and how these can be measured (scraping or what not)	14:45
tobberydberg	Do you think that is a potential way forward? I'm open for any suggestions	14:46
ncastele	that's a good start. We will not cover all resources/services, but that's a good way to focus on those we need to go forward	14:47
tobberydberg	witek Agree that would be the best solution, but I'm pretty sure that won't happen any time soon	14:47
witek	tobberydberg: it could be the mission of this group to drive and coordinate, there are other groups interested as well, like e.g. self-healing	14:48
tobberydberg	witek absolutely, might be that we come to that conclusion after all	14:49
tobberydberg	so, added this section "Metrics we need" under the section "Limitation of scope"	14:50
witek	sounds good	14:50
ncastele	+1	14:50
ncastele	can we plan to fill it for the next meeting ?	14:51
tobberydberg	Would be good if all can try to identify and contribute to this section until next meeting, that will be next Thursday at the same time (1400 UTC) in this channel	14:51
tobberydberg	you were quicker than me ncastele :-)	14:51
ncastele	:)	14:52
tobias-urdin	cool, sounds like good next step	14:52
tobberydberg	added some examples of a suggestion of how to structure that, resource, units, how to collect the data ... feel free to change and structure in whatever way you feel works best	14:55
tobberydberg	Anything else someone wants to rise before we close todays meeting?	14:55
ncastele	not on my side :)	14:56
witek	thanks tobberydberg	14:59
witek	see you next week	14:59
ncastele	thanks for this exceptional meeting	14:59
ncastele	see u next week	14:59
tobias-urdin	not really, we might want some heads up to see what the ceilometer-reboot comes up with	14:59
tobias-urdin	thanks tobberydberg	14:59
tobberydberg	thanks for today folks! Talk next week!	14:59
tobberydberg	indeed	14:59
tobberydberg	#endmeeting	15:00
*** openstack changes topic to "New meeting time!! Thursday odd weeks at 1400 UTC in this channel!!"		15:00
openstack	Meeting ended Tue May 28 15:00:08 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	15:00
openstack	Minutes: http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.html	15:00
openstack	Minutes (text): http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.txt	15:00
openstack	Log: http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.log.html	15:00
*** hberaud is now known as hberaud\|school-r		15:32
*** ncastele has quit IRC		15:33
*** hberaud\|school-r is now known as hberaud		15:39
*** witek has quit IRC		15:53
*** ricolin_ has joined #openstack-publiccloud		16:10
*** gtema has quit IRC		16:28
*** ricolin_ has quit IRC		17:00
*** hberaud is now known as hberaud\|gone		17:01
*** irclogbot_0 has quit IRC		17:17
*** irclogbot_0 has joined #openstack-publiccloud		17:19
*** dasp has joined #openstack-publiccloud		20:20
*** dasp has quit IRC		20:38
*** dasp has joined #openstack-publiccloud		20:38

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!