14:10:45 <tobberydberg> #startmeeting publiccloud_wg 14:10:46 <openstack> Meeting started Thu Jun 6 14:10:45 2019 UTC and is due to finish in 60 minutes. The chair is tobberydberg. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:10:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:10:50 <openstack> The meeting name has been set to 'publiccloud_wg' 14:11:22 <tobberydberg> Simple agenda this time 14:11:26 <ncastele> Yep 14:11:29 <tobberydberg> #topic 1. Joint development effort billing 14:12:04 <tobberydberg> notes from last meeting is here: #link http://eavesdrop.openstack.org/meetings/publiccloud_wg/2019/publiccloud_wg.2019-05-28-14.05.log.html 14:12:39 <ncastele> homeworks have been done in the etherpad, and I'm quite happy we both wrote the same stuff 14:12:47 <ncastele> https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal 14:13:40 <tobberydberg> Yea, some parts are there for sure ... might be more there ... would be good if more people fill in their needs as well so we can get a full list of the needs 14:13:49 <ncastele> +1 14:13:50 <tobberydberg> But yes, pretty much the same 14:14:41 <tobberydberg> I guess we both feel that we need something that fires events as well as continues reports .... scraping method potentially 14:16:17 <ncastele> Scraping is a way to go, but can lead to an impact on control plane performances 14:17:00 <tobberydberg> What I was thinking about is that we basically need a tool or a set of tools that can handle all form of metrics, not only the metrics for the resources we bill for, but for things that are for general information/statistic purposes as well 14:17:25 <tobberydberg> well ... depends on how that scraping is done I would assume 14:17:31 <ncastele> what do you mean by things for general information ? 14:18:01 <tobberydberg> could be security groups and rules for instance 14:18:31 <tobberydberg> SOME might bill for that (we do not), but it would still be of interest for me to see the usage of that 14:19:05 <tobberydberg> Hard to track such thing today since neutron doesn't store anything deleted in the databse 14:19:13 <ncastele> Yes 14:19:21 <tobberydberg> barbican stuff is another thing 14:19:23 <wondra> What do you mean by scraping? 14:19:29 <wondra> API requests? 14:19:33 <ncastele> there are no retention at all in neutron db ? 14:20:04 <ncastele> API requests, or nice SQL queries on database 14:20:19 <wondra> Nice? :-) 14:20:28 <tobberydberg> not what I know of, no (on the retention) 14:20:41 <wondra> What about the usage of ceilometer events as I suggested? That is a log of the history in itself. 14:21:29 <tobberydberg> wondra I guess you can doing scraping in different forms ... but one can be db queries, you can also scrape each compute node for its resources ... 14:21:57 <ncastele> I'm not enough into ceilometer to ensure it can answer all our needs 14:22:37 <tobberydberg> What I hear from people, the ceilometer agents are pretty stable in that sense, but still, people do not fully rely on it anyone 14:22:42 <tobberydberg> *anyway 14:22:48 <wondra> Our billing is based on that. We did it back in Kilo and will have to rework it due to an API change for Ocata. 14:23:05 <tobberydberg> usually having something else as well and then comparing the results 14:23:07 <wondra> You get events about every entity along with details, like the owner, size, etc. 14:23:32 <tobberydberg> yes, which is good 14:23:44 <ncastele> Is it easy to enhance the content of an event with custom information depending of the service ? 14:24:04 <tobberydberg> the storage of that data and possibility to query it is hard to work with though 14:24:26 <wondra> Dunno. But you can choose what you store from the notification which is being sent by the particular OpenStack project in the events pipeline.yaml 14:24:29 <wondra> https://docs.openstack.org/ceilometer/ocata/events.html 14:25:45 <wondra> Querying it would be done with the Panko API, which we haven't learned yet. Still on the old Ceilometer one. We basically query it every day on midnight for every tenant and compute the usage. 14:26:24 <tobberydberg> It's been a little bit to long since I looked into all the side projects around ceilometer, but listening to people it seams not to work very well 14:26:56 <wondra> How does CloudKitty work anyway? Doest it query Panko or Gnocchi? 14:27:04 <tobberydberg> gnocchi being outside of openstack makes it worse 14:27:19 <tobberydberg> think it uses gnocchi 14:27:27 <witek> they don't rely on a specific DB from what I know 14:27:44 <ncastele> Cloudkitty offers drivers for prometheus/ceilometer/gnocchi 14:27:45 <witek> they can use Gnocchi, Monasca, Prometheus 14:28:09 <tobberydberg> ok ok 14:28:16 <witek> :) 14:29:45 <tobberydberg> are you folks basing your billing today on all native openstack services? 14:30:29 <tobberydberg> we do not, now days not at all using openstack telemetry services for that (unfortunately) 14:30:32 <ncastele> for the collecting part, yes, we are relying on ceilometer (in a heartbeat way, not in an event way) 14:30:45 <ncastele> for instances 14:31:11 <ncastele> For volumes, we have some sql queries that are pushing usage data into ceilometer 14:31:38 <tobberydberg> ok, so ceilometer doesn't work for that? 14:31:45 <ncastele> but it's a scrapping way (scrapping hypervisor, scrapping our ceph clusters, etc.) 14:32:26 <ncastele> ceilometer does the work because we scrap each 5 minutes, but we bill by hour, meaning that we accept we can lose some points 14:32:59 <mnaser> oh i didn't know cloudkitty can use prometheus TIL 14:33:21 <tobberydberg> ok. I would say that we need something that is down at seconds level of usage for all type of resources 14:33:49 <mnaser> i think if we forget scraping and bring evented monitoring, we dont have to track things per second anymore, because you can easily introspect things "what time did it start and end" 14:33:54 <ncastele> but when we come to seconds billing, it's not possible for us to use the actual system because for this precision, we need to base on events/on precise dates and time 14:34:09 <ncastele> +1 mnaser 14:34:13 <tobberydberg> mnaser totally agree 14:34:30 <mnaser> then it's $bill_in_whatever_increment_you_want 14:34:41 <tobberydberg> byes 14:35:05 <witek> seconds meaning 1s, 10s or 30s ? 14:35:10 <tobberydberg> do we need the scraping for other purposes? 14:35:15 <tobberydberg> 1s 14:35:36 <ncastele> seconds meaning 1s :D 14:35:43 <tobberydberg> or even less than seconds event based will give you 14:36:09 <wondra> I believe that we do it for Floating IPs. Cinder is fine with cinder-volume-audit in cron, but Neutron does not have that. 14:36:12 <witek> the actual question is if such fine resolution is really useful for billing? 14:36:14 <ncastele> it depends, if we trust at 100% our event based collecting, then we do not need scrapping. But scrapping can consolidate the event base process 14:36:31 <tobberydberg> I would say CPU load etc etc will need scraping if we sould like to cover those bits as well 14:37:25 <wondra> To clarify - most openstack components issue the .exist event, which allows you to find entities that you missed the .start notifications for. Floating IPs do not have it. 14:38:59 <tobberydberg> so same question again, if events is enough, is ceilometer reliable to use in its current shape and form? 14:39:02 <wondra> Billing by CPU load - the white unicorn of public clouds? 14:39:12 <tobberydberg> (in just collecting the "metrics") 14:39:28 <wondra> Dunno. I'm 4 releases behind. 14:39:34 <ncastele> Dunno either 14:39:46 <tobberydberg> mnaser what is you feeling there? 14:39:55 <wondra> Eh, actually 6. Damn. 14:40:46 <tobberydberg> wondra I was thinking just to have the same source of data to stuff that are interesting to visualise for users ... not necessarily bill for it =) 14:41:25 <tobberydberg> wondra just go directly to train this fall ;-) 14:41:42 <wondra> Having visualisations in our customer portal would be nice. I've got example code from a bachelor's thesis for Gnocchi. 14:42:24 <tobberydberg> +1 14:42:50 <wondra> With our main product being VPS, we don't need it in Horizon. 14:43:10 <mnaser> i think it's probably easier to start building the structure for the billable stuff 14:43:11 <tobberydberg> The biggest issue that I see here moving forward will be the storage and real time query bits 14:43:13 <mnaser> imho 14:43:45 <mnaser> well, we think of things as a resource, that's what ceilometer did at the time, resource X (could be cinder uuid) which starts created_at $x and deleted_at $y (or none) 14:44:05 <mnaser> and then we have a tool that says "query how many seconds has resource X been used in the period A => B" and it can do all the math 14:44:09 <mnaser> i have a lot of that code 14:45:38 <tobberydberg> event driven only, and I think that will cover the billing bits 14:46:10 <tobberydberg> so, using ceilometer for that mnaser? Reading off the rabbit queue? 14:46:27 <mnaser> nope, reading the db :\ 14:46:38 <tobberydberg> neutron? 14:46:46 <mnaser> not billing anything for that 14:47:34 <tobberydberg> ok. I totally agree that reading o the database is very easy for a lot of the resources, but not for all 14:47:53 <tobberydberg> object storage is another thing that will be har to track that way 14:48:26 <tobberydberg> dynamic size stuff 14:48:36 <mnaser> well i was thinking for that sort of thing, parsing logs might be the way to go 14:49:03 <mnaser> logs are pretty much events of download/upload i guess 14:49:20 <wondra> Reading the DB is bad. There's no clear contract. Any OpenStack project can change it and break the code. 14:49:36 <tobberydberg> true 14:52:17 <ncastele> agree with wondra, the db contract is not stable. But it's the easiest and less impacting way to read data 14:52:34 <tobberydberg> So, just a few more minutes left of todays meeting, would be good to sum it up and find action points until next meeting 14:52:50 <ncastele> I tried a PoC on reading from Nova APIs as an admin, and I quickly reached some limits 14:53:06 <wondra> But the deployment tools are trailing the OpenStack releases. Maybe a billing product that reads the database could only exist for 1 year old releases... 14:53:43 <tobberydberg> #action All - continue to define what we really need to track 14:55:08 <tobberydberg> what else? more clear suggestions on collection method of events? DB queries? whatnot? 14:55:39 <tobberydberg> Should we try to book a meeting for next week as well to keep up the pace before all vacations? 14:56:54 <tobberydberg> I'm happy to take a meeting same time next week 15:00:49 <wondra> Not against. I have said most of what I know, though. 15:01:03 <tobberydberg> well, time is up for today ... any opinions on above before we close down? 15:01:38 <tobberydberg> Thanks wondra ... have you added your needs of metrics to the etherpad as well? 15:01:49 <witek> good understanding of what exactly has to be collected is important 15:02:23 <wondra> yes, I have. 15:02:29 <tobberydberg> yes, I think so too and that is what we need to base tooling etc on to be successful 15:02:37 <tobberydberg> +1 wondra 15:03:47 <tobberydberg> Ok. So I'll end todays meeting. I send out a reminder for a meeting next week, and we continue with once per week for a few more weeks and see what we can get out of that 15:03:52 <tobberydberg> Thanks for today folks! 15:04:20 <witek> thanks, I'm in vacation, see you in 3 weeks 15:05:17 <tobberydberg> #endmeeting