11:00:22 <oneswig> #startmeeting scientific-sig 11:00:23 <openstack> Meeting started Wed Jul 31 11:00:22 2019 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:26 <openstack> The meeting name has been set to 'scientific_sig' 11:00:34 <oneswig> Greetings 11:00:47 <verdurin> Afternoon. 11:00:48 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_July_31st_2019 11:00:50 <dh3> hi 11:00:53 <peschk_l> o/ 11:00:56 <belmoreira> o/ 11:01:11 <priteau> \o 11:01:13 <oneswig> Hi all 11:02:12 <oneswig> Two quick points to cover first. 11:02:20 <oneswig> #topic OpenStack user survey 11:02:35 <oneswig> #link add your scientific openstack cloud here and be counted https://www.openstack.org/user-survey/survey-2019/landing 11:02:52 <oneswig> It's true, we've yet to do ours... 11:03:41 <oneswig> #topic Open Infra days Rome / Milan 11:03:51 <oneswig> Two infra days in one 11:04:09 <oneswig> #link CFP closes today (not sure what time) https://openinfraday.it/call-for-paper/ 11:04:56 <oneswig> what better accompaniment to the delights of Italy than the delights of open infrastructure :-) 11:05:30 <oneswig> Ok, let's move to today's main event 11:05:34 <oneswig> #topic Monitoring for Chargeback and Accounting 11:06:02 <oneswig> priteau: thanks for doing this investigation and coming along to talk about it. 11:06:18 <priteau> Happy to share :) 11:06:22 <oneswig> #link CloudKitty+Monasca for accounting and chargeback https://www.stackhpc.com/cloudkitty-and-monasca-1.html 11:06:41 <oneswig> ^^ That's Pierre's first findings 11:07:22 <witek> nice summary 11:07:46 <oneswig> Hi witek, thanks for joining 11:07:47 <peschk_l> I've read it this morning, we'd be glad to help with this :) 11:08:00 <oneswig> excellent 11:08:45 <priteau> Nice to see that you joined peschk_l. I am still fairly new to CloudKitty so hopefully I didn't write anything completely absurd. 11:08:57 <witek> priteau: have you tried editing `monasca_field_definitions.yaml` to include additional metadata? 11:09:20 <oneswig> We've been looking for integrated options for accounting for usage and this is the first contender investigated 11:09:51 <priteau> witek: I did, but didn't manage yet to get the flavor info in there. There should be a way to do it though, because I see actual values for `created_at`, `launched_at`, `host`… 11:10:48 <priteau> I also discovered in the process that Monasca has a restriction of 16 metadata values per metric 11:11:49 <verdurin> Impressive work. I noted the caveat about account not being taken of instance flavour for Ceilosca. That's a shame. 11:12:46 <priteau> verdurin: It should be possible to pass flavor data by changing the configuration, just haven't found the perfect incantation yet. 11:13:27 <priteau> I will update the post once I get it to work. 11:13:42 <verdurin> Great, thanks. 11:13:45 <oneswig> I think the most interesting piece is the final section and the different ways that usage data can be collected 11:13:54 <witek> priteau: I assume you've seen that example already: https://opendev.org/openstack/monasca-ceilometer/src/branch/master/etc/ceilometer/examples/monasca_field_definitions.yaml.full-example 11:13:57 <priteau> Gnocchi stores this metadata by default, so it would be good to have feature parity when using Monasca 11:14:38 <belmoreira> priteau: nice post. Instance flavor would be a great addition. Did you look into volume types? 11:15:12 <priteau> witek: Yes, I tried importing monasca_field_definitions.yaml from master, and also adapting meter definitions in Ceilometer. I am sure it must be something minor missing, just need some more debugging to figure it out. 11:16:12 <priteau> belmoreira: Not yet looked at charging for volumes, and the cloud I am using doesn't actually run Cinder (it's a bare-metal cloud). But you can check peschk_l's post about it: https://www.objectif-libre.com/en/blog/2018/03/14/integration-monasca-et-cloudkitty/ 11:16:50 <oneswig> The model of polling for usage is subject to rounding errors and coarse granularity. Can you cover the options for using notification events? 11:17:25 <dh3> presumably when flavor type is in there, Cloudkitty can charge differently for different flavors? I'm thinking dedicated vs overcommitted CPUs 11:17:41 <peschk_l> note that the configuration format of cloudkitty has changed since this post, the reference is available here: https://docs.openstack.org/cloudkitty/latest/admin/configuration/collector.html#metric-collection 11:18:05 <peschk_l> dh3: yes, that's a pretty standard usecase 11:18:19 <priteau> dh3: Yes, that's the idea. Check the official CloudKitty documentation for an example: https://docs.openstack.org/cloudkitty/latest/user/rating/hashmap.html#examples 11:19:30 <dh3> OK, I see, is there a way to scale based on flavor name (e.g. "all m1.* flavors cost 8 times what o1.* flavors cost") or are the flavor costs all set explicitly? 11:19:42 <priteau> oneswig: Ceilometer is the main option for capturing and publishing notification events, as it already supports many OpenStack services. However, it looks like we may have other options in the future, such as Monasca capturing events directly: http://specs.openstack.org/openstack/monasca-specs/specs/stein/approved/monasca-events-listener.html 11:20:37 <witek> yes, the the pipeline for persisting notifications is missing the collecting agent only 11:20:48 <noggin143> FYI, there is also an initiative going on in the public cloud working group looking at alternatives to ceilometer - https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal 11:20:51 <oneswig> priteau: thanks, I was wondering about that. 11:20:58 <peschk_l> dh3: you need to explicitly set the price of each flavor. But globing would be a nice addition 11:21:06 <priteau> dh3: I don't think you can do this with the hashmap rating module, but there is also a pyscript one which should allow complete customisation 11:21:27 <dh3> peschk_l priteau thanks! 11:21:52 <oneswig> noggin143: thanks - it would be good to connect to that effort. I added Pierre's blog to their etherpad 11:22:21 <witek> they're meeting Thursdays, bi-weekly 11:22:38 <priteau> dh3: But presumably you want to charge differently for each m1.* flavor, so you would have to enter multiple costs explicitly anyway 11:22:48 <dh3> yes 11:23:04 <peschk_l> I beliieve they've a meeting tomorrow at 14h UTC 11:23:13 <priteau> noggin143: thanks for the reminder about the public cloud SIG effort, I will try to join their meetings 11:23:50 <oneswig> do you know if there has been further discussion beyond the content of the etherpad? 11:24:21 <noggin143> there was a discussion at the Denver summit if I remember rightly. 11:24:32 <peschk_l> (dh3 CK has a meeting friday at 15h UTC, if you want to ask usquestions) 11:24:35 <priteau> #link http://eavesdrop.openstack.org/#Public_Cloud_SIG 11:25:21 <oneswig> unfortunately eavesdrop does not collect meeting logs for their channel 11:25:52 <noggin143> there was also a good presentation at Denver comparing the different agents - https://www.openstack.org/videos/summits/denver-2019/monasca-openstack-monitoring-1 11:26:21 <witek> http://eavesdrop.openstack.org/irclogs/%23openstack-publiccloud/ 11:26:27 <oneswig> I recognise that guy :-) 11:26:32 <witek> :) 11:26:50 <priteau> Log of their last meeting: http://eavesdrop.openstack.org/irclogs/%23openstack-publiccloud/%23openstack-publiccloud.2019-07-18.log.html#t2019-07-18T14:00:31 11:27:27 <witek> there have been several discussion following the Denver PTG/Summit 11:28:59 <oneswig> priteau: there was also the option of having the Monasca Agent draw data from prometheus openstack exporters, what's the latest there? 11:30:09 <priteau> oneswig: I got this alternative solution to work using a recent version of prometheus-openstack-exporter, which gathers server status. I extended the exporter to report the flavor ID and was able to charge based on this information. 11:31:55 <priteau> I need to figure out what limitations there are compared to using Ceilometer and notifications, will report in another blog post soon. 11:32:25 <oneswig> Thanks priteau, I think that would be useful information. 11:32:32 <oneswig> noggin143: does CERN collect usage data using APEL? 11:32:41 <peschk_l> one thing with the prometheus openstack exporters is that they do not provide many chageable metrics 11:33:29 <peschk_l> they're very convenient for monitoring, but do not provide enough information for rating 11:33:52 <noggin143> CERN exports usage data to APEL using a combination of data sources and cASO 11:33:56 <noggin143> #link https://caso.readthedocs.io/en/stable/ 11:34:16 <priteau> peschk_l: https://github.com/openstack-exporter/openstack-exporter now has metrics such as `openstack_nova_server_status`, which I've used for rating with CloudKitty. 11:34:47 <peschk_l> prtieau: good to know, thanks 11:34:49 <noggin143> we tried ceilometer for nearly 2 years but there were many issues so belmoreira wrote a sensor to pull the data from libvirt 11:36:08 <noggin143> INFN are also interested in this area (Christina Duma) 11:36:39 <oneswig> noggin143: do they use a similar setup to CERN? 11:37:24 <priteau> noggin143: did you have issues with Ceilometer extracting the data or its metrics storage backend? (originally MongoDB, then Gnocchi) 11:39:09 <belmoreira> priteau we only used MongoDB (Gnocchi was still not available). Extracting data as nearly impossible 11:40:16 <belmoreira> currently we only keep ceilometer for the notifications 11:40:21 <priteau> I had the same experience on Chameleon. We had to purge the MongoDB database regularly if we wanted queries to return a result. 11:40:28 <noggin143> For INFN, they did not have a good solution and were looking for advice 11:41:10 <priteau> oneswig: I just found that the publiccloud meeting logs are actually at http://eavesdrop.openstack.org/meetings/publiccloud_wg/ 11:42:17 <priteau> They still use publiccloud_wg instead of publiccloud_sig for their meeting name. 11:42:26 <noggin143> ceilometer at scale also created very high keystone load 11:42:29 <oneswig> ah ok, thanks priteau 11:43:57 <oneswig> noggin143: I think we've been using the caso collector, or something similar to it, for parts of the IRIS project. 11:45:50 <oneswig> Did anyone have more to add on this, or suggestions for other areas to explore? 11:46:55 <priteau> Something nice about CloudKitty is that it's very easy to extend. I wrote a proof of concept collector that uses the Nova os-simple-tenant-usage API to charge for historical Nova usage. 11:48:18 <peschk_l> Well, we'd love some feedback about people using CK with monasca, as there are only few existing integrations. priteau 's article was nice for that 11:49:11 <peschk_l> priteau: is the code available somewhere ? 11:49:50 <priteau> peschk_l: I'll clean it up a bit and share it 11:50:11 <witek> one thing I'd like to move forward in future is to instrument the OpenStack services itself 11:51:02 <witek> so that they can provide valuable measurements about their state (and usage) without the need of external black-box monitoring 11:51:40 <oneswig> witek: sounds like really useful insights. 11:51:59 <oneswig> I am sure there are a lot of "golden signals" that are buried deep within these services. 11:52:27 <priteau> This would be great for monitoring large OpenStack deployments. You can monitor things externally, like API response time, but having internal insight would be even better. 11:52:49 <dh3> witek: that sounds good, we do some ad-hoc monitoring of e.g. rabbitmq queue sizes, would be nice to have that instrumented "out of the box" 11:54:19 <oneswig> We are nearing time to close. Final points for discussion? 11:55:13 <oneswig> #topic AOB 11:56:17 <dh3> just a quick call for anyone with experience of Spark/Hail talking to Ceph rgw/S3 - got a user getting unexpected Content-Length errors which I can't reproduce - would like to talk to someone else who has done this 11:57:30 <oneswig> Might be one for verdurin? 11:58:04 <verdurin> dh3: we have some interest here, but no-one running it yet 11:58:29 <dh3> verdurin: ah OK, thanks anyway 11:58:30 <oneswig> dh3: how has that rolling ceph upgrade gone? 11:58:40 <dh3> oneswig: complete :) 11:58:51 <oneswig> I'm watching one deep-scrub before moving to Nautilus, hopefully this afternoon 11:58:57 <dh3> (now we are battling large omap objects and orphaned bits and pieces...........) 11:59:36 <oneswig> dh3 - that's a story for another day, surely 11:59:49 <oneswig> Are you going to CERN Ceph day in September? 11:59:58 <dh3> no but my colleague Matthew is 12:00:09 <dh3> he will probably have war stories to share :) 12:00:14 <oneswig> Great - hopefully I'll see him there. 12:00:21 <oneswig> OK we are out of time 12:00:29 <verdurin> Thanks, bye. 12:00:31 <oneswig> thanks priteau and everyone for joining 12:00:36 <oneswig> #endmeeting