11:00:22 <oneswig> #startmeeting scientific-sig
11:00:23 <openstack> Meeting started Wed Jul 31 11:00:22 2019 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:26 <openstack> The meeting name has been set to 'scientific_sig'
11:00:34 <oneswig> Greetings
11:00:47 <verdurin> Afternoon.
11:00:48 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_July_31st_2019
11:00:50 <dh3> hi
11:00:53 <peschk_l> o/
11:00:56 <belmoreira> o/
11:01:11 <priteau> \o
11:01:13 <oneswig> Hi all
11:02:12 <oneswig> Two quick points to cover first.
11:02:20 <oneswig> #topic OpenStack user survey
11:02:35 <oneswig> #link add your scientific openstack cloud here and be counted https://www.openstack.org/user-survey/survey-2019/landing
11:02:52 <oneswig> It's true, we've yet to do ours...
11:03:41 <oneswig> #topic Open Infra days Rome / Milan
11:03:51 <oneswig> Two infra days in one
11:04:09 <oneswig> #link CFP closes today (not sure what time) https://openinfraday.it/call-for-paper/
11:04:56 <oneswig> what better accompaniment to the delights of Italy than the delights of open infrastructure :-)
11:05:30 <oneswig> Ok, let's move to today's main event
11:05:34 <oneswig> #topic Monitoring for Chargeback and Accounting
11:06:02 <oneswig> priteau: thanks for doing this investigation and coming along to talk about it.
11:06:18 <priteau> Happy to share :)
11:06:22 <oneswig> #link CloudKitty+Monasca for accounting and chargeback https://www.stackhpc.com/cloudkitty-and-monasca-1.html
11:06:41 <oneswig> ^^ That's Pierre's first findings
11:07:22 <witek> nice summary
11:07:46 <oneswig> Hi witek, thanks for joining
11:07:47 <peschk_l> I've read it this morning, we'd be glad to help with this :)
11:08:00 <oneswig> excellent
11:08:45 <priteau> Nice to see that you joined peschk_l. I am still fairly new to CloudKitty so hopefully I didn't write anything completely absurd.
11:08:57 <witek> priteau: have you tried editing `monasca_field_definitions.yaml` to include additional metadata?
11:09:20 <oneswig> We've been looking for integrated options for accounting for usage and this is the first contender investigated
11:09:51 <priteau> witek: I did, but didn't manage yet to get the flavor info in there. There should be a way to do it though, because I see actual values for `created_at`, `launched_at`, `host`…
11:10:48 <priteau> I also discovered in the process that Monasca has a restriction of 16 metadata values per metric
11:11:49 <verdurin> Impressive work. I noted the caveat about account not being taken of instance flavour for Ceilosca. That's a shame.
11:12:46 <priteau> verdurin: It should be possible to pass flavor data by changing the configuration, just haven't found the perfect incantation yet.
11:13:27 <priteau> I will update the post once I get it to work.
11:13:42 <verdurin> Great, thanks.
11:13:45 <oneswig> I think the most interesting piece is the final section and the different ways that usage data can be collected
11:13:54 <witek> priteau: I assume you've seen that example already: https://opendev.org/openstack/monasca-ceilometer/src/branch/master/etc/ceilometer/examples/monasca_field_definitions.yaml.full-example
11:13:57 <priteau> Gnocchi stores this metadata by default, so it would be good to have feature parity when using Monasca
11:14:38 <belmoreira> priteau: nice post. Instance flavor would be a great addition. Did you look into volume types?
11:15:12 <priteau> witek: Yes, I tried importing monasca_field_definitions.yaml from master, and also adapting meter definitions in Ceilometer. I am sure it must be something minor missing, just need some more debugging to figure it out.
11:16:12 <priteau> belmoreira: Not yet looked at charging for volumes, and the cloud I am using doesn't actually run Cinder (it's a bare-metal cloud). But you can check peschk_l's post about it: https://www.objectif-libre.com/en/blog/2018/03/14/integration-monasca-et-cloudkitty/
11:16:50 <oneswig> The model of polling for usage is subject to rounding errors and coarse granularity.  Can you cover the options for using notification events?
11:17:25 <dh3> presumably when flavor type is in there, Cloudkitty can charge differently for different flavors? I'm thinking dedicated vs overcommitted CPUs
11:17:41 <peschk_l> note that the configuration format of cloudkitty has changed since this post, the reference is available here: https://docs.openstack.org/cloudkitty/latest/admin/configuration/collector.html#metric-collection
11:18:05 <peschk_l> dh3: yes, that's a pretty standard usecase
11:18:19 <priteau> dh3: Yes, that's the idea. Check the official CloudKitty documentation for an example: https://docs.openstack.org/cloudkitty/latest/user/rating/hashmap.html#examples
11:19:30 <dh3> OK, I see, is there a way to scale based on flavor name (e.g. "all m1.* flavors cost 8 times what o1.* flavors cost") or are the flavor costs all set explicitly?
11:19:42 <priteau> oneswig: Ceilometer is the main option for capturing and publishing notification events, as it already supports many OpenStack services. However, it looks like we may have other options in the future, such as Monasca capturing events directly: http://specs.openstack.org/openstack/monasca-specs/specs/stein/approved/monasca-events-listener.html
11:20:37 <witek> yes, the the pipeline for persisting notifications is missing the collecting agent only
11:20:48 <noggin143> FYI, there is also an initiative going on in the public cloud working group looking at alternatives to ceilometer - https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal
11:20:51 <oneswig> priteau: thanks, I was wondering about that.
11:20:58 <peschk_l> dh3: you need to explicitly set the price of each flavor. But globing would be a nice addition
11:21:06 <priteau> dh3: I don't think you can do this with the hashmap rating module, but there is also a pyscript one which should allow complete customisation
11:21:27 <dh3> peschk_l priteau thanks!
11:21:52 <oneswig> noggin143: thanks - it would be good to connect to that effort.  I added Pierre's blog to their etherpad
11:22:21 <witek> they're meeting Thursdays, bi-weekly
11:22:38 <priteau> dh3: But presumably you want to charge differently for each m1.* flavor, so you would have to enter multiple costs explicitly anyway
11:22:48 <dh3> yes
11:23:04 <peschk_l> I beliieve they've a meeting tomorrow at 14h UTC
11:23:13 <priteau> noggin143: thanks for the reminder about the public cloud SIG effort, I will try to join their meetings
11:23:50 <oneswig> do you know if there has been further discussion beyond the content of the etherpad?
11:24:21 <noggin143> there was a discussion at the Denver summit if I remember rightly.
11:24:32 <peschk_l> (dh3 CK has a meeting friday at 15h UTC, if you want to ask usquestions)
11:24:35 <priteau> #link http://eavesdrop.openstack.org/#Public_Cloud_SIG
11:25:21 <oneswig> unfortunately eavesdrop does not collect meeting logs for their channel
11:25:52 <noggin143> there was also a good presentation at Denver comparing the different agents - https://www.openstack.org/videos/summits/denver-2019/monasca-openstack-monitoring-1
11:26:21 <witek> http://eavesdrop.openstack.org/irclogs/%23openstack-publiccloud/
11:26:27 <oneswig> I recognise that guy :-)
11:26:32 <witek> :)
11:26:50 <priteau> Log of their last meeting: http://eavesdrop.openstack.org/irclogs/%23openstack-publiccloud/%23openstack-publiccloud.2019-07-18.log.html#t2019-07-18T14:00:31
11:27:27 <witek> there have been several discussion following the Denver PTG/Summit
11:28:59 <oneswig> priteau: there was also the option of having the Monasca Agent draw data from prometheus openstack exporters, what's the latest there?
11:30:09 <priteau> oneswig: I got this alternative solution to work using a recent version of prometheus-openstack-exporter, which gathers server status. I extended the exporter to report the flavor ID and was able to charge based on this information.
11:31:55 <priteau> I need to figure out what limitations there are compared to using Ceilometer and notifications, will report in another blog post soon.
11:32:25 <oneswig> Thanks priteau, I think that would be useful information.
11:32:32 <oneswig> noggin143: does CERN collect usage data using APEL?
11:32:41 <peschk_l> one thing with the prometheus openstack exporters is that they do not provide many chageable metrics
11:33:29 <peschk_l> they're very convenient for monitoring, but do not provide enough information for rating
11:33:52 <noggin143> CERN exports usage data to APEL using a combination of data sources and cASO
11:33:56 <noggin143> #link https://caso.readthedocs.io/en/stable/
11:34:16 <priteau> peschk_l: https://github.com/openstack-exporter/openstack-exporter now has metrics such as `openstack_nova_server_status`, which I've used for rating with CloudKitty.
11:34:47 <peschk_l> prtieau: good to know, thanks
11:34:49 <noggin143> we tried ceilometer for nearly 2 years but there were many issues so belmoreira wrote a sensor to pull the data from libvirt
11:36:08 <noggin143> INFN are also interested in this area (Christina Duma)
11:36:39 <oneswig> noggin143: do they use a similar setup to CERN?
11:37:24 <priteau> noggin143: did you have issues with Ceilometer extracting the data or its metrics storage backend? (originally MongoDB, then Gnocchi)
11:39:09 <belmoreira> priteau we only used MongoDB (Gnocchi was still not available). Extracting data as nearly impossible
11:40:16 <belmoreira> currently we only keep ceilometer for the notifications
11:40:21 <priteau> I had the same experience on Chameleon. We had to purge the MongoDB database regularly if we wanted queries to return a result.
11:40:28 <noggin143> For INFN, they did not have a good solution and were looking for advice
11:41:10 <priteau> oneswig: I just found that the publiccloud meeting logs are actually at http://eavesdrop.openstack.org/meetings/publiccloud_wg/
11:42:17 <priteau> They still use publiccloud_wg instead of publiccloud_sig for their meeting name.
11:42:26 <noggin143> ceilometer at scale also created very high keystone load
11:42:29 <oneswig> ah ok, thanks priteau
11:43:57 <oneswig> noggin143: I think we've been using the caso collector, or something similar to it, for parts of the IRIS project.
11:45:50 <oneswig> Did anyone have more to add on this, or suggestions for other areas to explore?
11:46:55 <priteau> Something nice about CloudKitty is that it's very easy to extend. I wrote a proof of concept collector that uses the Nova os-simple-tenant-usage API to charge for historical Nova usage.
11:48:18 <peschk_l> Well, we'd love some feedback about people using CK with monasca, as there are only few existing integrations. priteau 's article was nice for that
11:49:11 <peschk_l> priteau: is the code available somewhere ?
11:49:50 <priteau> peschk_l: I'll clean it up a bit and share it
11:50:11 <witek> one thing I'd like to move forward in future is to instrument the OpenStack services itself
11:51:02 <witek> so that they can provide valuable measurements about their state (and usage) without the need of external black-box monitoring
11:51:40 <oneswig> witek: sounds like really useful insights.
11:51:59 <oneswig> I am sure there are a lot of "golden signals" that are buried deep within these services.
11:52:27 <priteau> This would be great for monitoring large OpenStack deployments. You can monitor things externally, like API response time, but having internal insight would be even better.
11:52:49 <dh3> witek: that sounds good, we do some ad-hoc monitoring of e.g. rabbitmq queue sizes, would be nice to have that instrumented "out of the box"
11:54:19 <oneswig> We are nearing time to close.  Final points for discussion?
11:55:13 <oneswig> #topic AOB
11:56:17 <dh3> just a quick call for anyone with experience of Spark/Hail talking to Ceph rgw/S3 - got a user getting unexpected Content-Length errors which I can't reproduce - would like to talk to someone else who has done this
11:57:30 <oneswig> Might be one for verdurin?
11:58:04 <verdurin> dh3: we have some interest here, but no-one running it yet
11:58:29 <dh3> verdurin: ah OK, thanks anyway
11:58:30 <oneswig> dh3: how has that rolling ceph upgrade gone?
11:58:40 <dh3> oneswig: complete :)
11:58:51 <oneswig> I'm watching one deep-scrub before moving to Nautilus, hopefully this afternoon
11:58:57 <dh3> (now we are battling large omap objects and orphaned bits and pieces...........)
11:59:36 <oneswig> dh3 - that's a story for another day, surely
11:59:49 <oneswig> Are you going to CERN Ceph day in September?
11:59:58 <dh3> no but my colleague Matthew is
12:00:09 <dh3> he will probably have war stories to share :)
12:00:14 <oneswig> Great - hopefully I'll see him there.
12:00:21 <oneswig> OK we are out of time
12:00:29 <verdurin> Thanks, bye.
12:00:31 <oneswig> thanks priteau and everyone for joining
12:00:36 <oneswig> #endmeeting