15:01:33 <witek> #startmeeting monasca 15:01:33 <openstack> Meeting started Wed Feb 20 15:01:33 2019 UTC and is due to finish in 60 minutes. The chair is witek. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:36 <openstack> The meeting name has been set to 'monasca' 15:01:45 <witek> hello everyone 15:01:48 <koji_n> hello 15:01:54 <witek> hi koji_n 15:02:37 <Dobroslaw> Hi 15:02:38 <mohankumar> hi eveyone 15:02:50 <witek> hi Dobroslaw and mohankumar 15:03:12 <witek> Courtesy Monasca meeting reminder in #openstack-monasca: witek, jayahn,iurygregory,ezpz,igorn,haad,sc,joadavis, akiraY,tobiajo,dougsz_,fouadben, amofakhar, aagate, haruki,kaiokmo,pandiyan,charana,guilhermesp,chaconpiza,toabctl 15:03:24 <dougsz> hi all 15:03:26 <sc> here I'm 15:03:34 <witek> agenda: 15:03:38 <witek> https://etherpad.openstack.org/p/monasca-team-meeting-agenda 15:03:46 <witek> #topic reviews 15:03:58 <witek> I have just one new one: 15:04:03 <witek> https://review.openstack.org/636337 15:04:50 <joadavis> more tests are appreciated. will have to read that one 15:05:47 <witek> I have also updated the alembic DevStack change to install missing jira dependency for monasca-notification 15:05:53 <witek> https://review.openstack.org/622361 15:06:40 <witek> some other reviews to pull attention to? 15:07:23 <dougsz> There was this one: https://review.openstack.org/#/c/637190/ 15:08:01 <witek> oh yes, thanks dougsz 15:08:23 <Dobroslaw> dougsz: thx for comments 15:08:35 <Dobroslaw> michal is still thinking about it 15:09:19 <Dobroslaw> he said that api is gladly taking metrics with old timestamp but persister is not so happy about them 15:09:29 <Dobroslaw> he will look into it more 15:09:36 <dougsz> Sounds good, great to have another contributor. 15:09:52 <dougsz> thanks Dobroslaw 15:10:06 <Dobroslaw> yep, hi is learning quite fast 15:10:16 <Dobroslaw> *he* 15:10:33 <witek> persister will drop messages based on retention period 15:12:05 <Dobroslaw> shouldn't they be dropped already in the API? 15:13:26 <mohankumar> witek : qq , persister have rentention policy ? or you are referring InfluxDB retention DB retention policies ? 15:13:37 <witek> we don't control retention policy in the API 15:13:54 <witek> mohankumar: InfluxDB retention period 15:14:11 <mohankumar> witek : okay 15:14:26 <Dobroslaw> ok 15:14:52 <witek> Dobroslaw: we could do that, but that's additional logic which have to be done on every message 15:15:12 <Dobroslaw> hmmm, yea 15:16:40 <witek> let's discuss it further in review 15:16:45 <Dobroslaw> ok 15:17:01 <witek> can we move on to the next topic? 15:17:28 <witek> #topic Reliance Jio deployment 15:17:41 <mayankkapoor> Hi all 15:17:45 <witek> hi mayankkapoor 15:17:45 <mayankkapoor> Sorry it's been a while 15:17:49 <mayankkapoor> Wanted to give a quick update 15:18:03 <mayankkapoor> I've mentioned status of deployment in meeting agenda 15:18:32 <witek> thanks, great to hear that 15:18:45 <mayankkapoor> Deployed across 352 bare-metals at the moment (single openstack cluster), working fine, few issues we're working through when they come 15:19:16 <mayankkapoor> Any specific items I should talk about? 15:19:49 <joadavis> Are there any of the monasca metrics you are finding particularly useful? 15:20:01 <witek> you've deployed with Docker Swarm, would it be possible to share configuration? 15:20:47 <mayankkapoor> @joadavis: We've started with CPU and RAM mainly. We have built a custom console for our users, and we're showing these in that UI. 15:20:52 <openstackgerrit> weizj proposed openstack/python-monascaclient master: Update hacking version https://review.openstack.org/627725 15:21:50 <mayankkapoor> @witek: Sure, yes sure no problem. How do you recommend we share the config? github.com? 15:22:03 <openstackgerrit> weizj proposed openstack/python-monascaclient master: Update hacking version https://review.openstack.org/627725 15:22:14 <witek> yes, github would be great 15:22:28 <mayankkapoor> Ok let me upload to github.com and send the link here 15:22:37 <witek> great, thanks 15:22:38 <Dobroslaw> mayankkapoor: any monasca components working slower/unexpectly than others? 15:22:56 <witek> have you managed to improve persister performance 15:22:57 <witek> ? 15:23:51 <mayankkapoor> @Dobroslaw: We had some issues with Timeouts on InfluxDB, but that was mainly due to bombarding InfluxDB with large batch sizes (50k) and lots of persisters (5). 15:24:25 <mayankkapoor> Now we're using 5k batch size and 10 persisters. mohankumar works with me and can confirm latest config 15:24:31 <openstackgerrit> weizj proposed openstack/python-monascaclient master: Update hacking version https://review.openstack.org/627725 15:25:02 <mayankkapoor> @witek: Things are working fine with persisters and InfluxDB now with 5k batch and 10 persisters 15:25:16 <mayankkapoor> Still early days through 15:25:39 <dougsz> mayankkapoor: How many metrics a second do you ingest? 15:25:54 <mayankkapoor> We've also built our own monitoring scripts for errors/warnings we got, which we'll share in github.com repo 15:26:42 <mohankumar> witek: hi , I m concerned about DB writing speed . We use API post to write into DB , Cluster DB would help to scale DB and perforamnce 15:27:01 <mayankkapoor> @dougsz: Think I will need some help how to calculate the exact number. So 352 bare-metals with monasca-agent sending libvirt plugin data every 30 seconds. Roughly 5 VMs on each bare-metal. 15:27:21 <joadavis> If I had more expertise, I'd love to share the Cassandra setup with the docker based installs. :/ 15:28:54 <joadavis> we use Cassandra in a nicely clustered setup, but we still install using our own methods 15:29:17 <witek> mohankumar: InfluxDB can ingest >1.000.000 measurements/s, the bottleneck is the persister 15:29:31 <witek> in particular the old Kafka consumer 15:29:55 <witek> I hope I can provide the new implementation soon 15:30:00 <mohankumar> witek : im getting InfluxDB timeout error if i add more batch size 15:30:10 <mayankkapoor> @dougsz: We can share some data from our Kafka-admin page to give you some idea about TPS 15:30:23 <mayankkapoor> ^^^ increase batch size 15:31:19 <dougsz> thanks mayankkapoor, i'm always interested to hear about performance at scale 15:31:46 <mayankkapoor> Main thing we did for this setup was use GlusterFS for HA across three docker worker VMs 15:31:53 <mayankkapoor> This was a huge risk 15:32:12 <mohankumar> witek : just to add on Persister giving InfluxDB timeout error if i increasing batch size 15:32:17 <mayankkapoor> However, we reasoned that we're not running active-active containers, so might be ok 15:33:29 <mayankkapoor> So when a stateful container dies, it respawns on another node and it has access to the same data it had previously 15:34:31 <mayankkapoor> We tested each component, MySQL, InfluxDB and monasca containers individually for HA with GlusterFS. Then proceeding for prod deployment. 15:34:39 <witek> so you add HA to InfluxDB that way 15:34:43 <mayankkapoor> Yup 15:34:50 <witek> do you know what is performance impact? 15:35:18 <joadavis> coool 15:35:27 <mayankkapoor> Hmm no we don't yet, haven't gotten around to testing a setup without GlusterFS and comparing 15:36:00 <dougsz> mayankkapoor: Are you using rdma transport for Gluster share? 15:36:43 <mayankkapoor> @dougsz: Hmm need to check. We're using a GlusterFS replicated volume with 3 replicas, and we haven't changed any of the defaults. 15:37:38 <dougsz> Cool, there is also nufa which is quite neat - if your gluster storage is local to the docker workers it can right directly to a local drive 15:38:00 <dougsz> This might be a useful reference, we use it for HPC activities: https://github.com/stackhpc/ansible-role-gluster-cluster 15:38:38 <mayankkapoor> Based on reading the GlusterFS docs, RDMA transport needs to be enabled. So no, we haven't enabled RDMA yet. 15:39:12 <mayankkapoor> Yeah our gluster storage is local to the worker VMs 15:39:22 <mayankkapoor> Hmm wait, we're using Ceph 15:39:27 <mayankkapoor> So not local 15:39:31 <mayankkapoor> Used Ceph RBD 15:39:44 <dougsz> Ah ok 15:40:49 <mayankkapoor> So we'll share our deployment doc and config on github.com for review 15:41:49 <witek> you mentioned problems with network metrics, do you mean standard system plugin metrics? 15:43:37 <mayankkapoor> @witek: We've disabled following in libvirt plugin: ping_check=false vm_ping_check_enable=false vm_network_check_enable=false 15:44:19 <witek> i see, thanks 15:44:27 <mayankkapoor> If we enable vm_network_check_enable, it loads our OpenContrail controllers too much. We tested load balancing on our OpenContrail controllers, and that worked fine 15:45:07 <mayankkapoor> Current hypothesis that we need to test is that monasca-agent is getting some unexpected response from the SDN controllers, and keeps querying the controllers rapidly 15:45:20 <mayankkapoor> Haven't gotten around to checking this at the moment. 15:45:48 <mayankkapoor> Rather than every 30 sec, it querying faster 15:46:07 <witek> please report a bug if you can confirm that 15:46:19 <mayankkapoor> sure 15:47:37 <witek> what OpenStack distribution do you use? 15:49:12 <mayankkapoor> Monasca deployment is for Ocata cloud. However, We have 7 production clouds with between 100-500 bare-metals each, and staging environments for each. Various versions of openstack. Oldest are on Liberty, latest on Pike. 15:49:16 <mohankumar> witek : Ocata 15:49:54 <witek> which OS? 15:50:13 <mayankkapoor> Ubuntu1604 15:50:42 <witek> meaning, Ubuntu agent packages would be handy 15:51:32 <mayankkapoor> Yes. However, we use the Mirantis distribution of openstack (they provide L4 support for us). So getting these bundled in the OS is a bit challenging for us. 15:53:09 <witek> thanks for the update 15:53:21 <mayankkapoor> You're welcome, and thanks for the great work on Monasca 15:53:23 <witek> that great to hear your feedback 15:53:46 <mayankkapoor> We'll try to figure out how we can contribute further. 15:53:57 <mohankumar> question : does the current monasca help me to get vm disk usage , if I'm using ceph storage. I can see baremetal (cmp nodes) disk usage , not from VM . If I enable monasca ceph plugin as per document https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md#ceph . I'll get only ceph cluster metrics not from Individual VM . Is there any way in current monasca i can get from each vm with ceph storage? 15:55:15 <mohankumar> I hope this question based on the agenda line what we ve :) 15:55:31 <witek> I'm afraid I cannot answer that right now 15:56:04 <witek> I'll check 15:56:52 <mohankumar> witek : sure thanks . 15:57:25 <witek> good support for Ceph is important, so if there's anything missing, we should think about closing the gaps 15:57:56 <dougsz> We've got update for Luminous support in the pipeline 15:58:18 <dougsz> The existing plugin can't handle the newer releases 15:58:40 <witek> thanks again for the update 15:58:43 <witek> I have to finish the meeting soon 15:58:51 <witek> #topic backlog 15:58:55 <witek> short update 15:58:57 <dougsz> yep, thanks mohankumar 15:59:06 <witek> I have added two stories to the backlog 15:59:44 <witek> especially running Python3 unit tests for monasca-agent 16:00:01 <witek> we still don't run them 16:00:14 <witek> all from me 16:00:19 <witek> thanks for joining 16:00:23 <witek> see you next time 16:00:24 <Dobroslaw> one info 16:00:24 <Dobroslaw> API docker image is now pushed from zuul on master, need to wait for tagging to see if it will be pushed with proper tag and then I will replace first image on github.com/monasca/monasca-docker with this one 16:00:26 <witek> #endmeeting