15:01:33 #startmeeting monasca 15:01:33 Meeting started Wed Feb 20 15:01:33 2019 UTC and is due to finish in 60 minutes. The chair is witek. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:36 The meeting name has been set to 'monasca' 15:01:45 hello everyone 15:01:48 hello 15:01:54 hi koji_n 15:02:37 Hi 15:02:38 hi eveyone 15:02:50 hi Dobroslaw and mohankumar 15:03:12 Courtesy Monasca meeting reminder in #openstack-monasca: witek, jayahn,iurygregory,ezpz,igorn,haad,sc,joadavis, akiraY,tobiajo,dougsz_,fouadben, amofakhar, aagate, haruki,kaiokmo,pandiyan,charana,guilhermesp,chaconpiza,toabctl 15:03:24 hi all 15:03:26 here I'm 15:03:34 agenda: 15:03:38 https://etherpad.openstack.org/p/monasca-team-meeting-agenda 15:03:46 #topic reviews 15:03:58 I have just one new one: 15:04:03 https://review.openstack.org/636337 15:04:50 more tests are appreciated. will have to read that one 15:05:47 I have also updated the alembic DevStack change to install missing jira dependency for monasca-notification 15:05:53 https://review.openstack.org/622361 15:06:40 some other reviews to pull attention to? 15:07:23 There was this one: https://review.openstack.org/#/c/637190/ 15:08:01 oh yes, thanks dougsz 15:08:23 dougsz: thx for comments 15:08:35 michal is still thinking about it 15:09:19 he said that api is gladly taking metrics with old timestamp but persister is not so happy about them 15:09:29 he will look into it more 15:09:36 Sounds good, great to have another contributor. 15:09:52 thanks Dobroslaw 15:10:06 yep, hi is learning quite fast 15:10:16 *he* 15:10:33 persister will drop messages based on retention period 15:12:05 shouldn't they be dropped already in the API? 15:13:26 witek : qq , persister have rentention policy ? or you are referring InfluxDB retention DB retention policies ? 15:13:37 we don't control retention policy in the API 15:13:54 mohankumar: InfluxDB retention period 15:14:11 witek : okay 15:14:26 ok 15:14:52 Dobroslaw: we could do that, but that's additional logic which have to be done on every message 15:15:12 hmmm, yea 15:16:40 let's discuss it further in review 15:16:45 ok 15:17:01 can we move on to the next topic? 15:17:28 #topic Reliance Jio deployment 15:17:41 Hi all 15:17:45 hi mayankkapoor 15:17:45 Sorry it's been a while 15:17:49 Wanted to give a quick update 15:18:03 I've mentioned status of deployment in meeting agenda 15:18:32 thanks, great to hear that 15:18:45 Deployed across 352 bare-metals at the moment (single openstack cluster), working fine, few issues we're working through when they come 15:19:16 Any specific items I should talk about? 15:19:49 Are there any of the monasca metrics you are finding particularly useful? 15:20:01 you've deployed with Docker Swarm, would it be possible to share configuration? 15:20:47 @joadavis: We've started with CPU and RAM mainly. We have built a custom console for our users, and we're showing these in that UI. 15:20:52 weizj proposed openstack/python-monascaclient master: Update hacking version https://review.openstack.org/627725 15:21:50 @witek: Sure, yes sure no problem. How do you recommend we share the config? github.com? 15:22:03 weizj proposed openstack/python-monascaclient master: Update hacking version https://review.openstack.org/627725 15:22:14 yes, github would be great 15:22:28 Ok let me upload to github.com and send the link here 15:22:37 great, thanks 15:22:38 mayankkapoor: any monasca components working slower/unexpectly than others? 15:22:56 have you managed to improve persister performance 15:22:57 ? 15:23:51 @Dobroslaw: We had some issues with Timeouts on InfluxDB, but that was mainly due to bombarding InfluxDB with large batch sizes (50k) and lots of persisters (5). 15:24:25 Now we're using 5k batch size and 10 persisters. mohankumar works with me and can confirm latest config 15:24:31 weizj proposed openstack/python-monascaclient master: Update hacking version https://review.openstack.org/627725 15:25:02 @witek: Things are working fine with persisters and InfluxDB now with 5k batch and 10 persisters 15:25:16 Still early days through 15:25:39 mayankkapoor: How many metrics a second do you ingest? 15:25:54 We've also built our own monitoring scripts for errors/warnings we got, which we'll share in github.com repo 15:26:42 witek: hi , I m concerned about DB writing speed . We use API post to write into DB , Cluster DB would help to scale DB and perforamnce 15:27:01 @dougsz: Think I will need some help how to calculate the exact number. So 352 bare-metals with monasca-agent sending libvirt plugin data every 30 seconds. Roughly 5 VMs on each bare-metal. 15:27:21 If I had more expertise, I'd love to share the Cassandra setup with the docker based installs. :/ 15:28:54 we use Cassandra in a nicely clustered setup, but we still install using our own methods 15:29:17 mohankumar: InfluxDB can ingest >1.000.000 measurements/s, the bottleneck is the persister 15:29:31 in particular the old Kafka consumer 15:29:55 I hope I can provide the new implementation soon 15:30:00 witek : im getting InfluxDB timeout error if i add more batch size 15:30:10 @dougsz: We can share some data from our Kafka-admin page to give you some idea about TPS 15:30:23 ^^^ increase batch size 15:31:19 thanks mayankkapoor, i'm always interested to hear about performance at scale 15:31:46 Main thing we did for this setup was use GlusterFS for HA across three docker worker VMs 15:31:53 This was a huge risk 15:32:12 witek : just to add on Persister giving InfluxDB timeout error if i increasing batch size 15:32:17 However, we reasoned that we're not running active-active containers, so might be ok 15:33:29 So when a stateful container dies, it respawns on another node and it has access to the same data it had previously 15:34:31 We tested each component, MySQL, InfluxDB and monasca containers individually for HA with GlusterFS. Then proceeding for prod deployment. 15:34:39 so you add HA to InfluxDB that way 15:34:43 Yup 15:34:50 do you know what is performance impact? 15:35:18 coool 15:35:27 Hmm no we don't yet, haven't gotten around to testing a setup without GlusterFS and comparing 15:36:00 mayankkapoor: Are you using rdma transport for Gluster share? 15:36:43 @dougsz: Hmm need to check. We're using a GlusterFS replicated volume with 3 replicas, and we haven't changed any of the defaults. 15:37:38 Cool, there is also nufa which is quite neat - if your gluster storage is local to the docker workers it can right directly to a local drive 15:38:00 This might be a useful reference, we use it for HPC activities: https://github.com/stackhpc/ansible-role-gluster-cluster 15:38:38 Based on reading the GlusterFS docs, RDMA transport needs to be enabled. So no, we haven't enabled RDMA yet. 15:39:12 Yeah our gluster storage is local to the worker VMs 15:39:22 Hmm wait, we're using Ceph 15:39:27 So not local 15:39:31 Used Ceph RBD 15:39:44 Ah ok 15:40:49 So we'll share our deployment doc and config on github.com for review 15:41:49 you mentioned problems with network metrics, do you mean standard system plugin metrics? 15:43:37 @witek: We've disabled following in libvirt plugin: ping_check=false vm_ping_check_enable=false vm_network_check_enable=false 15:44:19 i see, thanks 15:44:27 If we enable vm_network_check_enable, it loads our OpenContrail controllers too much. We tested load balancing on our OpenContrail controllers, and that worked fine 15:45:07 Current hypothesis that we need to test is that monasca-agent is getting some unexpected response from the SDN controllers, and keeps querying the controllers rapidly 15:45:20 Haven't gotten around to checking this at the moment. 15:45:48 Rather than every 30 sec, it querying faster 15:46:07 please report a bug if you can confirm that 15:46:19 sure 15:47:37 what OpenStack distribution do you use? 15:49:12 Monasca deployment is for Ocata cloud. However, We have 7 production clouds with between 100-500 bare-metals each, and staging environments for each. Various versions of openstack. Oldest are on Liberty, latest on Pike. 15:49:16 witek : Ocata 15:49:54 which OS? 15:50:13 Ubuntu1604 15:50:42 meaning, Ubuntu agent packages would be handy 15:51:32 Yes. However, we use the Mirantis distribution of openstack (they provide L4 support for us). So getting these bundled in the OS is a bit challenging for us. 15:53:09 thanks for the update 15:53:21 You're welcome, and thanks for the great work on Monasca 15:53:23 that great to hear your feedback 15:53:46 We'll try to figure out how we can contribute further. 15:53:57 question : does the current monasca help me to get vm disk usage , if I'm using ceph storage. I can see baremetal (cmp nodes) disk usage , not from VM . If I enable monasca ceph plugin as per document https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md#ceph . I'll get only ceph cluster metrics not from Individual VM . Is there any way in current monasca i can get from each vm with ceph storage? 15:55:15 I hope this question based on the agenda line what we ve :) 15:55:31 I'm afraid I cannot answer that right now 15:56:04 I'll check 15:56:52 witek : sure thanks . 15:57:25 good support for Ceph is important, so if there's anything missing, we should think about closing the gaps 15:57:56 We've got update for Luminous support in the pipeline 15:58:18 The existing plugin can't handle the newer releases 15:58:40 thanks again for the update 15:58:43 I have to finish the meeting soon 15:58:51 #topic backlog 15:58:55 short update 15:58:57 yep, thanks mohankumar 15:59:06 I have added two stories to the backlog 15:59:44 especially running Python3 unit tests for monasca-agent 16:00:01 we still don't run them 16:00:14 all from me 16:00:19 thanks for joining 16:00:23 see you next time 16:00:24 one info 16:00:24 API docker image is now pushed from zuul on master, need to wait for tagging to see if it will be pushed with proper tag and then I will replace first image on github.com/monasca/monasca-docker with this one 16:00:26 #endmeeting