16:00:34 <Jeffrey4l> #startmeeting kolla 16:00:35 <openstack> Meeting started Wed Nov 1 16:00:34 2017 UTC and is due to finish in 60 minutes. The chair is Jeffrey4l. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:40 <openstack> The meeting name has been set to 'kolla' 16:00:58 <Jeffrey4l> #topic roll-call 16:01:02 <duonghq> o/ 16:01:08 <chason> o/ 16:01:23 <Jeffrey4l> hi chason ;D 16:01:41 <chason> Jeffrey4l Hahah 16:01:43 <vhosakot> o/ 16:01:47 <duonghq> hi vhosakot 16:01:50 <rfxn> \o/ 16:01:51 <coolsvap> o/ 16:01:59 <vhosakot> hi duonghq :) 16:02:37 <Jeffrey4l> let us wait another two minutes. 16:02:46 <Jeffrey4l> this will be a short meeting. 16:02:59 <duonghq> ya, I have 2 topics 16:03:08 <Jeffrey4l> duonghq, cool. 16:03:29 <Jeffrey4l> we have no scheduled topics today. 16:03:49 <duonghq> ya, I forgot add to our schedule 16:04:15 <Jeffrey4l> ok. let us start. 16:04:22 <Jeffrey4l> #topic Announcements 16:04:54 <Jeffrey4l> Sydney summit will be hold next week. 16:05:22 <spsurya__> 0/ 16:05:26 <duonghq> ohhh, time flies too fast 16:05:37 <Jeffrey4l> so next meeting will be canceled. 16:05:45 <Jeffrey4l> duonghq, yes. 16:05:57 <Jeffrey4l> any other announcement from community? 16:06:52 <Jeffrey4l> let us start the open discuss directly. 16:07:04 <Jeffrey4l> #topic open discuss 16:07:11 <Jeffrey4l> duonghq, your call. 16:07:18 <duonghq> thank you Jeffrey4l 16:07:27 <duonghq> first one is quite simple 16:07:39 <duonghq> I get this bug many time: https://bugs.launchpad.net/kolla-ansible/+bug/1729246 16:07:40 <openstack> Launchpad bug 1729246 in kolla-ansible "MariaDB cluster fails to start after upgrade" [Undecided,New] 16:07:58 <pbourke> o/ 16:08:00 <duonghq> I run on 2 nodes with 6GB memory/node 16:08:03 <duonghq> hi pbourke 16:08:21 <duonghq> can somebody help me test this upgrade again 16:08:42 <Jeffrey4l> duonghq, mariadb changed some thing recently. 16:09:05 <Jeffrey4l> our upgrade process ( ansible roles ) should fix the gap. 16:09:19 <duonghq> so, can you triaged this bug? 16:09:28 <Jeffrey4l> the safe_to_bootstrap do not exist before. 16:09:29 <Jeffrey4l> sure. 16:09:39 <duonghq> I cannot find any bug related to this issue 16:09:44 <duonghq> thank Jeffrey4l 16:10:09 <Jeffrey4l> curiosity upgrade is failed. 16:10:25 <Jeffrey4l> anyway, i will check this. 16:10:36 <Jeffrey4l> thank you for pointing this out. 16:11:09 <duonghq> :) 16:11:40 <Jeffrey4l> btw, there is another old bug report that possible data loss during mariadb recovery. 16:11:52 <Jeffrey4l> https://bugs.launchpad.net/kolla-ansible/+bug/1682153 16:11:53 <openstack> Launchpad bug 1682153 in kolla-ansible "mariadb_recovery is prone to data loss" [Critical,Confirmed] 16:12:15 <duonghq> sure, I saw that, seem that Sam proposed a fix for the bug 16:12:37 <rfxn> that bug has bitten me before, that articulate bug report helped narrow down and recover cluster 16:12:37 <Jeffrey4l> no patch right now 16:12:58 <Jeffrey4l> but he propose a possible solution way in the description. 16:13:34 <duonghq> we should implements his proposal 16:13:39 <Jeffrey4l> yes. 16:13:43 <duonghq> than let it roll for awhile 16:14:14 <Jeffrey4l> okay. please move on 16:14:30 <duonghq> sure, 16:14:45 <duonghq> so, my 2nd topic is about Kolla-ansible HA layer 16:15:13 <duonghq> do anybody know why we use haproxy/keepalived for HA layer? 16:15:23 <duonghq> but not pacemaker/corosync stack 16:15:41 <Jeffrey4l> duonghq, pacemaker/corosync is more complicated. 16:15:47 <rfxn> far more complicated 16:16:07 <Jeffrey4l> and iirc, pacemaker can not be containerized before ( now it should work ) 16:16:29 <duonghq> but it provides some mechanism for react with the failure, like data plane evacuation 16:17:03 <duonghq> it is invaluable feature (IMO) 16:17:43 <Jeffrey4l> duonghq, yes. pacemaker is powerful than keepalived. but what kind of issue we are facing by using keepalived. 16:18:08 <duonghq> it just for add some functionality to our stack, 16:18:38 <duonghq> I'm thinking in implement pacemaker into Kolla and let user choose which HA stack they want to use 16:19:24 <Jeffrey4l> that will be cool. pacemaker can handle more than keepalived. 16:20:00 <duonghq> Jeffrey4l, so, I'll create a blueprint for this, is it ok? 16:20:24 <duonghq> and try to containerized pacemaker (again) 16:20:28 <Jeffrey4l> but i just afraid what it will take to Kolla. more better health check? or fail over? 16:21:03 <Jeffrey4l> sure. a blueprint is necessary for others to evaluate the possibility. 16:21:29 <duonghq> about healthcheck, I'm not sure only it can make Kolla better 16:21:48 <rfxn> pacemaker in kolla feels like a solution to a problem that doesn't yet exist; granted from a maturity standpoint eventually moving to pacemaker from keepalived is probably what needs to happen 16:21:49 <duonghq> but I'm certainly about failover 16:22:01 <Jeffrey4l> please write what you think and the benefit. 16:22:22 <duonghq> sure 16:23:50 <Jeffrey4l> rfxn, tbh, i will current keepalived is enough. ;) 16:24:22 <Jeffrey4l> i think* 16:24:27 <Jeffrey4l> but who know what duonghq will take 16:24:37 <rfxn> current, ya i think keepalived is more than enough -- time could be better spent on other areas of HA and Disaster Recovery instead of pacemaker atm 16:24:48 <duonghq> I think we can let user choose which stack they like, 16:25:04 <duonghq> rfxn, can you suggest some area? 16:25:37 <rfxn> MariaDB is treated emphemeral right now, we put it behind a galera cluster and smile 16:25:45 <rfxn> in reality, if you loose mysql data, your done 16:25:53 <rfxn> we need a reliable, backup strategy for mariadb 16:26:05 <rfxn> clustering = HA, backup = DR 16:27:08 <Jeffrey4l> rfxn, are u meaning use pacemaker as DR solution? 16:27:23 <duonghq> I'm thinking it is slightly out of scope of Kolla 16:28:48 <rfxn> Jeffrey4l, no, im all for keepalive (now) and pacemaker (later, as a maturity point); duonghq asked some area to suggest, my thought is lack of backup solution for mariadb is the largest, most volatile gap, currently 16:29:13 <Jeffrey4l> ah, got. 16:29:45 <Jeffrey4l> db backup is really necessary. 16:30:39 <Jeffrey4l> so please register a bp for this duonghq , and let us discuss base on the bp. 16:30:40 <Jeffrey4l> thanks 16:31:00 <Jeffrey4l> then any other topics? 16:31:05 <duonghq> Jeffrey4l, sure 16:31:15 <rfxn> happy to help discuss and lay out options in that bp 16:31:22 <duonghq> thank you rfxn 16:32:04 <Jeffrey4l> guess no topic. 16:32:06 <rfxn> https://blueprints.launchpad.net/kolla/+spec/database-backup-recovery <- related 16:32:33 <duonghq> rfxn, ah, it should be in kolla-ansible 16:32:36 <duonghq> how do you think, Jeffrey4l 16:32:44 <Jeffrey4l> sure. 16:32:51 <Jeffrey4l> and xtrabackup should be the best solution. 16:32:59 <Jeffrey4l> the backup should run periodic. 16:33:15 <rfxn> agreed xtrabackup/innobackupx with a few output options = win 16:33:56 <rfxn> two smaller items; updating haproxy from 1.5 to 1.7 16:33:57 <Jeffrey4l> and we can add this jobs into cron containers. and save the backups into a new docker volumes. 16:34:38 <rfxn> and letsencrypt for automagick issuance of external_fqdn ssl certs 16:35:03 <Jeffrey4l> refer to backup, there are other ways to backup. like ceph pg map, crush rule. 16:35:10 <Jeffrey4l> rfxn, what upgrade haproxy? 16:35:16 <Jeffrey4l> what/why 16:35:52 <rfxn> 1.7 offers better ssl termination, http2, more advanced acl features, far more performant 16:36:22 <Jeffrey4l> rfxn, letencrypt may be hard. because during sign new certs, it requires network connective and a public domain. 16:37:01 <Jeffrey4l> rfxn, basically, package version in kolla based on the linux distro repo. 16:37:27 <rfxn> i dont think those are blockers, most production deployments are going to be on an external fqdn w/ internet access 16:37:52 <rfxn> i have a poc with haproxy that routes letsencrypt CA challenge/response to a dedicated listener on network group systems 16:38:00 <rfxn> so it never needs to touch the horizon container 16:38:41 <rfxn> letsencrypt is a nice to have, maybe not a need to have :) 16:39:06 <Jeffrey4l> yep 16:39:08 <rfxn> but would minimize the barrier to entry on new deployments imo 16:40:14 <duonghq> rfxn, can you elaborate your point? about new deployments 16:40:18 <Jeffrey4l> and we can implement this in "kolla-ansible certifications" command. 16:42:11 <rfxn> SSL certificates, valid browser recognized CA certificates, are the norm on any production ready deployment. Right now, managing SSL certificates is a pain, prone to human error and nothing should ever be internet facing without a valid cert. 16:42:23 <rfxn> letsencrypt allows us to automate the process entirely, in very trivial way 16:43:07 <Jeffrey4l> letsencrypt is a gread service ;p 16:43:11 <duonghq> rfxn, thank you for teach me this 16:43:26 <Jeffrey4l> great* 16:43:47 <rfxn> not trying to preach, just speak aloud that imo automagick ssl certificates would set kolla apart and remove a human error prone component for the stack 16:43:59 <rfxn> e.g azure going down cause of forgetting to renew api connector certs :P 16:44:13 <Jeffrey4l> lol 16:44:43 <Jeffrey4l> rfxn, will you add such feature into kolla? 16:45:56 <rfxn> i can bp it 16:46:00 <rfxn> and we go from there 16:46:10 <Jeffrey4l> cool. thanks. 16:46:26 <rfxn> and happy to share out my ugly code as poc :) 16:46:54 <Jeffrey4l> rfxn, thanks for sharing ;D 16:47:10 <Jeffrey4l> any other topics? 16:47:23 <duonghq> rfxn, poc is always has many idea 16:48:03 <Jeffrey4l> guess no. let us end the meeting. 16:48:08 <Jeffrey4l> thanks for all coming. 16:48:13 <Jeffrey4l> #endmeeting