#openstack-meeting log

09:00:26 <aspiers> #startmeeting HA: automated recovery from hypervisor failure
09:00:27 <openstack> Meeting started Mon Nov 16 09:00:26 2015 UTC and is due to finish in 60 minutes.  The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:30 <openstack> The meeting name has been set to 'ha__automated_recovery_from_hypervisor_failure'
09:00:58 <bogdando> hi
09:01:04 <aspiers> Hi all, and welcome to the (first ever?) official IRC meeting about OpenStack HA!
09:01:11 <bogdando> o/
09:01:11 <_gryf> :)
09:01:16 <masahito> hi o/
09:01:20 <ddeja> hi \o
09:01:22 <aspiers> here is the etherpad we will be referring to: https://etherpad.openstack.org/p/automatic-evacuation
09:01:22 <bpiotrowski> hello
09:02:13 <aspiers> this meeting was arranged off the back of discussions in Tokyo on hypervisor HA
09:02:37 <aspiers> we have a few existing approaches within the community, and we were interested in trying to converge on one
09:03:01 <aspiers> the etherpad lists all known existing approaches, plus some ideas on new approaches
09:03:11 <bogdando> and probably put it in the HA guide as recommended :)
09:03:17 <aspiers> exactly :)
09:03:38 <aspiers> I think today we only have 30 mins or so, so I don't expect us to reach a complete plan for convergence :)
09:03:44 <aspiers> although if that happens I will not complain ;)
09:04:07 <aspiers> but it's more about setting a general direction for how work should continue
09:04:39 <aspiers> should I give a very brief history for those who are new to this area?
09:05:08 <_gryf> aspiers, pleas, do :)
09:05:46 <aspiers> OK. The first approach on the etherpad (masakari) is from NTT and has some very nice features.
09:06:07 <aspiers> The second was born out of conversations between Red Hat and Intel, and has a similar design.
09:06:48 <aspiers> The third was originally by Red Hat, and also used by SUSE.  This one exposed some weaknesses in the design which the other two are trying to address.
09:06:53 <beekhof> ok, i'm here too
09:07:00 <aspiers> Mainly the unreliability of evacution
09:07:09 <beekhof> too many channels :)
09:07:15 <beekhof> what are we talking about?
09:07:17 <aspiers> hey beekhof :) you didn't miss much but I guess you can catch up via the logs on the web
09:07:26 <aspiers> I'm just summarising status quo for benefit of newcomers
09:07:34 <beekhof> k
09:08:08 <aspiers> then we have an entirely different approach from AWcloud and ChinaMobile which was presented in Tokyo
09:08:26 <aspiers> it has some good ideas which a best of breed solution should probably incorporate
09:08:52 <aspiers> but it is fundamentally different (Pacemaker not used)
09:08:55 <beekhof> ChinaMobile was tristack?
09:08:59 <bogdando> could we define the evacuation term more precise? in terms of fencing as well. Is it just STONITH the host hypervisor node and relaunch instances another place, which would be like a reboot from the instances pov?
09:09:09 <aspiers> so I guess it is impossible to converge with that codebase
09:09:10 <bogdando> or is it live migration?
09:09:18 <aspiers> it is not live migration
09:09:28 <aspiers> so we have a problem with terminology unfortunately
09:09:42 <aspiers> a long time ago the nova project chose the misnomer "evacuate"
09:09:45 <_gryf> bogdando, evacuation is basically rebuilding vm on another host. it's post mortem process
09:09:53 <beekhof> at the point a compute node is dead - there is nowhere to live migrate from
09:09:59 <aspiers> exactly. it's really "resurrect" not "evacuate"
09:10:01 <bogdando> okay, so it is like a reboot for its apps
09:10:22 <aspiers> in Vancouver it was proposed to fix this naming by renaming to resurrect
09:10:45 <bogdando> well in the guide that would be just a notem so everything would be clear
09:10:47 <aspiers> but since then nothing has happened, and I spoke to Dan Smith about it who said he was -1.99 on the rename due to the impact
09:10:52 <bogdando> note*
09:10:57 <aspiers> so it's unlikely to happen any time soon
09:11:15 <aspiers> so yes, we'll just have to be clear what we mean, especially in docs
09:11:17 <bogdando> not a problem, just to be on the same page here
09:11:37 <aspiers> bogdando: thanks for the clarification, I had forgotten to mention that :)
09:12:20 <bogdando> it seems all methods are about pacemaker
09:12:42 <_gryf> bogdando, not really
09:12:50 <bogdando> and I like it personally. But do we make a R&D around its fancy alternatives?
09:13:05 <bogdando> do we want to*?
09:13:12 <aspiers> that's a good question
09:13:30 <aspiers> what alternatives are there?
09:13:52 <aspiers> BTW I suspect most people in this meeting want to stick with Pacemaker but we can put it to the vote for sure
09:13:57 <bogdando> note, we probably may want to sync with a DLM initiative
09:14:05 <beekhof> keepalived is about all (from a high level view, not the instance HA part)
09:14:16 <bogdando> if they decided to stick with zookeper or etcd or consul
09:14:30 <aspiers> the AWcloud/ChinaMobile approach uses consul/raft/gossip
09:14:51 <bogdando> so to keep ops less frustrated it would be nice we to use the same solution
09:14:52 <oomichi> andreaf: it was easy to rename BaseComputeTest to BaseV2ComputeTest due to number of lines
09:14:57 <aspiers> frankly, moving away from Pacemaker would be too much rework for SUSE
09:15:09 <beekhof> fwiw, i spent most of the day starting to pull in a lot of the content from https://github.com/beekhof/osp-ha-deploy into the ha-guide
09:15:17 <aspiers> so it's pretty unlikely we'd do that
09:15:19 <bogdando> just to not introduce yet another control plane
09:15:40 * oomichi sorry for interuppting.
09:15:47 <beekhof> redhat is staying with pacemaker too
09:16:08 <aspiers> also the upstream HA guide is already Pacemaker-based
09:16:21 <aspiers> and so is Masakari
09:16:24 <masahito> NTT also pant to use pacemaker.
09:16:26 <bogdando> yes, pacemaker is a part of ref. arch already
09:16:42 <masahito> s/pant/plan/
09:17:24 <aspiers> of course anyone is free to do R&D around other technologies at any time
09:17:53 <beekhof> everyone is entitled to their opinions, as long as they understand they are wrong :)
09:17:57 <aspiers> lol :)
09:18:21 <bogdando> and make sure it would be not diverge from the control plane /distributed consensus solution picked by the DLM initiative, I'd say
09:18:46 <aspiers> given that 1) we already have a very good platform based on Pacemaker and 2) we are pretty confident we can implement solid hypervisor HA on top of it, I'm not sure it makes much sense for us to research other options
09:18:54 <aspiers> unless it's to steal ideas :)
09:18:58 <bogdando> I mean ops would barely be happy to see pacemaker and consul and zookeper in one deploy
09:19:18 <aspiers> bogdando: agreed
09:19:28 <aspiers> one cluster manager is enough :)
09:19:43 <bogdando> let's push pacemaker to the DLM topic! :)
09:19:49 <bogdando> sorry for offtopic
09:19:59 <aspiers> :)
09:20:24 <aspiers> so are we agreed to stick with Pacemaker for now?
09:20:40 <bogdando> It seems so
09:20:43 <_gryf> yup
09:21:02 <beekhof> well i'm not going to say no
09:21:03 <masahito> agree
09:21:12 <aspiers> #agreed hypervisor HA solution will be based on Pacemaker
09:21:34 <aspiers> so, most obvious path to me is to converge masakari with evacuationd
09:21:58 <_gryf> aspiers, how about mistral solution?
09:21:58 <aspiers> Nova{Compute,Evacuate} RAs work fine with known limitations, but are kind of a dead end, right?
09:22:12 <aspiers> _gryf: oh yes, sorry I forgot about that
09:22:18 <bogdando> note, there is kilometers of bash, would it be a maintainable well solution?
09:22:28 <bogdando> I mean masakari
09:22:34 <aspiers> bogdando: IIRC both masakari and evacuationd are Python?
09:22:42 <bogdando> we may want the bats tests at least, may be
09:23:04 <bogdando> well, as I understood from the repo, it is pure bash?
09:23:09 <aspiers> I am strongly in favour of the solution being mainly in Python
09:23:20 <_gryf> evacuationd is in pure python
09:23:20 <aspiers> bogdando: ah, I didn't look at the code yet
09:23:41 <aspiers> no, masakari has a bunch of Python
09:23:46 <aspiers> 73.5% according to github
09:23:52 <bogdando> I like bash, but we should care of unit tests as well
09:23:59 <bogdando> hm, ok then
09:24:24 <bogdando> example https://github.com/ntt-sic/masakari/blob/master/masakari-hostmonitor/hostmonitor/hostmonitor.sh
09:24:24 <masahito> Only masakari's RAs are basing on bash. Masakari's controller is written Python.
09:24:46 <aspiers> I think both Masakari and evacuationD do a pretty good job of using standard OpenStack tech
09:25:15 <beekhof> bogdando: most pacemaker agents are written in bash.  they can be in python but miss out on the common library functions
09:25:32 <aspiers> true, although most RAs are very simple and quite short
09:25:35 <aspiers> so bash is tolerable
09:25:43 <beekhof> thats why you'd be seeing bash at all i imagine
09:25:55 <aspiers> right
09:26:35 <_gryf> otoh openstack is mainly in python, so interacting with nova would have much more sense if we go into python
09:26:39 <bogdando> I'm not sure if it is restricted to the RA for mamsakari
09:26:59 <beekhof> _gryf: agreed
09:27:04 <bogdando> limited*
09:27:30 <bogdando> bash for RA is acceptedd practice, agree
09:27:35 <bogdando> but for the rest?
09:27:44 <aspiers> question for the Intel guys: how attached are you to evacuationd? e.g. would you be ok with the idea of switching to masakari and then porting over any features from evacuationd which it is missing?
09:28:16 <_gryf> we have no problem with deprecating evacuationd
09:29:09 <aspiers> IIRC, masakari already persists to database which is nice. Does it use SQLalchemy? I can't remember
09:29:14 <beekhof> is one clearly better than the other?
09:29:32 <aspiers> there are pros and cons to both
09:29:39 <masahito> Masakari use SQLalchemy to access DB.
09:29:46 <aspiers> as listed in the etherpad
09:29:59 <aspiers> I think the main thing which would need fixing with masakari is to port it to use pacemaker_remote
09:30:02 <aspiers> so that it can scale
09:30:09 <aspiers> currently compute nodes are grouped into 16s
09:30:13 <bogdando> regarding masakari " corosync's scaling limits, compute nodes are grouped into  16-node clusters" why to not use pacemaker-remote?
09:30:19 <aspiers> but I doubt that would be hard
09:30:32 <beekhof> i think that would be the first thing RH would want to change :)
09:30:38 <aspiers> and SUSE :)
09:30:53 <beekhof> masahito: any problem there?
09:31:26 <bogdando> aspiers, oops, you was first
09:31:39 <masahito> I don't have any problem to change pacemaker remote.
09:31:43 <_gryf> other than that, there would be a need for selecting certain vms for resurrecting - we already have that in evacuationd
09:32:00 <beekhof> _gryf: good point
09:32:04 <aspiers> _gryf: sounds nice, how would that work?
09:32:25 <beekhof> aspiers: atttribute on the instance in nova iirc
09:32:32 <_gryf> we had it implemented through the flavor extra specs and vm metadata
09:32:57 <beekhof> _gryf: did you get around to using a db for persistance in evacuationd?
09:32:59 <aspiers> oh, you mean something a bit like availability zones?
09:33:08 <aspiers> _gryf: e.g. selecting which VMs are pets vs. cattle?
09:33:44 <bogdando> what about the MQ? Do masakari use it?
09:33:46 <_gryf> beekhof, we didn't made any changes due to discussion we had
09:33:53 <_gryf> aspiers, right
09:33:57 <aspiers> bogdando: I think they use HTTP requests
09:34:19 <bogdando> should we also keep in mind the Mistral alternative in the list? Looks like a high level coordintaion for masakari as well
09:34:27 <beekhof> yes
09:34:31 <aspiers> +1
09:34:49 <aspiers> what's the first action we could take regarding Mistral?
09:34:57 <bogdando> how wouldwe want to proceed with PoC?
09:35:02 <aspiers> I guess it's still in the R&D phase regarding hypervisor HA
09:35:20 <ddeja> right now I'm working on POC using mistral that will auto evacuate VMs
09:35:30 <aspiers> cool!
09:35:31 <beekhof> _gryf: how far off is that project from conceptually being able to support what we want?
09:35:32 <_gryf> we have this almost working :)
09:35:37 <beekhof> wow
09:35:39 <bogdando> ddeja, would be nice to join your efforts
09:35:40 <beekhof> awesome
09:35:43 <bogdando> if possible
09:36:04 <aspiers> ddeja: perhaps you can give a quick summary of how that works and where you are with it?
09:36:05 <beekhof> so do we park this thread until we hear how the mistral PoC went?
09:36:19 <bogdando> any action item then?
09:36:26 <ddeja> bogdando: no problem, I can share code on github
09:36:32 <aspiers> beekhof: I suggest we hear some details first before deciding
09:36:33 <bogdando> great!
09:36:42 <aspiers> yes please, github would be great
09:37:23 <aspiers> ddeja / _gryf: are you able to quickly summarise now?
09:37:28 <bogdando> aspiers, action item please?
09:37:31 <_gryf> aspiers, yup
09:37:36 <_gryf> so the idea is simple
09:37:52 <_gryf> prepare small action class (in python)
09:37:58 <beekhof> ok, i need to head out guys (its my wedding anniversary and its getting late here)
09:38:04 <aspiers> #action ddeja will share mistral PoC code on github
09:38:05 <_gryf> plus the workflow (which is basically a yaml)
09:38:12 <beekhof> i'll read up on the weblog though
09:38:27 <aspiers> beekhof: whoa, run before you get in trouble ;-) congrats and thanks for attending!
09:38:32 <_gryf> and then proceed like with other solution - trigger it from pacemaker
09:38:50 <bogdando> beekhof, congrats!
09:38:53 <_gryf> mistral would take care of the evacuation
09:38:53 <masahito> beekhof: congrats!
09:38:58 <ddeja> _gryf: this action class is needed only for selecting which VM should be evacuated (so resolving pet vs cattle problem)
09:39:26 <_gryf> ddeja, right, but it;s one of our assumption
09:40:15 <bogdando> could this also help us to build the solution with a solid logging for events? http://blog.clusterlabs.org/blog/2015/reliable-notifications/
09:40:22 <_gryf> and the main point regarding mistral is, that it's a quite stable project now
09:40:51 <aspiers> that sounds cool to me, looking forward to seeing more details
09:41:46 <_gryf> a soon we have the poc working, I'll make some more detailed description on the etherpad.
09:42:06 <aspiers> #action _gryf will update the etherpad with more details of the mistral PoC
09:42:11 <_gryf> +!
09:42:14 <_gryf> +1
09:42:17 <_gryf> :)
09:42:19 <aspiers> :)
09:43:07 <aspiers> so do we want to work in parallel on masakari / mistral, or hold off for a short while?
09:43:30 <bogdando> My vote is for Mistral PoC as well
09:43:43 <aspiers> masakari has three levels of monitoring which is really nice - could mistral do that too?
09:44:17 <aspiers> and can mistral itself be made HA?
09:44:25 <aspiers> if not, how much work is that?
09:44:38 <aspiers> obviously there's no point designing an HA system around a component which is a SPoF :)
09:44:59 <aspiers> just trying to figure out what's likely in the short vs. long term
09:45:19 <_gryf> aspiers, we can make it to monitor - actually, we can make it through pacemaker
09:45:30 <_gryf> *all 3 levels
09:45:35 <bogdando> I believe the HA based solution with pacemaker for Mistral would be the same as for the rest OpenStack projects
09:45:52 <aspiers> bogdando: so it's basically stateless?
09:45:54 <bogdando> A/P or A/A for multiple API instances, what else?
09:46:05 <aspiers> all state in the DB I guess?
09:46:08 <bogdando> depends on if it is  stateless, yes
09:46:11 <_gryf> regarding ha - there is a priority to make mistral HA in the mitaka cycle
09:46:21 <aspiers> ok cool
09:47:05 <bogdando> _gryf, OCF_CHECK_LEVEL?
09:47:17 <bogdando> are we about those levels?
09:47:35 <bogdando> we could make as many of them as we want
09:47:57 <_gryf> bogdando, look at the masakari project - there are 3 levels of failure check
09:48:11 <_gryf> vm, process (libvirt/compute) and host
09:48:15 <aspiers> I'd like to know how much potential there is for convergence between Mistral PoC and Masakari, but maybe it's too early to discuss that and we should wait until more details of Mistral PoC first?
09:48:28 <bogdando> perhaps
09:48:43 <aspiers> like I said, I didn't expect us to solve everything in the first meeting ;-)
09:48:48 <_gryf> aspiers, I would wait
09:49:04 <bogdando> anyway we did not bad already :)
09:49:15 <aspiers> right :)
09:49:43 <aspiers> I think we can continue the architectural conversations on #openstack-ha and openstack-dev mailing list, right?
09:49:53 <_gryf> aspiers, sure
09:50:06 <aspiers> #openstack-ha is logged, so noone has to miss discussions
09:50:32 <aspiers> also, please make sure to include "[HA]" Subject: prefix for discussions on openstack-dev
09:50:58 <aspiers> ttx configured mailman for this prefix so you can even do server-side filtering for that topic now
09:51:47 <aspiers> masahito: once more details of Mistral PoC are released, would you be able to look at it and assess possibility of convergence?
09:52:02 <masahito> aspiers: yes.
09:52:17 <aspiers> great, thanks!
09:52:22 <masahito> I think we would converge both as long term goal
09:52:28 <aspiers> agreed :)
09:52:52 <masahito> btw, which do you all easy to push codes to Masakari using stackforge or github?
09:52:55 <aspiers> #action masahito will investigate possibility of converging masakari with Mistral PoC, once details of the latter are published
09:53:20 <aspiers> masahito: I guess gerrit/stackforge is preferred
09:53:36 <_gryf> for a poc level I think gh is alright
09:53:47 <aspiers> true, either works
09:54:09 <aspiers> if we need to set anything up on gerrit, I can help with that
09:54:15 <_gryf> then, if we decide to go either way, gerrit/stackfroge will be a way :)
09:54:28 <aspiers> I already went through that process for openstack-resource-agents so I know how to do it
09:54:35 <_gryf> great
09:54:41 <masahito> aspiers: great
09:55:09 <masahito> Do I start to move the repo to stackforge?
09:55:17 <aspiers> masahito: also would you be able to investigate what work would be needed to switch masakari to pacemaker_remote?
09:55:44 <masahito> aspiers: not yet. because we didn't try it.
09:55:59 <aspiers> masahito: I would guess no rush to move to stackforge yet, but of course you can if you want
09:56:17 <masahito> aspiers: meaning just think the idea, but not implementing.
09:56:17 <aspiers> ok, we are approaching the end of the 60 minutes
09:56:38 <masahito> aspiers: ok. I'll wait suitable time :)
09:56:49 <aspiers> masahito: right. if you have ideas on pacemaker_remote implementation then please share them on IRC or mailing list
09:57:05 <aspiers> anybody want to raise anything else before we close?
09:57:32 <aspiers> otherwise let's continue on #openstack-ha and openstack-dev (with "[HA]" in Subject: line :)
09:57:38 <_gryf> it was fruitfull meeting :) thx everyone :)
09:58:00 <aspiers> yeah great first meeting, thanks a lot everyone!
09:58:46 <masahito> bye!
09:58:56 <aspiers> #agreed we'll continue discussion on #openstack-ha and openstack-dev (with "[HA]" in Subject: line :)
09:59:28 <aspiers> thanks everyone, see you same time/place next week!
09:59:37 <aspiers> #endmeeting