09:20:22 #startmeeting ha 09:20:22 Meeting started Wed Feb 1 09:20:22 2017 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:20:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:20:25 The meeting name has been set to 'ha' 09:20:34 let's just have a quick chat 09:20:51 aspiers: sure, in that case, I have to let go one of my talks, but its ok 09:20:56 #topic Boston 09:21:19 samP: I guess you would suggest another talk on compute HA? 09:21:20 aspiers: talk submit deadline is 2/6 (I think) 09:21:23 yes, it's soon 09:21:26 aspiers: howdy 09:21:31 oh hey beekhof 09:21:37 beekhof: you gonna come to Boston? 09:21:38 beekhof: hi 09:21:51 sorry, the US is on my No-Fly-To list 09:21:59 ah, you too 09:22:23 Tim and Florian too 09:22:24 aspiers: I was thinking to submit one about compute-HA but more masakari related 09:22:46 aspiers: but I can join with you.. 09:22:52 organising a conference in the US right now seems borderline dumb and more than a little insensitive 09:23:09 beekhof: agreed but there is a blog explaining why 09:23:16 -ENOCARE 09:23:48 I'm definitely not gonna argue with you on that :) 09:23:53 :) 09:24:53 samP: if we could write a simple host monitor which notifies masakari for host recovery then I think doing a joint talk is justified 09:25:10 since that demonstrates the idea of a componentized architecture 09:25:11 aspiers: sure 09:25:20 and it should be easy to do I think 09:25:48 aspiers: I can do that 09:26:30 we could follow the driver idea, so that it can have a driver to notify masakari via its native mechanism, and also a driver to notify any other service via standard https 09:26:53 or if it's easy, to enhance masakari to accept the standard message 09:26:57 or just make them the same format 09:27:06 I need to learn more about how masakari currently works 09:27:44 aspiers: It depends on what is "standard message" 09:28:12 samP: well we can define it to be whatever we want :) but something which is suitable for any recovery workflow controller to handle 09:28:50 it's ok if it supports masakari extensions, but not ok if it requires masakari-specific things 09:29:18 aspiers: correct, I thought you are referring to smt already exist 09:29:47 ah, no :) 09:30:41 are there any docs for masakari yet? 09:30:42 aspiers: current masakari-monitors does not require any masakari-specific info, however there is a format for every thing 09:31:00 aspiers: sorry, docs are on the way.. 09:31:07 ok 09:31:34 we need the host monitor to support different notifications anyway, so that it can notify mistral too 09:33:06 aspiers: agree, both cases we have to prepare the data with required format from the engine (mistral/masakari or etc...) 09:33:52 samP: could you do a very quick rough doc somewhere (etherpad/wiki) describing how masakari monitors send notification to the controller? 09:34:34 aspiers: sure, I will do it on wiki 09:34:43 beekhof: from our side I think the action would be to split fence_compute into two decoupled parts: the monitoring code and the recovery code 09:34:44 and will send you the link 09:35:10 beekhof: or at least to conceptually split it, if not into separate files 09:35:20 samP: great thanks! 09:35:40 #action samP to document how masakari monitors send notification to the controller 09:36:06 aspiers: sure, thanks for the action item 09:36:39 i'm not sure i follow 09:36:44 samP: so maybe I should draft a synopsis for a talk proposal 09:36:58 fence_compute only performs evacuations 09:37:14 it doesn't look for failed vms 09:37:26 aspiers: That would be great..thank you 09:37:28 beekhof: no but it sets the attribute 09:37:44 beekhof: which is akin to sending a notification to the recovery workflow controller (NovaEvacuate) 09:38:04 #action aspiers to draft a proposal for a joint talk with samP on next gen compute HA 09:38:08 in that case we agree 09:38:25 i didnt really like that it was called twice in two completely different modes 09:38:56 beekhof: ok cool, me neither. 09:39:07 beekhof: Since we have the underlying goal of componentising everything to support a modular approach, this decoupling would be a key part of that 09:39:29 beekhof: since currently the monitoring part of fence_compute only works with the recovery part of fence_compute 09:39:42 since both are dependent on attrd_updater 09:40:26 if we convert it to send / receive generic JSON notifications via https then it can work with any other approach to recovery 09:40:55 I guess we need some thought on how to do that reliably 09:41:34 e.g. what if the recovery workflow controller which receives notifications is offline when a compute host dies and gets fenced 09:42:03 sounds like fun 09:42:04 obviously https is not a stateful queue 09:42:17 whereas at least attrd can act as one 09:43:55 hmm, actually we could keep attrd acting as a queue, and then have a separate cluster process which takes items off the "queue" as soon as it successfully notifies a controller about them 09:44:02 that would probably work 09:44:22 Hi guys, I had problems with my PC... 09:44:23 aspiers: IMO, that is way the HA of recovery controller itself is important, https is stateless but we could retry 09:44:49 ddeja: hi 09:44:54 samP: we could retry, but we wouldn't be allowed to ever give up 09:45:03 and if the process running the retries died, then the notification would get lost 09:45:16 hey ddeja :) 09:46:33 beekhof: so actually, the change would be more to NovaEvacuate 09:46:36 aspiers: true 09:46:54 beekhof: that NovaEvacuate would be responsible not for recovery, but instead for notifying the recovery workflow controller 09:47:04 that could work 09:47:18 of course we'd probably want to change the name 09:47:25 but architecturally that's my idea 09:48:12 cool! that sounds like a potentially strategy for starting to unify all three approaches 09:48:44 which would make migration to masakari/mistral easier 09:49:20 #topic architecture 09:50:23 #info aspiers proposed an extension to ddeja's driver-based approach which would work with not only masakari/mistral but also potentially the OCF agents 09:50:42 I'll try to capture it in the specs 09:51:01 #action aspiers to capture proposal in the specs 09:51:14 of course we can consider other approaches too 09:51:17 aspiers: what if it was a different agent 09:51:44 beekhof: what do you mean? 09:51:52 NovaEvacuate for the existing pacemaker way, and mistral-evacuate for the new one 09:52:09 absolutely 09:52:33 well, it wouldn't necessarily be mistral-evacuate. I think it should aim to support both mistral and masakari 09:52:46 then later masakari can potentially use mistral to do its work 09:52:52 json-evacuate then :) 09:52:58 right :) 09:53:26 dont much care what its called, but not trying to support json and attrd in the same agent seems like a good idea 09:53:34 +100 09:54:17 beekhof: and the cool thing about having two RAs is that you could trivially migrate a cluster from one to the other by having the two resources co-existing during the switch-over 09:54:29 you would simply turn the old one off and the new one on 09:54:36 boom - done 09:54:45 yep 09:54:47 since they could still use the same attributes 09:54:55 hooray 09:55:14 this sounds suspiciously easy 09:55:21 lets not jump the gun on upgrades though, i believe its tradition to wait until the new way is in production 09:55:28 true 09:55:56 samP: I made some updates to your spec based on ddeja's feedback 09:56:57 aspiers: thanks.. 09:57:05 samP: please review when you get a chance 09:57:52 aspiers: sure I will 09:58:23 tonyb: ? 09:58:28 ddeja: I'll update the host monitor spec to suggest supporting driver plugins, with one for mistral, one for JSON, and another for masakari if it needs something different to the standard JSON one 09:58:36 ok, looks like we're out of time 09:58:48 but that was a pleasingly productive discussion 09:59:03 if you have anything else, let's just continue on #openstack-ha 09:59:09 thanks guys! 09:59:19 thank you all! 09:59:29 thanks 09:59:29 bye for now :) 09:59:54 #endmeeting