09:02:47 <aspiers> #startmeeting ha 09:02:48 <openstack> Meeting started Wed Dec 21 09:02:47 2016 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:02:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:02:52 <openstack> The meeting name has been set to 'ha' 09:03:12 <samP> hi o/ 09:03:22 <aspiers> hi :) anyone else here today? 09:03:51 <aspiers> I'll try to do a better job with minutes than last time ... 09:04:54 <aspiers> OK, I guess just us 09:05:03 <aspiers> #topic specs 09:05:47 <samP> OK I've just update VM recovery spec. 09:05:58 <aspiers> #info VM recovery spec is making progress 09:06:38 <aspiers> yeah I saw, thanks. I will be working on these over the vacation while it is quiet 09:06:49 <aspiers> I have had a lot of urgent customer issues recently :-( 09:07:15 <samP> And I put a comment on compute node monitoring spec also 09:07:32 <aspiers> great thanks! 09:07:37 <aspiers> I'll reply to that ASAP 09:07:50 <aspiers> a general format makes a lot of sense 09:08:11 <samP> yeh, I would like to have more discussion on that. 09:08:33 <aspiers> my feeling is that the specs need to be quite precise about the implementation 09:08:44 <aspiers> to ensure we can have compatibility between implementations 09:09:07 <samP> aspiers: sure 09:09:42 <aspiers> but ddeja does not seem so sure 09:10:58 <aspiers> well, I guess I need to think about this more 09:11:15 <aspiers> did you see the conversation from last week? if not, worth reading I think 09:11:32 <samP> yes, I read it 09:11:36 <aspiers> ok cool 09:12:11 <aspiers> I suggested that the libvirtd/nova-compute RAs should do normal process monitoring *and* recovery, but also send HTTP message when monitor fails and when starting/ending recovery 09:12:40 <haukebruno> hey 09:12:40 <aspiers> then the external process recovery component could decide its own policy 09:12:44 <aspiers> oh hey haukebruno :) 09:12:55 <samP> haukebruno: hi! 09:13:18 <samP> aspiers: I agree 09:13:36 <aspiers> samP: ok great, I think I will propose that as the preferred implementation in the spec 09:13:42 <aspiers> samP: but I will list alternatives 09:14:00 <aspiers> samP: maybe including one where Pacemaker gets enhanced 09:14:32 <samP> aspiers: I'm wondering how much further should I wire in spec "VM recovery" 09:14:33 <aspiers> and definitely including the one where we do service-disable on every stop 09:15:25 <aspiers> samP: I think VM recovery spec needs to document the interface point 09:16:31 <samP> aspiers: if we wants to put more details about implementation in spec, I prefer to have fix the notification format first 09:16:49 <aspiers> uhhh 09:16:57 <aspiers> #topic VM recovery spec 09:17:01 <aspiers> :) 09:17:11 <aspiers> samP: yes, that makes sense 09:18:13 <aspiers> samP: did you see Qiming's suggestion about versioning? 09:18:28 <aspiers> https://review.openstack.org/#/c/406659/2/specs/newton/approved/newton-instance-ha-host-monitoring-spec.rst@205 09:19:58 <aspiers> #agreed that we should decide on a standard format for all HTTP notifications, across all components 09:20:04 <samP> how about have a simple spec for notification format and lets make reference it from each spec 09:20:32 <aspiers> samP: ok sure 09:20:36 <samP> aspiers: yes, about JSON ver? 09:21:04 <aspiers> the versioning sounds like a good idea to me 09:21:21 <aspiers> I guess there is already an oslo standard? 09:23:32 <aspiers> hmm, maybe just nova 09:23:34 <aspiers> http://developer.openstack.org/api-guide/compute/microversions.html 09:23:57 <samP> I couldn't fine any doc in oslo 09:24:20 <aspiers> this looks like a good place to start 09:24:31 * ddeja forgot about the meeting... 09:24:38 <ddeja> sorry guys, hello all 09:24:40 <aspiers> ddeja: haha no problem :) 09:24:49 <samP> ddeja: hi 09:25:00 <ddeja> but, I remember on Monday that there is no meeting 09:25:20 <aspiers> that's a good start anyway :) 09:25:22 <aspiers> ddeja: samP has a nice suggestion to standardize the common bits of HTTP message format across all components 09:25:43 <ddeja> that's good 09:26:12 <ddeja> wheras I'm still not sure if it's a good idea to standarize it at all, as I stated in one of the reviews 09:26:33 <aspiers> ddeja: yes I saw that 09:26:52 <aspiers> ddeja: I'm fine with your driver idea, but I think we have to specify the message format in the specs 09:27:22 <aspiers> ddeja: since the main point of the specs is to allow different implementations of each component to be compatible 09:27:30 <aspiers> so that they are interchangeable 09:27:49 <ddeja> OK 09:28:10 <ddeja> we can add an abstraction layer for compatibility reasons 09:28:31 <ddeja> I'm afraid that it may be seen as an overhead for others 09:28:37 <ddeja> and noone would use it 09:29:20 <aspiers> what kind of overhead do you mean? 09:29:27 <aspiers> performance, or implementation? 09:30:33 <ddeja> niether of them 09:30:39 <ddeja> let me show as an example 09:30:44 <aspiers> ok 09:30:56 <ddeja> let's say we have someone who wants to use Masakari 09:31:01 <ddeja> for VM ha 09:31:12 <ddeja> he would have to a) setup pacamaker 09:31:22 <ddeja> b) setup agent that monitors vm 09:31:38 <ddeja> c) setup agent that receive alarms from monitor and inform Maskari 09:31:45 <ddeja> and d) Masakri itself 09:31:53 <ddeja> what he woudl think about it? 09:31:55 <aspiers> no, c) and d) are the same 09:32:04 <ddeja> OK 09:32:27 <ddeja> then, we need Masakri to be able to read the http message that monitor would send 09:32:31 <ddeja> same for Mistral 09:32:31 <aspiers> yes 09:32:34 <ddeja> same for everything 09:33:19 * ddeja is now entering 'mistral core mode' 09:33:25 <ddeja> so, as for Mistral 09:33:27 <aspiers> sometimes it might be easier to write a shim which proxies notifications 09:33:34 <aspiers> but that's up to the individual implementation 09:33:48 <ddeja> unless that http message is openstack complaint, I see no way to implement something like that in Mistral 09:34:05 * ddeja closed 'mistral core mode' :) 09:34:12 <aspiers> ok so in the mistral case maybe a shim is required 09:34:39 <aspiers> is there a problem with that? 09:34:49 <ddeja> aspiers: for me there is no 09:35:02 <ddeja> but if we go this way 09:35:08 <ddeja> we got a, b, c and d 09:35:15 <ddeja> (as in my example) 09:35:28 <aspiers> right 09:35:38 <ddeja> and given someone may think "hm, why I need something that just caches the alarm and sends it to recovery service" 09:35:53 <ddeja> "why I can just notify the recovery itself?" 09:36:11 <ddeja> and given someone would end up writing his own monitor 09:36:30 <ddeja> that's why I think driver-based architecture woudl be better 09:37:26 <ddeja> in vm monitor spec, we just specyfi what information would be passed to driver layer 09:37:44 <ddeja> and what to do with it, is up to driver writer/maintener 09:37:57 <aspiers> but then we have no guarantee of compatibility 09:38:21 <ddeja> yes, but what kind of copatibility you need at this point? 09:38:40 <ddeja> you just want to catch the alaram and send it to recovery service of your choice 09:38:57 <aspiers> compatibility between monitors and recovery services 09:39:19 <ddeja> it would be guaranteed by the driver 09:39:52 <aspiers> the driver in the monitor? 09:39:56 <ddeja> which means, we don't need to specyfi what kind of informatio recovery service needs to understand 09:40:18 <ddeja> we just specyfi that we have a drive that can notify the recovery service and it woudl understand it 09:41:12 <aspiers> but if that's undocumented then there is no reliable way for multiple implementations to be compatible 09:41:29 <ddeja> aspiers: agree 09:41:48 <ddeja> aspiers: taking it other way around: why we want it to be compatible? 09:42:08 <ddeja> as I said before: you just want to catch the alaram and send it to recovery service of your choice 09:42:11 <aspiers> so that we can incrementally move from existing vendor implementations to a single upstream converged implementation 09:42:42 <ddeja> aspiers: so you can change each peace independently? 09:42:45 <aspiers> yes 09:43:03 <ddeja> that's one thing I didn't take into account 09:43:05 <aspiers> that was the whole point of the approach we agreed in Tokyo 09:43:31 <ddeja> OK 09:43:42 <ddeja> what if we have a driver that still talks "the old way" 09:43:49 <aspiers> because several of us have existing implementations which are in production and supported for customers 09:44:04 <ddeja> and then, when you change the revocery service, you just switch the driver? 09:44:44 <aspiers> ddeja: yes, that would be perfect. any support for communication an "old way" is of course still allowed, but outside the scope of the spec 09:44:51 <ddeja> or, why not both? 09:44:54 <ddeja> I mean 09:44:58 <aspiers> both is fine 09:45:09 <ddeja> let's do the driver architecture 09:45:20 <ddeja> specyfi what information are given to it 09:45:30 <aspiers> the specs should only require support for the new way, they are not exclusive deals which prohibit the old ways :) 09:45:43 <ddeja> and give one driver that is "the" driver 09:46:01 <aspiers> for which component? 09:46:06 <ddeja> for monitor 09:46:16 <ddeja> that talks using the specyfied http message 09:46:17 <aspiers> VM monitor or host monitor? 09:46:24 <aspiers> or both 09:46:26 <ddeja> VM monitor 09:46:37 <ddeja> but for host monitor it should also works 09:47:09 <ddeja> then we can specyfi the modular architecture beetwen monitors<->Recovery services 09:47:13 <ddeja> using standard drivers 09:47:29 <aspiers> I like the idea of drivers, but I don't see why the spec would need to require a driver architecture 09:47:51 <aspiers> any component is free to use a driver architecture, but why force all implementations to have one? 09:47:53 <ddeja> so that we can ommit the point 'c' for some recovery services ;) 09:48:37 <aspiers> sure, but why does it have to be in the spec? 09:48:40 <ddeja> we just let the user choice: either your recovery service understand the standard http message 09:48:53 <ddeja> aspiers: I didn't say it's need to be in spec :) 09:48:56 <aspiers> oh ok :) 09:49:18 <aspiers> so I think we are agreed on everything then :) 09:49:39 <aspiers> hmm 09:49:40 <ddeja> I think so 09:49:41 <aspiers> maybe not 09:49:43 <aspiers> :) 09:49:46 <ddeja> ? 09:49:47 <aspiers> I am thinking ... 09:50:08 <aspiers> OK, so let's say we implement a VM monitor with notification drivers 09:50:27 <ddeja> I just don't like the idea to writing some proxy service that translate our standard http message to mistral 09:50:31 <aspiers> then we implement a) a driver for notifying Mistral 09:50:36 <samP> sounds good, lest discuss more in the spec 09:50:44 <aspiers> b) a driver for sending standard HTTP message 09:50:48 <ddeja> aspiers: because of this -> https://xkcd.com/927/ 09:51:07 * aspiers guesses that is the one about standards 09:51:18 <aspiers> right :) 09:51:21 <haukebruno> lol 09:51:46 <ddeja> aspiers: I think we need to only implement b) 09:51:59 <ddeja> as a starting point 09:52:27 <aspiers> ddeja: but driver b) requires a shim, and you want to eliminate that for mistral 09:52:46 <samP> ddeja: how do you propose to implement drivers? 09:53:00 <ddeja> aspiers: and for anything else, that cannot understand the standard http message 09:53:36 <aspiers> but if you don't provide the shim, the only way for each VM monitor implementation to be compatible with mistral is by talking directly to it 09:53:51 <ddeja> aspiers: yes, I want to elimiante it, but let's make it work in one standard way, then add next implementation 09:53:58 <ddeja> that's how I see it 09:54:30 <ddeja> samP: I think of it as some standard base class, to which we would pass a given set of input parameters 09:54:39 <ddeja> and then call some method, like 'notify' on it 09:54:44 <ddeja> that's all 09:54:53 <ddeja> the rest is on the drivers writer side 09:55:01 <aspiers> ddeja: oh, so you are saying we *would* write a shim for initial case? 09:55:15 <aspiers> ddeja: and then later optimise by adding mistral-specific driver? 09:55:23 <ddeja> aspiers: maybe 09:55:28 <aspiers> that could work 09:55:40 <ddeja> or maybe just made it work with only Masakari at first moment 09:56:01 <ddeja> or we can write a driver nearly in the same time, it shouldn't be a big deal 09:56:24 <samP> ddeja: OK, got it. 09:56:31 <ddeja> what is more, with drivers it may be easier to be complaint with someone existing solution 09:56:52 <ddeja> he just uses/writes a driver that is compliant with exisitn recovery workflow 09:57:02 <ddeja> service 09:57:21 <aspiers> yes I'm definitely in favour of drivers where it makes sense 09:57:37 <aspiers> I'm just trying to figure out what should be in the specs though 09:57:39 <ddeja> then switches to Masakari/Mistral (are there any other standards?) 09:57:48 <ddeja> and switches the driver 09:58:00 <ddeja> aspiers: I don't think it must be specyfi in the spec itself 09:58:12 <aspiers> right 09:58:15 <ddeja> maybe just information about 'modular architecture' 09:58:33 <ddeja> so it woudl be easy to notify various recovery services? 09:58:43 <samP> IMO, spec should contain all the info can obtain from monitors. Then drivers can choose what they want 09:59:00 <ddeja> samP: that's what I was thinking too :) 09:59:43 <aspiers> ok, still not sure I follow 100% but we're out of time so let's just continue working in gerrit :) 09:59:56 <aspiers> good discussion though, thanks! 10:00:10 <samP> yep. thanks ddeja 10:00:11 <ddeja> aspiers: I'll make some diagram to show my idea :) 10:00:17 <aspiers> ddeja: perfect! 10:00:19 <ddeja> thanks guys! 10:00:22 <aspiers> thanks :) 10:00:28 <aspiers> #endmeeting