09:06:22 #startmeeting ha 09:06:22 Meeting started Mon Nov 28 09:06:22 2016 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:06:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:06:25 The meeting name has been set to 'ha' 09:06:29 morning :) 09:06:40 samNTT: but it is not morning for you, right? :) 09:06:51 yes, it 6pm here 09:07:15 ok, so good evening ;-) 09:07:35 ;) good evening.. 09:07:41 elynn_: good to have you here 09:07:59 #topic design specs 09:08:04 I'm here to listen your plans :) 09:08:07 so I have finally written a new spec 09:08:22 I meant to submit it yesterday but I will do that now 09:08:32 it's for host monitoring 09:09:27 ah.. host monitoring 09:09:37 and it is affected by the discussion I had with samNTT in Barcelona 09:09:59 line 135 of https://etherpad.openstack.org/p/newton-instance-ha 09:10:43 Using what for host monitoring? 09:10:49 elynn_: pacemaker_remote 09:10:56 elynn_: you will see the details very shortly 09:11:11 elynn_: it's basically describing what we are doing already, with a few minor changes 09:11:11 :D 09:11:43 samNTT: do you need to keep the workflow -1 on your review? 09:12:08 samNTT: it would be good to merge that 09:12:10 sorry, I will update it soon. I have add some more info to it 09:12:15 samNTT: oh ok 09:12:41 I will add them adn remove WF-1 tomorrow 09:12:56 cool thanks! 09:13:21 next I will do libvirt and NovaCompute OCF RA specs 09:13:40 I guess they can be separate specs even though they are very similar 09:14:41 libvirt specs is some what same as instance monitor in masakari? 09:14:43 in the host monitoring spec I have given justification for choosing pacemaker_remote 09:15:02 samNTT: no, it's just an RA for libvirtd 09:15:09 ha.. OK 09:15:32 samNTT: should be very simple process monitoring 09:15:41 so.. it is the process monitor for libvirtd 09:15:48 aspiers: ok 09:16:26 so my hope is that the justification will preemptively answer some questions about design which people may have 09:16:42 especially people who did not participate in discussions/work until now 09:17:10 elynn_: are you familiar with the proposed architecture? 09:17:36 No 09:18:05 Using pacemaker to monitor host and trigger mistral workflow when host failure? 09:18:18 right 09:19:12 Will this monitor agent monitor the VM itself? 09:19:33 no, that's a separate spec 09:19:48 ah.. okay 09:19:49 it will become clear soon I hope :) 09:20:39 I'm thinking if this monitor agent can notify senlin, I might need to see your spec for more details. 09:21:13 elynn_: the monitor agent for which? the host or processes or VMs? 09:21:30 elynn_: https://etherpad.openstack.org/p/newton-instance-ha lines 115+ 09:21:38 elynn_: these are the specs we are working on 09:21:53 for the hosts 09:22:26 ok, yes that could certainly notify senlin 09:22:42 ok, I see 09:22:58 it will just send a JSON message to an endpoint 09:23:11 elynn_: if senlin can process custom notifications, then I think it could be possible 09:23:33 That's great. 09:23:54 might it be required for the monitor to send to multiple endpoints? 09:24:09 e.g. would we want to notify both senlin and masakari? 09:24:36 senlin could expose a web hook to receive notifications. 09:24:55 For for custom notifications, we might need more investigation 09:25:40 it depends, if you use senlin as recovery engine, then you dont need masakari. But if you need masakari to keep the log and handle the recovery with senline then you might need to send them to both or masakari can send it to senlin 09:26:23 that's an interesting idea - have masakari re-send the notification 09:26:24 aspiers, yes, that's a good idea. Is it hard to implemented? 09:26:34 Masakari can send it senlin as we do it to mistral. But it depends on how you wants to use it 09:26:36 elynn_: the notification is extremely simple 09:26:52 If masakari do the recovery, I don't think senlin needs to do it again. 09:26:59 samNTT: I guess it should be a POST, right? 09:27:05 yes 09:27:19 I think ddeja forgot to mention the request type in his spec 09:27:24 But I'm think after the vm recovered, who will do the software deployment on it? 09:27:34 elynn_: nova 09:27:47 elynn_: it just restarts it on another host 09:28:36 So we need a shared storage for nova, right? 09:29:35 elynn_: yes, otherwise you will not get your instance back 09:30:06 I mean you have to reconfiure it 09:30:27 that make sense. 09:30:38 the only thing we cannot handle with this situation is the possibility of no fencing when it is pure cattle VMs 09:30:58 but maybe we don't need that 09:31:18 e.g. stateless HPC VMs 09:31:35 I just wanna ask about fencing questions :) 09:31:48 in that case fencing might not be mandatory, but I can't imagine why anyone would want to avoid it 09:33:16 maybe for cattle VMs, ignore the dead one and create a new one is enough? 09:33:27 elynn_: fencing details are in the host monitoring spec so you will see very soon :) 09:33:52 cool 09:33:55 elynn_: yes, ignore might be enough *technically*, but from a maintenance PoV fencing the node is probably cleaner 09:34:08 sometimes fencing will restart the node which also might fix the problem 09:34:13 e.g. if it was a kernel crash 09:35:17 How do you trigger the fencing? 09:35:22 Pacemaker does it 09:35:57 Hmm, good to know pacemaker can do that :) 09:36:45 yes that is one of the main reasons to use it 09:36:58 you might need to fence cattle VM, when nova show VM is not there or Error but, actually it it running(/partially) on the host. 09:37:43 ah ok, so in that case the fencing is to fix nova state, rather than just to fix the VM 09:38:00 There are some situations that you cannot delete the crashed VMs, and ops need to clear them manually 09:38:06 makes sense 09:38:29 samNTT: perhaps you can add details of that to my spec after I submit it? 09:38:39 aspiers: sure 09:38:42 I will submit after this meeting 09:38:58 cool 09:39:04 do we have anything else to discuss today? 09:40:04 I don't have one, looking forward to your spec :) 09:40:08 ok 09:40:19 aspiers: not form my side.. I will finish my sepcs and add reqied info to your sepcs 09:40:19 btw we are using a single "instance-ha" topic for the specs 09:40:29 samNTT: cool, thanks! 09:40:40 elynn_: https://review.openstack.org/#/q/status:open+project:openstack/openstack-resource-agents-specs+branch:master+topic:instance-ha 09:40:59 aspiers, got it 09:41:20 https://github.com/openstack/openstack-resource-agents-specs/tree/master/specs/newton/approved 09:41:24 aspiers: thanks for fixing the topic on the vm recovery sepc 09:41:32 sure :) 09:41:38 alright, I guess we are done 09:41:48 elynn_: thanks for attending! great to have someone from senlin team here :) 09:42:25 Thank you all for sharing your ideas. 09:42:43 elynn_: of course we are also on #openstack-ha 09:42:48 elynn_: so you can find us there any time 09:42:49 I will keep an eye on this topic ;) 09:42:57 in case you have more questions 09:43:10 elynn_: thank you for your time looking forward have more discussions 09:43:21 alright, thanks both and bye for now :) 09:43:40 bye.. have a nice week ahead.. 09:43:43 bye 09:43:46 #endmeeting