#openstack-meeting log

09:06:22 <aspiers> #startmeeting ha
09:06:22 <openstack> Meeting started Mon Nov 28 09:06:22 2016 UTC and is due to finish in 60 minutes.  The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:06:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:06:25 <openstack> The meeting name has been set to 'ha'
09:06:29 <aspiers> morning :)
09:06:40 <aspiers> samNTT: but it is not morning for you, right? :)
09:06:51 <samNTT> yes, it 6pm here
09:07:15 <aspiers> ok, so good evening ;-)
09:07:35 <samNTT> ;) good evening..
09:07:41 <aspiers> elynn_: good to have you here
09:07:59 <aspiers> #topic design specs
09:08:04 <elynn_> I'm here to listen your plans :)
09:08:07 <aspiers> so I have finally written a new spec
09:08:22 <aspiers> I meant to submit it yesterday but I will do that now
09:08:32 <aspiers> it's for host monitoring
09:09:27 <samNTT> ah.. host monitoring
09:09:37 <aspiers> and it is affected by the discussion I had with samNTT in Barcelona
09:09:59 <aspiers> line 135 of https://etherpad.openstack.org/p/newton-instance-ha
09:10:43 <elynn_> Using what for host monitoring?
09:10:49 <aspiers> elynn_: pacemaker_remote
09:10:56 <aspiers> elynn_: you will see the details very shortly
09:11:11 <aspiers> elynn_: it's basically describing what we are doing already, with a few minor changes
09:11:11 <elynn_> :D
09:11:43 <aspiers> samNTT: do you need to keep the workflow -1 on your review?
09:12:08 <aspiers> samNTT: it would be good to merge that
09:12:10 <samNTT> sorry, I will update it soon. I have add some more info to it
09:12:15 <aspiers> samNTT: oh ok
09:12:41 <samNTT> I will add them adn remove WF-1 tomorrow
09:12:56 <aspiers> cool thanks!
09:13:21 <aspiers> next I will do libvirt and NovaCompute OCF RA specs
09:13:40 <aspiers> I guess they can be separate specs even though they are very similar
09:14:41 <samNTT> libvirt specs is some what same as instance monitor in masakari?
09:14:43 <aspiers> in the host monitoring spec I have given justification for choosing pacemaker_remote
09:15:02 <aspiers> samNTT: no, it's just an RA for libvirtd
09:15:09 <samNTT> ha.. OK
09:15:32 <aspiers> samNTT: should be very simple process monitoring
09:15:41 <samNTT> so.. it is the process monitor for libvirtd
09:15:48 <samNTT> aspiers: ok
09:16:26 <aspiers> so my hope is that the justification will preemptively answer some questions about design which people may have
09:16:42 <aspiers> especially people who did not participate in discussions/work until now
09:17:10 <aspiers> elynn_: are you familiar with the proposed architecture?
09:17:36 <elynn_> No
09:18:05 <elynn_> Using pacemaker to monitor host and trigger mistral workflow when host failure?
09:18:18 <aspiers> right
09:19:12 <elynn_> Will this monitor agent monitor the VM itself?
09:19:33 <aspiers> no, that's a separate spec
09:19:48 <elynn_> ah.. okay
09:19:49 <aspiers> it will become clear soon I hope :)
09:20:39 <elynn_> I'm thinking if this monitor agent can notify senlin, I might need to see your spec for more details.
09:21:13 <aspiers> elynn_: the monitor agent for which? the host or processes or VMs?
09:21:30 <aspiers> elynn_: https://etherpad.openstack.org/p/newton-instance-ha lines 115+
09:21:38 <aspiers> elynn_: these are the specs we are working on
09:21:53 <elynn_> for the hosts
09:22:26 <aspiers> ok, yes that could certainly notify senlin
09:22:42 <elynn_> ok, I see
09:22:58 <aspiers> it will just send a JSON message to an endpoint
09:23:11 <samNTT> elynn_: if senlin can process custom notifications, then I think it could be possible
09:23:33 <elynn_> That's great.
09:23:54 <aspiers> might it be required for the monitor to send to multiple endpoints?
09:24:09 <aspiers> e.g. would we want to notify both senlin and masakari?
09:24:36 <elynn_> senlin could expose a web hook to receive notifications.
09:24:55 <elynn_> For for custom notifications, we might need more investigation
09:25:40 <samNTT> it depends, if you use senlin as recovery engine, then you dont need masakari. But if you need masakari to keep the log and handle the recovery with senline then you might need to send them to both or masakari can send it to senlin
09:26:23 <aspiers> that's an interesting idea - have masakari re-send the notification
09:26:24 <elynn_> aspiers, yes, that's a good idea. Is it hard to implemented?
09:26:34 <samNTT> Masakari can send it senlin as we do it to mistral. But it depends on how you wants to use it
09:26:36 <aspiers> elynn_: the notification is extremely simple
09:26:52 <elynn_> If masakari do  the recovery, I don't think senlin needs to do it again.
09:26:59 <aspiers> samNTT: I guess it should be a POST, right?
09:27:05 <samNTT> yes
09:27:19 <aspiers> I think ddeja forgot to mention the request type in his spec
09:27:24 <elynn_> But I'm think after the vm recovered, who will do  the software deployment on it?
09:27:34 <aspiers> elynn_: nova
09:27:47 <aspiers> elynn_: it just restarts it on another host
09:28:36 <elynn_> So we need a shared storage for nova, right?
09:29:35 <samNTT> elynn_: yes, otherwise you will not get your instance back
09:30:06 <samNTT> I mean you have to reconfiure it
09:30:27 <elynn_> that make sense.
09:30:38 <aspiers> the only thing we cannot handle with this situation is the possibility of no fencing when it is pure cattle VMs
09:30:58 <aspiers> but maybe we don't need that
09:31:18 <aspiers> e.g. stateless HPC VMs
09:31:35 <elynn_> I just wanna ask about fencing questions :)
09:31:48 <aspiers> in that case fencing might not be mandatory, but I can't imagine why anyone would want to avoid it
09:33:16 <elynn_> maybe for cattle VMs, ignore the dead one and create a new one is enough?
09:33:27 <aspiers> elynn_: fencing details are in the host monitoring spec so you will see very soon :)
09:33:52 <elynn_> cool
09:33:55 <aspiers> elynn_: yes, ignore might be enough *technically*, but from a maintenance PoV fencing the node is probably cleaner
09:34:08 <aspiers> sometimes fencing will restart the node which also might fix the problem
09:34:13 <aspiers> e.g. if it was a kernel crash
09:35:17 <elynn_> How do you trigger the fencing?
09:35:22 <aspiers> Pacemaker does it
09:35:57 <elynn_> Hmm, good to know pacemaker can do that :)
09:36:45 <aspiers> yes that is one of the main reasons to use it
09:36:58 <samNTT> you might need to fence cattle VM, when nova show VM is not there or Error but, actually it it running(/partially) on the host.
09:37:43 <aspiers> ah ok, so in that case the fencing is to fix nova state, rather than just to fix the VM
09:38:00 <samNTT> There are some situations that you cannot delete the crashed VMs, and ops need to clear them manually
09:38:06 <aspiers> makes sense
09:38:29 <aspiers> samNTT: perhaps you can add details of that to my spec after I submit it?
09:38:39 <samNTT> aspiers: sure
09:38:42 <aspiers> I will submit after this meeting
09:38:58 <aspiers> cool
09:39:04 <aspiers> do we have anything else to discuss today?
09:40:04 <elynn_> I don't have one, looking forward to your spec :)
09:40:08 <aspiers> ok
09:40:19 <samNTT> aspiers: not form my side.. I will finish my sepcs and add reqied info to your sepcs
09:40:19 <aspiers> btw we are using a single "instance-ha" topic for the specs
09:40:29 <aspiers> samNTT: cool, thanks!
09:40:40 <aspiers> elynn_: https://review.openstack.org/#/q/status:open+project:openstack/openstack-resource-agents-specs+branch:master+topic:instance-ha
09:40:59 <elynn_> aspiers, got it
09:41:20 <aspiers> https://github.com/openstack/openstack-resource-agents-specs/tree/master/specs/newton/approved
09:41:24 <samNTT> aspiers: thanks for fixing the topic on the vm recovery sepc
09:41:32 <aspiers> sure :)
09:41:38 <aspiers> alright, I guess we are done
09:41:48 <aspiers> elynn_: thanks for attending! great to have someone from senlin team here :)
09:42:25 <elynn_> Thank you all for sharing your ideas.
09:42:43 <aspiers> elynn_: of course we are also on #openstack-ha
09:42:48 <aspiers> elynn_: so you can find us there any time
09:42:49 <elynn_> I will keep an eye on this topic ;)
09:42:57 <aspiers> in case you have more questions
09:43:10 <samNTT> elynn_: thank you for your time looking forward have more discussions
09:43:21 <aspiers> alright, thanks both and bye for now :)
09:43:40 <samNTT> bye.. have a nice week ahead..
09:43:43 <elynn_> bye
09:43:46 <aspiers> #endmeeting