09:08:32 <aspiers> #startmeeting ha 09:08:32 <openstack> Meeting started Wed Dec 14 09:08:32 2016 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:08:33 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:08:35 <openstack> The meeting name has been set to 'ha' 09:08:37 <aspiers> alright then 09:08:55 <aspiers> just two of us today 09:09:01 <aspiers> I guess the day change confused people 09:09:05 <ddeja> maybe 09:09:17 <ddeja> but we see not many people from Barcelona 09:09:45 <aspiers> #topic specs 09:10:04 <aspiers> so I see sampath updated his VM recovery spec 09:10:33 <ddeja> that's good 09:10:42 <ddeja> today I finally have some time to review them 09:10:48 <aspiers> oh great 09:11:03 <ddeja> I don't know how it would go however 09:11:06 <aspiers> yeah, it is still missing the important part though 09:11:19 * ddeja spend 11 hours in train yesterday insted of 6... 09:11:24 <aspiers> ouch :-( 09:11:38 <aspiers> the most important part of these specs are the interface points 09:11:44 <ddeja> aspiers: agree 09:11:45 <aspiers> to ensure compatibility between the components 09:12:50 <ddeja> OK 09:13:10 <aspiers> so that bit should be covered for compute node monitoring recovery now 09:13:29 <aspiers> since my spec and your spec agree on the format of the message to pass 09:14:02 <aspiers> I have been thinking about the libvirt and nova-compute OCF RA specs 09:14:17 <ddeja> OK 09:14:29 <aspiers> in Barcelona I agreed with sampath that these should simply send failures to an external component 09:14:38 <aspiers> which decides what to do 09:14:56 <aspiers> but the challenge is how that decision should be handled 09:15:06 <aspiers> because Pacemaker needs to handle it 09:16:21 <ddeja> aspiers: BTW, I found this last week https://review.openstack.org/#/c/389103/2 09:17:05 <aspiers> ddeja: yes I talked to Michele and Andrew about that in Barcelona 09:17:11 <ddeja> oh 09:17:14 <ddeja> OK 09:17:21 <ddeja> can you give a little update? 09:17:35 <aspiers> they said don't worry, it's just covering what we are already doing, from a triple-o PoV 09:17:48 <aspiers> so not a new solution 09:18:08 <aspiers> just about getting triple-o to automatically set up instance HA 09:18:25 <ddeja> oh, OK 09:18:37 <ddeja> so it would be using resource agents? 09:19:24 <aspiers> yes I guess so 09:19:26 <aspiers> for now 09:19:34 <aspiers> or maybe mistral 09:19:39 <aspiers> I don't know 09:19:44 <ddeja> OK 09:19:46 <aspiers> really RH are driving that 09:20:19 <ddeja> but If it's not about implementing something new, then it is OK 09:20:42 <aspiers> yeah 09:20:46 <aspiers> well that's what they said 09:21:02 <aspiers> but I would suggest pinging them and asking about mistral 09:21:18 <ddeja> sure 09:21:22 <ddeja> thanks 09:21:40 <aspiers> we should also review their spec ... 09:22:02 <ddeja> I've just added myself as a reviewer 09:22:13 <ddeja> it seems like today is the review day 09:22:29 <aspiers> cool 09:22:41 <aspiers> I'll aim for that too 09:22:46 <ddeja> OK 09:23:12 <aspiers> last month I've been forced to focus much more on customers :( 09:23:31 <aspiers> but closer to Xmas hopefully it should get quiet leaving more time for this 09:24:10 <aspiers> if you have any ideas about the process monitoring specs then please let me know 09:24:21 <ddeja> good (for openstack community, maybe not for the customers) 09:24:25 <ddeja> aspiers: of course 09:24:25 <aspiers> hehe 09:24:39 <aspiers> I don't think it can work with HTTP messages like with host recovery 09:25:00 <aspiers> since Pacemaker has to initiate the monitoring and also receive the results, synchronously 09:25:42 <ddeja> aspiers: yes, but it can also then send HTTP message 09:26:24 <aspiers> the question is whether it should send HTTP message every time 09:26:31 <aspiers> and when it should attempt recovery 09:26:45 <aspiers> only based on HTTP response, or whenever it normally would recover? 09:27:14 <aspiers> in the latter case, the recovery workflow engine could perform additional recovery 09:27:21 <ddeja> aspiers: please remind me - process recovery is about restarting libvirt/nova compute? 09:27:30 <aspiers> that's the first part 09:27:39 <aspiers> but it could take stronger action after several failures 09:27:50 <aspiers> like service-disable 09:28:05 <ddeja> but, hm 09:28:19 <ddeja> all of those could be performed inside the resource agent I guess 09:28:39 <aspiers> yes but the idea was that the spec for monitoring should leave that totally flexible 09:28:47 <aspiers> so that each cloud can decide its own policy 09:29:07 <ddeja> OK 09:29:20 <aspiers> so I am not sure exactly how it should work 09:29:34 <ddeja> but hm, still I don't see a place where we should perform any http calls? 09:29:47 <ddeja> it still can be done from the resource agent IMO 09:30:02 <aspiers> I think maybe the RA should do normal process monitoring *and* recovery, but also send HTTP message when monitor fails and when starting/ending recovery 09:30:23 <aspiers> then external engine can do any additional steps 09:30:28 <aspiers> if necessary 09:30:39 <ddeja> that should work 09:30:53 <aspiers> beekhof also suggested that every RA stop also does service-disable, and start does service-enable 09:31:31 <aspiers> the problem is that the RA doesn't know if the stop is due to failed monitor or just a clean stop 09:31:38 <aspiers> maybe it can track that internally 09:31:57 <aspiers> well, I'll try to write the spec and in the process hopefully we can figure out the best option 09:32:04 <ddeja> OK 09:32:05 <aspiers> but if you have ideas meanwhile, please tell me 09:32:10 <ddeja> sure 09:32:11 <aspiers> thanks 09:32:20 <aspiers> OK, I have to go soon - anything else from you? 09:32:24 <aspiers> e.g. mistral updates? 09:32:44 <ddeja> not really 09:32:56 <aspiers> OK no problem 09:32:58 <ddeja> last week I was focused on presentation for meetup 09:33:09 <aspiers> oh yeah, I heard it went well 09:33:16 <ddeja> and in mistral I'm working mostly on gate 09:33:24 <aspiers> ok 09:33:38 <ddeja> aspiers: yes, it went OK 09:33:57 <ddeja> and the fact that demo failed at middle step didn't broke everything ;) 09:34:02 <aspiers> cool 09:34:04 <aspiers> lol 09:34:13 <aspiers> it's not a real demo if it works ;-) 09:34:31 <ddeja> Roman told me to get a printed chicken 09:34:37 <aspiers> and rip it up before? 09:34:41 <aspiers> I have done that in the past 09:34:41 <ddeja> yes 09:34:43 <aspiers> it works ;-) 09:34:45 <ddeja> but I didn't 09:34:48 <aspiers> well 09:34:52 <aspiers> so it is your fault ;-) 09:34:53 <ddeja> so that must be a case 09:34:56 <aspiers> haha 09:34:58 <aspiers> proof! 09:35:14 <ddeja> I'll do it next time 09:35:26 <aspiers> good ;-) 09:35:34 <ddeja> also, next time I'll be jumping through commits instead of live coding ;) 09:35:41 <aspiers> lol 09:35:49 <aspiers> wow, live coding - nice :) 09:35:51 <aspiers> I did that oncde 09:35:53 <aspiers> once 09:35:55 <aspiers> it was fun 09:36:08 <aspiers> BTW when you review my host monitoring spec please let me know what you think about the section structure 09:36:13 <ddeja> OK 09:36:15 <aspiers> it is different to the others 09:36:30 <aspiers> I need to update the template to match, so that tests pass 09:36:38 <aspiers> OK I have to go now 09:36:42 <ddeja> no problem 09:36:44 <aspiers> thanks a lot! 09:36:49 <ddeja> thank you 09:37:00 <aspiers> see you on #openstack-ha, bye :) 09:37:16 <aspiers> #endmeeting