08:03:55 <aspiers> #startmeeting ha 08:03:56 <openstack> Meeting started Mon Jun 27 08:03:55 2016 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:03:57 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:03:59 <openstack> The meeting name has been set to 'ha' 08:04:15 <aspiers> so let's start, maybe people will join in a bit 08:04:18 <ddeja> hello 08:04:34 <aspiers> hi :) 08:04:34 <rsjethani> o/ 08:04:40 <aspiers> #topic Current status (progress, issues, roadblocks, further plans) 08:04:49 <aspiers> alright, we haven't done any status reports for a while 08:04:54 <aspiers> I'll go first 08:05:11 <aspiers> I'm finishing up preparation of openstack-resource-agents-specs 08:05:30 <samP> Hi, sorry for the delay 08:05:30 <aspiers> there's a weird Sphinx problem building the docs, but I now know how to pin it down 08:05:35 <aspiers> samP: hi, np :) 08:05:49 <aspiers> so I am hoping to submit this today 08:05:59 <aspiers> at which point it should be easy for other people to submit specs for review 08:06:26 <aspiers> also, the long discussion with beekhof and kgaillot on the pacemaker-users list finally reached consensus :) 08:06:52 <samP> yeh, it took some time for me to follow up...lol 08:06:57 <aspiers> #info openstack-resource-agents-specs should be ready for submissions later today 08:07:16 <aspiers> samP, ddeja: not sure if you are following that thread? 08:07:47 <ddeja> aspiers: I've read it last week, but I didn't take an update since then 08:07:52 <aspiers> #info conclusion reached on how to stop nova scheduling VMs to failing nova-compute 08:07:55 <samP> I read the ML discussion 08:08:00 <aspiers> the conclusion was basically: 08:08:10 <aspiers> 1. use force_host_down not service-disable 08:08:23 <aspiers> 2. do it on every RA stop, and the opposite on every RA start 08:08:48 <aspiers> I think that's it :) 08:09:03 <aspiers> no Pacemaker enhancements required 08:09:03 <ddeja> aspiers: thanks for the update! 08:09:19 <aspiers> probably migration-threshold=1 and start-failure-is-fatal=False 08:09:38 <aspiers> ok, I think that's all from my side 08:09:54 <aspiers> samP: any news from your side? 08:10:04 <samP> aspiers: Thank you and I'm working on masakari to try that out. But I coulndt finish 08:10:17 <aspiers> oh ok, cool 08:10:37 <samP> I will give some details on ML or next meeting 08:10:40 <aspiers> nice 08:10:58 <aspiers> oh I remembered more things 08:11:20 <aspiers> as ddeja already saw, Intel is asking if we want to use the OSIC for testing implementations of the user story 08:12:08 <aspiers> I gave them a quick summary of where we are 08:12:11 <aspiers> and pointers to more info 08:12:28 <aspiers> but I think it's still a bit too early to test an upstream implementation :) 08:13:17 <aspiers> also I talked to the chairman of the OpenStack board and he has raised the idea of a dedicated HA track in Barcelona 08:13:24 <samP> aspiers: agree, but nice if we can use it to analyse the technical gaps, if possible 08:13:28 <aspiers> I don't know if it will happen, but at least people are thinking about it now 08:13:37 <ddeja> aspiers: cool 08:13:43 <aspiers> samP: true, although I think OSIC is probably more useful for scale testing 08:14:00 <aspiers> samP: but I'm not really sure. I guess we'll find out when they've looked at the info 08:14:38 <samP> aspiers: sure 08:14:53 <aspiers> ddeja: any news on the mistral side? 08:14:58 <ddeja> my status: 08:15:00 <ddeja> -Working on alternative RPC layer for mistral, so that we can have ACK then process pattern 08:15:02 <ddeja> -Lot of internal work last week 08:15:21 <ddeja> aspiers: RPC layer is working more or less ;) 08:15:28 <aspiers> :-O :-) 08:15:43 <ddeja> I'm hoping to have it merged in ~2 weeks 08:16:17 <ddeja> and I'm starting work on letting user decide in which mode he would like to process his message 08:16:33 <ddeja> that's it 08:16:35 <aspiers> remind me what the aim of that is? is it so that we can track which worker is handling a workflow? 08:16:49 <ddeja> aspiers: maybe in a long term 08:16:50 <aspiers> and then make sure the workflow is reliable? 08:17:07 <ddeja> right now is to make sure that if given task is idempotent 08:17:17 <ddeja> it would be done at some point 08:17:28 <ddeja> it is enaught for evacuate workflow 08:18:06 <aspiers> ok 08:18:39 <aspiers> so is it supporting retry of task on different worker? 08:18:51 <ddeja> yup 08:18:53 <aspiers> or something else, I can't remember the main point of doing it 08:19:11 <ddeja> yeah, it is basically re-sending a messagem in case worker dies 08:19:19 <ddeja> so another would do the job 08:19:26 <aspiers> wouldn't we need to also fence the worker? 08:19:44 <ddeja> not really 08:19:52 <aspiers> (in general, maybe not for idempotent tasks) 08:20:01 <aspiers> I mean if a task cannot be repeated safely 08:20:14 <ddeja> aspiers: your concer is right, but 08:20:23 <ddeja> in case wokrer dies 08:20:32 <ddeja> we don't know what he already did 08:20:59 <samP> ddeja: is that what we discussed abt state of the job? 08:21:04 <ddeja> may he send nova_boot to nova and died just before he let the mistral engine knows that the action succeded? 08:21:29 <ddeja> so in case action is not idempotent, we just cannot do it twice, we need to fail whole workflow 08:21:44 <ddeja> and thereofre, there is no need to fence the worker 08:21:45 <aspiers> hmm yeah, difficult 08:22:00 <aspiers> so how does ACK then process help? 08:22:11 <ddeja> aspiers: process then ACK 08:22:24 <aspiers> you said the opposite earlier ;-) 08:22:25 <ddeja> oh, I write it wrong... 08:22:28 <aspiers> lol 08:22:38 <ddeja> so, process then ACK helps 08:22:44 <aspiers> ok, process then ACK makes more sense ;-) 08:22:49 <ddeja> yup, sorry 08:23:01 <ddeja> samP: yes, that's the story 08:23:31 <aspiers> but if process fails and there is no ACK, how do you know whether to retry or fail the whole workflow? 08:24:09 <ddeja> soo 08:24:14 <aspiers> ddeja: BTW is there a spec for this? if so please just provide the URL so I can stop asking stupid questions :) 08:24:34 <ddeja> aspiers: there is no spec right now... 08:24:38 <aspiers> if there isn't a spec, maybe there should be? 08:24:52 <aspiers> presumably you already discussed this with Renat etc.? 08:24:55 <ddeja> aspiers: well, really it is not very complicated 08:25:07 <ddeja> let me explain 08:25:30 <ddeja> so: in mistral we would support both ACK then process and process then ACK 08:25:37 <aspiers> ah 08:25:48 <ddeja> for idempotent messages we use process then ACK 08:26:04 <ddeja> and for not idempotent ACK then process (model that is used now) 08:26:17 <ddeja> + timeout for such tasks 08:26:19 <aspiers> ok 08:27:13 <aspiers> and idempotency property is set as metadata on the workflow or task, or similar? 08:27:34 <ddeja> aspiers: it would be set per task 08:27:41 <ddeja> in a workflow definition 08:27:58 <aspiers> ok, makes sense 08:29:03 <aspiers> ddeja: are there any mailing list discussions on this which I might have missed? 08:29:59 <ddeja> aspiers: there was only discussion about adding such feature to oslo 08:30:09 <ddeja> and there is a blueprint 08:30:21 <aspiers> an oslo blueprint? 08:30:53 <ddeja> nope, oslo rejected it 08:31:13 <ddeja> https://blueprints.launchpad.net/mistral/+spec/mistral-alternative-rpc 08:31:16 <aspiers> thanks 08:31:23 <ddeja> blueprint for alternative rpc layer 08:31:54 <aspiers> http://thread.gmane.org/gmane.comp.cloud.openstack.devel/83394/focus=86142 08:32:00 <aspiers> that was the oslo discussion 08:32:22 <ddeja> yup, that's that 08:32:42 <aspiers> ddeja: so would it be a fair summary to say you are working on making mistral more reliable? 08:32:54 <aspiers> I guess that's the one-line description :) 08:32:58 <ddeja> yes :) 08:33:09 <aspiers> #info ddeja is working on making mistral more reliable 08:33:11 <aspiers> :) 08:33:12 <samP> gret :) 08:33:22 <aspiers> alright 08:33:29 <aspiers> any other topics to discuss? 08:33:55 <samP> I think we should use our ML more 08:34:03 <aspiers> openstack-dev? 08:34:07 <samP> yep 08:34:10 <aspiers> I agree 08:34:30 <aspiers> #action everyone should use openstack-dev more often for discussing HA topics 08:34:49 <radek__> I have one question, I am using TripleO do deploy my openstack and in TripleO world there is someting called light HA 08:34:51 <samP> If you need to share or need attention our team, then just put it to ML 08:34:54 <radek__> any idea what that could be ? 08:35:14 <aspiers> radek__: I've not heard of that 08:35:38 <radek__> it was also new to me :) 08:35:43 <radek__> ok thanks Adam 08:35:52 <aspiers> radek__: based on https://github.com/openstack/tripleo-heat-templates/blob/master/environments/puppet-pacemaker.yaml 08:36:02 <aspiers> I would guess that it is Red Hat's move from Pacemaker to systemd 08:36:11 <aspiers> as per http://blog.clusterlabs.org/blog/2016/next-openstack-ha-arch 08:36:16 <aspiers> but I could be totally wrong 08:36:21 <aspiers> you'd have to ask beekhof :) 08:36:25 <radek__> ahhh maybe 08:36:58 <radek__> want me to make him angry :) 08:37:11 <aspiers> haha 08:37:27 <aspiers> beekhof's always angry ;-) just kidding ;-) 08:37:45 <radek__> anyway going to ask on TripleO meeting next time 08:37:49 <aspiers> good idea 08:38:04 <aspiers> #topic AOB (Any Other Business) 08:38:34 <aspiers> #info aspiers is going on holiday shortly, should be back Monday July 18th 08:38:58 <aspiers> please could someone volunteer to chair HA meetings for the next 2 weeks? 08:39:04 <ddeja> aspiers: sure 08:39:07 <aspiers> thanks! 08:39:14 <samP> BTW, have we put discussed time slot? 08:39:28 <aspiers> samP: I have collected all the input and need to compare it + suggest a new time 08:39:43 <samP> OK, thank you 08:40:41 <aspiers> alright, anything else? otherwise we can finish early 08:41:31 <ddeja> I'm done 08:41:44 <samP> nothing form my side, hv a nice week ahead... 08:41:54 <aspiers> ok thanks, you too! 08:41:58 <aspiers> bye for now :) 08:42:30 <samP> bye then... thank you all 08:42:39 <aspiers> #endmeeting