08:03:55 #startmeeting ha 08:03:56 Meeting started Mon Jun 27 08:03:55 2016 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:03:57 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:03:59 The meeting name has been set to 'ha' 08:04:15 so let's start, maybe people will join in a bit 08:04:18 hello 08:04:34 hi :) 08:04:34 o/ 08:04:40 #topic Current status (progress, issues, roadblocks, further plans) 08:04:49 alright, we haven't done any status reports for a while 08:04:54 I'll go first 08:05:11 I'm finishing up preparation of openstack-resource-agents-specs 08:05:30 Hi, sorry for the delay 08:05:30 there's a weird Sphinx problem building the docs, but I now know how to pin it down 08:05:35 samP: hi, np :) 08:05:49 so I am hoping to submit this today 08:05:59 at which point it should be easy for other people to submit specs for review 08:06:26 also, the long discussion with beekhof and kgaillot on the pacemaker-users list finally reached consensus :) 08:06:52 yeh, it took some time for me to follow up...lol 08:06:57 #info openstack-resource-agents-specs should be ready for submissions later today 08:07:16 samP, ddeja: not sure if you are following that thread? 08:07:47 aspiers: I've read it last week, but I didn't take an update since then 08:07:52 #info conclusion reached on how to stop nova scheduling VMs to failing nova-compute 08:07:55 I read the ML discussion 08:08:00 the conclusion was basically: 08:08:10 1. use force_host_down not service-disable 08:08:23 2. do it on every RA stop, and the opposite on every RA start 08:08:48 I think that's it :) 08:09:03 no Pacemaker enhancements required 08:09:03 aspiers: thanks for the update! 08:09:19 probably migration-threshold=1 and start-failure-is-fatal=False 08:09:38 ok, I think that's all from my side 08:09:54 samP: any news from your side? 08:10:04 aspiers: Thank you and I'm working on masakari to try that out. But I coulndt finish 08:10:17 oh ok, cool 08:10:37 I will give some details on ML or next meeting 08:10:40 nice 08:10:58 oh I remembered more things 08:11:20 as ddeja already saw, Intel is asking if we want to use the OSIC for testing implementations of the user story 08:12:08 I gave them a quick summary of where we are 08:12:11 and pointers to more info 08:12:28 but I think it's still a bit too early to test an upstream implementation :) 08:13:17 also I talked to the chairman of the OpenStack board and he has raised the idea of a dedicated HA track in Barcelona 08:13:24 aspiers: agree, but nice if we can use it to analyse the technical gaps, if possible 08:13:28 I don't know if it will happen, but at least people are thinking about it now 08:13:37 aspiers: cool 08:13:43 samP: true, although I think OSIC is probably more useful for scale testing 08:14:00 samP: but I'm not really sure. I guess we'll find out when they've looked at the info 08:14:38 aspiers: sure 08:14:53 ddeja: any news on the mistral side? 08:14:58 my status: 08:15:00 -Working on alternative RPC layer for mistral, so that we can have ACK then process pattern 08:15:02 -Lot of internal work last week 08:15:21 aspiers: RPC layer is working more or less ;) 08:15:28 :-O :-) 08:15:43 I'm hoping to have it merged in ~2 weeks 08:16:17 and I'm starting work on letting user decide in which mode he would like to process his message 08:16:33 that's it 08:16:35 remind me what the aim of that is? is it so that we can track which worker is handling a workflow? 08:16:49 aspiers: maybe in a long term 08:16:50 and then make sure the workflow is reliable? 08:17:07 right now is to make sure that if given task is idempotent 08:17:17 it would be done at some point 08:17:28 it is enaught for evacuate workflow 08:18:06 ok 08:18:39 so is it supporting retry of task on different worker? 08:18:51 yup 08:18:53 or something else, I can't remember the main point of doing it 08:19:11 yeah, it is basically re-sending a messagem in case worker dies 08:19:19 so another would do the job 08:19:26 wouldn't we need to also fence the worker? 08:19:44 not really 08:19:52 (in general, maybe not for idempotent tasks) 08:20:01 I mean if a task cannot be repeated safely 08:20:14 aspiers: your concer is right, but 08:20:23 in case wokrer dies 08:20:32 we don't know what he already did 08:20:59 ddeja: is that what we discussed abt state of the job? 08:21:04 may he send nova_boot to nova and died just before he let the mistral engine knows that the action succeded? 08:21:29 so in case action is not idempotent, we just cannot do it twice, we need to fail whole workflow 08:21:44 and thereofre, there is no need to fence the worker 08:21:45 hmm yeah, difficult 08:22:00 so how does ACK then process help? 08:22:11 aspiers: process then ACK 08:22:24 you said the opposite earlier ;-) 08:22:25 oh, I write it wrong... 08:22:28 lol 08:22:38 so, process then ACK helps 08:22:44 ok, process then ACK makes more sense ;-) 08:22:49 yup, sorry 08:23:01 samP: yes, that's the story 08:23:31 but if process fails and there is no ACK, how do you know whether to retry or fail the whole workflow? 08:24:09 soo 08:24:14 ddeja: BTW is there a spec for this? if so please just provide the URL so I can stop asking stupid questions :) 08:24:34 aspiers: there is no spec right now... 08:24:38 if there isn't a spec, maybe there should be? 08:24:52 presumably you already discussed this with Renat etc.? 08:24:55 aspiers: well, really it is not very complicated 08:25:07 let me explain 08:25:30 so: in mistral we would support both ACK then process and process then ACK 08:25:37 ah 08:25:48 for idempotent messages we use process then ACK 08:26:04 and for not idempotent ACK then process (model that is used now) 08:26:17 + timeout for such tasks 08:26:19 ok 08:27:13 and idempotency property is set as metadata on the workflow or task, or similar? 08:27:34 aspiers: it would be set per task 08:27:41 in a workflow definition 08:27:58 ok, makes sense 08:29:03 ddeja: are there any mailing list discussions on this which I might have missed? 08:29:59 aspiers: there was only discussion about adding such feature to oslo 08:30:09 and there is a blueprint 08:30:21 an oslo blueprint? 08:30:53 nope, oslo rejected it 08:31:13 https://blueprints.launchpad.net/mistral/+spec/mistral-alternative-rpc 08:31:16 thanks 08:31:23 blueprint for alternative rpc layer 08:31:54 http://thread.gmane.org/gmane.comp.cloud.openstack.devel/83394/focus=86142 08:32:00 that was the oslo discussion 08:32:22 yup, that's that 08:32:42 ddeja: so would it be a fair summary to say you are working on making mistral more reliable? 08:32:54 I guess that's the one-line description :) 08:32:58 yes :) 08:33:09 #info ddeja is working on making mistral more reliable 08:33:11 :) 08:33:12 gret :) 08:33:22 alright 08:33:29 any other topics to discuss? 08:33:55 I think we should use our ML more 08:34:03 openstack-dev? 08:34:07 yep 08:34:10 I agree 08:34:30 #action everyone should use openstack-dev more often for discussing HA topics 08:34:49 I have one question, I am using TripleO do deploy my openstack and in TripleO world there is someting called light HA 08:34:51 If you need to share or need attention our team, then just put it to ML 08:34:54 any idea what that could be ? 08:35:14 radek__: I've not heard of that 08:35:38 it was also new to me :) 08:35:43 ok thanks Adam 08:35:52 radek__: based on https://github.com/openstack/tripleo-heat-templates/blob/master/environments/puppet-pacemaker.yaml 08:36:02 I would guess that it is Red Hat's move from Pacemaker to systemd 08:36:11 as per http://blog.clusterlabs.org/blog/2016/next-openstack-ha-arch 08:36:16 but I could be totally wrong 08:36:21 you'd have to ask beekhof :) 08:36:25 ahhh maybe 08:36:58 want me to make him angry :) 08:37:11 haha 08:37:27 beekhof's always angry ;-) just kidding ;-) 08:37:45 anyway going to ask on TripleO meeting next time 08:37:49 good idea 08:38:04 #topic AOB (Any Other Business) 08:38:34 #info aspiers is going on holiday shortly, should be back Monday July 18th 08:38:58 please could someone volunteer to chair HA meetings for the next 2 weeks? 08:39:04 aspiers: sure 08:39:07 thanks! 08:39:14 BTW, have we put discussed time slot? 08:39:28 samP: I have collected all the input and need to compare it + suggest a new time 08:39:43 OK, thank you 08:40:41 alright, anything else? otherwise we can finish early 08:41:31 I'm done 08:41:44 nothing form my side, hv a nice week ahead... 08:41:54 ok thanks, you too! 08:41:58 bye for now :) 08:42:30 bye then... thank you all 08:42:39 #endmeeting