04:02:20 <samP> #startmeeting masakari 04:02:21 <openstack> Meeting started Tue Dec 13 04:02:20 2016 UTC and is due to finish in 60 minutes. The chair is samP. Information about MeetBot at http://wiki.debian.org/MeetBot. 04:02:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 04:02:24 <openstack> The meeting name has been set to 'masakari' 04:02:30 <samP> Hi all 04:03:02 <samP> Thank you all for attending to our first masakari IRC 04:03:52 <samP> First I would like to make quick intro from every one 04:04:03 <tpatil> I think we should add weekly agenda on masakari wiki page 04:04:12 <tpatil> #link https://wiki.openstack.org/wiki/Masakari 04:04:32 <abhishekk> #link https://wiki.openstack.org/wiki/Meetings/Masakari#Agenda_for_next_meeting 04:04:37 <samP> can we use the rollcall? 04:05:23 <samP> tpatil: sure, I will do that 04:05:43 <samP> #action samP add add weekly agenda on masakari wiki page 04:05:50 <abhishekk> samP: I have added some points for todays discussion 04:06:00 <samP> abhishekk: thank you 04:08:22 <abhishekk> shall we start with the discussion? 04:08:30 <samP> sure, 04:09:15 <tpatil> Let's start the discussion as per agenda 04:09:27 <samP> we dont have opens bugs right? can we stat jump in to next? 04:09:32 <abhishekk> yes 04:09:36 <samP> sure 04:10:03 <tpatil> Only one issue is open 04:10:26 <samP> #topic Discussion new features 04:10:34 <tpatil> in fact there are 3 open issues but none is critical 04:11:20 <samP> tpatil: OK, lets do it after this discussion 04:11:35 <tpatil> samP: ok 04:11:36 <samP> First, evacuate_all config option 04:12:38 <samP> now we dont evacuate VMs without HA flag, this option enable/disable the evacuation of all VMs 04:12:49 <abhishekk> yes, we are using this option for host_failure flow 04:13:18 <abhishekk> if this option is True then we should evacuate all the instances else only ha_enabled instances should be evacuated 04:13:34 <tpatil> #link https://review.openstack.org/#/c/407538/ 04:13:55 <abhishekk> IMO we should rename this option so that we can use for instance_failure flow as well 04:14:38 <samP> abhishekk: agree, we have kind of same issue there 04:15:02 <abhishekk> as of now in instance_failure we are only processing HA-Enabled instances 04:15:14 <samP> how about rescue_all? 04:15:59 <rkmrhj> rescue is other API name of nova. 04:16:12 <abhishekk> right, we can decide about the config name in internal disscussion 04:16:31 <samP> rkmrhj: ah,, thank you 04:17:48 <samP> In future, we are going to impliment custormize rescue patterns, 04:19:05 <samP> I think we need to define separate options for evacuate and instance_failure 04:19:27 <tpatil> samP: Let's add a new blueprint to describe about new feature 04:19:52 <samP> such as, evacuate for all but instance_failure only for HA enable VMS 04:19:57 <tpatil> also, we should add a litespec to describe how we are going to implement it 04:20:10 <samP> tpatil: sure 04:20:42 <samP> do we need a spec repo? or just document it someware? 04:20:54 <tpatil> repo is better 04:21:39 <samP> OK, in future, spec repo is more useful. I'll try to get one 04:21:51 <samP> #action create spec repo for masakari 04:23:09 <samP> are we going to create BP for "evacuate_all config option"? 04:23:41 <tpatil> I think blueprint should be enough for this change as design wise it's not a big change 04:24:02 <samP> tpatil: agree 04:25:00 <tpatil> can we move to the next item 04:25:11 <samP> any volunteer for that BP? 04:25:46 <abhishekk> I will do that 04:26:04 <samP> #action abhishekk crate evacuate_all config option BP 04:26:09 <samP> abhishekk: thanks 04:26:20 <samP> OK lets go to next item 04:26:25 <abhishekk> samP: no problem 04:26:46 <samP> item2: an we have one periodic task? 04:26:54 <abhishekk> Ok, I will explain about this 04:26:56 <abhishekk> Earlier we were planning to have two periodic tasks, 04:27:06 <abhishekk> process_error_notifications, for processing notifications which are in error state 04:27:13 <abhishekk> process_queued_notifications, for processing notifications which are in new state for long time due to ignored/missed by messaging server. 04:27:42 <abhishekk> but we can club this into one as in both the tasks we are going to execute the workflow again 04:27:54 <abhishekk> this way we can eliminate the duplicate code 04:29:05 <tpatil> the question is, we can process both these tasks in a single periodic task 04:29:56 <tpatil> the only difference is in case of notification status is new, if the periodic task fails to execute the workflow , should the status be set to "failed" or "error"? 04:30:45 <tpatil> abhishekk: can you please explain the status transition that takes place during processing notifications 04:30:53 <abhishekk> ok 04:31:04 <abhishekk> process_error_notifications: 04:31:14 <abhishekk> Error flow, error >> running >> error >> failed 04:31:22 <abhishekk> Success flow, error >> running >> finished 04:31:33 <abhishekk> for process_queued_notifications: 04:31:40 <abhishekk> Error flow, new >> running >> error 04:31:47 <abhishekk> Success flow, new >> running >> finished 04:32:35 <abhishekk> In case of secnond periodic task it we set status to error then that will again be taken for execution by process_error_notifications 04:32:51 <abhishekk> so we cab club this and have common flow like, 04:32:59 <abhishekk> Error flow, new/error >> running >> error >> failed 04:33:06 <abhishekk> Success flow, new/error >> running >> finished 04:33:31 <samP> Is there any flag to stop it at some point? 04:34:07 <tpatil> abhishekk: Let's add a litespec to explain all these possible cases 04:34:08 <abhishekk> no, these periodic tasks will run at regular interval 04:34:38 <abhishekk> ok 04:35:01 <samP> OK, lets discuss this further on the spec 04:35:28 <samP> abhishekk: can I assign this spec to you? 04:35:45 <abhishekk> samP: yes 04:36:20 <samP> #action abhishekk create spc for merge periodic tasks 04:36:25 <samP> abhishekk: thank you 04:36:34 <samP> shall we move to next then? 04:37:02 <samP> item3: configurable workflow 04:37:37 <abhishekk> this is a new requirement 04:37:45 <samP> is this configurable recovery patterns or smt else? 04:37:52 <abhishekk> yes 04:37:58 <samP> abhishekk: ok 04:38:16 <tpatil> samP: configurable recovery patterns 04:38:34 <samP> tpatil: thanks 04:38:43 <tpatil> I think Kajinami explained you the problems we are having in the current design 04:39:26 <samP> tpatil: actually, I couldn't. we gonna meet tomorrow 04:39:44 <tpatil> samP: Ok 04:40:14 <tpatil> Post that discussion, let's finalize on the new requirement before we go ahead and add a new blueprint for it 04:41:12 <samP> tpatil: sure, I will discuss with this on ML with kajinami 04:41:34 <samP> tpatil: we can have more details discuss on next meeting 04:41:54 <tpatil> samP: Sure 04:42:28 <abhishekk> samP: we have one more item for discussion 04:42:40 <samP> abhishekk: sure 04:42:44 <abhishekk> Dinesh_Bhor will explain you about that 04:42:58 <Dinesh_Bhor> ok I have a question that whether the workflow should be executed synchronously or asynchronously? 04:43:31 <samP> specific work flow or all of them? 04:44:31 <abhishekk> Prticulary host_failure 04:44:31 <Dinesh_Bhor> The problem is we want to mark the used reserved_hosts after the execution of host failure workflow as reserved=False 04:45:17 <Dinesh_Bhor> For this we are passing that reserved_host_list dictionary to workflow for further execution. 04:45:21 <samP> ah.. got it 04:45:33 <Dinesh_Bhor> When the reserved_host is taken for evacuation, it is set to reserved_host['reserved'] = False. As the dictionary is mutable we get the updated dictionary after the execution of workflow. 04:45:59 <Dinesh_Bhor> After the execution of whole workflow we are looping over through the reserved_host_list in manager.py and if the reserved_host is marked as false then we are getting the related object and marking it as reserved=False. 04:46:25 <Dinesh_Bhor> The above solution is based on the assumption that we are executing the workflow synchronously. 04:47:01 <tpatil> In future, if some one wants to contribute another driver say Mistral then the workflow might execute asychronously and you might not get the results from the workflow execution in engine, right? 04:47:33 <Dinesh_Bhor> tpatil: yes, correct 04:47:33 <tpatil> where you will can db apis to update reserve_host flag to False 04:47:48 <tpatil> s/can/call 04:49:38 <tpatil> the current supported driver run on the local machine where engine is running. but in future any one can contribute new driver and we don't know whether it will return result or not. 04:49:39 <samP> as tpatil said, if some one bring other driver to call this workflow, we can not do this synchronously 04:50:31 <tpatil> so the main question is how to set reserve_host flag to False after the instances are evacuated from the failover segment. 04:51:55 <tpatil> let's discuss the design offline but one thing is sure we cannot assume workflow to return results 04:52:04 <tpatil> samP: Do you agree? 04:52:38 <samP> tpatil: yes, I am thing abt some kind of locking or intermediate state for it 04:53:19 <samP> tpatil: agree, shall we raise a spec for this? 04:53:29 <tpatil> samP: yes 04:53:38 <samP> tpatil: thanks 04:53:54 <samP> Dinesh_Bhor: may I assign this spec to you? 04:54:04 <Dinesh_Bhor> samP: yes 04:55:01 <samP> #action Dinesh_Bhor spec for synchronous/asynchronous work flows 04:56:06 <samP> any other discussion topics, if no lest move to #any_other_topics 04:56:45 <samP> #topic AOB 04:57:20 <samP> I will update masakari wiki with our release schedule. 04:58:30 <samP> Our initial plan, we had milestone b1 on 12/9 04:59:33 <samP> since we have new topics to discuss, I would like this to extend this 12/16 05:00:35 <rkmrhj> Sure. 05:00:44 <samP> ok then. 05:00:51 <tpatil> I think we should use LP milestone feature to figure out details of each milestone 05:01:07 <samP> tpatil: sure 05:01:15 <abhishekk> thank you all 05:01:29 <tpatil> samP: Thank you 05:01:29 <samP> OK then, its almost time 05:01:40 <Dinesh_Bhor> yes, thanks all 05:02:00 <samP> please use ML openstack-dev[masakari] for further discussions 05:02:10 <samP> Thank you all 05:02:13 <tpatil> Sure 05:02:15 <tpatil> bye 05:02:16 <samP> #endmeeting