04:00:34 <samP> #startmeeting masakari
04:00:34 <openstack> Meeting started Tue May 23 04:00:34 2017 UTC and is due to finish in 60 minutes.  The chair is samP. Information about MeetBot at http://wiki.debian.org/MeetBot.
04:00:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
04:00:37 <openstack> The meeting name has been set to 'masakari'
04:00:48 <samP> Hi all..o/
04:00:50 <rkmrHonjo> hi
04:00:54 <Dinesh_Bhor> Hi all
04:00:56 <samP> Dinesh_Bhor: hi
04:00:58 <samP> rkmrHonjo: hi
04:01:02 <abhishek_k> o/
04:01:05 <samP> abhishek_k: hi
04:01:15 <abhishek_k> samP: hi
04:01:22 <samP> Lets start with bugs..
04:01:31 <samP> #topic Critical Bugs
04:01:33 <sagara> Hi
04:01:58 <abhishek_k> just to inform all Tushar san will not be able to join today's meeting
04:02:16 <samP> abhishek_k: thanks...
04:02:24 <Dinesh_Bhor> #link https://bugs.launchpad.net/masakari/+bug/1690768
04:02:25 <openstack> Launchpad bug 1690768 in masakari "Notification status will be "error" if recovered instance was "resized"." [High,New] - Assigned to Dinesh Bhor (dinesh-bhor)
04:02:38 <samP> Dinesh_Bhor: yes..
04:02:52 <samP> that is one of the bugs..
04:03:24 <samP> we have several reports with same topic
04:04:23 <samP> Which is, how to rescue VM with state= {'shelved', 'paused' , 'rescued', etc...}
04:04:58 <abhishek_k> samP: we can reset them to error state and then they will be evacuated
04:05:27 <abhishek_k> samP: but the problem is do we need to maintain it's state when it will be evacuated because it will be overhead
04:05:29 <samP> abhishek_k: sure, in order to evacuate we have to set the state=error
04:06:05 <samP> abhishek_k: That is my point, do we hvae to put it back to same state as before...?
04:06:31 <abhishek_k> samP: for example if instance is shelved then after evacuating it will be active on target host, then if we need to maintain previous state then we need to make additional call to shelve it again
04:06:41 <samP> #link https://bugs.launchpad.net/masakari/+bug/1690768/comments/5
04:06:42 <openstack> Launchpad bug 1690768 in masakari "Notification status will be "error" if recovered instance was "resized"." [High,New] - Assigned to Dinesh Bhor (dinesh-bhor)
04:07:51 <abhishek_k> samP: right
04:08:11 <samP> abhishek_k: in that case, what we want is not to start it after evacaute.
04:08:52 <abhishek_k> samP: IMO then for all such cases we can make additional call to stop that instance, is it sounds good?
04:08:56 <samP> However, with current evacuate, error instancess will be active after evacuate.
04:09:20 <abhishek_k> samP: right
04:09:59 <samP> abhishek_k: In that case, instance will be at active state till we make it stop via API, right?
04:10:11 <abhishek_k> samP: yesz
04:10:17 <abhishek_k> s/yesz/yes
04:10:54 <samP> for some apps, that may acceptable. But not all of the
04:11:01 <samP> s/the/them
04:12:11 <abhishek_k> samP: imo if we can set those instances to error state then stop them before evacuating
04:12:37 <samP> abhishek_k: can we use reset state API to stop them?
04:13:05 <abhishek_k> samP: with reset we can only set to error or active
04:13:45 <samP> abhishek_k: yes, same understanding here
04:14:15 <samP> so, how can we set it to stop before evacuate?..
04:14:45 <abhishek_k> samP: after reseting the instance to error state we can call stop api (need to check though)
04:15:23 <samP> I dont think simply sotp API is gonna work, because nova-compute is not there,...( agree: need to check)
04:16:01 <abhishek_k> samP: you are right
04:16:15 <samP> rkmrHonjo: You said you have some doc for Masakri recovery patterns?
04:16:53 <samP> abhishek_k: lets check it and comment to https://bugs.launchpad.net/masakari/+bug/1690768
04:16:54 <openstack> Launchpad bug 1690768 in masakari "Notification status will be "error" if recovered instance was "resized"." [High,New] - Assigned to Dinesh Bhor (dinesh-bhor)
04:17:00 <abhishek_k> samP: may be before disabling the compute node we can do this stuff (it requires rearranging current tasks)
04:17:04 <abhishek_k> samP: ok
04:17:53 <samP> abhishek_k: the probelm is compute-node is not there, disabling is db thing anb stop is need to done in virt
04:18:18 <samP> abhishek_k: anyway let check that...
04:18:20 <abhishek_k> samP: correct
04:19:10 <abhishek_k> samP: instance remains in powering-off state if compute node is not there
04:20:08 <samP> if Im remember correctly, rkmrHonjo told me that he have some doc about how masakari react to each state of VMs.
04:20:19 <samP> If rkmrHonjo can contrubute that do to comunity, that would be really nice..
04:20:26 <samP> abhishek_k: great..
04:20:44 <rkmrHonjo> samP: OK, I'll contribute it.
04:20:50 <samP> rkmrHonjo: thank you.
04:21:15 <Dinesh_Bhor> rkmrHonjo: thanks
04:21:23 <rkmrHonjo> samP: Can I put it in document directory of masakari repository?
04:21:30 <samP> abhishek_k: BTW, powering-off is the power state, right?
04:21:38 <abhishek_k> samP: yes
04:22:04 <abhishek_k> samP: vm state remains as it is
04:23:22 <Dinesh_Bhor> vm_state is previous state, task_state is powering-off, power_state is 1
04:25:35 <samP> rkmrHonjo: sure, you may put them under, doc/masakari_features/ or some suitable name
04:25:49 <samP> sorry for delay.. cut off from the net...;)
04:25:53 <rkmrHonjo> samP: thanks. I get it.
04:27:07 <samP> Dinesh_Bhor: abhishek_k , thanks... possiable states are state=error and tast_state=pwertin-off, power_state=1 ?
04:28:08 <abhishek_k> samP:need to check what is power_state = 1 means
04:28:28 <samP> I think we still have to check whether we can set the ^^above VMs state can set to STOP.
04:28:37 <samP> abhishek_k: sure, lets check them.
04:28:51 <abhishek_k> samP: yes
04:29:16 <samP> could some one take a quick check and update the https://bugs.launchpad.net/masakari/+bug/1690768?
04:29:17 <openstack> Launchpad bug 1690768 in masakari "Notification status will be "error" if recovered instance was "resized"." [High,New] - Assigned to Dinesh Bhor (dinesh-bhor)
04:30:54 <Dinesh_Bhor> I am working on this
04:31:04 <samP> Dinesh_Bhor: thank you...
04:31:11 <Dinesh_Bhor> will push patch soon
04:31:16 <rkmrHonjo> Dinesh_Bhor: Thanks.
04:31:23 <samP> Dinesh_Bhor: thanks...
04:31:54 <samP> OK then, are there other Bugs to discuss?
04:32:00 <Dinesh_Bhor> One more question: is the billing happens for 'shelved' instance?
04:32:59 <Dinesh_Bhor> I mean instance releases resources when in 'shelved' state so will user get charged for those resources?
04:33:18 <samP> Dinesh_Bhor: if( shelved_offload_time != 0) causes?
04:33:47 <Dinesh_Bhor> when shelved_offload_time = -1
04:34:11 <samP> Dinesh_Bhor: it depends, but normally user will not get charged for shelved VMs.
04:34:57 <abhishek_k> samP: hmm, then after evacaution if instance goes to active state then user will be charged in this case
04:35:12 <samP> abhishek_k: yes..
04:35:53 <samP> thasz why I prefer to error or (stop?) state after rescue
04:36:21 <abhishek_k> samP: even instance is stopped, resources will be consumed right?
04:36:33 <samP> abhishek_k: right
04:37:11 <samP> abhishek_k: but that depends on how you charge..
04:37:19 <abhishek_k> samP: ok
04:37:32 <Dinesh_Bhor> samP: ok, thanks
04:37:50 <abhishek_k> samP: let me see how we can tackle this
04:38:34 <samP> abhishek_k: sure, let's discuss further on https://bugs.launchpad.net/masakari/+bug/1690768
04:38:35 <openstack> Launchpad bug 1690768 in masakari "Notification status will be "error" if recovered instance was "resized"." [High,New] - Assigned to Dinesh Bhor (dinesh-bhor)
04:38:56 <samP> Any other Bugs?
04:39:36 <abhishek_k> samP: no
04:39:49 <samP> abhishek_k: thanks
04:40:01 <abhishek_k> samP: I have added on point but we can discuss that after PIKE work items discussion
04:40:22 <samP> abhishek_k: sure.. we will
04:40:33 <samP> #topic Discussion points
04:41:04 <samP> I have added some comments for recovery method customization
04:41:10 <samP> abhishek_k: please check them
04:41:30 <abhishek_k> samP: I have checked them, will push new specs soon
04:41:35 <abhishek_k> samP: I have added some points in the etherpad which will point out problems in adding mistral driver in masakari
04:41:35 <samP> abhishek_k: thanks
04:41:49 <abhishek_k> samP: https://etherpad.openstack.org/p/masakari-recovery-method-customization
04:41:57 <abhishek_k> #link https://etherpad.openstack.org/p/masakari-recovery-method-customization
04:43:09 <samP> abhishek_k: thanks...I will take a look and if it is possible lets put them in the spec.
04:43:24 <abhishek_k> if anyone has any suggestion for those or identifies anyother problem then please update it in the etherpad
04:43:33 <abhishek_k> samP: sure, I will
04:44:15 <samP> #action ALL review and put your comments for https://etherpad.openstack.org/p/masakari-recovery-method-customization
04:44:30 <samP> abhishek_k: thank you.
04:44:53 <samP> Update for Pike work Items:
04:45:42 <samP> we gonnd skip "Improve the masakari-hostmonitor's implementation about detecting split-brain", because no better idea than current one
04:46:08 <rkmrHonjo> samP: yes, I'd like to skip it. sorry.
04:46:45 <Dinesh_Bhor> ok,
04:46:54 <samP> rkmrHonjo: NP, lets skip it in pike
04:47:04 <rkmrHonjo> thanks.
04:47:10 <samP> Other than that, on much update on pike work items.
04:47:48 <samP> s/on much/no much/
04:48:00 <samP> Lets move to abhishek_k's item.
04:48:13 <abhishek_k> yes
04:48:15 <samP> #topic Host recovery flow, after evacuation instance is deleted then notification will be marked as failure
04:48:38 <samP> abhishek_k: could you please explain this a bit?
04:49:01 <abhishek_k> suppose we are having 1000 instances on host A which is failed
04:49:11 <abhishek_k> and we are evacuating those instances to host B
04:49:48 <abhishek_k> as per current implementation, evacuation task will evacuate all 1000 instances and then pass control to confirm evacuation task
04:51:08 <abhishek_k> so before confirming the evacuation is done if user deletes or performs any other opertation such as stop, shelved etc then it will fail to confirm that evacuation is done for that instance and it will mark that notification as error
04:52:23 <samP> abhishek_k: sould we confirm them in more atomic level? or lock the instaces?
04:53:10 <abhishek_k> samP: if we lock that instance then till evacuation confirmation user will not be able to perform any action
04:53:19 <samP> abhishek_k: correct
04:53:53 <abhishek_k> samP: in production there might be mre than 10000 of instances so locking will not be efficient
04:54:12 <abhishek_k> s/mre/more
04:54:17 <samP> abhishek_k: agree
04:54:48 <rkmrHonjo> or, should we change notification status to ignore if instance is changed to the status other than expected?
04:55:32 <rkmrHonjo> I afraid the race condition between masakari's operation and user's operation.
04:56:00 <abhishek_k> rkmrHonjo: there might be the case that while evacuating the instance goes to error state and if we set the notification to ignore then periodic task will not pick that notification for processing
04:56:37 <samP> abhishek_k: we can break down the confirm evacuateion to per VM and lock it till Masakari confirm ti.
04:56:54 <abhishek_k> one suggestion is can we cobmine evacuate and confirm evacuate task
04:57:20 <abhishek_k> samP: yes, I was saying same thing
04:57:25 <samP> abhishek_k: yep..
04:57:46 <abhishek_k> samP: we will identify pros and cons for this and let you know
04:58:20 <samP> abhishek_k: sure, please consider the race condition between masakari's operation and user's operation too
04:58:35 <samP> we are running out of time...
04:58:44 <abhishek_k> samP: yes
04:58:49 <rkmrHonjo> abhishek_k: thanks.
04:58:50 <samP> No update from other topicd
04:59:02 <samP> s/topicd/topics
04:59:10 <samP> #topic AOB
04:59:31 <abhishek_k> rkmrHonjo: no problem
04:59:50 <samP> Please bring other topics to #openstack-masakari or openstack-dev ML with [masakari]
04:59:57 <abhishek_k> yes
05:00:04 <samP> lets end the meeting...
05:00:12 <abhishek_k> thank you
05:00:12 <samP> thank you all ...
05:00:16 <samP> #endmeeting