04:00:34 <samP> #startmeeting masakari 04:00:34 <openstack> Meeting started Tue May 23 04:00:34 2017 UTC and is due to finish in 60 minutes. The chair is samP. Information about MeetBot at http://wiki.debian.org/MeetBot. 04:00:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 04:00:37 <openstack> The meeting name has been set to 'masakari' 04:00:48 <samP> Hi all..o/ 04:00:50 <rkmrHonjo> hi 04:00:54 <Dinesh_Bhor> Hi all 04:00:56 <samP> Dinesh_Bhor: hi 04:00:58 <samP> rkmrHonjo: hi 04:01:02 <abhishek_k> o/ 04:01:05 <samP> abhishek_k: hi 04:01:15 <abhishek_k> samP: hi 04:01:22 <samP> Lets start with bugs.. 04:01:31 <samP> #topic Critical Bugs 04:01:33 <sagara> Hi 04:01:58 <abhishek_k> just to inform all Tushar san will not be able to join today's meeting 04:02:16 <samP> abhishek_k: thanks... 04:02:24 <Dinesh_Bhor> #link https://bugs.launchpad.net/masakari/+bug/1690768 04:02:25 <openstack> Launchpad bug 1690768 in masakari "Notification status will be "error" if recovered instance was "resized"." [High,New] - Assigned to Dinesh Bhor (dinesh-bhor) 04:02:38 <samP> Dinesh_Bhor: yes.. 04:02:52 <samP> that is one of the bugs.. 04:03:24 <samP> we have several reports with same topic 04:04:23 <samP> Which is, how to rescue VM with state= {'shelved', 'paused' , 'rescued', etc...} 04:04:58 <abhishek_k> samP: we can reset them to error state and then they will be evacuated 04:05:27 <abhishek_k> samP: but the problem is do we need to maintain it's state when it will be evacuated because it will be overhead 04:05:29 <samP> abhishek_k: sure, in order to evacuate we have to set the state=error 04:06:05 <samP> abhishek_k: That is my point, do we hvae to put it back to same state as before...? 04:06:31 <abhishek_k> samP: for example if instance is shelved then after evacuating it will be active on target host, then if we need to maintain previous state then we need to make additional call to shelve it again 04:06:41 <samP> #link https://bugs.launchpad.net/masakari/+bug/1690768/comments/5 04:06:42 <openstack> Launchpad bug 1690768 in masakari "Notification status will be "error" if recovered instance was "resized"." [High,New] - Assigned to Dinesh Bhor (dinesh-bhor) 04:07:51 <abhishek_k> samP: right 04:08:11 <samP> abhishek_k: in that case, what we want is not to start it after evacaute. 04:08:52 <abhishek_k> samP: IMO then for all such cases we can make additional call to stop that instance, is it sounds good? 04:08:56 <samP> However, with current evacuate, error instancess will be active after evacuate. 04:09:20 <abhishek_k> samP: right 04:09:59 <samP> abhishek_k: In that case, instance will be at active state till we make it stop via API, right? 04:10:11 <abhishek_k> samP: yesz 04:10:17 <abhishek_k> s/yesz/yes 04:10:54 <samP> for some apps, that may acceptable. But not all of the 04:11:01 <samP> s/the/them 04:12:11 <abhishek_k> samP: imo if we can set those instances to error state then stop them before evacuating 04:12:37 <samP> abhishek_k: can we use reset state API to stop them? 04:13:05 <abhishek_k> samP: with reset we can only set to error or active 04:13:45 <samP> abhishek_k: yes, same understanding here 04:14:15 <samP> so, how can we set it to stop before evacuate?.. 04:14:45 <abhishek_k> samP: after reseting the instance to error state we can call stop api (need to check though) 04:15:23 <samP> I dont think simply sotp API is gonna work, because nova-compute is not there,...( agree: need to check) 04:16:01 <abhishek_k> samP: you are right 04:16:15 <samP> rkmrHonjo: You said you have some doc for Masakri recovery patterns? 04:16:53 <samP> abhishek_k: lets check it and comment to https://bugs.launchpad.net/masakari/+bug/1690768 04:16:54 <openstack> Launchpad bug 1690768 in masakari "Notification status will be "error" if recovered instance was "resized"." [High,New] - Assigned to Dinesh Bhor (dinesh-bhor) 04:17:00 <abhishek_k> samP: may be before disabling the compute node we can do this stuff (it requires rearranging current tasks) 04:17:04 <abhishek_k> samP: ok 04:17:53 <samP> abhishek_k: the probelm is compute-node is not there, disabling is db thing anb stop is need to done in virt 04:18:18 <samP> abhishek_k: anyway let check that... 04:18:20 <abhishek_k> samP: correct 04:19:10 <abhishek_k> samP: instance remains in powering-off state if compute node is not there 04:20:08 <samP> if Im remember correctly, rkmrHonjo told me that he have some doc about how masakari react to each state of VMs. 04:20:19 <samP> If rkmrHonjo can contrubute that do to comunity, that would be really nice.. 04:20:26 <samP> abhishek_k: great.. 04:20:44 <rkmrHonjo> samP: OK, I'll contribute it. 04:20:50 <samP> rkmrHonjo: thank you. 04:21:15 <Dinesh_Bhor> rkmrHonjo: thanks 04:21:23 <rkmrHonjo> samP: Can I put it in document directory of masakari repository? 04:21:30 <samP> abhishek_k: BTW, powering-off is the power state, right? 04:21:38 <abhishek_k> samP: yes 04:22:04 <abhishek_k> samP: vm state remains as it is 04:23:22 <Dinesh_Bhor> vm_state is previous state, task_state is powering-off, power_state is 1 04:25:35 <samP> rkmrHonjo: sure, you may put them under, doc/masakari_features/ or some suitable name 04:25:49 <samP> sorry for delay.. cut off from the net...;) 04:25:53 <rkmrHonjo> samP: thanks. I get it. 04:27:07 <samP> Dinesh_Bhor: abhishek_k , thanks... possiable states are state=error and tast_state=pwertin-off, power_state=1 ? 04:28:08 <abhishek_k> samP:need to check what is power_state = 1 means 04:28:28 <samP> I think we still have to check whether we can set the ^^above VMs state can set to STOP. 04:28:37 <samP> abhishek_k: sure, lets check them. 04:28:51 <abhishek_k> samP: yes 04:29:16 <samP> could some one take a quick check and update the https://bugs.launchpad.net/masakari/+bug/1690768? 04:29:17 <openstack> Launchpad bug 1690768 in masakari "Notification status will be "error" if recovered instance was "resized"." [High,New] - Assigned to Dinesh Bhor (dinesh-bhor) 04:30:54 <Dinesh_Bhor> I am working on this 04:31:04 <samP> Dinesh_Bhor: thank you... 04:31:11 <Dinesh_Bhor> will push patch soon 04:31:16 <rkmrHonjo> Dinesh_Bhor: Thanks. 04:31:23 <samP> Dinesh_Bhor: thanks... 04:31:54 <samP> OK then, are there other Bugs to discuss? 04:32:00 <Dinesh_Bhor> One more question: is the billing happens for 'shelved' instance? 04:32:59 <Dinesh_Bhor> I mean instance releases resources when in 'shelved' state so will user get charged for those resources? 04:33:18 <samP> Dinesh_Bhor: if( shelved_offload_time != 0) causes? 04:33:47 <Dinesh_Bhor> when shelved_offload_time = -1 04:34:11 <samP> Dinesh_Bhor: it depends, but normally user will not get charged for shelved VMs. 04:34:57 <abhishek_k> samP: hmm, then after evacaution if instance goes to active state then user will be charged in this case 04:35:12 <samP> abhishek_k: yes.. 04:35:53 <samP> thasz why I prefer to error or (stop?) state after rescue 04:36:21 <abhishek_k> samP: even instance is stopped, resources will be consumed right? 04:36:33 <samP> abhishek_k: right 04:37:11 <samP> abhishek_k: but that depends on how you charge.. 04:37:19 <abhishek_k> samP: ok 04:37:32 <Dinesh_Bhor> samP: ok, thanks 04:37:50 <abhishek_k> samP: let me see how we can tackle this 04:38:34 <samP> abhishek_k: sure, let's discuss further on https://bugs.launchpad.net/masakari/+bug/1690768 04:38:35 <openstack> Launchpad bug 1690768 in masakari "Notification status will be "error" if recovered instance was "resized"." [High,New] - Assigned to Dinesh Bhor (dinesh-bhor) 04:38:56 <samP> Any other Bugs? 04:39:36 <abhishek_k> samP: no 04:39:49 <samP> abhishek_k: thanks 04:40:01 <abhishek_k> samP: I have added on point but we can discuss that after PIKE work items discussion 04:40:22 <samP> abhishek_k: sure.. we will 04:40:33 <samP> #topic Discussion points 04:41:04 <samP> I have added some comments for recovery method customization 04:41:10 <samP> abhishek_k: please check them 04:41:30 <abhishek_k> samP: I have checked them, will push new specs soon 04:41:35 <abhishek_k> samP: I have added some points in the etherpad which will point out problems in adding mistral driver in masakari 04:41:35 <samP> abhishek_k: thanks 04:41:49 <abhishek_k> samP: https://etherpad.openstack.org/p/masakari-recovery-method-customization 04:41:57 <abhishek_k> #link https://etherpad.openstack.org/p/masakari-recovery-method-customization 04:43:09 <samP> abhishek_k: thanks...I will take a look and if it is possible lets put them in the spec. 04:43:24 <abhishek_k> if anyone has any suggestion for those or identifies anyother problem then please update it in the etherpad 04:43:33 <abhishek_k> samP: sure, I will 04:44:15 <samP> #action ALL review and put your comments for https://etherpad.openstack.org/p/masakari-recovery-method-customization 04:44:30 <samP> abhishek_k: thank you. 04:44:53 <samP> Update for Pike work Items: 04:45:42 <samP> we gonnd skip "Improve the masakari-hostmonitor's implementation about detecting split-brain", because no better idea than current one 04:46:08 <rkmrHonjo> samP: yes, I'd like to skip it. sorry. 04:46:45 <Dinesh_Bhor> ok, 04:46:54 <samP> rkmrHonjo: NP, lets skip it in pike 04:47:04 <rkmrHonjo> thanks. 04:47:10 <samP> Other than that, on much update on pike work items. 04:47:48 <samP> s/on much/no much/ 04:48:00 <samP> Lets move to abhishek_k's item. 04:48:13 <abhishek_k> yes 04:48:15 <samP> #topic Host recovery flow, after evacuation instance is deleted then notification will be marked as failure 04:48:38 <samP> abhishek_k: could you please explain this a bit? 04:49:01 <abhishek_k> suppose we are having 1000 instances on host A which is failed 04:49:11 <abhishek_k> and we are evacuating those instances to host B 04:49:48 <abhishek_k> as per current implementation, evacuation task will evacuate all 1000 instances and then pass control to confirm evacuation task 04:51:08 <abhishek_k> so before confirming the evacuation is done if user deletes or performs any other opertation such as stop, shelved etc then it will fail to confirm that evacuation is done for that instance and it will mark that notification as error 04:52:23 <samP> abhishek_k: sould we confirm them in more atomic level? or lock the instaces? 04:53:10 <abhishek_k> samP: if we lock that instance then till evacuation confirmation user will not be able to perform any action 04:53:19 <samP> abhishek_k: correct 04:53:53 <abhishek_k> samP: in production there might be mre than 10000 of instances so locking will not be efficient 04:54:12 <abhishek_k> s/mre/more 04:54:17 <samP> abhishek_k: agree 04:54:48 <rkmrHonjo> or, should we change notification status to ignore if instance is changed to the status other than expected? 04:55:32 <rkmrHonjo> I afraid the race condition between masakari's operation and user's operation. 04:56:00 <abhishek_k> rkmrHonjo: there might be the case that while evacuating the instance goes to error state and if we set the notification to ignore then periodic task will not pick that notification for processing 04:56:37 <samP> abhishek_k: we can break down the confirm evacuateion to per VM and lock it till Masakari confirm ti. 04:56:54 <abhishek_k> one suggestion is can we cobmine evacuate and confirm evacuate task 04:57:20 <abhishek_k> samP: yes, I was saying same thing 04:57:25 <samP> abhishek_k: yep.. 04:57:46 <abhishek_k> samP: we will identify pros and cons for this and let you know 04:58:20 <samP> abhishek_k: sure, please consider the race condition between masakari's operation and user's operation too 04:58:35 <samP> we are running out of time... 04:58:44 <abhishek_k> samP: yes 04:58:49 <rkmrHonjo> abhishek_k: thanks. 04:58:50 <samP> No update from other topicd 04:59:02 <samP> s/topicd/topics 04:59:10 <samP> #topic AOB 04:59:31 <abhishek_k> rkmrHonjo: no problem 04:59:50 <samP> Please bring other topics to #openstack-masakari or openstack-dev ML with [masakari] 04:59:57 <abhishek_k> yes 05:00:04 <samP> lets end the meeting... 05:00:12 <abhishek_k> thank you 05:00:12 <samP> thank you all ... 05:00:16 <samP> #endmeeting