04:00:25 <samP> #startmeeting masakari 04:00:26 <openstack> Meeting started Tue Jul 11 04:00:25 2017 UTC and is due to finish in 60 minutes. The chair is samP. Information about MeetBot at http://wiki.debian.org/MeetBot. 04:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 04:00:29 <openstack> The meeting name has been set to 'masakari' 04:00:36 <samP> hi all o/ 04:00:42 <Dinesh_Bhor> hi all 04:00:44 <abhishekk> o/ 04:00:48 <rkmrHonjo> hi 04:00:57 <samP> let's start.. 04:01:13 <samP> #topic Critical Bugs 04:01:20 <samP> any bugs to discuss? 04:01:51 <samP> BTW, review request is done.. 04:02:23 <rkmrHonjo> Thanks! 04:02:50 <samP> If no bugs to discuss, then let's move to discussion points. 04:03:03 <abhishekk> samP: ok 04:03:16 <rkmrHonjo> ok. 04:03:18 <samP> #topic Discussion points 04:03:41 <samP> 1. Make nova on_shared_storage configurable 04:04:34 <rkmrHonjo> We discussed about this topic last week. 04:05:02 <samP> ah.. 04:05:07 <Dinesh_Bhor> In last meeting it is decided to make "on_shared_storage" option configurable. Will submit patch soon. 04:05:22 <samP> Agenda is not uptodate then..:) 04:05:34 <samP> Dinesh_Bhor: rkmrHonjo thanks.. 04:06:09 <samP> sorry, I should hv update it.. 04:06:36 <rkmrHonjo> Dinesh_Bhor: Do you want to discuss about this now? 04:06:55 <Dinesh_Bhor> Just have one question related to this: will there be a situation like half of the instances are on shared-storage and half not in real deployment? 04:08:00 <tpatil> example: host aggregate, some host aggregate on shared storage and others on non shared storage 04:09:14 <samP> First, in order to use Masakari, instance must be on shared storage. 04:09:56 <samP> if instance in not in shared storage evacuate = rebuild and can not rescue it. 04:11:42 <tpatil> Then there is no point of making this option configurable 04:11:46 <samP> Can you pass on_shared_storage option through nova API? 04:11:57 <tpatil> Yes 04:12:06 <rkmrHonjo> samP: How do you think about user who uses boot-from-volume(and non shared storage)? 04:12:30 <tpatil> You can evacuate instance if it's not on shared storage 04:13:09 <samP> rkmrHonjo: in that case, on_shared_storage does not make any diff. 04:13:46 <samP> rkmrHonjo: because, it will be volume detach and re-attach after evacuate. right? 04:14:36 <tpatil> Dinesh : please update our findings about on shared storage parameter true and false 04:14:53 <rkmrHonjo> samP: I got it. 04:15:02 <Dinesh_Bhor> tpatil: yes 04:15:35 <Dinesh_Bhor> if on_shared_storage option isTrue and the instance files are not on shared storage actually then evacuate calls fails 04:15:43 <samP> I remember I fixed some thing related to this in past.. 04:15:48 <samP> #link https://review.openstack.org/#/c/320231/ 04:16:32 <Dinesh_Bhor> whereas if the on_shared_storage option is False then there is no issue 04:18:59 <samP> Dinesh_Bhor: true 04:19:27 <samP> #link https://developer.openstack.org/api-ref/compute/?expanded=evacuate-server-evacuate-action-detail 04:19:34 <samP> Please see ^^ 04:19:43 <samP> Starting since version 2.14, Nova automatically detects whether the server is on shared storage or not. 04:19:52 <samP> Therefore this parameter was removed. 04:20:31 <samP> Am I going in wrong direction? 04:20:55 <rkmrHonjo> samP: Should we bump nova API version over 2.14 in masakari? Current version is 2.9. 04:21:11 <rkmrHonjo> #link https://github.com/openstack/masakari/blob/master/masakari/compute/nova.py#L42 04:21:12 <tpatil> in which micro version this parameter was removed 04:22:44 <Dinesh_Bhor> #link https://github.com/openstack/nova/blob/master/releasenotes/notes/remove-on-shared-storage-flag-from-evacuate-api-76a3d58616479fe9.yaml 04:23:03 <Dinesh_Bhor> it is removed in microversion 2.14 04:23:39 <tpatil> What if the instances are booted from imaages? 04:24:48 <samP> tpatil: glance images? 04:25:29 <Dinesh_Bhor> yes 04:26:06 <samP> sorry, Still cant see the problem.. 04:27:26 <samP> for image booted instance, if ephemeral disk is in shared_storage then, it works in normal way, right? 04:28:01 <tpatil_> samP: you said the on_shared_storage parameter is removed, what if the instances are booted from images. THe data storage on the instances will be lost after evacuation, right? 04:28:56 <tpatil_> samP: if instances path is not using shared_storage 04:29:00 <samP> nova automatically detects whether it is in shared storage or not.. 04:29:09 <samP> tpatil_: in that case, yes 04:29:31 <tpatil_> samP: If it's not on shared storage, does it fail to evacuate if instance is booted from image 04:30:20 <samP> tpatil_: I have to check, but I think it will not fail. instance will still evacuated (= rebuild) 04:30:44 <tpatil_> samP: ok, do we want to allow masakari to evacuate instance in such cases 04:31:14 <Dinesh_Bhor> samP: I have checked this. It evacuates = rebuilds 04:31:21 <abhishekk> samP: i.e. if instance_path is not on shared storage 04:33:26 <samP> tpatil_: if operator define non shared storage cluster in masakari, then masakari will evacuate those instances.. 04:33:41 <samP> tpatil_: which operator should not do.. 04:34:44 <samP> if we can prevent that by set on_shared_storage option, then that would be good. 04:35:15 <tpatil_> samP: in future, we might need to use nova version 2.14 above so making this option configurable is not a good solution 04:35:15 <samP> However, my point is we can not control it after API v2.14 04:35:57 <tpatil_> samP: so let's not make on_shared_storage option configurable 04:36:12 <samP> tpatil_: agree. 04:36:31 <rkmrHonjo> tpatil_: +1 04:36:57 <samP> do we have bug report for this issue? 04:37:07 <tpatil_> samP: but the questions remains if instances are booted from image, then there will be data loss. I don't think it's acceptable to the users 04:37:46 <tpatil_> samP: I think we can document saying use masakari only if the instance path are on shared_storage 04:38:18 <samP> tpatil_: understand.. but instances booted from image on non shared storage... how do we rescue it when compute node no longer there.. 04:39:41 <tpatil_> samP: I understand that we don't have any control, but I just wanted to point from users perspective that there will be data loss 04:39:56 <samP> tpatil_: agree. we should put this in README.rst 04:40:07 <tpatil_> samP: sure 04:41:45 <samP> tpatil_: that is a good point.. all the presentations we did in the past, this was an unspoken agreement.. 04:42:24 <rkmrHonjo> tpatil: Will you modify masakari codes? I think that we should bump nova API version to 2.14 and remove on_shared_stroage parameter from nova.py. 04:43:08 <tpatil_> rkmrHonjo: Sure, we will submit a patch soon 04:43:19 <abhishekk> 2. Instance gets auto-confirmed(uses new flavor) if masakari evacuates an instance which was partially resized(resize-confirm is not performed) 04:43:22 <rkmrHonjo> tpatil: thanks. 04:43:23 <samP> tpatil_: thanks.. 04:43:35 <abhishekk> samP: regarding second point, I have sent mail to operators mailing list but haven't got any constructive feedback yet, #link http://lists.openstack.org/pipermail/openstack-operators/2017-July/013905.html 04:43:43 <samP> abhishekk: thanks.. 04:44:22 <samP> I asked some ops to take a look at this.. 04:44:28 <abhishekk> samP: I am also discussing same on operators IRC channel, but tow pepoples said they havent got this situation where they need to evacuate resized instance 04:44:34 <samP> currently only Saverio has replied .. 04:44:44 <abhishekk> s/tow/two 04:45:20 <samP> It is a very rare situation 04:45:57 <samP> But still critical to us.. 04:46:52 <abhishekk> samP: #link http://eavesdrop.openstack.org/irclogs/%23openstack-operators/%23openstack-operators.2017-07-10.log.html#t2017-07-10T08:37:13 04:47:01 <samP> On the other hand, most people do not use evacuate .. 04:47:23 <Dinesh_Bhor> In the meantime until nova makes changes to evacuate api to address this issue in masakari 04:47:24 <Dinesh_Bhor> From masakari side I have updated the patch which evacuates and stops the resized instance after evacuation on the basis of power_state of instance: https://review.openstack.org/#/c/469029/ 04:48:18 <tpatil_> samP: IMO, we should fix this issue in masakari first and keep following up with nova community to address this issue in nova 04:48:26 <samP> Dinesh_Bhor: Thanks 04:48:44 <samP> tpatil_: agree.. fix nova will take time 04:49:24 <tpatil_> samP: Ok, I will review Dinesh's patch taking this point into consideration 04:49:39 <samP> Let's review and merge this.. 04:49:43 <samP> tpatil_: thanks.. 04:49:48 <tpatil_> samP: Sure 04:50:16 <samP> 3. Remove ERROR instances from recovery targets when host failure happen 04:50:28 <rkmrHonjo> I want to add a configurable option. Error instances will be remove from recovery targets if the option is set. 04:50:33 <rkmrHonjo> Because some users don't want to launch error instances after recoverying. 04:50:40 <rkmrHonjo> Ofcourse there is a possibility that following patch resolve this issue, but that will take time. 04:50:47 <rkmrHonjo> #link https://review.openstack.org/#/c/469029/ 04:52:03 <samP> rkmrHonjo: do you want to remove only "ERROR" instances? 04:52:55 <rkmrHonjo> samP: Yes. But I think that there is another solution. Writing the rescuable statuses in masakari.conf. 04:53:17 <Dinesh_Bhor> With the above patch error instance will be stopped after evacuation 04:54:05 <Dinesh_Bhor> In the master code error instances will be evacuated and the final state will be active 04:54:27 <samP> Dinesh_Bhor: rkmrHonjo's proposal is not to evacaute error instance, which is slightly different from stop after evacuate, right? 04:54:44 <Dinesh_Bhor> yes 04:55:03 <rkmrHonjo> samP: yes. 04:55:35 <samP> rkmrHonjo: "rescuable statues" in config is much like Recovery method customization 04:55:40 <abhishekk> so even isntance which is in error state and marked as HA_Enabled True will be ignored in this case right? 04:55:51 <samP> abhishekk: right 04:56:24 <abhishekk> samP: thanls 04:56:28 <rkmrHonjo> abhishekk: yes, that is my wish. 04:56:33 <abhishekk> s/thanls/thanks 04:57:14 <abhishekk> 3 minutes left 04:57:16 <samP> I do not think it is a good idea to list down all rescuable statues in config, where we have spec for "Recovery method customization" 04:57:23 <samP> abhishekk: yep 04:57:36 <samP> Let's continue this discussion in ML 04:57:38 <abhishekk> I will fix comment given by rkmrHonjo on API specification patch 04:57:56 <rkmrHonjo> samP: OK. I'll send a mail. 04:58:03 <samP> I will send a mail with my thoughts.. 04:58:07 <rkmrHonjo> abhishekk: thanks a lot! I'll check it. 04:58:14 <samP> ah...ok 04:58:20 <samP> rkmrHonjo: please.. 04:58:24 <abhishekk> I have submitted updated specs for recovery method customization #link https://review.openstack.org/458023 04:58:31 <samP> abhishekk: thanks.. 04:58:43 <abhishekk> in this specs I have mentioned which actions we need to add in mistral 04:58:49 <abhishekk> please have a look at it 04:58:58 <samP> abhishekk: great.. I will review this.. 04:59:04 <samP> 1m left 04:59:06 <abhishekk> samP: thank you 04:59:34 <samP> please offload to #openstack-masakari or ML with [masakari] for further discussions.. 04:59:46 <samP> thank you all 04:59:51 <rkmrHonjo> thank you. 04:59:55 <samP> #endmeeting