04:02:54 <tpatil> #startmeeting Masakari 04:02:55 <openstack> Meeting started Tue May 19 04:02:54 2020 UTC and is due to finish in 60 minutes. The chair is tpatil. Information about MeetBot at http://wiki.debian.org/MeetBot. 04:02:56 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 04:02:58 <tpatil> Hi All 04:02:59 <openstack> The meeting name has been set to 'masakari' 04:03:16 <tpatil> Sampath san should be joining any time 04:03:24 <tpatil> Roll call? 04:03:52 <suzhengwei> hi 04:03:58 <samP> Hi all, sorry Im late 04:04:04 <tpatil> suzhengwei: Hi 04:04:09 <tpatil> samP: Hi 04:04:16 <noonedeadpunk> o/ 04:05:41 <samP> meeting started, right? 04:05:44 <tpatil> samP: Yes 04:05:55 <samP> tpatil: Thanks 04:06:14 <tpatil> samP: Please go ahead and start the topic discussions 04:06:19 <samP> sure 04:06:52 <samP> #topic Victoria work items 04:07:08 <samP> First let's take a look at what left for victoria 04:07:19 <tpatil> #link : https://etherpad.opendev.org/p/masakari-victoria-workitems 04:07:32 <tpatil> I have moved the work items from Ussuri to above etherpad 04:07:39 <samP> tpatil: thanks 04:08:31 <suzhengwei> nice 04:09:55 <suzhengwei> I have add "promotion for large scale host failure". 04:10:10 <samP> spec: Evacuate non-recovery (’HA_enabled = False’) instances in shutoff status at host failure except specified tenant 04:10:32 <noonedeadpunk> This looks https://review.opendev.org/#/c/714615/1 pretty usefull one 04:11:03 <samP> spec lgtm. I think we can review and merge the code soon. 04:11:33 <samP> noonedeadpunk: thanks, that is the code for above spec. right? 04:11:40 <noonedeadpunk> yep 04:12:17 <tpatil> samP: I will review the spec and code in this week 04:12:48 <samP> suzhengwei: Thanks for adding promotion for large scale host failure. Is this a feature or problem statement? Could please add more details on this. 04:14:24 <suzhengwei_> I will give a spec about it later. 04:14:43 <samP> suzhengwei_: Thanks.. 04:14:57 <samP> (2) Modify masakari-hostmonitor in order to run it inside container 04:15:39 <tpatil> There is one finding about systemctl command that's used in corosync command as well 04:16:42 <noonedeadpunk> sorry for small offtopic, I think I just missed a feature, but how failed host goes to reserved state after recovery? by setting post in host_rh_failure_recovery_tasks? 04:18:45 <tpatil> noonedeadpunk: Are you asking how does recovery_method='reserved_host' works? 04:18:50 <hyunsikyang> kst 04:19:20 <noonedeadpunk> not really. just in spec there is "When a failed host goes back to system as a reserved host" 04:19:22 <samP> noonedeadpunk: your question is, how a failed host become a reserved host? 04:19:52 <noonedeadpunk> yep. as by default it do not become reserved one after recovery 04:21:36 <suzhengwei_> It should have finished recovery workflow and rejion the compute resource pool, then it can be update to reserved by api. 04:22:10 <noonedeadpunk> Ok, but it's not smth masakari does at the moment? 04:22:38 <samP> masakari dont have that feature now. 04:22:56 <suzhengwei_> reserved hosts is set by operator. 04:23:15 <noonedeadpunk> ah, ok I see. Just out of spec it seemed that it has one 04:23:24 <suzhengwei_> yes 04:23:36 <noonedeadpunk> Btw it might be pretty useful and not really hard to implement I think.. 04:24:10 <tpatil> You are allowed to set on_maintenance and reserved of the failed host. Once the VMs are evacuated from the failed host. You can update host to change on_maintenance and reserved parameters. 04:24:46 <tpatil> This doesn't happen automatically, operator will need to do it manually. 04:25:00 <tpatil> Are you saying this procedure should be automated? 04:25:26 <noonedeadpunk> oh, I see your point now. Like as you need to update on_maintenance manually anyway it's useless to set it to reserved 04:25:33 <samP> It is a nice feature to have. However, before you bring back the failed host to cluster again, you have make sure it worn't break again. 04:26:04 <noonedeadpunk> yeah, ok, it's fair 04:26:22 <samP> In current operations. we leave that part to operator. 04:27:06 <tpatil> Moving back to run hostmonitor inside container 04:27:34 <samP> noonedeadpunk: if you have comments or questions, please feel free to add them to spec 04:27:38 <tpatil> corosync itself uses systemctl command. I'm not able to find the link of the source code. 04:27:38 <samP> tpatil: sure 04:27:49 <tpatil> I will add it to the etherpad after this meeting 04:28:01 <samP> tpatil: sure, thanks 04:28:34 <samP> we do not have any bug or gerrit code review related to this, right? 04:29:31 <tpatil> No 04:31:19 <samP> tpatil: OK thanks. Let's see how we can proceed this in next meetings. 04:31:30 <samP> About, Command parameter support to segment list command to filter out segments based on host input parameter 04:31:50 <tpatil> samP: Sure.I will post all details in etherpad in this week. 04:31:51 <samP> we discussed about this in past meetings and agree on how to proceed. 04:33:21 <samP> Enable/Disable evacuation segment wise 04:34:05 <samP> suzhengwei_: Thanks, and sorry for the review delay. 04:34:07 <tpatil> so we should add a new REST API Get /hosts?host=xyz which shouldn't include segment_id, is it correct? 04:34:16 <suzhengwei_> I would like to move this spec to victoria. 04:34:17 <samP> tpatil: correct. 04:34:28 <tpatil> samP: Ok 04:35:26 <suzhengwei_> Add victoria cycle spec firstly. https://review.opendev.org/#/c/723297/ 04:35:36 <samP> suzhengwei_: sure, let's do the review and merge this on early Victoria. Then hopefully we can finish this feature in V 04:36:28 <samP> suzhengwei_: Thanks, I will merge this first.. 04:36:43 <suzhengwei_> OK 04:36:58 <tpatil> I have one question, what value should be set to "state"? 04:37:15 <suzhengwei_> I want to split the implent commit for easy review. 04:37:36 <samP> suzhengwei_That would be really helpful. 04:37:39 <tpatil> IMO, it should be either ENABLED/DISABLED or ACTIVATED/DEACTIVATED, instead of False/True? 04:37:51 <suzhengwei_> we use 'enable' already. If use 'state', we would have a update problem. 04:38:19 <noonedeadpunk> imo boolean is better as a value 04:38:38 <noonedeadpunk> since it would have only 2 states 04:38:58 <suzhengwei_> yes, a boolean value. 04:39:46 <tpatil> sorry, it's enable not state. 04:40:30 <tpatil> should it be called as state/status and value can be as stated above 04:42:00 <noonedeadpunk> Like I really see no reason here to invent some naming for really boolean value. Also bool takes less space in db storage comparing to varchar :p 04:42:19 <suzhengwei_> this is not important conflict. But to me, if changed to 'state', my cloud have to change its api and db table. 04:43:16 <suzhengwei_> I would have to do some extra meaningless work. 04:43:50 <tpatil> enable should be changed to enabled atleast 04:44:05 <noonedeadpunk> +1 04:44:31 <tpatil> Anyway, I will post this comment on the spec and later you guys can comment on it. 04:45:51 <suzhengwei_> ok 04:47:24 <samP> both options are good for me. Let's discuss further on spec. 04:48:26 <samP> suzhengwei_: please add more details for "promotion for large scale host failure". So we can discuss this on up coming meetings. 04:48:43 <suzhengwei_> ok. 04:49:26 <samP> #topic List of patches waiting for Victoria 04:50:10 <samP> Please add any patches you need to get merge in V. 04:50:18 <tpatil> I have reviewed couple of patches and posted my comments 04:50:34 <noonedeadpunk> So I originaly joined meeting to talk about https://review.opendev.org/#/c/728629/ 04:51:07 <tpatil> Today I have posted my comments. did you check these comments? 04:51:09 <noonedeadpunk> I read your comment tpatil just during the meeting 04:51:13 <tpatil> ok 04:51:17 <samP> noonedeadpunk: thanks for the patch 04:51:35 <suzhengwei_> I will review it later. 04:51:59 <noonedeadpunk> So the thing is, that hypervisors are used in the code only once and for checking hosts while adding them 04:52:20 <noonedeadpunk> And all futher operations are provided with compute api 04:53:41 <noonedeadpunk> So like even if ppl configure pacemaker to use hypervisor names and add them to masakari - masakari would just faile while completing action 04:54:11 <noonedeadpunk> As won't be able to find corresponding service to disable and evacuate 04:54:16 <tpatil> noonedeadpunk: I'm ok with your changes. just that, we need a migration command in masakari-manage to update the host from hypervisor_name to host 04:54:37 <tpatil> this is required for operators who was already using masakari in production env. 04:54:56 <noonedeadpunk> Yeah, I see, was just answering first comment:) 04:55:07 <noonedeadpunk> s/comment/question 04:55:36 <suzhengwei_> time is running out, let's be quick. 04:55:52 <tpatil> noonedeadpunk: So operators basically use nova.services.host when they add a node in pacemaker cluster, is it correct? 04:56:01 <noonedeadpunk> Just might need some help with writing migration thing 04:56:14 <noonedeadpunk> tpatil: they are supposed so at least 04:56:34 <noonedeadpunk> otherwise things do not work out of my experience 04:56:46 <samP> noonedeadpunk: sounds reasonable to me. Let's discuss further on the gerrit. 04:56:53 <noonedeadpunk> ok 04:56:56 <tpatil> noonedeadpunk: Ok, I will help you to write a migration command in masakari-manage tool. 04:57:24 <noonedeadpunk> tpatil: like let me try doing it:) IF I got stuck I'll ping you?:) 04:57:38 <tpatil> noonedeadpunk: Sure 04:58:16 <samP> sorry I was disconnected.. 04:58:24 <suzhengwei__> me, too. 04:59:08 <noonedeadpunk> So I originaly tried to not touch much code to make patch backportable 04:59:12 <noonedeadpunk> but yeah 04:59:22 <suzhengwei__> Samp: not wait when instance evacuate error, I have read your comment. 05:00:45 <samP> noonedeadpunk: i will review and add my comments on the patch. You will get the help you need for migtaion scripts 05:01:01 <samP> suzhengwei_ I think that it tpatil's comment, right? 05:01:07 <samP> Anyway no more time left. 05:01:15 <tpatil> suzhengwei_: yes 05:01:23 <suzhengwei__> sorry :) 05:01:35 <samP> Let's discuss further on ML and gerrit. Thank you all for joining for the meeting today 05:01:47 <samP> tpatil: could you please end the meeting. 05:02:03 <tpatil> suzhengwei_: NP, if you have any questions, I'm available on openstack-masakari IRC 05:02:09 <tpatil> samP: Sure 05:02:18 <tpatil> Thank you all for joining this meeting 05:02:26 <tpatil> Take care, Bye 05:02:31 <tpatil> #endmeeting