04:02:54 <tpatil> #startmeeting Masakari
04:02:55 <openstack> Meeting started Tue May 19 04:02:54 2020 UTC and is due to finish in 60 minutes.  The chair is tpatil. Information about MeetBot at http://wiki.debian.org/MeetBot.
04:02:56 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
04:02:58 <tpatil> Hi All
04:02:59 <openstack> The meeting name has been set to 'masakari'
04:03:16 <tpatil> Sampath san should be joining any time
04:03:24 <tpatil> Roll call?
04:03:52 <suzhengwei> hi
04:03:58 <samP> Hi all, sorry Im late
04:04:04 <tpatil> suzhengwei: Hi
04:04:09 <tpatil> samP: Hi
04:04:16 <noonedeadpunk> o/
04:05:41 <samP> meeting started, right?
04:05:44 <tpatil> samP: Yes
04:05:55 <samP> tpatil: Thanks
04:06:14 <tpatil> samP: Please go ahead and start the topic discussions
04:06:19 <samP> sure
04:06:52 <samP> #topic Victoria work items
04:07:08 <samP> First let's take a look at what left for victoria
04:07:19 <tpatil> #link : https://etherpad.opendev.org/p/masakari-victoria-workitems
04:07:32 <tpatil> I have moved the work items from Ussuri to above etherpad
04:07:39 <samP> tpatil: thanks
04:08:31 <suzhengwei> nice
04:09:55 <suzhengwei> I have add "promotion for large scale host failure".
04:10:10 <samP> spec: Evacuate non-recovery (’HA_enabled = False’) instances in shutoff status at host failure except specified tenant
04:10:32 <noonedeadpunk> This looks https://review.opendev.org/#/c/714615/1 pretty usefull one
04:11:03 <samP> spec lgtm. I think we can review and merge the code soon.
04:11:33 <samP> noonedeadpunk: thanks, that is the code for above spec. right?
04:11:40 <noonedeadpunk> yep
04:12:17 <tpatil> samP: I will review the spec and code in this week
04:12:48 <samP> suzhengwei: Thanks for adding promotion for large scale host failure. Is this a feature or problem statement? Could please add more details on this.
04:14:24 <suzhengwei_> I will give a spec about it later.
04:14:43 <samP> suzhengwei_: Thanks..
04:14:57 <samP> (2) Modify masakari-hostmonitor in order to run it inside container
04:15:39 <tpatil> There is one finding about systemctl command that's used in corosync command as well
04:16:42 <noonedeadpunk> sorry for small offtopic, I think I just missed a feature, but how failed host goes to reserved state after recovery? by setting post in host_rh_failure_recovery_tasks?
04:18:45 <tpatil> noonedeadpunk: Are you asking how does recovery_method='reserved_host' works?
04:18:50 <hyunsikyang> kst
04:19:20 <noonedeadpunk> not really. just in spec there is "When a failed host goes back to system as a reserved host"
04:19:22 <samP> noonedeadpunk: your question is, how a failed host become a reserved host?
04:19:52 <noonedeadpunk> yep. as by default it do not become reserved one after recovery
04:21:36 <suzhengwei_> It should have finished recovery workflow and rejion the compute resource pool, then it can be update to reserved by api.
04:22:10 <noonedeadpunk> Ok, but it's not smth masakari does at the moment?
04:22:38 <samP> masakari dont have that feature now.
04:22:56 <suzhengwei_> reserved hosts is set by operator.
04:23:15 <noonedeadpunk> ah, ok I see. Just out of spec it seemed that it has one
04:23:24 <suzhengwei_> yes
04:23:36 <noonedeadpunk> Btw it might be pretty useful and not really hard to implement I think..
04:24:10 <tpatil> You are allowed to set on_maintenance and reserved of the failed host. Once the VMs are evacuated from the failed host. You can update host to change on_maintenance and reserved parameters.
04:24:46 <tpatil> This doesn't happen automatically, operator will need to do it manually.
04:25:00 <tpatil> Are you saying this procedure should be automated?
04:25:26 <noonedeadpunk> oh, I see your point now. Like as you need to update on_maintenance manually anyway it's useless to set it to reserved
04:25:33 <samP> It is a nice feature to have. However, before you bring back the failed host to cluster again, you have make sure it worn't break again.
04:26:04 <noonedeadpunk> yeah, ok, it's fair
04:26:22 <samP> In current operations. we leave that part to operator.
04:27:06 <tpatil> Moving back to run hostmonitor inside container
04:27:34 <samP> noonedeadpunk: if you have comments or questions, please feel free to add them to spec
04:27:38 <tpatil> corosync itself uses systemctl command. I'm not able to find the link of the source code.
04:27:38 <samP> tpatil: sure
04:27:49 <tpatil> I will add it to the etherpad after this meeting
04:28:01 <samP> tpatil: sure, thanks
04:28:34 <samP> we do not have any bug or gerrit code review related to this, right?
04:29:31 <tpatil> No
04:31:19 <samP> tpatil: OK thanks. Let's see how we can proceed this in next meetings.
04:31:30 <samP> About, Command parameter support to segment list command to filter out segments based on host input parameter
04:31:50 <tpatil> samP: Sure.I will post all details in etherpad in this week.
04:31:51 <samP> we discussed about this in past meetings and agree on how to proceed.
04:33:21 <samP> Enable/Disable evacuation segment wise
04:34:05 <samP> suzhengwei_: Thanks, and sorry for the review delay.
04:34:07 <tpatil> so we should add a new REST API Get /hosts?host=xyz which shouldn't include segment_id, is it correct?
04:34:16 <suzhengwei_> I would like to move this spec to victoria.
04:34:17 <samP> tpatil: correct.
04:34:28 <tpatil> samP: Ok
04:35:26 <suzhengwei_> Add victoria cycle spec firstly. https://review.opendev.org/#/c/723297/
04:35:36 <samP> suzhengwei_: sure, let's do the review and merge this on early Victoria. Then hopefully we can finish this feature in V
04:36:28 <samP> suzhengwei_: Thanks, I will merge this first..
04:36:43 <suzhengwei_> OK
04:36:58 <tpatil> I have one question, what value should be set to "state"?
04:37:15 <suzhengwei_> I want to split the implent commit for easy review.
04:37:36 <samP> suzhengwei_That would be really helpful.
04:37:39 <tpatil> IMO, it should be either ENABLED/DISABLED or ACTIVATED/DEACTIVATED, instead of False/True?
04:37:51 <suzhengwei_> we use 'enable' already. If use 'state', we would have a update problem.
04:38:19 <noonedeadpunk> imo boolean is better as a value
04:38:38 <noonedeadpunk> since it would have only 2 states
04:38:58 <suzhengwei_> yes, a boolean value.
04:39:46 <tpatil> sorry, it's enable not state.
04:40:30 <tpatil> should it be called as state/status and value can be as stated above
04:42:00 <noonedeadpunk> Like I really see no reason here to invent some naming for really boolean value. Also bool takes less space in db storage comparing to varchar :p
04:42:19 <suzhengwei_> this is not important conflict. But to me, if changed to 'state', my cloud have to change its api and db table.
04:43:16 <suzhengwei_> I would have to do some extra meaningless work.
04:43:50 <tpatil> enable should be changed to enabled atleast
04:44:05 <noonedeadpunk> +1
04:44:31 <tpatil> Anyway, I will post this comment on the spec and later you guys can comment on it.
04:45:51 <suzhengwei_> ok
04:47:24 <samP> both options are good for me. Let's discuss further on spec.
04:48:26 <samP> suzhengwei_: please add more details for "promotion for large scale host failure". So we can discuss this on up coming meetings.
04:48:43 <suzhengwei_> ok.
04:49:26 <samP> #topic List of patches waiting for Victoria
04:50:10 <samP> Please add any patches you need to get merge in V.
04:50:18 <tpatil> I have reviewed couple of patches and posted my comments
04:50:34 <noonedeadpunk> So I originaly joined meeting to talk about https://review.opendev.org/#/c/728629/
04:51:07 <tpatil> Today I have posted my comments. did you check these comments?
04:51:09 <noonedeadpunk> I read your comment tpatil just during the meeting
04:51:13 <tpatil> ok
04:51:17 <samP> noonedeadpunk: thanks for the patch
04:51:35 <suzhengwei_> I will review it later.
04:51:59 <noonedeadpunk> So the thing is, that hypervisors are used in the code only once and for checking hosts while adding them
04:52:20 <noonedeadpunk> And all futher operations are provided with compute api
04:53:41 <noonedeadpunk> So like even if ppl configure pacemaker to use hypervisor names and add them to masakari - masakari would just faile while completing action
04:54:11 <noonedeadpunk> As won't be able to find corresponding service to disable and evacuate
04:54:16 <tpatil> noonedeadpunk: I'm ok with your changes. just that, we need a migration command in masakari-manage to update the host from hypervisor_name to host
04:54:37 <tpatil> this is required for operators who was already using masakari in production env.
04:54:56 <noonedeadpunk> Yeah, I see, was just answering first comment:)
04:55:07 <noonedeadpunk> s/comment/question
04:55:36 <suzhengwei_> time is running out, let's be quick.
04:55:52 <tpatil> noonedeadpunk: So operators basically use nova.services.host when they add a node in pacemaker cluster, is it correct?
04:56:01 <noonedeadpunk> Just might need some help with writing migration thing
04:56:14 <noonedeadpunk> tpatil: they are supposed so at least
04:56:34 <noonedeadpunk> otherwise things do not work out of my experience
04:56:46 <samP> noonedeadpunk: sounds reasonable to me.  Let's discuss further on the gerrit.
04:56:53 <noonedeadpunk> ok
04:56:56 <tpatil> noonedeadpunk: Ok, I will help you to write a migration command in masakari-manage tool.
04:57:24 <noonedeadpunk> tpatil: like let me try doing it:) IF I got stuck I'll ping you?:)
04:57:38 <tpatil> noonedeadpunk: Sure
04:58:16 <samP> sorry I was disconnected..
04:58:24 <suzhengwei__> me, too.
04:59:08 <noonedeadpunk> So I originaly tried to not touch much code to make patch backportable
04:59:12 <noonedeadpunk> but yeah
04:59:22 <suzhengwei__> Samp: not wait when instance evacuate error, I have read your comment.
05:00:45 <samP> noonedeadpunk: i will review and add my comments on the patch. You will get the help you need for migtaion scripts
05:01:01 <samP> suzhengwei_ I think that it tpatil's comment, right?
05:01:07 <samP> Anyway no more time left.
05:01:15 <tpatil> suzhengwei_: yes
05:01:23 <suzhengwei__> sorry :)
05:01:35 <samP> Let's discuss further on ML and gerrit. Thank you all for joining for the meeting today
05:01:47 <samP> tpatil: could you please end the meeting.
05:02:03 <tpatil> suzhengwei_: NP, if you have any questions, I'm available on openstack-masakari IRC
05:02:09 <tpatil> samP: Sure
05:02:18 <tpatil> Thank you all for joining this meeting
05:02:26 <tpatil> Take care, Bye
05:02:31 <tpatil> #endmeeting