#openstack-meeting log

13:01:14 <Qiming> #startmeeting senlin
13:01:15 <openstack> Meeting started Tue Aug 29 13:01:14 2017 UTC and is due to finish in 60 minutes.  The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:01:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:01:19 <openstack> The meeting name has been set to 'senlin'
13:02:47 <ruijie_> evening Qiming
13:03:31 <Qiming> evening
13:11:12 <ruijie_> em.. Qiming. there is a question about how to recover/terminate the actions when the engines restarted
13:11:25 <Qiming> yes
13:12:31 <ruijie_> if the engine which hold the cluster action went down, the node actions will be messed?
13:12:33 <Qiming> saw you comments
13:14:28 <Qiming> good question, I don't see a reliable way to recover the situation
13:14:49 <Qiming> unless we introduce transactions of actions
13:16:28 <ruijie_> em, yes Qiming. But we may want the engine do not process the sub-actions if the cluster action been marked as cancel or failed
13:18:33 <Qiming> okay
13:20:49 <ruijie_> Qiming, about the transactions, can you introduce the idea :)
13:21:22 <Qiming> one feature of transactions are that they can be rolled back
13:22:53 <Qiming> that would be very complicated
13:23:14 <Qiming> say for a CLUSTER_CREATE action
13:23:41 <Qiming> if it fails due to whatever reason (engine crash for example), we roll back the whole action
13:23:46 <Qiming> deleting all nodes created
13:24:05 <Qiming> that would be a super clean way for transactions
13:24:56 <Qiming> however, regarding the problem you want to solve right now, there could be a lightweight solution
13:26:02 <Qiming> CLUSTER_ actions always impose CLUSTER_SCOPE locks first, and later, node locks when necessary
13:26:11 <ruijie_> that would be great if it could be solved step by step
13:26:40 <Qiming> NODE_ actions *may* impose NODE_SCOPE locks first, then later node locks
13:26:55 <ruijie_> yes Qiming
13:27:37 <Qiming> without having those locks released, no other engines will be able to grab it for execution, even if we are marking them as READY
13:28:06 <Qiming> the key point here is when breaking locks, we have to do it in the reverse order
13:28:36 <Qiming> maybe that is something we can do at the moment
13:29:56 <ruijie_> so, we can keep the cluster locks there before we finished GC work
13:30:14 <Qiming> yes
13:31:17 <ruijie_> yes Qiming, that could be a workaround
13:31:30 <Qiming> great
13:32:16 <ruijie_> but we may push the work load/code to db layer
13:32:36 <ruijie_> we are now doing this in db.api
13:33:44 <ruijie_> sorry, need look forward to it, I think that can solve the current problem
13:35:19 <Qiming> difference between "push to db layer" and "do this in db.api" ?
13:36:50 <ruijie_> the real logic for breaking locks and erasing dependents are all in db layer
13:37:15 <Qiming> yes
13:38:30 <ruijie_> so, we can now update the status of the action directly for example
13:39:10 <ruijie_> or we can use the "control" field to control it before we process the actions
13:41:23 <ruijie_> this is about another problem that we have dirty data :)
13:45:11 <ruijie_> 1. we reverse the GC process so that we can make sure the cluster will not be triggered again before we finishing GC
13:45:58 <ruijie_> 2. we may want to process the actions which status are RUNNING .etc which should not be process anymore
13:48:23 <Qiming> at the end of the day, no action should be left RUNNING
13:48:50 <Qiming> if the engine holding its lock was dead, we should mark them as failed
13:49:34 <Qiming> definitely, we don't have logic to grab a RUNNING action for execution, right?
13:50:23 <ruijie_> but what if it's already been processed by one thread.. we do not check the nodes actions' status when processing
13:50:42 <Qiming> we check them all
13:51:06 <Qiming> we get "first" READY or "random" READY actions for execution, not RUNNING ones
13:52:46 <XueFeng> yes
13:54:05 <Qiming> I'm gonna cut final release for pike late tomorrow
13:54:29 <Qiming> please help confirm all important patches are already reviewed and merged
13:54:47 <ruijie_> sure Qiming
13:54:50 <Qiming> note, this is only about stable/pike branch
13:54:52 <Qiming> not master
13:54:53 <XueFeng> OK
13:54:59 <ruijie_> will think more about this problem
13:55:03 <Qiming> okay
13:55:51 <Qiming> any other issues I'm missing?
13:56:52 <XueFeng> Two problems to discuss
13:57:48 <XueFeng> 1.Some MANO Client want to use senlin to storage VMs
13:58:18 <Qiming> XueFeng, we are running out of time
13:58:32 <XueFeng> ok:)
13:58:45 <Qiming> you could have brought these up early in the meeting
13:58:57 <XueFeng> Ok
13:59:08 <Qiming> or, you can post them to #senlin channel later
13:59:17 <XueFeng> I entry in 13utc
13:59:35 <Qiming> thanks for joining, I guess you have long stories :)
13:59:38 <XueFeng> but my network has some problem
13:59:40 <Qiming> #endmeeting