13:01:14 #startmeeting senlin 13:01:15 Meeting started Tue Aug 29 13:01:14 2017 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:01:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:01:19 The meeting name has been set to 'senlin' 13:02:47 evening Qiming 13:03:31 evening 13:11:12 em.. Qiming. there is a question about how to recover/terminate the actions when the engines restarted 13:11:25 yes 13:12:31 if the engine which hold the cluster action went down, the node actions will be messed? 13:12:33 saw you comments 13:14:28 good question, I don't see a reliable way to recover the situation 13:14:49 unless we introduce transactions of actions 13:16:28 em, yes Qiming. But we may want the engine do not process the sub-actions if the cluster action been marked as cancel or failed 13:18:33 okay 13:20:49 Qiming, about the transactions, can you introduce the idea :) 13:21:22 one feature of transactions are that they can be rolled back 13:22:53 that would be very complicated 13:23:14 say for a CLUSTER_CREATE action 13:23:41 if it fails due to whatever reason (engine crash for example), we roll back the whole action 13:23:46 deleting all nodes created 13:24:05 that would be a super clean way for transactions 13:24:56 however, regarding the problem you want to solve right now, there could be a lightweight solution 13:26:02 CLUSTER_ actions always impose CLUSTER_SCOPE locks first, and later, node locks when necessary 13:26:11 that would be great if it could be solved step by step 13:26:40 NODE_ actions *may* impose NODE_SCOPE locks first, then later node locks 13:26:55 yes Qiming 13:27:37 without having those locks released, no other engines will be able to grab it for execution, even if we are marking them as READY 13:28:06 the key point here is when breaking locks, we have to do it in the reverse order 13:28:36 maybe that is something we can do at the moment 13:29:56 so, we can keep the cluster locks there before we finished GC work 13:30:14 yes 13:31:17 yes Qiming, that could be a workaround 13:31:30 great 13:32:16 but we may push the work load/code to db layer 13:32:36 we are now doing this in db.api 13:33:44 sorry, need look forward to it, I think that can solve the current problem 13:35:19 difference between "push to db layer" and "do this in db.api" ? 13:36:50 the real logic for breaking locks and erasing dependents are all in db layer 13:37:15 yes 13:38:30 so, we can now update the status of the action directly for example 13:39:10 or we can use the "control" field to control it before we process the actions 13:41:23 this is about another problem that we have dirty data :) 13:45:11 1. we reverse the GC process so that we can make sure the cluster will not be triggered again before we finishing GC 13:45:58 2. we may want to process the actions which status are RUNNING .etc which should not be process anymore 13:48:23 at the end of the day, no action should be left RUNNING 13:48:50 if the engine holding its lock was dead, we should mark them as failed 13:49:34 definitely, we don't have logic to grab a RUNNING action for execution, right? 13:50:23 but what if it's already been processed by one thread.. we do not check the nodes actions' status when processing 13:50:42 we check them all 13:51:06 we get "first" READY or "random" READY actions for execution, not RUNNING ones 13:52:46 yes 13:54:05 I'm gonna cut final release for pike late tomorrow 13:54:29 please help confirm all important patches are already reviewed and merged 13:54:47 sure Qiming 13:54:50 note, this is only about stable/pike branch 13:54:52 not master 13:54:53 OK 13:54:59 will think more about this problem 13:55:03 okay 13:55:51 any other issues I'm missing? 13:56:52 Two problems to discuss 13:57:48 1.Some MANO Client want to use senlin to storage VMs 13:58:18 XueFeng, we are running out of time 13:58:32 ok:) 13:58:45 you could have brought these up early in the meeting 13:58:57 Ok 13:59:08 or, you can post them to #senlin channel later 13:59:17 I entry in 13utc 13:59:35 thanks for joining, I guess you have long stories :) 13:59:38 but my network has some problem 13:59:40 #endmeeting