13:01:14 <Qiming> #startmeeting senlin 13:01:15 <openstack> Meeting started Tue Aug 29 13:01:14 2017 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:01:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:01:19 <openstack> The meeting name has been set to 'senlin' 13:02:47 <ruijie_> evening Qiming 13:03:31 <Qiming> evening 13:11:12 <ruijie_> em.. Qiming. there is a question about how to recover/terminate the actions when the engines restarted 13:11:25 <Qiming> yes 13:12:31 <ruijie_> if the engine which hold the cluster action went down, the node actions will be messed? 13:12:33 <Qiming> saw you comments 13:14:28 <Qiming> good question, I don't see a reliable way to recover the situation 13:14:49 <Qiming> unless we introduce transactions of actions 13:16:28 <ruijie_> em, yes Qiming. But we may want the engine do not process the sub-actions if the cluster action been marked as cancel or failed 13:18:33 <Qiming> okay 13:20:49 <ruijie_> Qiming, about the transactions, can you introduce the idea :) 13:21:22 <Qiming> one feature of transactions are that they can be rolled back 13:22:53 <Qiming> that would be very complicated 13:23:14 <Qiming> say for a CLUSTER_CREATE action 13:23:41 <Qiming> if it fails due to whatever reason (engine crash for example), we roll back the whole action 13:23:46 <Qiming> deleting all nodes created 13:24:05 <Qiming> that would be a super clean way for transactions 13:24:56 <Qiming> however, regarding the problem you want to solve right now, there could be a lightweight solution 13:26:02 <Qiming> CLUSTER_ actions always impose CLUSTER_SCOPE locks first, and later, node locks when necessary 13:26:11 <ruijie_> that would be great if it could be solved step by step 13:26:40 <Qiming> NODE_ actions *may* impose NODE_SCOPE locks first, then later node locks 13:26:55 <ruijie_> yes Qiming 13:27:37 <Qiming> without having those locks released, no other engines will be able to grab it for execution, even if we are marking them as READY 13:28:06 <Qiming> the key point here is when breaking locks, we have to do it in the reverse order 13:28:36 <Qiming> maybe that is something we can do at the moment 13:29:56 <ruijie_> so, we can keep the cluster locks there before we finished GC work 13:30:14 <Qiming> yes 13:31:17 <ruijie_> yes Qiming, that could be a workaround 13:31:30 <Qiming> great 13:32:16 <ruijie_> but we may push the work load/code to db layer 13:32:36 <ruijie_> we are now doing this in db.api 13:33:44 <ruijie_> sorry, need look forward to it, I think that can solve the current problem 13:35:19 <Qiming> difference between "push to db layer" and "do this in db.api" ? 13:36:50 <ruijie_> the real logic for breaking locks and erasing dependents are all in db layer 13:37:15 <Qiming> yes 13:38:30 <ruijie_> so, we can now update the status of the action directly for example 13:39:10 <ruijie_> or we can use the "control" field to control it before we process the actions 13:41:23 <ruijie_> this is about another problem that we have dirty data :) 13:45:11 <ruijie_> 1. we reverse the GC process so that we can make sure the cluster will not be triggered again before we finishing GC 13:45:58 <ruijie_> 2. we may want to process the actions which status are RUNNING .etc which should not be process anymore 13:48:23 <Qiming> at the end of the day, no action should be left RUNNING 13:48:50 <Qiming> if the engine holding its lock was dead, we should mark them as failed 13:49:34 <Qiming> definitely, we don't have logic to grab a RUNNING action for execution, right? 13:50:23 <ruijie_> but what if it's already been processed by one thread.. we do not check the nodes actions' status when processing 13:50:42 <Qiming> we check them all 13:51:06 <Qiming> we get "first" READY or "random" READY actions for execution, not RUNNING ones 13:52:46 <XueFeng> yes 13:54:05 <Qiming> I'm gonna cut final release for pike late tomorrow 13:54:29 <Qiming> please help confirm all important patches are already reviewed and merged 13:54:47 <ruijie_> sure Qiming 13:54:50 <Qiming> note, this is only about stable/pike branch 13:54:52 <Qiming> not master 13:54:53 <XueFeng> OK 13:54:59 <ruijie_> will think more about this problem 13:55:03 <Qiming> okay 13:55:51 <Qiming> any other issues I'm missing? 13:56:52 <XueFeng> Two problems to discuss 13:57:48 <XueFeng> 1.Some MANO Client want to use senlin to storage VMs 13:58:18 <Qiming> XueFeng, we are running out of time 13:58:32 <XueFeng> ok:) 13:58:45 <Qiming> you could have brought these up early in the meeting 13:58:57 <XueFeng> Ok 13:59:08 <Qiming> or, you can post them to #senlin channel later 13:59:17 <XueFeng> I entry in 13utc 13:59:35 <Qiming> thanks for joining, I guess you have long stories :) 13:59:38 <XueFeng> but my network has some problem 13:59:40 <Qiming> #endmeeting