08:00:43 <rakhmerov> #startmeeting Mistral 08:00:44 <openstack> Meeting started Wed Sep 11 08:00:43 2019 UTC and is due to finish in 60 minutes. The chair is rakhmerov. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:00:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:00:46 <rakhmerov> hi all 08:00:47 <openstack> The meeting name has been set to 'mistral' 08:00:51 <vgvoleg> hi! 08:01:03 <rakhmerov> if there's anyone here for the meeting reveal yourself! ) 08:01:08 <rakhmerov> vgvoleg: hi 08:01:40 <rakhmerov> eyalb: ^ 08:02:16 <vgvoleg> I'd like to discuss something 08:02:36 <rakhmerov> guys, I have to step away urgently for 20-30 mins. Oleg, please start writing your topics/questions, I'll join later 08:02:48 <vgvoleg> First of all, I've found that notifier base is not moved to mistral_lib 08:03:08 <vgvoleg> so to write custom publisher we should import mistral 08:03:57 <vgvoleg> I guess that it is not OK and I'm going to move it 08:04:03 <vgvoleg> Nod if you agree :D 08:04:51 <rakhmerov> Nodding.. ) 08:04:57 <vgvoleg> Secondly, I've found that PUT operations in our api are not safe at all 08:05:06 <rakhmerov> ? 08:06:12 <vgvoleg> We can break anything if we send multiple put requests to one execution 08:06:22 <vgvoleg> no check mechanisms 08:06:32 <vgvoleg> no locks 08:07:48 <vgvoleg> we can send contradictory commands to engine at the same time 08:07:55 <vgvoleg> it is not ok 08:08:29 <vgvoleg> And, IMO, it should be fixed on the api side 08:08:50 <vgvoleg> But I don't know how to do it nice 08:09:33 <vgvoleg> And the third topic I'd like to discuss it our ERROR state 08:10:17 <vgvoleg> in our state machine, SUCCESS is truly terminal, we can't do anything with execution if it was completed successfully 08:10:40 <vgvoleg> but ERROR is not truly terminal - we can rerun it, for example 08:11:44 <vgvoleg> and I think that it is a gap, that we don't have truly terminal state to indicate error 08:12:44 <vgvoleg> Something that moves ERROR execution to read-only 08:13:21 <vgvoleg> Something that means "OK, we don't care that it is failed and we are not going to do anything with it" 08:14:40 <vgvoleg> I think we can use CANCELLED state for it, but current implementation does not support this transaction 08:15:24 <vgvoleg> maybe we should add any additional state for this 08:15:34 <openstackgerrit> yatin proposed openstack/mistral master: moved generic util functions from mistral to mistral-lib https://review.opendev.org/676373 08:15:44 <vgvoleg> I dont know tbh 08:16:03 <vgvoleg> So I'll be glad to listen your opinion 08:22:06 <rakhmerov> I'm here 08:22:29 <rakhmerov> reading.. 08:22:51 <rakhmerov> vgvoleg: on your 2nd thing, can you given an example? 08:24:31 <vgvoleg> rakhmerov: sure, we can send two PUT requests to /v2/executions, the first request should pause execution, the second one should cancel it 08:24:32 <rakhmerov> on #3 I don't see why it is a real problem. It's all kind of relative, terminal or non-terminal. ERROR is really considered terminal from perspective of the running workflow 08:24:54 <rakhmerov> rerun mechanism is not a regular part of the execution process 08:25:13 <rakhmerov> vgvoleg: so what can happen? 08:25:16 <vgvoleg> and API will send two commands to the engine 08:25:22 <vgvoleg> which is not ok 08:25:22 <rakhmerov> so? 08:25:29 <rakhmerov> what bad is going to happen? 08:26:04 <rakhmerov> what's going to break? 08:26:48 <vgvoleg> it's a dice roll 08:26:55 <rakhmerov> why? 08:27:08 <rakhmerov> Oleg, it is something that a user can legally do 08:27:16 <rakhmerov> it's out of our control 08:27:25 <rakhmerov> yes, they can do it virtually simulteneously 08:27:34 <rakhmerov> but what will be broken in Mistral? 08:28:02 <vgvoleg> ok, I dont have the concrete example right now :D 08:28:27 <rakhmerov> if the CANCEL request comes first it will win, the PAUSE request will fail because it'll see that the execution is not in a proper state 08:29:01 <rakhmerov> if we'll have an opposite order I don't see any issues as well 08:29:19 <rakhmerov> as far as I remember we can legally cancel workflows that are in PAUSE state 08:29:39 <vgvoleg> ok ok ok I'll find a bug for sure 08:30:06 <vgvoleg> about #3 08:30:24 <rakhmerov> remember that in both cases it will be one DB TX 08:30:48 <vgvoleg> we should strictly separate terminal states and not-terminal states 08:30:57 <rakhmerov> it will either fail w/o changing anything in DB or will succeed and not let the other one make changes 08:31:15 <rakhmerov> vgvoleg: what does "strictly" mean here? 08:31:18 <vgvoleg> since we have rerun mechanism, ERROR is not terminal state 08:31:42 <rakhmerov> it is a terminal state from perspective of the certain scenario 08:31:44 <vgvoleg> so we can't work with it like it is a read-only 08:31:54 <vgvoleg> we can't cache it and so on 08:32:06 <rakhmerov> ok, what's the practical task you're trying to solve? ) 08:32:27 <rakhmerov> we can but we need to do a better job when caching 08:32:31 <rakhmerov> invalidating etc. 08:33:04 <vgvoleg> I want to have a state that means ERROR, but will be read only 08:33:45 <rakhmerov> what will be the difference? 08:33:50 <rakhmerov> from the regular ERROR? 08:33:51 <vgvoleg> so that I’m sure that it will not change 08:33:54 <rakhmerov> I don't understand 08:34:07 <vgvoleg> the current ERROR execution could be changed 08:34:18 <rakhmerov> what's the point? Why can't we rerun a workflow in that state too? 08:34:55 <vgvoleg> because it is read only 08:35:07 <rakhmerov> I mean logically what will be the difference? 08:35:08 <vgvoleg> it is a terminal state, that means it will be not changed 08:35:30 <rakhmerov> how are we going to explain user why we can rerun one ERROR state and can't rerun another kind of ERROR state? 08:36:19 <vgvoleg> so there will be a transition (something like) TEMPORARY ERROR -> ERROR 08:36:44 <vgvoleg> and this will be human-initiated operation 08:37:04 <rakhmerov> nope 08:37:22 <rakhmerov> I fail to understand this.. 08:37:32 <vgvoleg> Renat right now we call ERROR state as a terminal 08:37:44 <vgvoleg> but it is not terminal state 08:37:51 <rakhmerov> so all workflows have to be moved to that state only if a human says to do so? 08:38:07 <vgvoleg> so I want to have the FINAL_ERROR state 08:38:10 <rakhmerov> Oleg, again: in a certain scenario it is a terminal state 08:38:32 <rakhmerov> caching is a completely different problem 08:38:43 <rakhmerov> it's an implementation issue that we need to solve 08:38:44 <vgvoleg> caching was my stupid example 08:38:54 <rakhmerov> w/o letting users know about it 08:39:02 <vgvoleg> of use cases of read only objects 08:39:27 <vgvoleg> I told about external caching 08:39:30 <vgvoleg> not in Mistral 08:39:37 <rakhmerov> I don't see any point in having one more state for ERROR, really. What would be an explanation for users? 08:40:20 <rakhmerov> imagine someone coming here and asking "Guys, why did you add one more state? I lived just fine w/o it." 08:40:29 <rakhmerov> what are you going to answer? 08:40:38 <vgvoleg> Because we want to be honest with our clients 08:40:43 <vgvoleg> ERROR is not terminal 08:40:45 <rakhmerov> "The new state is terminal, the old one is not" ? 08:40:49 <vgvoleg> yes 08:40:52 <vgvoleg> :) 08:41:08 <rakhmerov> vgvoleg: Oleg, lots of clients don't care it all whether something is terminal or not :) 08:41:28 <vgvoleg> because right now we say that ERROR is a terminal state 08:41:48 <rakhmerov> in a certain (most common) scenario it's 100% true 08:42:35 <vgvoleg> yes 08:42:36 <vgvoleg> sure 08:42:52 <rakhmerov> let's not bother users with such mathematical kid of terms at all. It's not what they care about 08:43:11 <vgvoleg> until people find out that mistral has an amazing feature like rerun 08:43:46 <rakhmerov> ERROR is a truly terminal state meaning that a system doesn't have an automatic algorithm that can change this state to something else 08:43:59 <rakhmerov> only a human reasonably can 08:44:34 <rakhmerov> but in this case they know what they are doing and they don't care if it's not considered globally terminal anymore 08:44:44 <rakhmerov> because it's they decision to change it 08:44:59 <rakhmerov> but if we're talking about automatic processing it's 100% terminal 08:45:28 <vgvoleg> while we say that this is a terminal state, users can assume that such a state does not change, they can build their logic around this 08:46:59 <rakhmerov> again: they make a decision to change this state themselves :) 08:47:14 <rakhmerov> they know that it can change 08:47:17 <vgvoleg> ok I got what you mean 08:47:18 <rakhmerov> but only if they want to 08:47:41 <rakhmerov> automatically it can never ever change to something else 08:49:54 <vgvoleg> probably I'll return with this discussion when there will be more people :D 08:50:01 <vgvoleg> that's all, thank you! 08:51:30 <rakhmerov> ok :) 08:51:39 <boxiang> rakhmerov: https://review.opendev.org/#/c/680858/ can not fix my issue. 08:55:04 <rakhmerov> ok, let's wrap for now 08:55:08 <rakhmerov> #endmeeting