16:00:50 <rakhmerov> #startmeeting Mistral 16:00:51 <openstack> Meeting started Mon Jan 9 16:00:50 2017 UTC and is due to finish in 60 minutes. The chair is rakhmerov. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:52 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:55 <openstack> The meeting name has been set to 'mistral' 16:01:27 <rakhmerov> hi 16:01:32 <sharatss> hi 16:01:37 <d0ugal> Hello! 16:01:44 <rakhmerov> hello-hello ) 16:01:48 <ddeja> o/ 16:02:25 <rakhmerov> ok, let's begin 16:02:57 <rakhmerov> we haven't had meetings I guess at least for a month 16:03:46 <rakhmerov> so, essentially I would like to touchbase with you a little bit after long holidays 16:04:06 <rakhmerov> we didn't communicate about two weeks 16:04:52 <rakhmerov> #topic General syncup after long holidays 16:04:58 <thrash> o/ 16:05:02 <rakhmerov> hi 16:05:20 <rakhmerov> so, any news that you'd like to share? 16:05:25 <rakhmerov> anything? 16:05:30 <rakhmerov> d0ugal, ddeja, sharatss 16:05:42 <ddeja> I was focused on gate fixing 16:05:51 <d0ugal> Nothing much from me, with the holidays it has been very quiet. 16:05:51 <rakhmerov> ok 16:06:03 <sharatss> nothing from me too 16:06:09 <d0ugal> I do hope to spend some time with rbrady looking at the custom actions work 16:06:09 <rakhmerov> ddeja: how is it going with the gate? 16:06:11 <sharatss> became active only today 16:06:12 <ddeja> and for the last 3 weeks I'd like to focused on making kombu driver multi-thread support 16:06:26 <d0ugal> He put a patch up that needs reviews: https://review.openstack.org/#/c/411412/ 16:06:27 <ddeja> rakhmerov: despite the sshd_proxied action it is OK now 16:06:37 <ddeja> oh, yes, thanks d0ugal 16:06:40 <rakhmerov> ddeja: yes, it is important 16:07:18 <rbrady> hello 16:07:21 <d0ugal> rbrady: we are just doing a quick catchup and I mentioned that I plan to help you with custom actions 16:07:25 <rakhmerov> ddeja: ok, then we can probabaly disable this test for now and make the gate voting? 16:07:25 <d0ugal> rbrady: and I linked to your review. 16:07:26 <ddeja> d0ugal: oh, I was thinking it is a ling to another patch... 16:07:28 <rakhmerov> rbrady: hi 16:07:37 <rbrady> thanks d0ugal 16:07:41 <ddeja> rakhmerov: the regular one - yes 16:07:44 <d0ugal> ddeja: hah, I am curious which one :) 16:08:01 <rakhmerov> ddeja: I'm just afraid that if we make it voting it may give us huge troubles once in a while 16:08:17 <rakhmerov> if we are ready to take this risk then it's ok 16:08:19 <ddeja> rakhmerov: the kombu one, I'd like to wait untill end of this developement cycle 16:08:27 <ddeja> rakhmerov: HM 16:08:32 <rakhmerov> yes, that's understandable 16:08:35 <ddeja> I think maybe we can wait till Pike? 16:08:37 <d0ugal> rakhmerov: the risk isn't that big, we can always make it non-voting :) 16:08:38 <ddeja> with both 16:08:51 <rakhmerov> yes 16:08:56 <rakhmerov> I would go this path 16:09:08 <ddeja> and start new cycle with new gate 16:09:19 <rakhmerov> d0ugal: it's a pretty big patch 16:09:21 <ddeja> maybe I would be able to also fix the sshd_proxied test till then 16:09:30 <d0ugal> rakhmerov: Which one? 16:09:33 <rakhmerov> #action rakhmerov: review https://review.openstack.org/#/c/411412/ 16:09:42 <rakhmerov> d0ugal: https://review.openstack.org/#/c/411412/ 16:09:50 <rakhmerov> :) 16:09:51 <d0ugal> rakhmerov: yeah, that one. I think much of it is copied from Mistral but rbrady can explain it more. 16:09:58 <d0ugal> I have only reviewed it 16:10:39 <rakhmerov> ok 16:10:56 <rbrady> that patch is WIP, but more feedback and discussion would probably be good. I started splitting some files up that may or may not be a good thing 16:11:04 <rakhmerov> so I guess it's pointless now to open our regular topic "Current status" 16:11:12 <rakhmerov> it's pretty much clear who is doing what 16:11:25 <rakhmerov> rbrady: ok 16:11:57 <rakhmerov> I'd like to take a look at it first but if you can give some general tips that could help review it please do 16:12:30 <rakhmerov> ddeja: btw, Dawid, there's another 'rerun' test that fails sometimes 16:12:36 <rakhmerov> for direct workflow 16:12:39 <ddeja> rakhmerov: Oh, ok 16:12:42 <rakhmerov> yep 16:12:42 <ddeja> I'll take a look 16:12:50 <rbrady> the namespaces have been changed a bit from what was laid out in the spec. following more of a pythonic approach. 16:12:50 <rakhmerov> I saw it failing pretty often 16:13:09 <rakhmerov> rbrady: ok 16:13:25 <rakhmerov> btw, guys, do you already know if you're going to the PTG? 16:13:36 <rbrady> I am...booking hotel and flight today 16:14:00 * d0ugal isn't going 16:14:04 <rakhmerov> rbrady: planning to join our sessions? At least partially 16:14:18 <rbrady> yes 16:14:27 <rakhmerov> d0ugal: I see, it's bad :( 16:14:31 <rakhmerov> rbrady: ok 16:14:38 * ddeja don't know yet... 16:14:54 <rakhmerov> ddeja: it would be cool if you could do that 16:15:17 <d0ugal> rakhmerov: yeah, it is a shame. I shall try and add my input remotely if I can :) 16:15:41 <rbrady> d0ugal: maybe we can have you there via hangouts or something 16:16:02 <rakhmerov> as far as this actions stuff, it's definitely a topic for the PTG, I want to define a goal to figure out all principal things related to it 16:16:03 <d0ugal> rbrady: hah, that could be cool - or I can just read the notes and add questions for you all to answer ;) 16:16:12 * rbrady carries around virtual d0ugal at PTG 16:16:25 <rakhmerov> :) 16:17:03 <ddeja> http://www.conowego.pl/uploads/pics/DoubleRobot-660x440.jpg 16:17:20 <d0ugal> I think somebody used one of those at a previous summit... 16:17:28 <d0ugal> but now we are just getting off topic :-D 16:17:35 <rakhmerov> yeah, I saw those folks :) 16:17:45 <rakhmerov> it's pretty cool 16:18:21 <rakhmerov> in case you didn't see it in ML, this is an etherpad for the PTG plans: https://etherpad.openstack.org/p/mistral-ptg-pike 16:19:11 <rakhmerov> let's focus on gathering all challenges that we need to discuss in this etherpad 16:19:28 <ddeja> OK 16:20:34 <rakhmerov> ddeja: btw, could you please remind me something? Those periodic kombu gate failures are related exactly with the fact that our kombu RPC server is not thread-safe, right? 16:20:39 <rakhmerov> or you are not sure? 16:20:56 <rakhmerov> I remember we were discussing it but don't remember the conclusion 16:21:26 <ddeja> rakhmerov: yes, they are related to the fact that it is not thread safe - and here is the fix https://review.openstack.org/#/c/414533 16:21:57 <ddeja> (another gate failuer is due to sshd_proxied action...) 16:22:11 <rakhmerov> ooh, awesome 16:22:22 <rakhmerov> so do you think it's finished? Ready to be reviewed? 16:22:31 <ddeja> I think so 16:22:34 * rakhmerov needs to review so much.. 16:22:47 <ddeja> I spend 2 days looking into o.m code 16:22:53 <ddeja> and make my similar 16:22:59 <rakhmerov> :)) 16:23:01 <rakhmerov> I see 16:23:04 <rakhmerov> great 16:23:23 <tuan__> hi guys 16:23:25 <rakhmerov> d0ugal: how about your time thing? 16:23:32 <tuan__> hope not to disturb you 16:23:39 <rakhmerov> do you think you found the solution? 16:23:50 <rakhmerov> tuan__: chime in, np 16:23:56 <tuan__> :) 16:24:06 <d0ugal> rakhmerov: ddeja found an issue with it, I understand it now I think - so I just need to update it again. 16:24:10 <tuan__> okes, can i say something about our problem 16:24:22 <rakhmerov> d0ugal: ok 16:24:31 <rakhmerov> tuan__: sure, go ahead 16:24:35 <tuan__> cool 16:24:42 <tuan__> so, like this: 16:24:50 <ddeja> d0ugal, rakhmerov: yes, IMO old code was OK, just tests needs to be re-writen, but I may be wrong 16:24:59 <tuan__> when we have so sophisticating actions 16:25:20 <rakhmerov> ddeja: is there a patch for this already or you only found a reason? 16:25:34 <d0ugal> ddeja: I think the old code is okay, but I think it is confusing. I don't think we should be passing around local datetimes anywhere. 16:25:35 <tuan__> if they are failed, mistral will return a huge error with input parameters and the root cause of the action 16:25:36 <rakhmerov> tuan__: ok, like what? 16:26:06 <tuan__> the problem is that: With the very long error returned by mistral, operator is hard to understand the root cause 16:26:17 <tuan__> it is based on the requirement of ETSI in telco 16:26:33 <d0ugal> I've noticed this before. Tracking down where the problem is can be tricky 16:26:34 <ddeja> rakhmerov: there is a patch from d0ugal, it fixes it on his env (Scotland, so UTC0), but it brakes things on mine (Poland, so UTC+1) 16:26:42 <tuan__> stack trace must be human-readable 16:27:01 <rakhmerov> ddeja: :) 16:27:03 <rakhmerov> ok 16:27:09 <d0ugal> ddeja: the real question is why does it behave differently on my machine and CI :) 16:27:09 <tuan__> e.g.: mistral will return something like this: Action failed....blabla 16:27:21 <tuan__> and then after that the root cause of the action 16:27:31 <ddeja> rakhmerov, tuan__ I've also noticed it, sometimes even errors can be very misleading 16:27:44 <ddeja> and lead to another action by mistake 16:27:50 <rakhmerov> tuan__: quick question 16:27:56 <ddeja> I guess it's a perfect topic for PTG 16:27:56 <tuan__> go ahead 16:28:06 <tuan__> rakhmerov: i am lsitening 16:28:22 <tuan__> ddeja: That is what we have to deal with 16:28:28 <rakhmerov> ddeja: yes, moreover, we already discussed this with some folks from Nokia and others 16:28:29 <tuan__> to make it human-readable 16:28:49 <rakhmerov> tuan__: can you give a very specific example? 16:28:58 <tuan__> with a very sophisticating actions, the error returned quite long 16:28:58 <rakhmerov> what action, what kind of failure? 16:29:03 <rakhmerov> and what Mistral returns 16:29:17 <tuan__> well, it is the custom actio 16:29:17 <rakhmerov> sophisticated like what? 16:29:30 <rakhmerov> yep, just a second.. 16:29:43 <tuan__> i am so sorry that i could not tell more since it is the policy 16:29:57 <rakhmerov> the reason I'm asking is that actions are allowed to return a structured result even in case of a failure 16:30:31 <tuan__> you can imagine that, we want mistral to run action that deploy hundreds of vms for specific vnf 16:30:47 <tuan__> therefore the input parameter of mistral is quite long 16:30:53 <rakhmerov> tuan__: ok, np, I understand. Then I'd like to ask you to come up with an example which is to some extent similar to your real one 16:30:55 <tuan__> when this action fails 16:31:12 <rakhmerov> tuan__: not necessary now (we don't have too much time) 16:31:21 <tuan__> mistral returns the error of something: Failed action .... 16:31:43 <tuan__> then at the end of the error is the root cause, something like: Javascript fails 16:31:45 <d0ugal> I guess we need a bug report and a way to reproduce it 16:32:08 <rakhmerov> d0ugal: +1, that's what I'd like to achieve 16:32:10 <tuan__> but the operator has difficulty to find out the root cause of this, the line: Javascript fails 16:32:40 <tuan__> IMHO it is not a bug, it is just the hierachy of error 16:32:50 <rakhmerov> tuan__: yes, I see. Can you please file a ticket in our Launchpad? 16:32:52 <tuan__> or the return information from mistral 16:33:03 <rakhmerov> with as much info as possible 16:33:07 <tuan__> OK, i will try do it 16:33:11 <tuan__> thanks renat 16:33:11 <rakhmerov> yeah 16:33:46 <rakhmerov> please try to be very specific, like, for example, you say "Mistral returns.." 16:33:59 <rakhmerov> what do you mean by that? Returns where? 16:34:25 <rakhmerov> are you talking about logs or fields of some objects stored in DB 16:34:27 <rakhmerov> etc 16:35:24 <rakhmerov> the point I'm trying to make is that we might be able to solve this problem (at least partially) by writing actions themselves in a certain way 16:35:46 <rakhmerov> so that they return something more or less readable 16:35:53 <rakhmerov> even in case of an error 16:36:04 <rakhmerov> there's a protocol for that 16:36:15 <rakhmerov> maybe it's not documented well, then it's a different issue 16:36:29 <rakhmerov> that's why I'd like to see some example 16:37:03 <rakhmerov> tuan__: once you file a ticket please just join our IRC channel any time and we will discuss it 16:37:34 <tuan__> sure 16:37:40 <rakhmerov> ok 16:37:43 <tuan__> otherwise 16:38:01 <tuan__> https://bugs.launchpad.net/mistral/+bug/1624284 16:38:01 <openstack> Launchpad bug 1624284 in Mistral "MessagingTimeout when executing mistral actions" [Critical,In progress] - Assigned to Dawid Deja (dawid-deja-0) 16:38:13 <rakhmerov> yep 16:38:14 <tuan__> why this bug is partially 16:38:17 <tuan__> ? 16:38:20 <ddeja> tuan__: Mostly fixed 16:38:30 <tuan__> i also think that 16:38:30 <ddeja> there is one corncer case that is not fixed yet 16:38:46 <rakhmerov> yes, it's 99% fixed, just one very crazy corner case is left 16:39:03 <rakhmerov> :) 16:39:09 <ddeja> but unless you do something like: 'mistral run-action `mistral_run_action`' it works fine 16:39:20 <d0ugal> lol 16:39:29 <rakhmerov> ddeja: maybe it makes sense to close this bug and open another one which is more specific? 16:39:37 <rakhmerov> describing that specific corner case? 16:39:48 <ddeja> another words: the only thing that it is not fixed is when you use mistral run-action to start another action or workflow 16:39:53 <ddeja> rakhmerov: OK, I'll do tommorow 16:40:48 <rakhmerov> #action ddeja: close https://bugs.launchpad.net/mistral/+bug/1624284 and file a new more specific bug with the description of a corner case which is not yet fixed 16:40:48 <openstack> Launchpad bug 1624284 in Mistral "MessagingTimeout when executing mistral actions" [Critical,In progress] - Assigned to Dawid Deja (dawid-deja-0) 16:41:13 <rakhmerov> tuan__: did you come across something similar? 16:41:21 <ddeja> It wouldn't take long - I guess my comment would make a good bug report ;) 16:41:48 <tuan__> rakhmerov: you mean the problem i have reported 16:41:48 <tuan__> ? 16:41:52 <rakhmerov> yes 16:42:07 <rakhmerov> this timeout bug 16:42:27 <tuan__> ahha, the bug of ddeja 16:42:29 <tuan__> okes 16:42:37 <rakhmerov> yep 16:42:52 <rakhmerov> I mean, I'm curious why you're asking about it 16:42:52 <tuan__> well, we have something related to timeout but im not sure because of this bug 16:43:03 <tuan__> let me try to describe it 16:43:09 <rakhmerov> yes 16:44:41 <rakhmerov> meanwhile, I'd like to share some info 16:45:17 <rakhmerov> just FYI: seems like I was able to make necessary changes so that we could run multiple Mistral engines safely 16:45:26 <ddeja> \o/ 16:45:33 <rakhmerov> there's still a bunch of testing ahead, for sure 16:45:56 <d0ugal> Nice 16:46:00 <rakhmerov> but I did a lot already on my local env and found no issues yet 16:46:03 <rakhmerov> yeah 16:47:05 <rakhmerov> my further plan regarding this is to create a gate where we could start multiple Mistral engines and run our Rally scenarios against Mistral 16:47:19 <rakhmerov> theoretically it should not be that hard 16:47:42 <ddeja> but then theory would meet her sister - actuall work ;) 16:47:52 <rakhmerov> btw, I'm wondering if it's possible to create a gate that runs multiple VMs? 16:48:03 <ddeja> rakhmerov: I think so 16:48:04 <rakhmerov> ddeja: yes :) 16:48:13 <ddeja> Nova does so for testing upgrades 16:48:19 <rakhmerov> ddeja: do you know any examples? 16:48:23 <rakhmerov> hah... 16:48:26 <rakhmerov> interesting 16:48:28 <rakhmerov> ok 16:48:30 <ddeja> more specyfically 16:48:51 <ddeja> they have gate for testing if Live Migration of VMs is possible beetwen computes in different versions 16:49:17 <ddeja> so there must be at least 2 VMs 16:49:18 <rakhmerov> #info Nova uses multiple VMs on their gate to test Live Migration 16:49:28 <rakhmerov> I see 16:49:32 <rakhmerov> cool 16:50:00 <rakhmerov> yeah, ideally I'd like to be able to use more than one VM with Mistral components 16:50:44 <rakhmerov> tuan__: would you like to describe your issue now or separately? 16:50:52 <rakhmerov> you can also just file a bug 16:51:03 <rakhmerov> anything works 16:51:13 <tuan__> rakhmerov: im am trying to fidn some logs for that 16:51:21 <rakhmerov> ok 16:51:24 <tuan__> it is just the report back to us 16:51:35 <tuan__> we did not have enough info 16:51:36 <rakhmerov> we have about 8 mins 16:51:44 <tuan__> oh yeah, im sorry 16:51:54 <tuan__> then i think i will send it later 16:52:13 <rakhmerov> that's ok, that's why I said that you can come to our channel and discuss it separately 16:52:17 <rakhmerov> yeah 16:52:19 <rakhmerov> ok 16:52:55 <rakhmerov> if it's something related to stability (timeouts, locks etc.) I'd love to know more about it 16:54:32 <rakhmerov> ddeja, d0ugal, rbrady, sharatss: guys, can you please mention again (in the conclusion of the meeting) what you're planning to do next? 16:54:47 <rakhmerov> what bothers you, your priorities etc. 16:55:12 <rakhmerov> or something that you'd like to be working on but can't for some reason 16:55:41 <ddeja> My PLans: 1. End gate fixing 2. Focus on kombu driver multi-threading 16:55:51 <rakhmerov> ok 16:55:53 <rakhmerov> sounds good 16:56:06 <ddeja> and if I have any time before the O-3, I'll get back to preconditions 16:56:49 <rakhmerov> ddeja: with those RPC fixes we can start using Kombu RPC more actively 16:56:57 <rakhmerov> I'm planning to do that 16:57:06 <ddeja> great! 16:57:09 <d0ugal> For me, custom actions. I'd like to know enough so that by the PTG it is easier for rbrady/you all to have a discussion about it. 16:57:10 <rbrady> custom actions: I want to complete the initial patch and incorporate mistral-lib as a dependency to mistral in an iterative way to ensure CI doesn't break. then look to expand/add features as necessary to ensure the data required in a custom action is injected into the context at execution. 16:57:20 <rakhmerov> theoretically it should perform better than o.m 16:58:07 <rakhmerov> rbrady, d0ugal: ok, sounds good 16:58:58 <rakhmerov> along with testing multiple engines this is going to be one of my top priorities moving forward too 16:59:18 <rakhmerov> ok, thanks a lot 16:59:29 <rakhmerov> I think it's time to close the meeting 16:59:56 <rakhmerov> thanks for coming, have a great week ) 17:00:03 <rakhmerov> bye 17:00:18 <rakhmerov> #endmeeting