16:00:50 <rakhmerov> #startmeeting Mistral
16:00:51 <openstack> Meeting started Mon Jan  9 16:00:50 2017 UTC and is due to finish in 60 minutes.  The chair is rakhmerov. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:52 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:55 <openstack> The meeting name has been set to 'mistral'
16:01:27 <rakhmerov> hi
16:01:32 <sharatss> hi
16:01:37 <d0ugal> Hello!
16:01:44 <rakhmerov> hello-hello )
16:01:48 <ddeja> o/
16:02:25 <rakhmerov> ok, let's begin
16:02:57 <rakhmerov> we haven't had meetings I guess at least for a month
16:03:46 <rakhmerov> so, essentially I would like to touchbase with you a little bit after long holidays
16:04:06 <rakhmerov> we didn't communicate about two weeks
16:04:52 <rakhmerov> #topic General syncup after long holidays
16:04:58 <thrash> o/
16:05:02 <rakhmerov> hi
16:05:20 <rakhmerov> so, any news that you'd like to share?
16:05:25 <rakhmerov> anything?
16:05:30 <rakhmerov> d0ugal, ddeja, sharatss
16:05:42 <ddeja> I was focused on gate fixing
16:05:51 <d0ugal> Nothing much from me, with the holidays it has been very quiet.
16:05:51 <rakhmerov> ok
16:06:03 <sharatss> nothing from me too
16:06:09 <d0ugal> I do hope to spend some time with rbrady looking at the custom actions work
16:06:09 <rakhmerov> ddeja: how is it going with the gate?
16:06:11 <sharatss> became active only today
16:06:12 <ddeja> and for the last 3 weeks I'd like to focused on making kombu driver multi-thread support
16:06:26 <d0ugal> He put a patch up that needs reviews: https://review.openstack.org/#/c/411412/
16:06:27 <ddeja> rakhmerov: despite the sshd_proxied action it is OK now
16:06:37 <ddeja> oh, yes, thanks d0ugal
16:06:40 <rakhmerov> ddeja: yes, it is important
16:07:18 <rbrady> hello
16:07:21 <d0ugal> rbrady: we are just doing a quick catchup and I mentioned that I plan to help you with custom actions
16:07:25 <rakhmerov> ddeja: ok, then we can probabaly disable this test for now and make the gate voting?
16:07:25 <d0ugal> rbrady: and I linked to your review.
16:07:26 <ddeja> d0ugal: oh, I was thinking it is a ling to another patch...
16:07:28 <rakhmerov> rbrady: hi
16:07:37 <rbrady> thanks d0ugal
16:07:41 <ddeja> rakhmerov: the regular one - yes
16:07:44 <d0ugal> ddeja: hah, I am curious which one :)
16:08:01 <rakhmerov> ddeja: I'm just afraid that if we make it voting it may give us huge troubles once in a while
16:08:17 <rakhmerov> if we are ready to take this risk then it's ok
16:08:19 <ddeja> rakhmerov: the kombu one, I'd like to wait untill end of this developement cycle
16:08:27 <ddeja> rakhmerov: HM
16:08:32 <rakhmerov> yes, that's understandable
16:08:35 <ddeja> I think maybe we can wait till Pike?
16:08:37 <d0ugal> rakhmerov: the risk isn't that big, we can always make it non-voting :)
16:08:38 <ddeja> with both
16:08:51 <rakhmerov> yes
16:08:56 <rakhmerov> I would go this path
16:09:08 <ddeja> and start new cycle with new gate
16:09:19 <rakhmerov> d0ugal: it's a pretty big patch
16:09:21 <ddeja> maybe I would be able to also fix the sshd_proxied test till then
16:09:30 <d0ugal> rakhmerov: Which one?
16:09:33 <rakhmerov> #action rakhmerov: review https://review.openstack.org/#/c/411412/
16:09:42 <rakhmerov> d0ugal: https://review.openstack.org/#/c/411412/
16:09:50 <rakhmerov> :)
16:09:51 <d0ugal> rakhmerov: yeah, that one. I think much of it is copied from Mistral but rbrady can explain it more.
16:09:58 <d0ugal> I have only reviewed it
16:10:39 <rakhmerov> ok
16:10:56 <rbrady> that patch is WIP, but more feedback and discussion would probably be good.  I started splitting some files up that may or may not be a good thing
16:11:04 <rakhmerov> so I guess it's pointless now to open our regular topic "Current status"
16:11:12 <rakhmerov> it's pretty much clear who is doing what
16:11:25 <rakhmerov> rbrady: ok
16:11:57 <rakhmerov> I'd like to take a look at it first but if you can give some general tips that could help review it please do
16:12:30 <rakhmerov> ddeja: btw, Dawid, there's another 'rerun' test that fails sometimes
16:12:36 <rakhmerov> for direct workflow
16:12:39 <ddeja> rakhmerov: Oh, ok
16:12:42 <rakhmerov> yep
16:12:42 <ddeja> I'll take a look
16:12:50 <rbrady> the namespaces have been changed a bit from what was laid out in the spec.  following more of a pythonic approach.
16:12:50 <rakhmerov> I saw it failing pretty often
16:13:09 <rakhmerov> rbrady: ok
16:13:25 <rakhmerov> btw, guys, do you already know if you're going to the PTG?
16:13:36 <rbrady> I am...booking hotel and flight today
16:14:00 * d0ugal isn't going
16:14:04 <rakhmerov> rbrady: planning to join our sessions? At least partially
16:14:18 <rbrady> yes
16:14:27 <rakhmerov> d0ugal: I see, it's bad :(
16:14:31 <rakhmerov> rbrady: ok
16:14:38 * ddeja don't know yet...
16:14:54 <rakhmerov> ddeja: it would be cool if you could do that
16:15:17 <d0ugal> rakhmerov: yeah, it is a shame. I shall try and add my input remotely if I can :)
16:15:41 <rbrady> d0ugal: maybe we can have you there via hangouts or something
16:16:02 <rakhmerov> as far as this actions stuff, it's definitely a topic for the PTG, I want to define a goal to figure out all principal things related to it
16:16:03 <d0ugal> rbrady: hah, that could be cool - or I can just read the notes and add questions for you all to answer ;)
16:16:12 * rbrady carries around virtual d0ugal at PTG
16:16:25 <rakhmerov> :)
16:17:03 <ddeja> http://www.conowego.pl/uploads/pics/DoubleRobot-660x440.jpg
16:17:20 <d0ugal> I think somebody used one of those at a previous summit...
16:17:28 <d0ugal> but now we are just getting off topic :-D
16:17:35 <rakhmerov> yeah, I saw those folks :)
16:17:45 <rakhmerov> it's pretty cool
16:18:21 <rakhmerov> in case you didn't see it in ML, this is an etherpad for the PTG plans: https://etherpad.openstack.org/p/mistral-ptg-pike
16:19:11 <rakhmerov> let's focus on gathering all challenges that we need to discuss in this etherpad
16:19:28 <ddeja> OK
16:20:34 <rakhmerov> ddeja: btw, could you please remind me something? Those periodic kombu gate failures are related exactly with the fact that our kombu RPC server is not thread-safe, right?
16:20:39 <rakhmerov> or you are not sure?
16:20:56 <rakhmerov> I remember we were discussing it but don't remember the conclusion
16:21:26 <ddeja> rakhmerov: yes, they are related to the fact that it is not thread safe - and here is the fix https://review.openstack.org/#/c/414533
16:21:57 <ddeja> (another gate failuer is due to sshd_proxied action...)
16:22:11 <rakhmerov> ooh, awesome
16:22:22 <rakhmerov> so do you think it's finished? Ready to be reviewed?
16:22:31 <ddeja> I think so
16:22:34 * rakhmerov needs to review so much..
16:22:47 <ddeja> I spend 2 days looking into o.m code
16:22:53 <ddeja> and make my similar
16:22:59 <rakhmerov> :))
16:23:01 <rakhmerov> I see
16:23:04 <rakhmerov> great
16:23:23 <tuan__> hi guys
16:23:25 <rakhmerov> d0ugal: how about your time thing?
16:23:32 <tuan__> hope not to disturb you
16:23:39 <rakhmerov> do you think you found the solution?
16:23:50 <rakhmerov> tuan__: chime in, np
16:23:56 <tuan__> :)
16:24:06 <d0ugal> rakhmerov: ddeja found an issue with it, I understand it now I think - so I just need to update it again.
16:24:10 <tuan__> okes, can i say something about our problem
16:24:22 <rakhmerov> d0ugal: ok
16:24:31 <rakhmerov> tuan__: sure, go ahead
16:24:35 <tuan__> cool
16:24:42 <tuan__> so, like this:
16:24:50 <ddeja> d0ugal, rakhmerov: yes, IMO old code was OK, just tests needs to be re-writen, but I may be wrong
16:24:59 <tuan__> when we have so sophisticating actions
16:25:20 <rakhmerov> ddeja: is there a patch for this already or you only found a reason?
16:25:34 <d0ugal> ddeja: I think the old code is okay, but I think it is confusing. I don't think we should be passing around local datetimes anywhere.
16:25:35 <tuan__> if they are failed, mistral will return a huge error with input parameters and the root cause of the action
16:25:36 <rakhmerov> tuan__: ok, like what?
16:26:06 <tuan__> the problem is that: With the very long error returned by mistral, operator is hard to understand the root cause
16:26:17 <tuan__> it is based on the requirement of ETSI in telco
16:26:33 <d0ugal> I've noticed this before. Tracking down where the problem is can be tricky
16:26:34 <ddeja> rakhmerov: there is a patch from d0ugal, it fixes it on his env (Scotland, so UTC0), but it brakes things on mine (Poland, so UTC+1)
16:26:42 <tuan__> stack trace must be human-readable
16:27:01 <rakhmerov> ddeja: :)
16:27:03 <rakhmerov> ok
16:27:09 <d0ugal> ddeja: the real question is why does it behave differently on my machine and CI :)
16:27:09 <tuan__> e.g.: mistral will return something like this: Action failed....blabla
16:27:21 <tuan__> and then after that the root cause of the action
16:27:31 <ddeja> rakhmerov, tuan__ I've also noticed it, sometimes even errors can be very misleading
16:27:44 <ddeja> and lead to another action by mistake
16:27:50 <rakhmerov> tuan__: quick question
16:27:56 <ddeja> I guess it's a perfect topic for PTG
16:27:56 <tuan__> go ahead
16:28:06 <tuan__> rakhmerov: i am lsitening
16:28:22 <tuan__> ddeja: That is what we have to deal with
16:28:28 <rakhmerov> ddeja: yes, moreover, we already discussed this with some folks from Nokia and others
16:28:29 <tuan__> to make it human-readable
16:28:49 <rakhmerov> tuan__: can you give a very specific example?
16:28:58 <tuan__> with a very sophisticating actions, the error returned quite long
16:28:58 <rakhmerov> what action, what kind of failure?
16:29:03 <rakhmerov> and what Mistral returns
16:29:17 <tuan__> well, it is the custom actio
16:29:17 <rakhmerov> sophisticated like what?
16:29:30 <rakhmerov> yep, just a second..
16:29:43 <tuan__> i am so sorry that i could not tell more since it is the policy
16:29:57 <rakhmerov> the reason I'm asking is that actions are allowed to return a structured result even in case of a failure
16:30:31 <tuan__> you can imagine that, we want mistral to run action that deploy hundreds of vms for specific vnf
16:30:47 <tuan__> therefore the input parameter of mistral is quite long
16:30:53 <rakhmerov> tuan__: ok, np, I understand. Then I'd like to ask you to come up with an example which is to some extent similar to your real one
16:30:55 <tuan__> when this action fails
16:31:12 <rakhmerov> tuan__: not necessary now (we don't have too much time)
16:31:21 <tuan__> mistral returns the error of something: Failed action ....
16:31:43 <tuan__> then at the end of the error is the root cause, something like: Javascript fails
16:31:45 <d0ugal> I guess we need a bug report and a way to reproduce it
16:32:08 <rakhmerov> d0ugal: +1, that's what I'd like to achieve
16:32:10 <tuan__> but the operator has difficulty to find out the root cause of this, the line: Javascript fails
16:32:40 <tuan__> IMHO it is not a bug, it is just the hierachy of error
16:32:50 <rakhmerov> tuan__: yes, I see. Can you please file a ticket in our Launchpad?
16:32:52 <tuan__> or the return information from mistral
16:33:03 <rakhmerov> with as much info as possible
16:33:07 <tuan__> OK, i will try do it
16:33:11 <tuan__> thanks renat
16:33:11 <rakhmerov> yeah
16:33:46 <rakhmerov> please try to be very specific, like, for example, you say "Mistral returns.."
16:33:59 <rakhmerov> what do you mean by that? Returns where?
16:34:25 <rakhmerov> are you talking about logs or fields of some objects stored in DB
16:34:27 <rakhmerov> etc
16:35:24 <rakhmerov> the point I'm trying to make is that we might be able to solve this problem (at least partially) by writing actions themselves in a certain way
16:35:46 <rakhmerov> so that they return something more or less readable
16:35:53 <rakhmerov> even in case of an error
16:36:04 <rakhmerov> there's a protocol for that
16:36:15 <rakhmerov> maybe it's not documented well, then it's a different issue
16:36:29 <rakhmerov> that's why I'd like to see some example
16:37:03 <rakhmerov> tuan__: once you file a ticket please just join our IRC channel any time and we will discuss it
16:37:34 <tuan__> sure
16:37:40 <rakhmerov> ok
16:37:43 <tuan__> otherwise
16:38:01 <tuan__> https://bugs.launchpad.net/mistral/+bug/1624284
16:38:01 <openstack> Launchpad bug 1624284 in Mistral "MessagingTimeout when executing mistral actions" [Critical,In progress] - Assigned to Dawid Deja (dawid-deja-0)
16:38:13 <rakhmerov> yep
16:38:14 <tuan__> why this bug is partially
16:38:17 <tuan__> ?
16:38:20 <ddeja> tuan__: Mostly fixed
16:38:30 <tuan__> i also think that
16:38:30 <ddeja> there is one corncer case that is not fixed yet
16:38:46 <rakhmerov> yes, it's 99% fixed, just one very crazy corner case is left
16:39:03 <rakhmerov> :)
16:39:09 <ddeja> but unless you do something like: 'mistral run-action `mistral_run_action`' it works fine
16:39:20 <d0ugal> lol
16:39:29 <rakhmerov> ddeja: maybe it makes sense to close this bug and open another one which is more specific?
16:39:37 <rakhmerov> describing that specific corner case?
16:39:48 <ddeja> another words: the only thing that it is not fixed is when you use mistral run-action to start another action or workflow
16:39:53 <ddeja> rakhmerov: OK, I'll do tommorow
16:40:48 <rakhmerov> #action ddeja: close https://bugs.launchpad.net/mistral/+bug/1624284 and file a new more specific bug with the description of a corner case which is not yet fixed
16:40:48 <openstack> Launchpad bug 1624284 in Mistral "MessagingTimeout when executing mistral actions" [Critical,In progress] - Assigned to Dawid Deja (dawid-deja-0)
16:41:13 <rakhmerov> tuan__: did you come across something similar?
16:41:21 <ddeja> It wouldn't take long - I guess my comment would make a good bug report ;)
16:41:48 <tuan__> rakhmerov: you mean the problem i have reported
16:41:48 <tuan__> ?
16:41:52 <rakhmerov> yes
16:42:07 <rakhmerov> this timeout bug
16:42:27 <tuan__> ahha, the bug of ddeja
16:42:29 <tuan__> okes
16:42:37 <rakhmerov> yep
16:42:52 <rakhmerov> I mean, I'm curious why you're asking about it
16:42:52 <tuan__> well, we have something related to timeout but im not sure because of this bug
16:43:03 <tuan__> let me try to describe it
16:43:09 <rakhmerov> yes
16:44:41 <rakhmerov> meanwhile, I'd like to share some info
16:45:17 <rakhmerov> just FYI: seems like I was able to make necessary changes so that we could run multiple Mistral engines safely
16:45:26 <ddeja> \o/
16:45:33 <rakhmerov> there's still a bunch of testing ahead, for sure
16:45:56 <d0ugal> Nice
16:46:00 <rakhmerov> but I did a lot already on my local env and found no issues yet
16:46:03 <rakhmerov> yeah
16:47:05 <rakhmerov> my further plan regarding this is to create a gate where we could start multiple Mistral engines and run our Rally scenarios against Mistral
16:47:19 <rakhmerov> theoretically it should not be that hard
16:47:42 <ddeja> but then theory would meet her sister - actuall work ;)
16:47:52 <rakhmerov> btw, I'm wondering if it's possible to create a gate that runs multiple VMs?
16:48:03 <ddeja> rakhmerov: I think so
16:48:04 <rakhmerov> ddeja: yes :)
16:48:13 <ddeja> Nova does so for testing upgrades
16:48:19 <rakhmerov> ddeja: do you know any examples?
16:48:23 <rakhmerov> hah...
16:48:26 <rakhmerov> interesting
16:48:28 <rakhmerov> ok
16:48:30 <ddeja> more specyfically
16:48:51 <ddeja> they have gate for testing if Live Migration of VMs is possible beetwen computes in different versions
16:49:17 <ddeja> so there must be at least 2 VMs
16:49:18 <rakhmerov> #info Nova uses multiple VMs on their gate to test Live Migration
16:49:28 <rakhmerov> I see
16:49:32 <rakhmerov> cool
16:50:00 <rakhmerov> yeah, ideally I'd like to be able to use more than one VM with Mistral components
16:50:44 <rakhmerov> tuan__: would you like to describe your issue now or separately?
16:50:52 <rakhmerov> you can also just file a bug
16:51:03 <rakhmerov> anything works
16:51:13 <tuan__> rakhmerov: im am trying to fidn some logs for that
16:51:21 <rakhmerov> ok
16:51:24 <tuan__> it is just the report back to us
16:51:35 <tuan__> we did not have enough info
16:51:36 <rakhmerov> we have about 8 mins
16:51:44 <tuan__> oh yeah, im sorry
16:51:54 <tuan__> then i think i will send it later
16:52:13 <rakhmerov> that's ok, that's why I said that you can come to our channel and discuss it separately
16:52:17 <rakhmerov> yeah
16:52:19 <rakhmerov> ok
16:52:55 <rakhmerov> if it's something related to stability (timeouts, locks etc.) I'd love to know more about it
16:54:32 <rakhmerov> ddeja, d0ugal, rbrady, sharatss: guys, can you please mention again (in the conclusion of the meeting) what you're planning to do next?
16:54:47 <rakhmerov> what bothers you, your priorities etc.
16:55:12 <rakhmerov> or something that you'd like to be working on but can't for some reason
16:55:41 <ddeja> My PLans: 1. End gate fixing 2. Focus on kombu driver multi-threading
16:55:51 <rakhmerov> ok
16:55:53 <rakhmerov> sounds good
16:56:06 <ddeja> and if I have any time before the O-3, I'll get back to preconditions
16:56:49 <rakhmerov> ddeja: with those RPC fixes we can start using Kombu RPC more actively
16:56:57 <rakhmerov> I'm planning to do that
16:57:06 <ddeja> great!
16:57:09 <d0ugal> For me, custom actions. I'd like to know enough so that by the PTG it is easier for rbrady/you all to have a discussion about it.
16:57:10 <rbrady> custom actions: I want to complete the initial patch and incorporate mistral-lib as a dependency to mistral in an iterative way to ensure CI doesn't break.  then look to expand/add features as necessary to ensure the data required in a custom action is injected into the context at execution.
16:57:20 <rakhmerov> theoretically it should perform better than o.m
16:58:07 <rakhmerov> rbrady, d0ugal: ok, sounds good
16:58:58 <rakhmerov> along with testing multiple engines this is going to be one of my top priorities moving forward too
16:59:18 <rakhmerov> ok, thanks a lot
16:59:29 <rakhmerov> I think it's time to close the meeting
16:59:56 <rakhmerov> thanks for coming, have a great week )
17:00:03 <rakhmerov> bye
17:00:18 <rakhmerov> #endmeeting