#openstack-meeting-alt log

17:59:54 <SergeyLukjanov> #startmeeting sahara
17:59:54 <openstack> Meeting started Thu Jul  9 17:59:54 2015 UTC and is due to finish in 60 minutes.  The chair is SergeyLukjanov. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:59:55 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:59:58 <openstack> The meeting name has been set to 'sahara'
18:00:00 <alazarev> o/
18:00:00 <elmiko> heyo/
18:00:15 <SergeyLukjanov> #link https://wiki.openstack.org/wiki/Meetings/SaharaAgenda
18:00:25 <weiting> Hi
18:00:28 <SergeyLukjanov> let's wait for a few mins
18:00:30 <alazarev> SergeyLukjanov, will you chair?
18:00:32 <SergeyLukjanov> for other folks
18:00:56 <SergeyLukjanov> alazarev, yeah, the PTLs overview will be one hour later ;)
18:01:02 <SergeyLukjanov> #chair alazarev
18:01:03 <openstack> Current chairs: SergeyLukjanov alazarev
18:01:08 <tosky> o/
18:01:14 <SergeyLukjanov> to be sure that someone will end the meeting :)
18:01:27 <alazarev> SergeyLukjanov :)
18:01:43 <egafford> \o
18:01:47 <pino|work> o/
18:01:50 <elmiko> updated the agenda to remove last weeks item about spark
18:01:59 <SergeyLukjanov> elmiko, yeah, thx
18:02:20 <SergeyLukjanov> crobertsrh, NikitaKonovalov ping
18:02:24 <SergeyLukjanov> #topic sahara@horizon status (crobertsrh, NikitaKonovalov)
18:02:30 <crobertsrh> howdy
18:02:34 <SergeyLukjanov> #link https://etherpad.openstack.org/p/sahara-reviews-in-horizon
18:02:38 <SergeyLukjanov> hey :)
18:03:03 <crobertsrh> We've had a few changes go through, not a ton of progress
18:03:09 <crobertsrh> The move to contrib patch is up
18:03:30 <crobertsrh> hopefully, once the move is done, we can get patches through a bit quicker (after we rebase them, of course)
18:03:41 <SergeyLukjanov> heh, yeah, hopefully
18:03:53 <SergeyLukjanov> anything else re horizon?
18:04:43 <tosky> crobertsrh: so is the sahara part fully separated, from the source point of view? Interesting
18:05:06 <tosky> it seems that the existing (two) selenium integration tests are still happy
18:05:07 <SergeyLukjanov> tosky, it seems still really mixed
18:05:15 <tosky> oh, I see
18:05:15 <crobertsrh> It's mostly still the same imho
18:05:20 <SergeyLukjanov> crobertsrh, ++
18:05:28 <SergeyLukjanov> but it's logically separated
18:06:21 <crobertsrh> The move to contrib patch is here...
18:06:23 <crobertsrh> #link https://review.openstack.org/#/c/197363/
18:06:35 <SergeyLukjanov> okay, thx
18:06:45 <SergeyLukjanov> #topic News / updates
18:06:52 <sreshetnyak> o/
18:06:57 <egafford> crobertsrh: "blueprint plugin-sanity" is excellent.
18:07:25 <vgridnev> i am working for several bugs and ntp support in plugins
18:07:27 <crobertsrh> how could anyone -1 a bp like that
18:07:34 <elmiko> i'm working on the conversion to use keystone sessions for authentication. also, writing up an abstract for tokyo on using spark with sahara to process streaming logs from openstack services.
18:08:00 <sreshetnyak> no updates from me
18:08:21 <egafford> Working on a few changes in parallel (specs for two stages of manila integration for binary storage, trusts for long-running clusters to permit cleanup.) On the latter, question: is alazarev still on leave?
18:08:27 <esikachev> i am working on the cluster-verification checks
18:08:32 <alazarev> I'm back from parental leave starting from today
18:08:47 <egafford> alazarev: Cool; I have a question for you then.
18:08:54 <egafford> (Will wait for discussion.)
18:09:00 <tosky> working on the small change on scenario tests (configuration files) now the spec is approved, it will be ready soooon
18:09:14 <huichun> working on recurrence schedule edp
18:09:21 <alazarev> egafford, yes, I still remember something about work :)
18:09:24 <SergeyLukjanov> tosky, cool, looking forward for the scenario templates
18:09:32 <huichun> and also with suspend resume edp jobs
18:10:05 <SergeyLukjanov> huichun, great, with this stuff we'll have some kind of additional job lifecycle management
18:10:17 <SergeyLukjanov> #topic Open discussion
18:10:37 <huichun> SergeyLukjanov: Hi Sergey, do we need suspend and resume edp job feature?
18:10:41 <tmckay> so no updates for me, I have been on PTO for a while :)
18:11:01 <NikitaKonovalov> o/ I've been working on HDP 2.2 plugin
18:11:08 <tmckay> time to catch up on reviews
18:11:15 <NikitaKonovalov> mainly on HA and EDP stuff
18:11:16 <SergeyLukjanov> huichun, it's a good q., do you have the use case for it?
18:11:18 <huichun> tmckay:  lots of edp enhancement spec needs your review ^_^
18:11:36 <tmckay> huichun, noted, I'll try hard to review them
18:11:48 <tmckay> long vacation
18:11:51 <alazarev> huichun, I'll join to review too
18:12:14 <egafford> alazarev: I note that in your spec for trusts to enable cluster cleanup, you suggest that trust ids should be stored in memory on the context. However, in a distributed Sahara install, this will mean that only one server will be able to cleanup any one cluster. That means, in turn, that we will always need to run that cleanup job on every node, or we'll need to find a solution for distributing the trust ids among engine nodes
18:12:57 <huichun> SergeyLukjanov:  for example, if one job has many steps, user want to suspend this job when finish the first step, to check if the log or data is right, then resume this job
18:13:17 <huichun> alazarev:  thx ^_^
18:13:49 <alazarev> egafford, it was for "create/scale cluster" operation. The whole task is done by one engine now.
18:15:03 <alazarev> egafford, for clean we need to store it in DB, what spec are you referencing?
18:15:40 <egafford> alazarev: (Aggregating links)
18:16:44 <egafford> alazarev: https://bugs.launchpad.net/sahara/+bug/1468722 covers a bug with periodic cluster cleanup (no trust-clusters cannot be cleaned up.)
18:16:44 <openstack> Launchpad bug 1468722 in Sahara "Periodic cleanup of non-final clusters moves the cluster into Error instead of removing it" [High,New] - Assigned to Ethan Gafford (egafford)
18:17:45 <egafford> alazarev: When we discussed this, you pointed me to the spec: http://specs.openstack.org/openstack/sahara-specs/specs/kilo/cluster-creation-with-trust.html
18:18:20 <egafford> The line "Trust is stored in memory only (probably context is the good place to put it). No serialization to DB." is a bit of a problem for the periodic cluster cleanup job.
18:18:42 <alazarev> egafford, I see, this spec is for long operation inside one engine, it will not work "as is" for clean up
18:19:23 <alazarev> egafford, but they both can use the same mechanism (e.g. with trust stored in DB)
18:20:12 <alazarev> egafford, or can use different (e.g. because long operation doesn't need trust in DB), need to think more
18:20:39 <alazarev> egafford, do you have thoughts about?
18:21:37 <egafford> alazarev: Right; that was my thought. I'll submit a patch to the spec for review, then, and we can talk about it there. I've got an impl nearly complete. Well, I don't actually see a great deal of sec difference between storing a trust for a transient cluster and storing a temporary trust for a long-running cluster. Either could contain extremely sensitive data.
18:22:25 <elmiko> i dont think storing the trust id is necessarily a sec concern
18:22:37 <elmiko> you still need a valid auth token to do anything useful
18:22:57 <egafford> I think if we're okay with one, the other makes sense. I can see an argument the other way (a long-running cluster trust allows a malicious user to create a backdoor for attack for longer,) but. elmiko: That makes sense.
18:23:38 <egafford> Okay, I have enough information to keep working. Thanks alazarev, elmiko.
18:24:55 <elmiko> the other option would be for each server instance to generate a trust with similar permissions
18:25:48 <elmiko> then each instance would have separate permissions to remove the cluster, or whatever operation is needed
18:26:11 <egafford> Sure; just create a trust on demand if you need one but don't have it.
18:26:17 <alazarev> elmiko, what do you mean by "server"?
18:26:29 <elmiko> i meant, -engine server
18:26:58 <elmiko> does that make sense?
18:26:59 <alazarev> to create trust you need valid token
18:27:15 <elmiko> yea
18:27:27 <alazarev> you can't create at "clean up" time
18:27:38 <egafford> Right, and we only have a valid token for the tenant plane while we're creating the nodes.
18:28:00 <elmiko> i meant more that each sahara instance could create a trust, then they each would have permissions on the cluster
18:28:18 <elmiko> that way there is no need to share a trust between sahara instances
18:28:42 <SergeyLukjanov> elmiko, what if the new -engine added after the last op on a cluster?
18:28:44 <elmiko> each could contain a seprate trust id in their context
18:29:00 <SergeyLukjanov> (added == reloaded for example)
18:29:04 <elmiko> SergeyLukjanov: good point, that could get tricky coordinating the actions
18:29:17 <elmiko> maybe this is not a good way to approach it
18:29:56 <egafford> elmiko: Yeah, I don't think we have any other fanout-type messaging tricks, and we probably don't want them unless we *really* need them.
18:30:08 <elmiko> right
18:30:30 <elmiko> probably storing to db is the easiest solution
18:30:39 <alazarev> elmiko, api doesn't know how many engines we have, it is not possible to create trusts for all of them
18:30:51 <egafford> elmiko: So it really comes down to "is it safe to store trust_ids in the database?" If the answer is broadly yes, then all is pretty well.
18:31:05 <elmiko> alazarev: ah, interesting. i did not know that
18:31:32 <elmiko> egafford: i think it is, but i can research a little more.
18:31:43 <egafford> alazarev: There are AMQP tricks that could conceivably deal with that (fanout messages) but they're usually troublesome.
18:31:45 <elmiko> iirc we store the proxy domain trusts in the db temporarily
18:32:04 <egafford> alazarev: Haven't looked into oslo_messaging support of that featureset.
18:32:15 <egafford> alazarev: (And I hope not to. :) )
18:32:36 <elmiko> another option would be to use barbican for external secret storage
18:32:52 <elmiko> (if needed)
18:32:57 <egafford> elmiko: That is not a bad idea at all if we need it, yeah.
18:33:21 <alazarev> barbican could be optional only, I think
18:33:25 <elmiko> but i think its generally safe to store a trust id
18:33:35 <alazarev> we don't want to depend on barbican for now
18:33:45 <elmiko> alazarev: yea, we'd have to use the castellan approach
18:33:46 <egafford> alazarev: +1 optional Barbican.
18:34:42 <elmiko> it would be similar to what we're proposing for secret storage now. use the db as a default with the option to improve to barbican
18:34:59 <egafford> elmiko: Right; we could piggyback on the same interface.
18:35:09 <elmiko> yea
18:35:41 <elmiko> speaking of which, #link https://review.openstack.org/#/c/179393/ :cough:
18:35:55 <elmiko> =)
18:36:22 <alazarev> elmiko, will review
18:36:26 <elmiko> thanks
18:36:41 <tmckay> elmiko, you should take something for that cough ;-)
18:36:57 <elmiko> the only cure now is more +1/+2 ;)
18:37:06 <egafford> Okay, so it sounds like the sensible plan is: 1) I propose the spec change to allow trust storage for long-running clusters, 2) elmiko researches whether trust_ids can be stored in the DB, 3) someone writes up a spec to alter our schema to store all trust ids (transient and long-running) through the improved secret storage module, 4) profit.
18:37:21 <egafford> Usually profit wants to be 3, but sometimes it takes a while.
18:37:28 <elmiko> egafford: sounds good to me
18:37:43 <egafford> elmiko: Cool. Thanks again.
18:40:01 <SergeyLukjanov> anything else to chat today?
18:40:07 <alazarev> egafford, sounds good
18:40:18 <egafford> alazarev: Excellent.
18:40:57 <pino|work> SergeyLukjanov: a look at pending reviews? ;)
18:41:16 <huichun> egafford: hi Ethan, recurrence edp spec has been updated according to your last comment,waiting your review^_^
18:41:47 <SergeyLukjanov> pino|work, in my backlog, should actively review tomorrow morning (canceled all mostly all meetings :) )
18:41:53 <egafford> huichun: I saw; I thought about reviewing, but then thought "you know, other people really need to review these too; they're incredibly important."
18:42:38 <egafford> huichun: It sounds like some other folks (tmckay, alazarev) have signed up to review as well; I'll come back to it once it gets some additional eyes (which I hope it does soon; it's incredibly important. :) )
18:43:04 <pino|work> SergeyLukjanov: thanks!
18:43:06 <tmckay> yes, very interested, but I was away!
18:43:08 <huichun> nice^_^
18:43:28 <egafford> tmckay: No judgment at all.
18:43:29 <elmiko> keystone session spec could use more eyes as well ;)
18:43:36 * tmckay makes todo list in email, marks urgent
18:48:40 <huichun> tmckay: oh,one more thing, do we need update current Ozzie client call from v1 to V2?
18:48:59 <tmckay> huichun, hmmm, I am unaware of the implications
18:49:50 <alazarev> huichun, why not? I think it should be pretty easy, right?
18:50:01 <huichun> yes,easy
18:50:57 <huichun> and we have new feature can be added into Sahara oozie engine by v2
18:51:12 <tmckay> huichun, okay, sounds good to me
18:51:28 <huichun> alazarev: i will write a spec to do this
18:51:38 <tmckay> huichun, maybe we don't need a blueprint, but a wishlist bug outlining what the new benefits/features are
18:51:45 <tmckay> or a spec :)
18:52:04 <alazarev> +1 on spec, this is not a bug
18:52:15 <tmckay> that's what wishlist is for :)
18:52:24 <huichun> tmckay:  agree
18:52:32 * SergeyLukjanov dissapearing for the ptl overview recording
18:52:42 <SergeyLukjanov> alazarev, don't forget to end the meeting please
18:52:53 <alazarev> SergeyLukjanov, ok
18:53:07 <alazarev> do we have anything more to discuss?
18:53:27 <huichun> alazarev:  waiting for your review comments on recurrence edp and suspend resume edp ^_^
18:54:03 <huichun> so that I can start up with coding work ^_^
18:54:34 <tmckay> huichun, heh
18:54:59 * tmckay sometimes starts coding before spec is done, ssshhh
18:55:34 * egafford is both shocked and appalled at tmckay, and almost always starts coding before spec is done.
18:55:44 <huichun> ^_^
18:56:02 <tmckay> iterative refinement, it's good
18:56:15 <huichun> nice
18:56:45 <egafford> In huichun's case, admittedly, there are more ways he can go than on some specs. Having community signoff first is nice.
18:57:07 <elmiko> yea, this recurrence spec could be implemented a couple different ways
18:57:50 <tmckay> like, set versus list?
18:58:04 <tmckay> or you mean at a higher level? ;-)
18:58:16 <elmiko> yea, higher level is definitely a possibility
18:58:17 <huichun> yes, so any way may rewrote code, not just refinement  ^_^
18:58:31 <elmiko> i guess we'll discuss it on the review though
18:58:37 <alazarev> ok, it looks we are done for today
18:58:43 <alazarev> #endmeeting