18:00:30 <SergeyLukjanov> #startmeeting sahara 18:00:31 <openstack> Meeting started Thu Feb 18 18:00:30 2016 UTC and is due to finish in 60 minutes. The chair is SergeyLukjanov. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:00:34 <openstack> The meeting name has been set to 'sahara' 18:00:35 <huichun> 😀 18:00:36 <crobertsrh> hello/ 18:00:38 <NikitaKonovalov> o/ 18:00:45 <elmiko> heyo/ 18:00:47 <esikachev> hi! 18:00:52 <vgridnev> hi 18:00:55 <mionkin> hello :) 18:01:00 <SergeyLukjanov> #link https://wiki.openstack.org/wiki/Meetings/SaharaAgenda 18:01:08 <SergeyLukjanov> #topic News / updates 18:01:16 <SergeyLukjanov> client release is on review 18:01:22 <apavlov> hi 18:01:59 <SergeyLukjanov> https://review.openstack.org/#/c/280893/ 18:02:13 <vgridnev> SergeyLukjanov, are we are going to have additional one for health checks? 18:02:17 <crobertsrh> UI rework is still up for review. As vgridnev pointed out this morning, the integration tests are broken in those patches. I may need a bit of help to sort out what tweaks are required. 18:02:32 <SergeyLukjanov> vgridnev yup 18:02:39 <elmiko> i've been adding more to the api v2 work, and also looking into a few more security issues 18:02:55 <elmiko> speaking of which, SergeyLukjanov did you ever see that security bug i logged? 18:03:08 <vgridnev> SergeyLukjanov, great 18:03:18 <SergeyLukjanov> elmiko not yet :) could you pleas re-send link? 18:03:25 <elmiko> yup, 1sec 18:03:52 <elmiko> SergeyLukjanov: https://bugs.launchpad.net/sahara/+bug/1541122 18:03:53 <openstack> elmiko: Error: malone bug 1541122 not found 18:04:26 <SergeyLukjanov> elmiko sad 18:04:42 <elmiko> SergeyLukjanov: what's sad is that i have a few more of these to post :/ 18:04:51 <SergeyLukjanov> yup 18:05:10 <elmiko> but, we need to fix em =) 18:05:12 <SergeyLukjanov> I think we should fix it as part of adding castelan based secure store to sahra 18:05:28 <SergeyLukjanov> and ensure that all of this stuff is configurable through UI 18:05:31 <SergeyLukjanov> API 18:05:41 <SergeyLukjanov> to make users able to specify some password 18:05:49 <elmiko> interesting thought 18:06:06 <elmiko> and i agree, it will help operators to have more access to these values 18:06:47 <elmiko> but, to begin with, we can fix up their hardcoded usage 18:06:58 <SergeyLukjanov> agree 18:08:40 <SergeyLukjanov> any else news? 18:08:48 <SergeyLukjanov> #topic API v2 progress 18:08:54 <SergeyLukjanov> elmiko your turn :) 18:08:59 <SergeyLukjanov> #link https://review.openstack.org/#/c/273316/ 18:09:03 <SergeyLukjanov> #link https://wiki.openstack.org/wiki/Sahara/api-v2 18:09:09 <elmiko> thanks 18:09:15 <elmiko> we need more reviews on the initial commit for v2 18:09:31 <elmiko> also, i am adding more items to the wiki 18:09:43 <elmiko> i have some local patches that depend on the initial commit, should i start posting these? 18:09:55 <elmiko> i thought maybe i should wait until the first merges 18:10:33 <elmiko> if there are no opinions, i'm just going to start posting them =D 18:11:23 <SergeyLukjanov> yeah, probably wait for the first merges 18:11:27 <SergeyLukjanov> to avoid rebase hell 18:11:35 <elmiko> yup, that was my thought too 18:11:49 <elmiko> so, everyone go review https://review.openstack.org/#/c/273316/ !!! 18:11:54 <elmiko> please =D 18:12:05 <elmiko> that's all from me 18:12:19 <SergeyLukjanov> elmiko thx! 18:12:27 <SergeyLukjanov> #topic Open discussion 18:12:52 <rickflare> good afternoon everyone 18:13:25 <rickflare> I wanted to simple start out by thanking everyone who has continued to provide support for me! 18:13:33 <vgridnev> Also I would appreciate if someone will review health checks stuff https://review.openstack.org/#/q/status:open++branch:master+topic:bp/cluster-verification 18:13:39 <huichun> vgridnev thx for your helpful review comments about is_engine_implement fun, never notice before, really wanted 18:13:54 <huichun> I will update suspend EDP patch 18:13:56 <vgridnev> huichun, np 18:14:09 <vgridnev> ok, thanks! 18:14:17 <rickflare> I have had several items that I feel could be very helpful to this project 18:14:17 <crobertsrh> vgridnev: I will give them a review today or tomorrow at the latest. 18:15:05 <huichun> SergeyLukjanov: I have an idea for replace the current Oozie engine with Luigi 18:15:23 <huichun> Guys, what do you think about Oozie? 18:16:08 <huichun> https://github.com/spotify/luigi 18:16:48 <huichun> Oozie need tomcat, write XML and need extra jar file by running jobs 18:17:11 <elmiko> huichun: i have not used luigi, but i don't see why we wouldn't consider a proposal to add it as well as the OozieEngine 18:17:27 <elmiko> oh, you want to replace with luigi? 18:17:41 <crobertsrh> I have not experienced Luigi either 18:18:06 <NikitaKonovalov> need to have a look at it 18:18:10 <elmiko> seems nice, and it's python 1 18:18:12 <elmiko> er +1 18:18:23 <huichun> elmiko: you mean add Luigi as a EDP engine like Oozie? 18:18:28 <huichun> Luigi is Python 18:18:40 <elmiko> huichun: that's what i thought, but i missed that you said "replace" oozie 18:18:52 <NikitaKonovalov> huichun: which Job types are supported? 18:19:00 <huichun> Yes, my first thought is replacing 18:20:35 <elmiko> i don't have a strong opinion either way, but i can see a few course of action; 18:20:37 <huichun> NikitaKonovalov: Luigi support all Oozie can do, batch jobs and with dependency resolution workflow management 18:20:45 <elmiko> 1. write a spec, so we can debate on review 18:21:00 <elmiko> 2. make an option to allow either oozie or luigi, so we can keep backwards compat 18:21:10 <elmiko> 3. plan a future migration away from oozie, if we desire 18:21:32 <elmiko> does that sound reasonable? 18:21:46 <vgridnev> agreed with elmiko 18:21:46 <crobertsrh> +1 elmkio: fine plan 18:22:05 <huichun> elmiko: yes, it's my original idea currently 18:22:12 <elmiko> huichun: great! 18:23:07 <huichun> elmiko: just working on EDP engine parts in Sahara, and make lots of research on others workflow engine and have this idea 18:23:26 <elmiko> cool, thanks for bringing it up =) 18:24:07 <elmiko> huichun: i think it's great if we can keep up to date with new technologies that might improve our experience. 18:24:07 <SergeyLukjanov> I've not seen luigi as well 18:24:10 * tosky reads Luigi 18:24:16 <tosky> what is this thing called like me? 18:24:22 <elmiko> tosky: https://github.com/spotify/luigi 18:24:28 <elmiko> hehe 18:24:30 <tosky> oh, thanks, I missed that 18:24:37 <elmiko> np 18:24:38 <crobertsrh> tosky, we plan to just have you launch jobs for all of our users. We figured you have the spare time. 18:24:45 <elmiko> yup, what crobertsrh said 18:24:59 <elmiko> just tosky operating on 1000s of nodes, manually running jobs 18:25:09 <tosky> '-_- 18:25:14 <elmiko> LOL 18:25:15 <rickflare> so guys i have stood up a few clusters 18:25:21 <rickflare> close to a 100 or so nodes 18:25:22 <huichun> lol 18:25:34 <rickflare> and one of the biggest issues I see is cluster persistance 18:26:12 <rickflare> when running hadoop is it not uncommon to have data nodes go down for a number of possible reasons 18:26:38 <rickflare> the horizon interface really needs a method to restart services 18:27:04 <rickflare> very much in the manner in which cloudera manager and custom dev ops tools would 18:27:21 <SergeyLukjanov> sorry folks, I need to go earlier today 18:27:24 <SergeyLukjanov> #chair elmiko 18:27:25 <openstack> Current chairs: SergeyLukjanov elmiko 18:27:36 <elmiko> i think this is an interesting idea, makes me wonder if we will have much overlap with reproducing cloudera manager functionality 18:27:40 <elmiko> SergeyLukjanov: no worries 18:27:53 <rickflare> elmiko you will somewhate 18:28:13 <elmiko> also, vgridnev, is this something that we might lead to eventually with the cluster health checks? 18:28:20 <crobertsrh> I think it might be a nice addition since we're already adding health checks, service restarts might be a nice thing to have 18:28:21 <rickflare> however for the the vanilla distributions it would be awesome 18:28:28 <elmiko> rickflare: agreed 18:28:40 <rickflare> restarts would be amazing 18:28:42 <elmiko> vanilla is tough though, because we are the only support mechanism for it 18:28:55 <rickflare> Taz and I are willing to help with that 18:29:03 <elmiko> cool 18:29:08 <rickflare> we have tons and I do mean tons of experience managing hadoop 18:29:11 <crobertsrh> great 18:29:22 <elmiko> i'd say take a look at the cluster health check stuff and maybe propose a spec about doing service restarts 18:29:35 <rickflare> ok 18:29:37 <elmiko> we can certainly help fill in the details about the specifics of how sahara works 18:29:42 <elmiko> it sounds like a fine idea 18:29:50 <rickflare> this kind of leads into my second idea 18:29:57 <rickflare> that is centered around security 18:30:21 <vgridnev> elmiko, I think that we can do some kind of auto scaling ideas or / and some kind of restarts 18:30:36 <rickflare> some clusters are going to need update, E.G. kernel patching, glibc patch etc 18:30:46 <elmiko> vgridnev: yea, i think there might be nice integration between health checks and service restarts 18:30:57 <rickflare> I would like to propose a hardend version of some of the images 18:31:07 <elmiko> rickflare: sounds great 18:31:16 <rickflare> like a vanilla that uses hadoop with kerberos 18:31:18 <elmiko> like, building on the centos images? 18:31:25 <elmiko> ooh, my favorite topic 18:31:27 <rickflare> elmiko exactly 18:31:33 <elmiko> we have talked about kerb integration before 18:31:54 <elmiko> i have a document i should share with you, we had a big session about it in vancouver (i think, or was that paris) 18:32:06 <rickflare> id like to see openscap intergrated and perhaps even drop in dev op tools like saltstack or puppet into the clusters 18:32:10 <elmiko> there are several questions surrounding kerb integration though 18:32:22 <elmiko> that's an interesting thought 18:32:27 <rickflare> this way if users need to customize anything in the cluster they can 18:32:34 <elmiko> hmm 18:32:51 <elmiko> you may want to propose this as an idea on the ML, it's a big topic 18:32:52 <rickflare> for us the biggest hurdle for pushing sahara will be our ability to control and push security 18:32:57 <elmiko> right 18:32:58 <rickflare> i know 18:33:23 <elmiko> but, if you create your own images, you could certainly run puppet outside the cluster and update images in the cluster with it, no? 18:33:24 <rickflare> and I figured I start her because you guys have been so receptive 18:33:44 <rickflare> yes you can 18:33:48 <rickflare> and we would be fine with that 18:33:54 <elmiko> but... 18:33:56 <elmiko> ;) 18:34:06 <rickflare> but customizing the images is really what we are aiming for 18:34:14 <elmiko> hmm 18:34:14 <rickflare> at least providing more secure instances 18:34:20 <rickflare> than what we have now 18:34:22 <elmiko> look at the work done on the image validation specs 18:34:23 <rickflare> like using ssl 18:34:30 <rickflare> within hadoop 18:34:33 <rickflare> etc 18:34:38 <elmiko> we are in the process of improving how we create images and deploy them 18:34:39 <rickflare> for the status pages 18:34:42 <elmiko> you may find it interesting 18:34:53 <rickflare> ive been looking at shara image elements 18:34:58 <elmiko> also, for ssl/kerb within the cluster we have a few options 18:35:00 <crobertsrh> Yeah, the new image creation bits might simplify things a bit 18:35:03 <rickflare> if that is what you are referring to 18:35:26 <elmiko> we could do something like adding a KDC such dogtag/ipa into the cluster and allow it to handle all kerb and tls stuff 18:35:34 <rickflare> YES 18:35:37 <rickflare> freeipa 18:35:40 <rickflare> would be amazing 18:35:46 <elmiko> but, we need to have sahara controlling the internal kdc to add users as necessary 18:35:49 <elmiko> OR 18:36:10 <elmiko> we could use something like apache knox to segregate a cluster, and use an external kdc to do authN 18:36:10 <rickflare> yes and keystone should control the internal kdc 18:36:13 <rickflare> in most cases 18:36:15 <elmiko> no 18:36:22 <elmiko> i disagree here 18:36:25 <rickflare> ok 18:36:48 <elmiko> i'm not sure we want to cross the streams of a kerb-backed keystone with users in the sahara cluster 18:37:00 <elmiko> i mean we *could* but i'm not convinced its the best method 18:37:11 <rickflare> so either have a external kdc or just have users manage the internal one that gets created 18:37:24 <elmiko> not users, we would let sahara manage the internal kdcs 18:37:36 <rickflare> k 18:37:39 <rickflare> k 18:37:48 <elmiko> it would be like an ephemeral kdc 18:37:56 <elmiko> living as long as the cluster 18:38:29 <elmiko> now, otoh, if we want to do something like an external kdc managed by the operator and that is also backing keystone, we might want to investigate using apache know 18:38:32 <elmiko> *knox 18:39:01 <rickflare> ok 18:39:05 <elmiko> i just think the identity management will get really unmanageable if you need to back keystone and have it control access to sahara clustres 18:39:09 <rickflare> i think having both would be great 18:39:15 <elmiko> it just seems like that will be complicated 18:39:22 <rickflare> this way it can plug into existing domains if needed 18:39:58 <rickflare> I think to start then having it external makes the most sense 18:40:03 <crobertsrh> This is sounding familiarly complex 18:40:19 <elmiko> yup 18:40:21 <elmiko> it is complex 18:40:27 <rickflare> because most environments will have some form of ldap or kdc 18:40:41 <rickflare> just being able to plugin would be a great start IMO 18:41:05 <elmiko> yea, it would be awesome 18:41:19 <elmiko> but it's tough to wrangle what sahara knows about identity with what the kdc will want 18:41:22 <rickflare> but to start off the service control is by far the biggest issue in the short term 18:41:28 <elmiko> remember, we only know what keystone tells us 18:42:00 <rickflare> have my cluster crap one and the only solution is to rebuild will be a tough sell for folks who can not regenerate data that has been ingested 18:42:27 <elmiko> i can see that 18:42:48 <elmiko> rickflare: pm me your email address, i'll send you a slide deck i made on secure sahara ideas 18:42:57 <rickflare> the ability to quick restart all services will be a massive improment 18:43:03 <elmiko> if anyone is interested pm me as well 18:43:09 <elmiko> agred 18:43:15 <elmiko> agreed, even 18:43:24 <rickflare> Taz and ysm 18:43:32 <rickflare> please message elmiko 18:43:47 <rickflare> sorry ryusk 18:43:47 <NikitaKonovalov> elmiko, please forward that to me 18:44:26 <rickflare> also after working with tmckay on spark for some time 18:44:42 <rickflare> log reporting from the batch jobs could be improved 18:44:49 <elmiko> NikitaKonovalov: sent 18:45:02 <elmiko> yea, we've been talking about how to improve logging 18:45:06 <rickflare> we spent quite a bit of time trouble shooting main class path errors 18:45:22 <elmiko> imo, i'd like to see something where we use zaqar to publish logs from the cluster nodes 18:45:39 <rickflare> might be cool to have logstash or elastic search stood up in a image that is a part of the cluster 18:45:45 <rickflare> that one can monitor output 18:45:58 <rickflare> zabbix or ganglia also come to mind 18:46:24 <elmiko> right, or for ultimate dogfooding 18:46:38 <elmiko> imagine sahara using a sahara cluster to process its own log data 18:46:41 <elmiko> *BAM* 18:46:57 <crobertsrh> trippy man 18:47:01 <elmiko> hehe 18:47:37 <elmiko> rickflare, huichun, thanks for bringing up all these new ideas 18:47:47 <rickflare> absolutely 18:47:49 <elmiko> i hope there are some posts to the ML for us to argue over ;) 18:47:53 <crobertsrh> Yeah, good stuff for sure 18:47:54 <rickflare> thank you guys for always being so awesome 18:47:56 <rickflare> and helpful 18:48:02 * elmiko blushes 18:48:07 <rickflare> really makes working on this fun 18:48:10 <rickflare> honestly 18:48:16 <elmiko> we like to have fun =D 18:48:46 <elmiko> ok, anything else? or should we gain 10 minutes of our day back? 18:49:05 <rickflare> just looked at the slides and at first glance 18:49:12 <rickflare> this is exactly what im talking about 18:49:15 <elmiko> =D 18:49:20 <crobertsrh> nice 18:49:22 <crobertsrh> +1 for #endmeeting 18:49:29 <elmiko> going once 18:49:31 <elmiko> ... 18:49:33 <rickflare> sold! 18:49:33 <elmiko> twice 18:49:35 <elmiko> ... 18:49:41 <elmiko> sold! 18:49:44 <elmiko> thanks everyone! 18:49:47 <elmiko> #endmeeting