18:00:30 #startmeeting sahara 18:00:31 Meeting started Thu Feb 18 18:00:30 2016 UTC and is due to finish in 60 minutes. The chair is SergeyLukjanov. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:32 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:00:34 The meeting name has been set to 'sahara' 18:00:35 😀 18:00:36 hello/ 18:00:38 o/ 18:00:45 heyo/ 18:00:47 hi! 18:00:52 hi 18:00:55 hello :) 18:01:00 #link https://wiki.openstack.org/wiki/Meetings/SaharaAgenda 18:01:08 #topic News / updates 18:01:16 client release is on review 18:01:22 hi 18:01:59 https://review.openstack.org/#/c/280893/ 18:02:13 SergeyLukjanov, are we are going to have additional one for health checks? 18:02:17 UI rework is still up for review. As vgridnev pointed out this morning, the integration tests are broken in those patches. I may need a bit of help to sort out what tweaks are required. 18:02:32 vgridnev yup 18:02:39 i've been adding more to the api v2 work, and also looking into a few more security issues 18:02:55 speaking of which, SergeyLukjanov did you ever see that security bug i logged? 18:03:08 SergeyLukjanov, great 18:03:18 elmiko not yet :) could you pleas re-send link? 18:03:25 yup, 1sec 18:03:52 SergeyLukjanov: https://bugs.launchpad.net/sahara/+bug/1541122 18:03:53 elmiko: Error: malone bug 1541122 not found 18:04:26 elmiko sad 18:04:42 SergeyLukjanov: what's sad is that i have a few more of these to post :/ 18:04:51 yup 18:05:10 but, we need to fix em =) 18:05:12 I think we should fix it as part of adding castelan based secure store to sahra 18:05:28 and ensure that all of this stuff is configurable through UI 18:05:31 API 18:05:41 to make users able to specify some password 18:05:49 interesting thought 18:06:06 and i agree, it will help operators to have more access to these values 18:06:47 but, to begin with, we can fix up their hardcoded usage 18:06:58 agree 18:08:40 any else news? 18:08:48 #topic API v2 progress 18:08:54 elmiko your turn :) 18:08:59 #link https://review.openstack.org/#/c/273316/ 18:09:03 #link https://wiki.openstack.org/wiki/Sahara/api-v2 18:09:09 thanks 18:09:15 we need more reviews on the initial commit for v2 18:09:31 also, i am adding more items to the wiki 18:09:43 i have some local patches that depend on the initial commit, should i start posting these? 18:09:55 i thought maybe i should wait until the first merges 18:10:33 if there are no opinions, i'm just going to start posting them =D 18:11:23 yeah, probably wait for the first merges 18:11:27 to avoid rebase hell 18:11:35 yup, that was my thought too 18:11:49 so, everyone go review https://review.openstack.org/#/c/273316/ !!! 18:11:54 please =D 18:12:05 that's all from me 18:12:19 elmiko thx! 18:12:27 #topic Open discussion 18:12:52 good afternoon everyone 18:13:25 I wanted to simple start out by thanking everyone who has continued to provide support for me! 18:13:33 Also I would appreciate if someone will review health checks stuff https://review.openstack.org/#/q/status:open++branch:master+topic:bp/cluster-verification 18:13:39 vgridnev thx for your helpful review comments about is_engine_implement fun, never notice before, really wanted 18:13:54 I will update suspend EDP patch 18:13:56 huichun, np 18:14:09 ok, thanks! 18:14:17 I have had several items that I feel could be very helpful to this project 18:14:17 vgridnev: I will give them a review today or tomorrow at the latest. 18:15:05 SergeyLukjanov: I have an idea for replace the current Oozie engine with Luigi 18:15:23 Guys, what do you think about Oozie? 18:16:08 https://github.com/spotify/luigi 18:16:48 Oozie need tomcat, write XML and need extra jar file by running jobs 18:17:11 huichun: i have not used luigi, but i don't see why we wouldn't consider a proposal to add it as well as the OozieEngine 18:17:27 oh, you want to replace with luigi? 18:17:41 I have not experienced Luigi either 18:18:06 need to have a look at it 18:18:10 seems nice, and it's python 1 18:18:12 er +1 18:18:23 elmiko: you mean add Luigi as a EDP engine like Oozie? 18:18:28 Luigi is Python 18:18:40 huichun: that's what i thought, but i missed that you said "replace" oozie 18:18:52 huichun: which Job types are supported? 18:19:00 Yes, my first thought is replacing 18:20:35 i don't have a strong opinion either way, but i can see a few course of action; 18:20:37 NikitaKonovalov: Luigi support all Oozie can do, batch jobs and with dependency resolution workflow management 18:20:45 1. write a spec, so we can debate on review 18:21:00 2. make an option to allow either oozie or luigi, so we can keep backwards compat 18:21:10 3. plan a future migration away from oozie, if we desire 18:21:32 does that sound reasonable? 18:21:46 agreed with elmiko 18:21:46 +1 elmkio: fine plan 18:22:05 elmiko: yes, it's my original idea currently 18:22:12 huichun: great! 18:23:07 elmiko: just working on EDP engine parts in Sahara, and make lots of research on others workflow engine and have this idea 18:23:26 cool, thanks for bringing it up =) 18:24:07 huichun: i think it's great if we can keep up to date with new technologies that might improve our experience. 18:24:07 I've not seen luigi as well 18:24:10 * tosky reads Luigi 18:24:16 what is this thing called like me? 18:24:22 tosky: https://github.com/spotify/luigi 18:24:28 hehe 18:24:30 oh, thanks, I missed that 18:24:37 np 18:24:38 tosky, we plan to just have you launch jobs for all of our users. We figured you have the spare time. 18:24:45 yup, what crobertsrh said 18:24:59 just tosky operating on 1000s of nodes, manually running jobs 18:25:09 '-_- 18:25:14 LOL 18:25:15 so guys i have stood up a few clusters 18:25:21 close to a 100 or so nodes 18:25:22 lol 18:25:34 and one of the biggest issues I see is cluster persistance 18:26:12 when running hadoop is it not uncommon to have data nodes go down for a number of possible reasons 18:26:38 the horizon interface really needs a method to restart services 18:27:04 very much in the manner in which cloudera manager and custom dev ops tools would 18:27:21 sorry folks, I need to go earlier today 18:27:24 #chair elmiko 18:27:25 Current chairs: SergeyLukjanov elmiko 18:27:36 i think this is an interesting idea, makes me wonder if we will have much overlap with reproducing cloudera manager functionality 18:27:40 SergeyLukjanov: no worries 18:27:53 elmiko you will somewhate 18:28:13 also, vgridnev, is this something that we might lead to eventually with the cluster health checks? 18:28:20 I think it might be a nice addition since we're already adding health checks, service restarts might be a nice thing to have 18:28:21 however for the the vanilla distributions it would be awesome 18:28:28 rickflare: agreed 18:28:40 restarts would be amazing 18:28:42 vanilla is tough though, because we are the only support mechanism for it 18:28:55 Taz and I are willing to help with that 18:29:03 cool 18:29:08 we have tons and I do mean tons of experience managing hadoop 18:29:11 great 18:29:22 i'd say take a look at the cluster health check stuff and maybe propose a spec about doing service restarts 18:29:35 ok 18:29:37 we can certainly help fill in the details about the specifics of how sahara works 18:29:42 it sounds like a fine idea 18:29:50 this kind of leads into my second idea 18:29:57 that is centered around security 18:30:21 elmiko, I think that we can do some kind of auto scaling ideas or / and some kind of restarts 18:30:36 some clusters are going to need update, E.G. kernel patching, glibc patch etc 18:30:46 vgridnev: yea, i think there might be nice integration between health checks and service restarts 18:30:57 I would like to propose a hardend version of some of the images 18:31:07 rickflare: sounds great 18:31:16 like a vanilla that uses hadoop with kerberos 18:31:18 like, building on the centos images? 18:31:25 ooh, my favorite topic 18:31:27 elmiko exactly 18:31:33 we have talked about kerb integration before 18:31:54 i have a document i should share with you, we had a big session about it in vancouver (i think, or was that paris) 18:32:06 id like to see openscap intergrated and perhaps even drop in dev op tools like saltstack or puppet into the clusters 18:32:10 there are several questions surrounding kerb integration though 18:32:22 that's an interesting thought 18:32:27 this way if users need to customize anything in the cluster they can 18:32:34 hmm 18:32:51 you may want to propose this as an idea on the ML, it's a big topic 18:32:52 for us the biggest hurdle for pushing sahara will be our ability to control and push security 18:32:57 right 18:32:58 i know 18:33:23 but, if you create your own images, you could certainly run puppet outside the cluster and update images in the cluster with it, no? 18:33:24 and I figured I start her because you guys have been so receptive 18:33:44 yes you can 18:33:48 and we would be fine with that 18:33:54 but... 18:33:56 ;) 18:34:06 but customizing the images is really what we are aiming for 18:34:14 hmm 18:34:14 at least providing more secure instances 18:34:20 than what we have now 18:34:22 look at the work done on the image validation specs 18:34:23 like using ssl 18:34:30 within hadoop 18:34:33 etc 18:34:38 we are in the process of improving how we create images and deploy them 18:34:39 for the status pages 18:34:42 you may find it interesting 18:34:53 ive been looking at shara image elements 18:34:58 also, for ssl/kerb within the cluster we have a few options 18:35:00 Yeah, the new image creation bits might simplify things a bit 18:35:03 if that is what you are referring to 18:35:26 we could do something like adding a KDC such dogtag/ipa into the cluster and allow it to handle all kerb and tls stuff 18:35:34 YES 18:35:37 freeipa 18:35:40 would be amazing 18:35:46 but, we need to have sahara controlling the internal kdc to add users as necessary 18:35:49 OR 18:36:10 we could use something like apache knox to segregate a cluster, and use an external kdc to do authN 18:36:10 yes and keystone should control the internal kdc 18:36:13 in most cases 18:36:15 no 18:36:22 i disagree here 18:36:25 ok 18:36:48 i'm not sure we want to cross the streams of a kerb-backed keystone with users in the sahara cluster 18:37:00 i mean we *could* but i'm not convinced its the best method 18:37:11 so either have a external kdc or just have users manage the internal one that gets created 18:37:24 not users, we would let sahara manage the internal kdcs 18:37:36 k 18:37:39 k 18:37:48 it would be like an ephemeral kdc 18:37:56 living as long as the cluster 18:38:29 now, otoh, if we want to do something like an external kdc managed by the operator and that is also backing keystone, we might want to investigate using apache know 18:38:32 *knox 18:39:01 ok 18:39:05 i just think the identity management will get really unmanageable if you need to back keystone and have it control access to sahara clustres 18:39:09 i think having both would be great 18:39:15 it just seems like that will be complicated 18:39:22 this way it can plug into existing domains if needed 18:39:58 I think to start then having it external makes the most sense 18:40:03 This is sounding familiarly complex 18:40:19 yup 18:40:21 it is complex 18:40:27 because most environments will have some form of ldap or kdc 18:40:41 just being able to plugin would be a great start IMO 18:41:05 yea, it would be awesome 18:41:19 but it's tough to wrangle what sahara knows about identity with what the kdc will want 18:41:22 but to start off the service control is by far the biggest issue in the short term 18:41:28 remember, we only know what keystone tells us 18:42:00 have my cluster crap one and the only solution is to rebuild will be a tough sell for folks who can not regenerate data that has been ingested 18:42:27 i can see that 18:42:48 rickflare: pm me your email address, i'll send you a slide deck i made on secure sahara ideas 18:42:57 the ability to quick restart all services will be a massive improment 18:43:03 if anyone is interested pm me as well 18:43:09 agred 18:43:15 agreed, even 18:43:24 Taz and ysm 18:43:32 please message elmiko 18:43:47 sorry ryusk 18:43:47 elmiko, please forward that to me 18:44:26 also after working with tmckay on spark for some time 18:44:42 log reporting from the batch jobs could be improved 18:44:49 NikitaKonovalov: sent 18:45:02 yea, we've been talking about how to improve logging 18:45:06 we spent quite a bit of time trouble shooting main class path errors 18:45:22 imo, i'd like to see something where we use zaqar to publish logs from the cluster nodes 18:45:39 might be cool to have logstash or elastic search stood up in a image that is a part of the cluster 18:45:45 that one can monitor output 18:45:58 zabbix or ganglia also come to mind 18:46:24 right, or for ultimate dogfooding 18:46:38 imagine sahara using a sahara cluster to process its own log data 18:46:41 *BAM* 18:46:57 trippy man 18:47:01 hehe 18:47:37 rickflare, huichun, thanks for bringing up all these new ideas 18:47:47 absolutely 18:47:49 i hope there are some posts to the ML for us to argue over ;) 18:47:53 Yeah, good stuff for sure 18:47:54 thank you guys for always being so awesome 18:47:56 and helpful 18:48:02 * elmiko blushes 18:48:07 really makes working on this fun 18:48:10 honestly 18:48:16 we like to have fun =D 18:48:46 ok, anything else? or should we gain 10 minutes of our day back? 18:49:05 just looked at the slides and at first glance 18:49:12 this is exactly what im talking about 18:49:15 =D 18:49:20 nice 18:49:22 +1 for #endmeeting 18:49:29 going once 18:49:31 ... 18:49:33 sold! 18:49:33 twice 18:49:35 ... 18:49:41 sold! 18:49:44 thanks everyone! 18:49:47 #endmeeting