14:00:15 #startmeeting sahara 14:00:16 Meeting started Thu Jan 11 14:00:15 2018 UTC and is due to finish in 60 minutes. The chair is tellesnobrega. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:19 The meeting name has been set to 'sahara' 14:00:24 o/ 14:00:56 o/ 14:01:52 o/ 14:01:57 we all are here 14:01:58 Hi guys 14:02:03 that is new :) 14:02:05 hey all 14:02:11 lets get started 14:02:16 #topic News/Updates 14:03:25 I'm doing small fixes on the images, not much left to do there. Now I'm focusing on fixing some of the bugs we talked about in Denver (file transfer is the first). Also I want to start testing S3 integration as well as APIv2 14:04:24 still checking the images produced by sahara-image-pack; finally fixed the CLI tests (the patch for tempest just landed); starting to tackle sahara-extra and upload of oozie tarballs 14:04:26 Business travel takes me half of the week. Get back yesterday and start working on solve problem in vanilla upgrade 14:04:53 (and nagging and complaining for mass-doc changes, but that touched sahara only partially) 14:04:53 i haven't had much time to focus on upstream.... but i have (finally) fixing s3_hadoop on my mind. and also apiv2 in client 14:05:46 re apiv2 in client, i think i will post a patch doing version discovery the wrong way, then have monty fix it (The patch isn't just that, it also will have all the other v1->v2 changes reflected) 14:06:12 jeremyfreudberg, we need to start testing APIv2 asap, when are you sending the patch? 14:06:25 a rough estimation is ok 14:07:10 late tonight 14:07:13 hopefully 14:07:17 perfect 14:07:18 thanks 14:07:29 np 14:07:38 wrong patch forcing monty to fix it, I like the strategy :) 14:08:00 yes, my version of cunningham's law 14:08:09 what could go wrong using that? 14:09:18 more updates? 14:10:06 well, about force deletre 14:10:08 delete 14:10:40 i have a patch that you guys can review BUT actually i am thinking about it some more and i think stack abandon might be not the best choice 14:11:04 what is the choice you are thinking of? 14:11:38 just do a stack delete, but don't wait for deletion. then, unhide the stack 14:12:10 let's discuss the details now or let's switch topic? 14:12:24 tosky, yes, probably switch topic if no more updates 14:12:29 jumped the gun a bit 14:12:33 let me switch topic so we can discuss it 14:12:44 #topic Force Delete details 14:12:53 you have the floor jeremyfreudberg 14:13:39 current version of the patch does force delete like this: 1) put stack in deleting state 2) immediately abandon it 3) drop from db and other cleanup 14:13:44 drop cluster, i mean 14:14:16 my new idea is: 1) put stack in deleting state, don't wait for deletion to complete 2) unhide stack 3) drop cluster from db 14:14:54 the problem with abandon was that the user has no idea what was orphaned 14:15:15 the step that I don't understand, probably due to lack of knowledge on heat, but what does unhide stack means 14:15:28 could you explain about the step 2 14:15:38 yes, unhide stack is actually a very funny thing 14:15:42 i will explain 14:16:10 sahara actually has a special arrangement with heat, that all stacks with a tag data-processing-cluster are "hidden" 14:16:38 so you need to specifically query for hidden stacks to see them in api/cli 14:17:02 yes, heat stack-list -a 14:17:07 but if i remove the tag from the stack, suddenly the user can see the stack and will know to clean stuff up 14:17:44 by user you mean cloud operator? 14:18:04 by user i mean the real user 14:18:32 ok 14:18:39 that makes sense to me 14:18:42 shuyingy_, tosky ? 14:19:37 I don't understand what can we benefit if we see the stack 14:19:59 and the "drop cluster from db" solves the issue when heat stack is clean, but the cluster is stuck in deletion state from the sahara point of view? 14:21:15 tosky, my force delete solution fixes (by my estimate) 90%+ of cases where cluster is stuck deleting because heat stack won't finish deleting 14:21:41 by providing alternate way of getting to the final "drop from db" step that doesn't wait for heat stack delete to finish 14:22:21 your case where sahara is still in deleting but stack is already gone is not fixed by my patch, but if the stack is already gone and you issue the delete call again the delete should finish fine 14:22:41 jeremyfreudberg, I have tried to delete the stack in "heat stack-list -a", but most of time sahara can't delete because stack status is failed 14:23:13 I see 14:23:33 after the stack is completely gone that should be ok shuyingya 14:23:48 basically here's my point: 14:23:56 I got it. 14:24:10 * jeremyfreudberg will summarize anyway 14:24:17 please do jeremyfreudberg 14:25:28 the only way we can have a meaningful force delete option here is if we don't await stack deletion and proceed on to drop cluster from db regardless of stack status 14:25:31 the implications of that: 14:25:45 - user might still want to know what resources didn't end up deleted 14:26:11 - but, if the stack really was undeleteable (DELETE_FAILED) then that still doesn't mean that the user will have any easier of a time 14:26:31 i mean the user's manual cleanup still won't be easy 14:27:50 i think we all get the idea, i don't want to waste half the meeting on it 14:28:01 please follow up on the patch itself if you have concerns 14:28:07 jeremyfreudberg, If I remember correctly, heat has an option to configure what kind of stack should be hidden. how can you unhiden it? 14:28:26 you can do a PATCH on stack to update tags list 14:28:33 i was planning just to remove the tag 14:28:44 the conf option stays the same 14:29:12 thanks 14:29:20 totally understand it 14:29:33 np 14:30:05 are we all settled on this topic? 14:30:20 yep 14:30:24 yep 14:30:24 nope 14:30:50 shuyingya, continue 14:31:04 jeremyfreudberg, why stack-abandon is not a proper way? 14:33:46 it would work okay, but it's unnecessary. it basically means "drop stack from heat db" but why should we be hiding stuff from user, i guess is my point 14:33:54 sorry, network broken 14:34:51 I guess the point boils down to, does the end user need to know about heat stack status when using sahara? 14:35:40 tellesnobrega, basically yes 14:37:16 i will continue to think about it, but i really think we should move on so we can use meeting time more efficiently 14:37:27 I need take some time to think about it 14:37:35 let's move on 14:37:43 we can continue on the patch 14:37:59 #topic Dublin PTG 14:38:09 did you receive the email form me? 14:38:20 I started an etherpad so we can start putting stuff together for the PTG 14:38:31 shuyingya_, yes, saw email 14:38:39 #link https://etherpad.openstack.org/p/sahara-rocky-ptg 14:38:57 let's continue tellesnobrega's topic 14:39:00 :) 14:39:12 yep, we can talk about oozie afterwards, I have few questions too 14:39:12 please take some time to write down topic ideas, so we can start getting ready for the PTG 14:39:29 will do 14:39:52 also, I know tosky and I will be there, jeremyfreudberg almost sure, how about you shuyingya_ ? 14:39:55 sure (I was thinking about copy-pasting the final TODO list from Queens PTG for easier comparison) 14:40:27 tosky, we can certainly do that 14:40:28 I am not sure right now. maybe not 14:40:47 if you can apply to TSP please do 14:41:02 you are a core reviewer, they will probably see you as high priority 14:41:41 I was also thinking, what do you guys think of creating a trello board so we can have a better tracking system of sahara todos, feature status and so on 14:41:42 ? 14:41:56 wait, before you answer that question 14:42:06 I will close this topic and change to Open Discussion 14:42:18 and after this we can jump into oozie and openstack-hadoop stuff 14:42:25 #topic Open Discussion 14:42:33 what do you guys think of creating a trello board so we can have a better tracking system of sahara todos, feature status and so on? 14:43:41 trello is a decent idea (and actually i am already doing that) 14:44:05 this tool looks like cool 14:44:30 tosky, shuyingya_, any objections? I know some people hate it 14:44:49 I am OK with it. 14:45:17 why not trying storyboard then? 14:45:17 the one objection that someone could have is why not storyboard maybe 14:45:25 * tosky <- someone 14:45:31 yes, beat tosky by one second 14:45:39 could be, I never used storyboard 14:46:10 by trello I meant some place we can have a better tracking system than keep going back to an 3-6 months old etherpad 14:46:21 I never used storyboard and trello :( 14:46:34 in fact "migrate to storyboard" is something I'd like to propose for Rocky - it's not going to be a community goal, but still we will use it at some point 14:47:12 tosky, is it possible to do a "soft" transition, meaning keep bugs on LP for now and just to do Task tracking on Storyboard? 14:47:38 jeremyfreudberg: I don't know, it's something to ask to... uhm... I don't know :) 14:47:51 TC maybe? 14:47:59 probably (or they can redirect) 14:48:09 tosky, do you have a link to storyboard? 14:48:37 and I will follow up with that 14:48:53 since we mainly agree that we need something better than what we have today 14:49:10 yes, as long as we have some tracking tool then i will be happy 14:49:29 let's use the last ten minutes for shuyingya_ oozie stuff 14:49:37 yeah, 14:49:40 http://storyboard.openstack.org/ 14:49:49 thanks 14:50:53 do you have any questions about the description in the email? 14:51:07 tellesnobrega: topic! 14:51:17 oh, open discussion already, sorry 14:51:46 yes 14:51:47 shuyingya_, basically you are saying that running the job normally against swift is working, but when you use oozie to launch job, something is missing 14:51:55 good news is that all testcases has passed. 14:52:40 jeremyfreudberg, do you mean you want know how to start the job? 14:53:03 the command is :openstack dataprocessing job execute --cluster gogo --job-template vanilla-job-template --configs mapred.map.class:org.apache.oozie.example.SampleMapper mapred.reduce.class:org.apache.oozie.example.SampleReducer mapreduce.framework.name:yarn fs.swift.service.sahara.username:admin fs.swift.service.sahara.password:admin --input 92e6b254-66a0-4b05-a989-40993bb5d9b7 --output 0cced149-5400-4afa-9d49-32295d74065d 14:53:24 one thing that I noticed and I don't understand is: why the generated jar is always hadoop-openstack-3.0.0-SNAPSHOT.jar , even if we specify different values of hadoop.version? 14:53:46 tosky, i think that naming is given by something in pom, but it doesn't mean anything 14:53:49 hardcoded somewhere 14:53:50 what is the difference then? 14:54:23 shuyingya_, what i meant was just to clarify the meaning of the email. there is the oozie way (sahara edp) and the normal way (hadoop command?) 14:54:55 tosky, https://github.com/openstack/sahara-extra/blob/master/hadoop-swiftfs/pom.xml#L22 14:54:59 I run it with sahara comand 14:55:28 jeremyfreudberg, for us, the normal way is using oozie (sahara way) 14:55:36 shuyingya_, i see. my mistake. i was imagining things 14:55:39 yes, hadoop-openstack-3.0.0-SNAPSHOT.jar doesn't mean anything 14:56:34 shuyingya_, i'm wondering if you can try the job with `hadoop jar ...` maybe the oozie classpath is different than normal hadoop classpath 14:57:40 what changed so httpclient is not present anymore? and what is the best way to put it back on? 14:57:56 shuyingya_, we can also try to get this change in: https://github.com/apache/hadoop/commit/62579b69a0a294ba1ea14cf76c650b640f89f331#diff-7debb0b87e1651538db48ac873db6a9c 14:58:17 apache people just stopped importing httpclient completely 14:58:26 and rewrote to use a different library 14:59:15 from our side? what need httpclient? can we change it to use httpresponse? 14:59:23 jeremyfreudberg, I know what you mean and I also thought about to apply the change in 3.0.0. have no time to test 14:59:30 https://github.com/openstack/sahara/blob/b3ff28609a916362fe113b2f9fb9b48db33aac09/sahara/plugins/vanilla/hadoop2/run_scripts.py#L179 14:59:40 :) I will try it out tomorrow 15:00:05 if you say uploading the jar fixes it, we can just do that too 15:00:30 quick first fix until we come up with something better 15:00:35 we are at the top of the hour 15:00:41 we have to close it now 15:00:50 yes 15:00:57 thanks guys, shuyingya_ update us on progress on this and what do you think will be the best approach for now 15:01:08 jeremyfreudberg, if you need help with APIv2 let me know 15:01:14 If you guys OK, I will post the patch with work around method tomorrow 15:01:15 tosky, do your thing :) 15:01:25 sounds good shuyingya_ 15:01:29 O.o 15:01:29 thanks all 15:01:31 o/ 15:01:40 #endmeeting