Tuesday, 2019-11-12

*** rcernin_ has joined #openstack-sahara03:00
*** rcernin has quit IRC03:03
*** rcernin_ has quit IRC03:26
*** rcernin has joined #openstack-sahara03:26
*** rcernin has quit IRC07:22
*** tosky has joined #openstack-sahara07:52
*** tesseract has joined #openstack-sahara08:11
*** tosky has quit IRC08:32
*** tosky has joined #openstack-sahara08:37
*** irclogbot_2 has quit IRC09:39
*** irclogbot_0 has joined #openstack-sahara09:41
*** tosky has quit IRC09:41
*** tosky has joined #openstack-sahara09:42
*** rcernin has joined #openstack-sahara09:45
*** tosky_ has joined #openstack-sahara09:52
*** tosky has quit IRC09:54
*** tosky has joined #openstack-sahara09:58
*** tosky_ has quit IRC10:01
*** tosky_ has joined #openstack-sahara10:05
*** tosky has quit IRC10:08
*** rcernin has quit IRC10:11
*** tosky has joined #openstack-sahara10:14
*** tosky_ has quit IRC10:17
*** tosky_ has joined #openstack-sahara10:21
*** tosky has quit IRC10:24
*** tosky has joined #openstack-sahara10:34
*** tosky_ has quit IRC10:36
*** tosky_ has joined #openstack-sahara10:43
*** tosky has quit IRC10:45
*** tosky_ has quit IRC10:46
*** tosky has joined #openstack-sahara10:46
*** rcernin has joined #openstack-sahara10:50
*** tosky has quit IRC11:01
*** tosky has joined #openstack-sahara11:01
*** tosky_ has joined #openstack-sahara11:05
*** tosky has quit IRC11:06
*** tosky_ has quit IRC11:09
*** tosky has joined #openstack-sahara11:09
*** tosky has quit IRC11:17
*** tosky has joined #openstack-sahara11:23
*** rcernin has quit IRC11:26
*** tosky_ has joined #openstack-sahara11:47
*** tosky has quit IRC11:49
*** tosky has joined #openstack-sahara11:50
*** tosky_ has quit IRC11:53
*** dave-mccowan has joined #openstack-sahara12:20
*** openstackgerrit has quit IRC12:41
*** sapd1 has joined #openstack-sahara14:32
*** tosky_ has joined #openstack-sahara15:00
sapd1Hi everyone, I'm trying to create a job with datasource from S3. I'm using spark plugin (2.3) and minio for S3-like.15:00
*** tosky has quit IRC15:00
sapd1I put job binaries and datasource to S3. I have checked sahara-engine log and It could load job binary from S3. But It was failed. I think the problem related datasource.15:01
sapd1I'm using this example too: https://opendev.org/openstack/sahara-tests/src/branch/master/sahara_tests/scenario/defaults/edp-examples/edp-spark15:01
*** tosky_ is now known as tosky15:01
toskyI tested the S3 example some time ago (but the code hasn't changed)15:02
toskybut I tested mostly with the real S315:02
sapd1I got stdout log from spark job: http://paste.openstack.org/show/785976/15:13
jeremy__bounceri don't see much in that log - do you have stderr?15:15
toskyunfortunately I don't know that S3 provider; maybe there are some compatibility quirks15:15
sapd1tosky, how can I set 'edp.java.main_class' variable in job execute command line?15:18
sapd1jeremy__bouncer, There is nothing in stderr15:19
jeremy__bounceri believe it's openstack dataprocessing job execute ... --configs key:value15:23
toskyuh, I usually used some wrapper scripts which calls the API directly, or the UI; but the main class may be openstack dataprocessing job --mains...15:24
sapd1tosky, jeremy__bouncer Thanks. The correct option is `configs`15:25
sapd1I tried this command: openstack dataprocessing job execute --input input-example --output output --job-template wordcount2 --cluster aaaaa --configs  edp.java.main_class:sahara.edp.spark.SparkWordCount15:25
sapd1Is it correct?15:26
jeremy__bouncertosky: --mains is how you refer to a job binary in a job template (at least for certain plugins)15:26
toskyoh, right15:26
jeremy__bouncersapd1: it looks okay to me15:27
jeremy__bounceri haven't used the cli for a while though15:27
*** pcaruana has joined #openstack-sahara15:28
sapd1jeremy__bouncer, the stderr log: http://paste.openstack.org/show/785982/15:28
sapd1Which plugin are you using? vanilla or spark.15:28
jeremy__bouncersapd1: currently i don't have any deployed sahara. in the past i've used both vanilla and spark for spark jobs15:30
jeremy__bounceranyway, you are getting that error because you need to specify what file to count words for https://github.com/openstack/sahara-tests/blob/master/sahara_tests/scenario/defaults/edp-examples/edp-spark/wordcountapp/src/main/scala/sahara/edp/spark/SparkWordCount.scala#L2815:30
*** tosky has quit IRC15:31
*** tosky has joined #openstack-sahara15:32
sapd1jeremy__bouncer, So maybe the problem is job execute command is not correct.15:33
jeremy__bounceryeah, i guess you need --args15:33
jeremy__bouncerah, i know what it is15:34
jeremy__bouncerspark edp jobs do not take an input and output (whereas mapreduce edp and some other types take it)15:34
jeremy__bouncereverything for spark is done through args15:34
jeremy__bouncerone sec, gotta find the doc that explains this and expains how to reference a datasource in args15:35
jeremy__bouncer(this is all much clearer in ui, btw)15:35
jeremy__bouncerhttps://docs.openstack.org/sahara/queens/user/edp.html#using-data-source-references-as-arguments15:36
jeremy__bounceredp.substitute_data_source_for_name or  edp.substitute_data_source_for_uuid should be true in configs15:37
jeremy__bouncerand then args can contain stuff like datasource://name15:37
sapd1I see15:38
sapd1The args like: s3://bigdata/input-example.txt15:38
jeremy__bouncerif you put s3a:// stuff directly into args you will have to specify fs.s3a.* creds/configs manually15:41
jeremy__bouncerwhereas if you put datasource:// into args all that stuff will be taken care of15:41
jeremy__bouncertaken care of, in the definition of the data source, i mean15:41
sapd1Ah. thankyou. It's success15:43
jeremy__bouncercool15:44
sapd1How can I set edp.substitute_data_source_for_name option in the command line?15:48
sapd1It's not params, not configs and not args.15:49
sapd1I don't know how to add this option.15:49
jeremy__bouncerit should be configs15:53
sapd1'Exception in thread "main" java.io.IOException: No FileSystem for scheme: datasource' It does not work.15:54
jeremy__bouncerhmm...15:59
jeremy__bouncercan you try adding edp.spark.adapt_for_swift to configs also?15:59
jeremy__bounceras true15:59
jeremy__bouncer(i know s3 is not swift)16:00
sapd1jeremy__bouncer, I have tried on the horizon, and It worked.16:00
sapd1I want to try with command line.16:00
sapd1My command is: openstack dataprocessing job execute --args datasource://input-example datasource://output --job-template wordcount2 --cluster aaaaa --configs edp.substitute_data_source_for_name:True  --configs  edp.java.main_class:sahara.edp.spark.SparkWordCount16:00
sapd1But it does not work.16:00
tosky"datasource" is a placeholder name; it should be replaced by the type of the datasource (s3a in your case)16:00
jeremy__bouncertosky, nope16:01
toskyjeremy__bouncer: or did I misread everything? :)16:01
jeremy__bouncerit should actually work with datasource://16:01
toskyI probably forgot16:01
jeremy__bouncerelise came up with that16:01
sapd1tosky, It worked with datasource://16:01
sapd1It worked on the UI :D16:01
toskyoook16:01
jeremy__bouncersapd1: if it worked in horizon, then you should be able to view the details of the succeeded job execution and see what configs are present16:01
jeremy__bouncerand then replicate those configs in cli16:02
jeremy__bounceranyway i think in cli it should not be --configs multiple times, it should be like --configs k1:v1 k2:v216:02
jeremy__bouncerthat's how openstackclient usually likes things, i think16:02
sapd1jeremy__bouncer, Thanks16:09
sapd1The correct command is: openstack dataprocessing job execute --args datasource://input-example datasource://output --job-template wordcount2 --cluster aaaaa --configs edp.substitute_data_source_for_name:True  edp.java.main_class:sahara.edp.spark.SparkWordCount  edp.spark.adapt_for_swift:True16:09
jeremy__bouncersapd1: so that command worked?16:10
sapd1Yes.16:10
sapd1I need to define the option adapt_for_swift too.16:10
jeremy__bouncersapd1: awesome, good to know16:10
sapd1thanks for your help. you guys.16:11
jeremy__bouncerno problem, sorry it was not so straightforward16:11
*** tesseract has quit IRC16:32
*** pcaruana has quit IRC17:03
*** rcernin has joined #openstack-sahara21:17

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!