Thursday, 2014-12-04

*** hdd has quit IRC00:39
*** tellesnobrega_ has quit IRC01:10
*** hdd has joined #openstack-sahara01:16
*** tellesnobrega_ has joined #openstack-sahara01:32
*** hdd has quit IRC01:49
*** hdd has joined #openstack-sahara02:10
*** Networkn3rd has joined #openstack-sahara02:18
*** zhidong has joined #openstack-sahara02:21
*** zhidong has quit IRC02:23
*** tellesnobrega_ has quit IRC03:10
openstackgerritAndrew Lazarev proposed openstack/sahara: Disabled requiretty in cloud-init script  https://review.openstack.org/13894203:12
openstackgerritAndrew Lazarev proposed openstack/sahara: Fixed Fake plugin for Fedora image  https://review.openstack.org/13894303:21
*** hdd has quit IRC03:26
*** Networkn3rd has quit IRC04:44
*** Networkn3rd has joined #openstack-sahara04:52
*** Networkn3rd has quit IRC04:57
*** hdd has joined #openstack-sahara05:06
*** samuelms has quit IRC05:10
*** samuelms has joined #openstack-sahara05:11
*** Poornima has joined #openstack-sahara05:12
openstackgerritKazuki OIKAWA proposed openstack/sahara: Add edp.java.adapt_for_oozie config for Java Action  https://review.openstack.org/11588405:21
*** Longgeek has joined #openstack-sahara05:32
*** hdd has quit IRC05:40
*** tnovacik has joined #openstack-sahara06:25
*** k4n0 has joined #openstack-sahara06:54
*** tnovacik_ has joined #openstack-sahara07:15
openstackgerritSergey Reshetnyak proposed openstack/sahara-image-elements: Fix bashate errors  https://review.openstack.org/13897608:00
*** witlessb has joined #openstack-sahara08:26
*** stannie has joined #openstack-sahara08:28
*** skolekonov has joined #openstack-sahara10:02
*** tellesnobrega_ has joined #openstack-sahara10:34
*** stannie has quit IRC10:50
*** tellesnobrega_ has quit IRC11:04
*** tnovacik_ has quit IRC11:17
*** tellesnobrega_ has joined #openstack-sahara11:28
openstackgerritTelles Mota Vidal Nóbrega proposed openstack/sahara: Storm integration  https://review.openstack.org/13769911:29
*** tnovacik_ has joined #openstack-sahara11:32
*** Poornima has quit IRC11:48
SergeyLukjanov_crobertsrh, tmckay, +2 from me to the client bump in global requirements12:06
openstackgerritSergey Reshetnyak proposed openstack/sahara: Minor refactoring integration tests  https://review.openstack.org/13335012:24
*** tnovacik_ has quit IRC12:27
openstackgerritSergey Reshetnyak proposed openstack/sahara: Minor refactoring integration tests  https://review.openstack.org/13335012:27
openstackgerritSergey Reshetnyak proposed openstack/sahara: Minor refactoring integration tests  https://review.openstack.org/13335012:30
openstackgerritSergey Reshetnyak proposed openstack/sahara: Run 3 transient cluster in parallel mode  https://review.openstack.org/13876512:36
openstackgerritSergey Reshetnyak proposed openstack/sahara: Minor refactoring integration tests  https://review.openstack.org/13335012:36
*** IvanBerezovskiy has quit IRC12:42
*** IvanBerezovskiy has joined #openstack-sahara12:43
*** zhidong has joined #openstack-sahara12:48
*** hdd has joined #openstack-sahara13:14
openstackgerritSergey Lukjanov proposed openstack/sahara: Update sahara.conf.sample after oslo.msg release  https://review.openstack.org/13904613:27
*** tellesnobrega_ has quit IRC13:35
*** weiting has joined #openstack-sahara13:37
*** _mattf is now known as mattf13:44
*** tnovacik has quit IRC13:47
*** tellesnobrega_ has joined #openstack-sahara13:48
*** _crobertsrh is now known as crobertsrh13:59
aignatovmeeting?14:00
elmikoit is 1400utc14:00
*** tmckay has joined #openstack-sahara14:01
*** Longgeek has quit IRC14:01
SergeyLukjanovyeah14:01
*** egafford has joined #openstack-sahara14:03
tmckaycrobertsrh, elmiko, I was thinking about data_source args some more this morning (exciting life, I know :) )14:04
crobertsrhOk.  You'll have to enlighten us :)14:04
tmckayI'm going to tweak the spec to say "name or uuid" instead of just name, and I think I'm going to allow it to work across configs, params, and args (all three instead of just args)14:05
tmckaycrobertsrh, that way, if you wanted to construct a pig job for instance that took N inputs, you could do it easily (params)14:06
*** miqui_ has joined #openstack-sahara14:06
crobertsrhGood idea14:06
elmikotmckay: meeting in openstack-meeting-314:06
tmckayelmkio, k.  crobertsrh, or if you used a mapreduce class that supports multiple files (there are some, I skimmed them) you could specify them all through multiple mapred.blah configs14:07
tmckayforgot about the new meeting :)14:07
egaffordtmckay: Nice; that should help a lot with the many-to-many input-output case.14:09
tmckayyeah, theoretically it's possible with this to deprecate that embedded input id and output id fields completely14:10
egaffordtmckay: Might be better to, really, if it's dead set on being a single value field.14:11
*** Longgeek has joined #openstack-sahara14:12
tmckaythe only possible hiccup here is that we potentially need some mechanism to mark an arg/param/config value as literal.  The case where it is a string that happens to match the name of a data_source14:13
tmckayedge case14:13
tmckaysome syntactic sugar, maybe14:14
* tmckay has to change locations14:14
egaffordtmckay: Input and output sources are fundamental enough that I can see the value of a dedicated API field, but it's certainly true that params/configs/args map much better to  the jobs themselves. Indeed; that's a tricky one. Is there any parallel case in OpenStack to date (escaping a ref by name?)14:15
tmckaynot aware14:15
egaffordtmckay: Wait, you can't effortlessly disprove universal negatives?14:16
egaffordtmckay: You've gotta get that upgrade. :)14:17
*** tellesnobrega_ has quit IRC14:21
*** crobertsrh has quit IRC14:21
*** tellesnobrega_ has joined #openstack-sahara14:24
*** tmckay has quit IRC14:25
*** crobertsrh has joined #openstack-sahara14:32
*** k4n0 has quit IRC14:33
*** miqui_ has quit IRC14:36
*** tellesnobrega_ has quit IRC14:38
*** hdd has quit IRC14:38
*** miqui_ has joined #openstack-sahara14:47
*** tmckay has joined #openstack-sahara14:47
tmckayegafford, there is a way to mark literals unambiguously, but I'm not sure I like it.  They can simply be listed.  If the feature is on to treat all values as potential data_source references, then separate edp.whatever configs can be added that list keys of configs/params and positions of args that should be absolutely literal.14:57
*** Networkn3rd has joined #openstack-sahara14:57
tmckayfor the UI, this would be transparent14:57
tmckayIt seems like a lot of effort for an edge case, but it solves the problem14:58
tmckayyou could do the inverse, too -- list the keys/arg positions that should be translated14:58
tmckayavoids the problem of coming up with unambiguous syntax14:58
*** Networkn3rd has quit IRC14:59
tmckayin most cases, there would be no need to list any literals.15:00
egaffordThe addition of a separate field specifically for literal tracking does seem unfortunate, but not necessarily worse than (similarly collision-prone) punctuation spaghetti on the values themselves.15:00
*** weiting has quit IRC15:00
tmckaywell, it's not exactly a separate field.  We already have a catch-all for configs, and a "edp." prefix for things that should be consumed by Sahara rather than Oozie, for example15:01
tmckayso, no json change15:01
*** Networkn3rd has joined #openstack-sahara15:01
tmckaycompletely backward compatible15:01
tmckaywe just optionally throw a config in there15:01
SergeyLukjanovmattf, interesting point about bug fixed in Ceilometer, it was approved by my wife :)15:02
egaffordSure, there's no change at the jsonschema level. Granted, that makes things a bit more opaque for those who need the feature, but in this case I can definitely see that opacity for the edge case is better than confusion for the main case.15:03
rharwoodpackaging question: a quick look suggests that sahara is packaged in Debian experimental, and not at all in Ubuntu.  Is this correct?15:04
mattfSergeyLukjanov, cheers to her!15:05
*** samuelms is now known as samuelms-away15:05
tmckaythe other possibility is to do this all at the UI level -- replace args/values before it's ever submitted.  But that doesn't improve the situation at the CLI or client level (you still have to setup up data_source paths by hand)15:06
SergeyLukjanovrharwood, we have no folks who'd like to maintain packages in ubuntu15:06
egaffordtmckay: Indeed, and I can personally attest that for automators, forcing those additional uuid insertions is a bit more of a pain than is needed.15:06
SergeyLukjanovrharwood, if you want to volunteer :)15:07
tmckayegafford :) yeah.  I don't speak uuid15:07
rharwoodSergeyLukjanov: thanks.  Maybe in the future... my plate is full right now rewriting the puppet module and doing the packstack integration15:08
egaffordtmckay: I mean, uuids are kinda great; don't get me wrong. Still, being able to just post reliably referential payloads without templating is also kinda great.15:09
tmckayegafford, hmm, maybe there is a simple syntactic mechanism. let15:10
tmckaylet's say there is some prefix, "literal." for the sake of discussion.  you could prepend that.  Sahara could strip it off and ignore.  If you really needed a "literal.foo" in your arg list at hadoop run time, you just double it15:11
elmikothat sounds messy15:11
tmckayelmiko, trying to cover the corner case where the is a name collision between a data_source reference and an app argument15:12
tmckayunlikely, but I hate holes15:12
tmckayyou would probably never need it15:12
elmikotmckay: yea, i've been following. it's a tough problem to solve15:12
egaffordtmckay: Yeah, you could do a prefix syntax (like C#'s @'thing'), but that actually seems messier to me than having the dedicated field list.15:13
openstackgerritMatthew Farrellee proposed openstack/sahara-image-elements: Simplification: wget+rpm -> rpm  https://review.openstack.org/13908315:13
openstackgerritMatthew Farrellee proposed openstack/sahara-image-elements: Use Fedora url to get EPEL  https://review.openstack.org/13908415:13
tmckayeasier to contstruct for the user, though15:13
SergeyLukjanovrharwood, thanks!15:13
tmckayIf your job fails, you'll know why.  Run it again, add "@" to the front of the offending fields15:13
tmckayfields == args15:14
SergeyLukjanovmattf, it sounds like it could fix the job15:14
tmckayit may be the best option15:14
mattfSergeyLukjanov, it'll simplify things and make it easier to track down15:14
tmckaywe can't get around this without some amount of mess15:14
*** hdd has joined #openstack-sahara15:15
egaffordtmckay: True, but invites infinite turtle collisions in deep corner cases, and looks more odd to the untrained eye. A neophyte walking into the payload with the literal config setting would likely have a better time parsing that than any reasonably non-colliding escape sequence.15:15
SergeyLukjanovmattf, thx15:15
egaffordtmkay, elmiko: Definitely arguments in both directions.15:15
tmckayI could go with the literal list.  User has choices:15:16
tmckaydon't use the convenience feature, list your paths and manually add auth configs if you need to.  State of the art today15:16
tmckayif specifying the literals in the odd collision case is easier than specifying the paths, then use the list15:17
tmckayooo, I had one more idea that limits the corner cases15:17
tmckayI was going to support replacement based on name or uuid match.  It could be one, or the other, or both.  A user could choose replacement only by uuid15:18
tmckayno collision on name, because, well, you're not using names15:18
egaffordSure, have a single enumerated setting that determines order of precedence (uuid-only, name-only, uuid-first, name-first). Optional, defaults to name-first.15:20
egaffordGives the user all the options they need.15:20
elmikoi need to see this written up or something, i'm starting to have a difficult time following the process15:20
tmckayelmiko, absolutely, just thinking out loud.  I believe all this is why I didn't tackle this in Juno :)15:21
egaffordtmckay: Is that what you were thinking?15:21
tmckaysomething like that.15:21
elmikotmckay: like, at this point are you talking about doing away with DataSource objects and just embedding the URIs into the commands sent to the cluster?15:22
openstackgerritSergey Reshetnyak proposed openstack/sahara-image-elements: Migrate to OpenJDK  https://review.openstack.org/13875215:22
tmckayelmiko, that already happens, from Sahara -> hadoop/spark/storm etc15:22
tmckayelmiko, this is from CLI/client/UI -> Sahara15:23
elmikotmckay: right, but are you talking about just letting the user input the URIs instead of having them create DataSource objects?15:23
tmckayelmiko, it bothers me that we have datasource objects encoding locations, but we can't use them in all situations15:23
tmckaythat's broken15:23
tmckayelmiko, no15:23
tmckaydata source references15:23
tmckayelmiko, in actuality, they already can use only URIs.15:24
tmckaywith the small workaround that they have to make Sahara happy with dummy data sources for certain jobs.  I believe the specific user configs will override the autogenerated stuff15:25
elmikowouldn't that be a custom config at that point though?15:25
tmckayelmiko, so I could write a Pig job today for instance that is driven strictly by params/configs15:25
tmckayyes15:25
elmikoright, but i would think you are in power user land at that point15:25
tmckaybut no, my intention is not to push users toward raw URIs.  I think data sources are a good idea, they just have to be usable everywhere, preferably by name15:26
elmikoi dunno, i think uuid is better than name15:27
tmckayalternatively by id15:27
tmckayname is a unique constraint in the data_source table15:27
tmckayand it's immediately apparent in a dump of the job exec15:27
elmikosure, but uuid is really easy to validate on input15:27
egaffordelmiko: Allowing name to serve as the reference allows automators to have a set of payloads that will reference one another without templating, which is nice.15:28
tmckaybut in this case, replacement is only going to happen if a database search matches a data_source15:28
tmckayif it doesn't, the value is left alone.  you must have meant something else15:29
tmckayname also doesn't get stale if delete/recreate the data source object.  Which could be argued as a benefit, or not15:30
tmckaywhich is why having the user option to use either as a lookup is nice :)  Decide for yourself15:30
elmikook, i need to read the original spec again, this is getting very confusing for me15:30
elmikoright, but it sounds like, as you talk through it, that validating the wide range of options for name input is getting sticky15:31
tmckaywell, the way the spec is currently written, if you have a collision (a literal arg matches a data_source name) then you just don't use the feature.15:32
tmckaytrying to find a simple way to avoid "just don't use the feature"15:32
tmckayI think it's highly unlikely that case comes up15:32
tmckaymaybe "just don't use it" is enough15:32
*** tnovacik has joined #openstack-sahara15:33
tmckayhmmm, actually -- you could always make another copy of your data source with duplicated path info, new name, and reference that15:34
tmckaythat's a user workaround.15:34
tmckayalright, that's probably enough then.  I don't think we need a mechanism to escape literals15:35
tmckayif we do, we know at least two ways.  1) a list and 2) prefix15:36
elmikoyea, i would need to think about this more before adding anything useful, sorry :/15:37
tmckayelmiko, egafford, thanks for indulging me15:37
elmikoi like the spec as written, but you are bringing up issues that are making me think twice about it15:37
tmckayelmiko, :)  I'm convinced that's why it takes longer to do things the more experience you have.  Counter-intuitive, you would think it would go faster.15:38
egaffordtmckay: Seems to've been useful.15:38
tmckayelmiko, but the longer we all do this, the better we become at poking holes in our own solutions.15:39
tmckayah, to be blissfully unaware of corner cases, like college15:40
tmckay"This is bullet proof!"15:40
elmikotmckay: well, the thing is, now you have me thinking about why not change the job execution json validation to allow lists in the input_id and output_id fields, and then just use those for all jobs15:41
elmikoit's a much bigger change, but maybe more appropriate15:41
*** zhidong has quit IRC15:41
tmckayelmiko, may have some utility, but the trouble is that the arg list for Java/Spark is unconstrained15:42
tmckayHow do we know the order in which to pass the data sources?15:42
tmckayelmiko, MyCrazyWordcount takes ... what?15:42
elmikotmckay: yea, that's the confusing part lol15:43
tmckaya recipe for muffins, number of slices, input, blog entry, output15:43
tmckaycould be15:43
elmikotmckay: what if we just start with implementing the spec as written, and then patch as we discover new cases?15:44
tmckayelmiko, yeah, I think that's where I ended up.  The literal case is the corner case, and there are (slightly unseemly) ways to solve that15:44
elmikoi mean, currently it seems simple. parse the arg list, then replace values, if error then raise15:44
tmckayyep15:45
egaffordtmckay: For Pig, as well, the $INPUT field is only convention, really; it's just a kwarg in the end, and could be anything else in theory. Often is in complex scripts.15:45
tmckayon error, rename your data sources, change your app, or turn off the feature15:45
elmikotmckay: to start, yea.15:45
tmckayegafford, absolutely.  I filed a bug about that a long time ago.15:45
tmckayWe need customizable param names15:45
egaffordtmckay, elmiko: We can demand convention for "sahara-compliant" scripts, but beyond that, it gets very difficult to actually inject inputs reliably in any case.15:46
tmckayegafford, there is a workaround, though.15:46
tmckaycrobertsrh (wizard man) has been strangely silent during all this15:46
egaffordtmckay: I'd be interested to hear it. Well, wizard men are a notoriously mysterious people.15:47
tmckaythis effects you directly you know, pal15:47
crobertsrhsorry, wasnt reading along15:47
tmckay:) np, just joking15:47
tmckayegafford, oh, workaround is satisfy Sahara with dummy data sources then add custom param values to the job submission.  Oozie will pass all to the Pig app on the commandline, app will ignore ones it doesn't need.  I had to do this.15:48
tmckayapp used $input and $output15:48
tmckaylowercase15:48
tmckayyou can do this from the UI now15:49
elmikotmckay: i like the simplicity and flexibility of the current spec. imo, if we were to add a prefix for the DataSource objects i would think something along the lines of "sahara://" might make sense, but i think we should burn that bridge when we get to it.15:50
tmckaycross that bridge?15:50
tmckayor just burn it outright15:51
elmikoi meant what i said!  ;)15:51
tmckayextreme programming paradigm.  Just burn it.15:51
*** Poornima has joined #openstack-sahara15:52
elmikojoking aside though, currently a user might use something like "hdfs://datasource" or "swift://container/datasource" in their command line, is that accurate?15:52
tmckaywell, they can supply the url15:53
elmikoso they could also use "http://datasource" ?15:53
tmckayfor Java wordcount, that is what you have to do15:53
tmckayonly if http://datasource is a literal path15:54
elmikook, they could also use "/some/path/to/datasource" ?15:54
tmckayright now, in Sahara, that would be legal if the datasource type is hdfs and it would be interpreted in hadoop as a relative path in the hadoop user's hdfs directory15:56
elmikook15:56
elmikohmm15:57
tmckaywell, actually, an abosulte path in the local hdfs15:57
tmckayrelative it leading / is missing15:57
elmikoyea15:57
elmikoi dunno, now i'm almost thinking that "sahara://datasource_name" makes a certain amount of sense15:57
tmckayso the spec is talking about just replacing "my_input" with (pseudo) select path from data_sources where name == my_input15:58
tmckaythat's the prefix idea, in general.  I wouldn't use an http schema, it implies that there could be node/user stuff in there15:59
tmckaybut "sahara." or "sahara:" I could see15:59
tmckayit's the opposite of marking literals.  mark the items to be interpreted15:59
elmikoi suggest the URI schema because it follows the others like swift:// or hdfs://16:00
tmckayyeah, or internal_db://16:00
tmckayI think we use that16:00
tmckaythat's actually what it is16:00
elmikoright, so sahara:// makes sense to me16:00
tmckayI mean, it's literally in the internal db16:01
tmckaysmall chance that would ever be a literal arg to a hadoop job16:01
elmikoplus, if you ever needed some sort of specific infor per data source you could a known pattern like sahara://datasource?extrainfo=foo or some such16:01
tmckaytrue16:02
elmikoi dunno, i'm just spit balling. like i said, i like the current spec for it's simplicity16:02
tmckaygood ideas16:02
elmikok, api wg meeting start, i might be distracted16:03
tmckayI like the tweak of allowing name, uuid, or both as the lookup key and changing the param from just a boolean to a string16:04
tmckayleave everything else the same16:04
tmckayelmiko, thanks, I've got enough to go on16:04
*** Poornima has quit IRC16:07
*** tellesnobrega_ has joined #openstack-sahara16:07
*** miqui__ has joined #openstack-sahara16:15
*** tmckay has quit IRC16:16
*** clds_ has joined #openstack-sahara16:16
*** tnovacik has quit IRC16:18
*** miqui_ has quit IRC16:18
*** clds has quit IRC16:18
*** tnovacik has joined #openstack-sahara16:19
*** mattf is now known as _mattf16:24
*** tnovacik has quit IRC16:28
SergeyLukjanovelmiko, _mattf any good news about oslo sync? :)16:32
elmikoSergeyLukjanov: i didn't look any further, i thought _mattf might be getting it. i'll make sure to talk with him today though16:33
SergeyLukjanovelmiko, thx!16:33
*** skolekonov has quit IRC16:48
*** IvanBerezovskiy has quit IRC16:51
*** IvanBerezovskiy1 has joined #openstack-sahara16:51
*** tmckay has joined #openstack-sahara16:51
*** IvanBerezovskiy1 has left #openstack-sahara16:56
*** Longgeek_ has joined #openstack-sahara16:57
*** tellesnobrega_ has quit IRC17:11
*** Longgeek has quit IRC17:12
*** tellesnobrega_ has joined #openstack-sahara17:25
*** samuelms-away is now known as samuelms17:33
tellesnobregathe meeting was earlier today right?17:39
elmikotellesnobrega: yea, 1400utc17:39
tellesnobregaelmiko, :( missed it17:40
elmiko:(17:40
tellesnobregagot a little busy and forgot about the time change17:40
tellesnobreganext week i will be there17:40
elmikocool, next week is the later time17:40
*** tellesnobrega_ has quit IRC17:44
*** tosky has joined #openstack-sahara17:54
jodahwow, 6am meeting time now18:05
elmikojodah: every other week18:06
jodahok18:06
toskyargh, I didn't realize it was moved earlier today18:13
openstackgerritMatthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator strutils  https://review.openstack.org/13914018:28
openstackgerritMatthew Farrellee proposed openstack/python-saharaclient: Updating oslo-incubator  https://review.openstack.org/13914118:28
openstackgerritMatthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator cliutils  https://review.openstack.org/13914218:28
openstackgerritMatthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator apiclient.exceptions  https://review.openstack.org/13914318:28
openstackgerritMatthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator importutils  https://review.openstack.org/13914418:28
openstackgerritMatthew Farrellee proposed openstack/sahara: Removed _i18n module, it is not used directly  https://review.openstack.org/13914618:42
openstackgerritMatthew Farrellee proposed openstack/sahara: Update oslo-incubator lockutils  https://review.openstack.org/13914718:42
openstackgerritMatthew Farrellee proposed openstack/sahara: Update oslo-incubator log  https://review.openstack.org/13914818:42
openstackgerritMatthew Farrellee proposed openstack/sahara: Update oslo-incubator policy  https://review.openstack.org/13914918:42
openstackgerritMatthew Farrellee proposed openstack/sahara: Update oslo-incubator threadgroup  https://review.openstack.org/13915018:42
openstackgerritMatthew Farrellee proposed openstack/sahara: Update oslo-incubator periodic_task  https://review.openstack.org/13915118:42
*** _mattf is now known as mattf18:49
*** openstackgerrit has quit IRC18:50
*** openstackgerrit has joined #openstack-sahara18:50
* mattf flexes and pins sahara ci to the ground18:50
* mattf tries to remember why we use oslo logging instead of python logging18:50
mattfat least one incubating module switched to python logging to remove a dep this time around18:51
*** Networkn3rd has quit IRC18:52
*** witlessb has quit IRC18:56
elmikomattf: thanks =)18:56
*** witlessb has joined #openstack-sahara18:59
mattfelmiko, at your service19:04
elmikowoot!19:04
*** tellesnobrega_ has joined #openstack-sahara19:06
openstackgerritMerged openstack/sahara-image-elements: Fix bashate errors  https://review.openstack.org/13897619:12
openstackgerritMatthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator strutils  https://review.openstack.org/13914019:17
openstackgerritMatthew Farrellee proposed openstack/python-saharaclient: Updating oslo-incubator  https://review.openstack.org/13914119:17
openstackgerritMatthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator cliutils  https://review.openstack.org/13914219:17
openstackgerritMatthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator apiclient.exceptions  https://review.openstack.org/13914319:18
openstackgerritMatthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator importutils  https://review.openstack.org/13914419:18
*** Longgeek_ has quit IRC19:25
openstackgerritTrevor McKay proposed openstack/sahara-specs: [EDP] Add options supporting DataSource identifiers in job_configs  https://review.openstack.org/13880919:34
tmckayelmiko, egafford, okay I generalized a bit and rewrote the spec.  No reason it shouldn't apply to all job types, and all subitems of job_configs, and allow name or uuid19:35
tmckaydarn it, didn't change the bp ref19:36
openstackgerritTrevor McKay proposed openstack/sahara-specs: [EDP] Add options supporting DataSource identifiers in job_configs  https://review.openstack.org/13880919:37
*** tellesnobrega_ has quit IRC19:38
elmikotmckay: cool, i'll take a look19:39
tmckayblah, whitespace19:40
tmckaygerrit, come on, make it green19:40
tmckaytox doesn't catch that19:40
*** tellesnobrega_ has joined #openstack-sahara19:42
elmikodon't worry, i got a few things to get at before i can brandish the -1 sword of justice ;)19:44
tmckayelmiko, I'll wait and push a single patch with any other fixes19:45
elmikocool19:45
egaffordI imagine that we're calling the "the user has decided to name their data source a UUID and it collides with another data source's random UUID" case absurd enough not to mention, given that it is absurd enough not to mention.19:48
egaffordHe says, mentioning it.19:48
elmikolol19:49
elmikoi think if a user is bold enough to use UUIDs for names then they probably know what they are doing, or are a robot =)19:50
egaffordOr a particularly dedicated QA engineer, but yeah. You'd pretty much have to want that bug.19:51
elmikoseriously dedicated...19:51
egaffordA massive tangent to this spec: is there a reason why this sort of flexible mapping should only apply to data_sources? job_binary_internal, job_binary, job, and even cluster could all be sanely referenced by name using a similar strategy, with similar benefits.19:55
* mattf curses the convoluted tox mess on fedora19:56
elmikoi dunno, it makes sense to me in terms of sending custom command lines to the processing engines. but in a more general sense i think it breaks OpenStack convention to not use the IDs in the REST API.19:56
elmikoor did i miss your point?19:57
egaffordNot that this spec can't stand without those changes, of course, even if they were deemed desirable. elmiko: Cool, argument by convention is a good argument. No, not at all.19:57
openstackgerritMatthew Farrellee proposed openstack/sahara: Update oslo-incubator log  https://review.openstack.org/13914819:58
openstackgerritMatthew Farrellee proposed openstack/sahara: Update oslo-incubator policy  https://review.openstack.org/13914919:58
openstackgerritMatthew Farrellee proposed openstack/sahara: Update oslo-incubator threadgroup  https://review.openstack.org/13915019:58
openstackgerritMatthew Farrellee proposed openstack/sahara: Update oslo-incubator periodic_task  https://review.openstack.org/13915119:58
openstackgerritMatthew Farrellee proposed openstack/sahara: Removed _i18n module, it is not used directly  https://review.openstack.org/13914619:58
openstackgerritMatthew Farrellee proposed openstack/sahara: Update oslo-incubator lockutils  https://review.openstack.org/13914719:58
egaffordAllowing name over id increases risk and provides a little ease of use. I can see why you'd only want to allow that at the most transient and repeated point of use.19:58
openstackgerritMichael McCune proposed openstack/sahara-specs: Adding security guidelines documentation spec  https://review.openstack.org/13917019:59
elmikoegafford: well, and are we talking about using name in a json template for a POST operation?20:00
tmckayI'll read back20:00
tmckayI'm starting to wonder if there is a better way -- as we alluded to, allow data sources to be lists and then map the elements to parameters, configs, or arg positions20:01
tmckaythis is another example of stuff growing from a first case (mapreduce) where single input/output was the norm20:02
*** mattf is now known as _mattf20:02
egaffordelmiko: Yup, that's what I'm talking about. tmckay: That's dangerous; depending on how your script/job is written, it could get very hairy to try to infer input placement without demanding a lot of conventions.20:02
elmikotmckay: yea, but you pointed out a good issue, namely the need for something more verbose than just a list20:02
*** openstackgerrit has quit IRC20:04
tmckayit would be a map, no inference20:04
*** openstackgerrit has joined #openstack-sahara20:04
elmikoegafford: i suppose as long as you could ensure that names were unique it would be ok, but i think decoupling a name from a unique id is generally a win20:04
tmckayyou give me all your stuff, and tell me where it goes20:04
egaffordtmckay: Maps are viable.20:04
tmckayinference is part of the trouble we have now ($INPUT for all pig/hive, for example)20:04
tmckayI would like to get this right, once.  Maybe I'll think of a general overhaul a bit.20:05
elmikomy issue with a map is that we add a condition like we have with the configs where it can become very difficult to validate the map20:05
egaffordtmckay: Absolutely; we're effectively demanding both convention and artificial limitation now, so I'm not arguing against the effort by any means.20:05
tmckayI think the spec is an improvement, but I'm not convinced that it can't be better still20:06
tmckaybut, it needs to fall within the scope for kilo too :)20:06
*** navesta has joined #openstack-sahara20:06
elmikomaybe you could keep it to lists if they were ordered or something, but that places limitations on the application writers to conform to how Sahara lays out the args20:06
egaffordelmiko: It's difficult to validate, but it's a much harder problem to determine where solely positional input arguments should be placed.20:06
tmckaya map structure could be something like this:20:07
elmikoegafford: agreed, that's kinda why i like the config substitution method, it avoids the complication of interpolating lists or something...20:07
tmckaydata source container is a list or a dict, value for each item holds the id of the data source and either an integer or a string20:08
tmckayinteger is an arg position, string is a parameter or a config name20:09
tmckayvery rough, but something like that20:09
egaffordThe integer value will intersect with the args list in what way? Slice notation?20:09
tmckayI was thinking just a position.  But you need some sane way to merge that with other args20:10
tmckaymaybe we should make the user do everything20:11
tmckaymaybe trying to help at all is to much20:11
elmikotmckay: i like what you're proposing, but it smells complicated20:11
tmckaymaybe the UI should simply allow you to pop open a data source and copy/paste strings20:11
tmckayeverything else is up to you20:11
tmckaywizard be darned20:12
elmikotmckay: +1 about letting the user control, that's why i like the current spec with substitution. it gives the user a lot of freedom.20:12
egaffordtmckay, elmiko: I agree with elmiko here. I think that in the map case, it works, but that in the positional case, it's too error-prone and power-user.20:12
crobertsrhKeep taking the wizard in vane will you?20:15
* crobertsrh adds a smite button to UI20:16
tmckay:)  My head is hurting trying to be all things to all people20:16
tmckaysomewhere there is a line20:16
egaffordHonestly, it seems to me that trying to provide any field definition for inputs and outputs is trying to assume a level of consistent abstraction that just doesn't exist across the engines we're trying to support. Referencing data sources as persistent sahara constructs is great, but it seems to me as though config/arg/param substitution is the only way to be all the things.20:17
elmikoegafford: +120:18
tmckayagreed.  So then the next question is, if we have this substitution mechanism, what do we do with the simple case that we have today?  Pig job, input, output, Sahara maps it for you20:18
tmckayscrap it, and the wizard can ask you questions, I suppose20:18
elmikotmckay: that's a good question20:18
tmckayPig Wizard (I like that)20:18
egaffordpig wizard: +120:19
tmckayPig Wizard says, "oh, I see you selected a data source.  What parameter name would you like that to map to in your pig script?"20:19
tmckaybecause that's the real question20:19
tmckayif it can ask that ^^ we are all set20:19
tmckayUI just sends done a parameter list with the pig script var names and the data source references20:19
tmckaythe whole "one input one output" thing is toast20:20
tmckaycrobertsrh, ^^ okay read this one20:20
tmckay;-)20:20
elmikotmckay: ooph, talk about can of worms... that sounds like it could get crazy complicated it little to no time.20:20
elmikoi think we almost have to err on the side of allowing the user to control all those mappings20:21
tmckayelmiko, I think the complication is on the wizard side.  The mechanism is simple.20:21
elmikotmckay: that's kinda what i meant20:21
tmckayelmiko, sure.  If you want to go wizardless, you can do it all.  But the wizard is supposed to hold your hand.20:21
tmckayright?20:21
egaffordA decent first step would be to just request the pig arg name as a string and trust the user to map it, if the complexity is seen as too high.20:21
elmikotmckay: well, maybe the wizard can only hold your hand if you need 1 input/1 output?20:22
elmikoi think it's perfectly acceptable to assume that the wizard can only perform certain types of actions20:22
tmckayelmiko, hmmm.  If you know the script you're running, you know the expected var names.  Sahara actually doesn't, it just guesses.20:23
tmckayif you can specify 1 data source, why not N?  The only difference is that currently we assume $INPUT and $OUTPUT for the standard job20:24
crobertsrhOh dear.  All things "wizard" are completely theoretical at this point.  I say design something that is usable and reasonably flexible for now.  The UI will require magic no matter which way we go.20:24
tmckaywell, you *might* know the var names.  If you don't, you could look :)20:24
tmckayI'm searching for the grand unified theory of data sources20:25
tmckaytoo lofty20:25
*** navesta has quit IRC20:25
crobertsrhOf course the grand unified data source theory according to mattf is "get rid of them"20:25
elmikohmm20:25
elmikolol20:25
tmckayyeah, that goes back to the "fill in the box with the url, dude" model20:26
tmckayless is more20:26
elmikotmckay: +1 to less is more20:27
tmckayI've spammed openstack-sahara quite a bit today.  Sorry20:27
elmikotmckay: the grand unified solution would be awesome, but i think we should go with the simple straight forward solution to start with20:27
elmikowhy apologize, it's usually so quiet in here you could hear a mouse...20:27
egaffordI actually really like the data sources as an option. It makes sense to me that they exist, and can be managed and encapsulated. Lets you change out your entire data store and change one thing.20:28
crobertsrhIt's stuff we need to really figure out20:28
elmikocrobertsrh: +120:28
tmckaycrobertsrh, what if we made data sources not required for Pig/Hive/MapReduce?20:28
crobertsrhAnd just use args?20:29
crobertsrhor params/configs?20:29
egaffordProjects have spent tons of money to make changing out their entire data store possible, with much less likelihood of that happening, and with much less success than we already have.20:29
tmckaylifting that restriction, coupled with substitution would add some flexibility20:29
crobertsrhIf it helps things, sure.20:30
tmckayI'll noodle that some more.  I20:30
elmikotmckay: that might be nice, shouldn't be too tough to mock up and try out20:30
tmckayI'm thinking of a split between "hey let me do this for you" and "I am Hadoop! Leave me alone"20:30
tmckayyou choose20:30
tmckaytoday, if you want to do it all, for PigHiveMapreduce you still have to pass dummy data_sources.  Blah.20:31
elmikoimo, in regards to the grand unified thingie, it would be really cool to allow the input_id and output_id fields to allow lists, and then just make it explicit how Sahara will generate the args so that users can always expect what will happen.20:31
openstackgerritAndrew Lazarev proposed openstack/sahara: Update sahara.conf.sample after oslo.msg release  https://review.openstack.org/13904620:31
tmckayyes20:31
elmikotmckay: agreed on the blah...20:31
tmckaymy dog is completely unaware of this conversation20:32
egaffordtmckay: So at the moment, are we driving toward removing the data_source fields on the job_execution post, and only providing config/arg/param substitution, or some kind of map-based payload field / job config hybrid?20:32
elmikotmckay: lol same here =)20:32
egaffordYour dog has a much fuller and emptier life than we do, depending on your metrics.20:33
tmckayegafford, we might be.  I like the spec, I think we agree on it.  Next question is whether or not there is something else we can do to ease usage and increase consistency20:33
elmikoegafford: lol, totally20:33
tmckaywithout breaking backward compatibility20:34
tmckayie, if we do do something, is there an alembic migration for it, etc etc20:34
elmikotmckay: i think taking out the mandatory inclusion of data sources for pig/hive/mr might be a nice spec after the substitution gets implemented20:34
tmckayProbably worth spending another half a day thinking about20:34
egaffordtmkcay: +1 for incrementalism.20:34
tmckayelmiko, yeah, just relaxing the constraint helps20:35
tmckaybaby steps20:35
elmikotmckay: +1 to baby steps20:35
tmckayalright guys, thanks, I'll shut up now, eat a snack, and write some cod20:35
tmckaywell, code20:35
egaffordDon't fix that... too late.20:35
tmckayI don't write cod.  They squirm20:35
elmikoon a different topic, check this out https://github.com/stackforge/anchor20:35
tmckayooo20:36
elmikoAnchor is a test project to provide ephemeral PKI for openstack, pretty cool20:36
tmckaysounds like Summit20:36
tmckayis this the talk we saw?20:36
elmikoyea20:36
tmckayI was just thinking about Barbican today20:36
elmikoit's an implementation of it20:36
elmikothe sec guys are talking about possibly working Anchor into Barbican somehow20:36
elmikoor letting Barb leverage Anchor20:37
tmckaystill mulling how to make Oozie more secure in the back of my mind.  Hadoop -> Barbican ... ??20:37
elmikoright20:37
elmikoHadoop -> proxied kerb (Barb or Keystone)20:37
tmckaycould we issue one set of creds to Hadoop, to allow access to Barbican, and then pull a key or cert per job?20:37
elmikostill trying to figure that one out20:37
tmckaythat would be ideal20:38
elmikowhat i'd like to see is something like this...20:38
elmikosahara generates a secret for the proxy user and stores it in Barb20:38
elmikothen the nodes use one of their keys(known to the controller) to access the secret in Barb20:39
elmikothat way sahara distributes nothing20:39
tmckaythe only way you can break that is if you have a key to one of the nodes20:39
tmckaybut if you have that, you can do anything -- replace hadoop binaries with your own stuff, etc20:39
elmikoright20:39
tmckayso you're no worse off20:39
elmikoexactly20:40
tmckaysounds awesome20:40
elmikoi brought this up at summit with the Barb guys, they were actually kinda interested in our use case20:40
tmckaygets rid of the oozie issues, solves it for spark too20:40
tmckaywell, anything that uses the hadoop plugin20:40
elmikoyea that's what i was hoping, but it does add complexity to the hadoop-openstack component20:41
tmckaythat's okay, imho20:41
elmikoit just stresses the need for us to maintain and release that though20:41
tmckayyes, I think so20:42
tmckaybrb20:42
crobertsrhOk.  Editing of node group templates is done on the UI side :)  Doing the UI stuff before the backend is done (or started) kinda feels like 1) UI work 2) yada yada yada 3) done.20:48
tmckayheh20:50
elmikocrobertsrh: unfortunately the "yada yada yada" step contains the really tough stuff =(20:50
crobertsrhExactly :)20:51
tmckayyou know, one more wrinkle in this data source thing I didn't mention20:51
tmckaydeletion constraints20:51
elmikoalso, nice Seignfeld reference lol20:51
crobertsrhI'm trying to ice away as much of the UI stuff for Kilo as possible before the wizzerful wizard work takes off.20:51
elmikocrobertsrh: smart thinking20:51
tmckayyou can't delete a datasource refeenced by a jobexec, I believe20:51
crobertsrhty20:51
elmikotmckay: i think that's correct20:52
crobertsrhty on the Seinfeld ref.....actually ty on the "smart thinking" bit too :)20:52
tmckayso if we leave name or uuid references in the jobexec, that restriction is not enforced.20:52
elmikocrobertsrh: i feel like all the UI stuff from Juno was a big learning lesson20:52
tmckayunless we add a column that stores a list of ids20:52
crobertsrhOh....while we're thinking of editing things and data sources....any chance that we need data source edit?20:52
elmikotmckay: i would think that if we leave the non-deletion behavior in place that we might need to do the substitution check on deletion or something?20:53
crobertsrhUI for Juno was close to == the merge process.......no shortage of learning took place20:53
elmikocrobertsrh: i would think yes, edit all the things!20:53
*** tnovacik has joined #openstack-sahara20:54
tmckaymaybe.  I was thinking the refs would stay in the job exec, but maybe if we wrote the changes through --- there wouldn't be any need to retain the data source20:54
tmckaydeletion wouldn't matter20:54
elmikotmckay: maybe we need to think about another level of indirection here. like a database object that models a relationship between a job execution and a data source?20:54
tmckayit's not like you can edit a data source now (can you?)20:54
tmckayooo, nice idea20:54
elmikotmckay: then we could have several relationship objects that model how they relate to a single execution. plus then we can perform ops on the relations20:55
elmikoi dunno, might be too complicated. again, just spitballing20:55
crobertsrhextra complicated if you start allowing data sources to be edited20:56
elmikowell, the relationship wouldn't need to change, just the data source20:56
crobertsrhI think my head is starting to hurt a bit.  I should stick to UI stuff :)20:57
elmikolol20:58
elmikono ducking out now ;)20:58
tmckaywell, if I write the changes through (I said I wouldn't) then there should be no expectation of a deletion constraint.  But we're back to the inconsistency with the fixed input/output fields20:59
crobertsrhanything I can't figure out in the UI is clearly "an API problem"20:59
tmckayI bet there is a sqlalchemy way to store a list of foreign ids20:59
tmckayand have the constraint enforced20:59
elmikocrobertsrh: lol, nice!20:59
elmikotmckay: i think so, some sort of many-to-one relation?21:00
tmckayyeah21:00
tmckaysecondary problem, I guess21:00
*** tellesnobrega_ has quit IRC21:14
*** ViswaV has joined #openstack-sahara21:20
openstackgerritMerged openstack/sahara-image-elements: Simplification: wget+rpm -> rpm  https://review.openstack.org/13908321:33
openstackgerritOpenStack Proposal Bot proposed openstack/sahara: Updated from global requirements  https://review.openstack.org/13920921:34
openstackgerritMerged openstack/sahara-image-elements: Use Fedora url to get EPEL  https://review.openstack.org/13908421:35
*** crobertsrh is now known as _crobertsrh22:07
*** tellesnobrega_ has joined #openstack-sahara22:16
*** hdd has quit IRC22:19
*** hdd has joined #openstack-sahara22:20
openstackgerritMichael McCune proposed openstack/sahara-specs: Adding security guidelines documentation spec  https://review.openstack.org/13917022:22
*** Longgeek has joined #openstack-sahara22:25
*** egafford has quit IRC22:30
*** Longgeek has quit IRC22:30
*** ViswaV has quit IRC22:37
openstackgerritOpenStack Proposal Bot proposed openstack/sahara: Updated from global requirements  https://review.openstack.org/13920922:43
*** ViswaV has joined #openstack-sahara22:49
*** ViswaV_ has joined #openstack-sahara22:51
*** ViswaV has quit IRC22:54
*** miqui__ has quit IRC22:54
openstackgerritAndrew Lazarev proposed openstack/sahara: Disabled requiretty in cloud-init script  https://review.openstack.org/13894223:18
*** jamielennox has joined #openstack-sahara23:24
jamielennoxhey all, 3 +2s on https://review.openstack.org/#/c/138211/ - can someone kick it off for me?23:25
*** Wenjie has joined #openstack-sahara23:36
*** witlessb has quit IRC23:38
*** Networkn3rd has joined #openstack-sahara23:54

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!