Thursday, 2014-12-04

*** hdd has quit IRC		00:39
*** tellesnobrega_ has quit IRC		01:10
*** hdd has joined #openstack-sahara		01:16
*** tellesnobrega_ has joined #openstack-sahara		01:32
*** hdd has quit IRC		01:49
*** hdd has joined #openstack-sahara		02:10
*** Networkn3rd has joined #openstack-sahara		02:18
*** zhidong has joined #openstack-sahara		02:21
*** zhidong has quit IRC		02:23
*** tellesnobrega_ has quit IRC		03:10
openstackgerrit	Andrew Lazarev proposed openstack/sahara: Disabled requiretty in cloud-init script https://review.openstack.org/138942	03:12
openstackgerrit	Andrew Lazarev proposed openstack/sahara: Fixed Fake plugin for Fedora image https://review.openstack.org/138943	03:21
*** hdd has quit IRC		03:26
*** Networkn3rd has quit IRC		04:44
*** Networkn3rd has joined #openstack-sahara		04:52
*** Networkn3rd has quit IRC		04:57
*** hdd has joined #openstack-sahara		05:06
*** samuelms has quit IRC		05:10
*** samuelms has joined #openstack-sahara		05:11
*** Poornima has joined #openstack-sahara		05:12
openstackgerrit	Kazuki OIKAWA proposed openstack/sahara: Add edp.java.adapt_for_oozie config for Java Action https://review.openstack.org/115884	05:21
*** Longgeek has joined #openstack-sahara		05:32
*** hdd has quit IRC		05:40
*** tnovacik has joined #openstack-sahara		06:25
*** k4n0 has joined #openstack-sahara		06:54
*** tnovacik_ has joined #openstack-sahara		07:15
openstackgerrit	Sergey Reshetnyak proposed openstack/sahara-image-elements: Fix bashate errors https://review.openstack.org/138976	08:00
*** witlessb has joined #openstack-sahara		08:26
*** stannie has joined #openstack-sahara		08:28
*** skolekonov has joined #openstack-sahara		10:02
*** tellesnobrega_ has joined #openstack-sahara		10:34
*** stannie has quit IRC		10:50
*** tellesnobrega_ has quit IRC		11:04
*** tnovacik_ has quit IRC		11:17
*** tellesnobrega_ has joined #openstack-sahara		11:28
openstackgerrit	Telles Mota Vidal Nóbrega proposed openstack/sahara: Storm integration https://review.openstack.org/137699	11:29
*** tnovacik_ has joined #openstack-sahara		11:32
*** Poornima has quit IRC		11:48
SergeyLukjanov	_crobertsrh, tmckay, +2 from me to the client bump in global requirements	12:06
openstackgerrit	Sergey Reshetnyak proposed openstack/sahara: Minor refactoring integration tests https://review.openstack.org/133350	12:24
*** tnovacik_ has quit IRC		12:27
openstackgerrit	Sergey Reshetnyak proposed openstack/sahara: Minor refactoring integration tests https://review.openstack.org/133350	12:27
openstackgerrit	Sergey Reshetnyak proposed openstack/sahara: Minor refactoring integration tests https://review.openstack.org/133350	12:30
openstackgerrit	Sergey Reshetnyak proposed openstack/sahara: Run 3 transient cluster in parallel mode https://review.openstack.org/138765	12:36
openstackgerrit	Sergey Reshetnyak proposed openstack/sahara: Minor refactoring integration tests https://review.openstack.org/133350	12:36
*** IvanBerezovskiy has quit IRC		12:42
*** IvanBerezovskiy has joined #openstack-sahara		12:43
*** zhidong has joined #openstack-sahara		12:48
*** hdd has joined #openstack-sahara		13:14
openstackgerrit	Sergey Lukjanov proposed openstack/sahara: Update sahara.conf.sample after oslo.msg release https://review.openstack.org/139046	13:27
*** tellesnobrega_ has quit IRC		13:35
*** weiting has joined #openstack-sahara		13:37
*** _mattf is now known as mattf		13:44
*** tnovacik has quit IRC		13:47
*** tellesnobrega_ has joined #openstack-sahara		13:48
*** _crobertsrh is now known as crobertsrh		13:59
aignatov	meeting?	14:00
elmiko	it is 1400utc	14:00
*** tmckay has joined #openstack-sahara		14:01
*** Longgeek has quit IRC		14:01
SergeyLukjanov	yeah	14:01
*** egafford has joined #openstack-sahara		14:03
tmckay	crobertsrh, elmiko, I was thinking about data_source args some more this morning (exciting life, I know :) )	14:04
crobertsrh	Ok. You'll have to enlighten us :)	14:04
tmckay	I'm going to tweak the spec to say "name or uuid" instead of just name, and I think I'm going to allow it to work across configs, params, and args (all three instead of just args)	14:05
tmckay	crobertsrh, that way, if you wanted to construct a pig job for instance that took N inputs, you could do it easily (params)	14:06
*** miqui_ has joined #openstack-sahara		14:06
crobertsrh	Good idea	14:06
elmiko	tmckay: meeting in openstack-meeting-3	14:06
tmckay	elmkio, k. crobertsrh, or if you used a mapreduce class that supports multiple files (there are some, I skimmed them) you could specify them all through multiple mapred.blah configs	14:07
tmckay	forgot about the new meeting :)	14:07
egafford	tmckay: Nice; that should help a lot with the many-to-many input-output case.	14:09
tmckay	yeah, theoretically it's possible with this to deprecate that embedded input id and output id fields completely	14:10
egafford	tmckay: Might be better to, really, if it's dead set on being a single value field.	14:11
*** Longgeek has joined #openstack-sahara		14:12
tmckay	the only possible hiccup here is that we potentially need some mechanism to mark an arg/param/config value as literal. The case where it is a string that happens to match the name of a data_source	14:13
tmckay	edge case	14:13
tmckay	some syntactic sugar, maybe	14:14
* tmckay has to change locations		14:14
egafford	tmckay: Input and output sources are fundamental enough that I can see the value of a dedicated API field, but it's certainly true that params/configs/args map much better to the jobs themselves. Indeed; that's a tricky one. Is there any parallel case in OpenStack to date (escaping a ref by name?)	14:15
tmckay	not aware	14:15
egafford	tmckay: Wait, you can't effortlessly disprove universal negatives?	14:16
egafford	tmckay: You've gotta get that upgrade. :)	14:17
*** tellesnobrega_ has quit IRC		14:21
*** crobertsrh has quit IRC		14:21
*** tellesnobrega_ has joined #openstack-sahara		14:24
*** tmckay has quit IRC		14:25
*** crobertsrh has joined #openstack-sahara		14:32
*** k4n0 has quit IRC		14:33
*** miqui_ has quit IRC		14:36
*** tellesnobrega_ has quit IRC		14:38
*** hdd has quit IRC		14:38
*** miqui_ has joined #openstack-sahara		14:47
*** tmckay has joined #openstack-sahara		14:47
tmckay	egafford, there is a way to mark literals unambiguously, but I'm not sure I like it. They can simply be listed. If the feature is on to treat all values as potential data_source references, then separate edp.whatever configs can be added that list keys of configs/params and positions of args that should be absolutely literal.	14:57
*** Networkn3rd has joined #openstack-sahara		14:57
tmckay	for the UI, this would be transparent	14:57
tmckay	It seems like a lot of effort for an edge case, but it solves the problem	14:58
tmckay	you could do the inverse, too -- list the keys/arg positions that should be translated	14:58
tmckay	avoids the problem of coming up with unambiguous syntax	14:58
*** Networkn3rd has quit IRC		14:59
tmckay	in most cases, there would be no need to list any literals.	15:00
egafford	The addition of a separate field specifically for literal tracking does seem unfortunate, but not necessarily worse than (similarly collision-prone) punctuation spaghetti on the values themselves.	15:00
*** weiting has quit IRC		15:00
tmckay	well, it's not exactly a separate field. We already have a catch-all for configs, and a "edp." prefix for things that should be consumed by Sahara rather than Oozie, for example	15:01
tmckay	so, no json change	15:01
*** Networkn3rd has joined #openstack-sahara		15:01
tmckay	completely backward compatible	15:01
tmckay	we just optionally throw a config in there	15:01
SergeyLukjanov	mattf, interesting point about bug fixed in Ceilometer, it was approved by my wife :)	15:02
egafford	Sure, there's no change at the jsonschema level. Granted, that makes things a bit more opaque for those who need the feature, but in this case I can definitely see that opacity for the edge case is better than confusion for the main case.	15:03
rharwood	packaging question: a quick look suggests that sahara is packaged in Debian experimental, and not at all in Ubuntu. Is this correct?	15:04
mattf	SergeyLukjanov, cheers to her!	15:05
*** samuelms is now known as samuelms-away		15:05
tmckay	the other possibility is to do this all at the UI level -- replace args/values before it's ever submitted. But that doesn't improve the situation at the CLI or client level (you still have to setup up data_source paths by hand)	15:06
SergeyLukjanov	rharwood, we have no folks who'd like to maintain packages in ubuntu	15:06
egafford	tmckay: Indeed, and I can personally attest that for automators, forcing those additional uuid insertions is a bit more of a pain than is needed.	15:06
SergeyLukjanov	rharwood, if you want to volunteer :)	15:07
tmckay	egafford :) yeah. I don't speak uuid	15:07
rharwood	SergeyLukjanov: thanks. Maybe in the future... my plate is full right now rewriting the puppet module and doing the packstack integration	15:08
egafford	tmckay: I mean, uuids are kinda great; don't get me wrong. Still, being able to just post reliably referential payloads without templating is also kinda great.	15:09
tmckay	egafford, hmm, maybe there is a simple syntactic mechanism. let	15:10
tmckay	let's say there is some prefix, "literal." for the sake of discussion. you could prepend that. Sahara could strip it off and ignore. If you really needed a "literal.foo" in your arg list at hadoop run time, you just double it	15:11
elmiko	that sounds messy	15:11
tmckay	elmiko, trying to cover the corner case where the is a name collision between a data_source reference and an app argument	15:12
tmckay	unlikely, but I hate holes	15:12
tmckay	you would probably never need it	15:12
elmiko	tmckay: yea, i've been following. it's a tough problem to solve	15:12
egafford	tmckay: Yeah, you could do a prefix syntax (like C#'s @'thing'), but that actually seems messier to me than having the dedicated field list.	15:13
openstackgerrit	Matthew Farrellee proposed openstack/sahara-image-elements: Simplification: wget+rpm -> rpm https://review.openstack.org/139083	15:13
openstackgerrit	Matthew Farrellee proposed openstack/sahara-image-elements: Use Fedora url to get EPEL https://review.openstack.org/139084	15:13
tmckay	easier to contstruct for the user, though	15:13
SergeyLukjanov	rharwood, thanks!	15:13
tmckay	If your job fails, you'll know why. Run it again, add "@" to the front of the offending fields	15:13
tmckay	fields == args	15:14
SergeyLukjanov	mattf, it sounds like it could fix the job	15:14
tmckay	it may be the best option	15:14
mattf	SergeyLukjanov, it'll simplify things and make it easier to track down	15:14
tmckay	we can't get around this without some amount of mess	15:14
*** hdd has joined #openstack-sahara		15:15
egafford	tmckay: True, but invites infinite turtle collisions in deep corner cases, and looks more odd to the untrained eye. A neophyte walking into the payload with the literal config setting would likely have a better time parsing that than any reasonably non-colliding escape sequence.	15:15
SergeyLukjanov	mattf, thx	15:15
egafford	tmkay, elmiko: Definitely arguments in both directions.	15:15
tmckay	I could go with the literal list. User has choices:	15:16
tmckay	don't use the convenience feature, list your paths and manually add auth configs if you need to. State of the art today	15:16
tmckay	if specifying the literals in the odd collision case is easier than specifying the paths, then use the list	15:17
tmckay	ooo, I had one more idea that limits the corner cases	15:17
tmckay	I was going to support replacement based on name or uuid match. It could be one, or the other, or both. A user could choose replacement only by uuid	15:18
tmckay	no collision on name, because, well, you're not using names	15:18
egafford	Sure, have a single enumerated setting that determines order of precedence (uuid-only, name-only, uuid-first, name-first). Optional, defaults to name-first.	15:20
egafford	Gives the user all the options they need.	15:20
elmiko	i need to see this written up or something, i'm starting to have a difficult time following the process	15:20
tmckay	elmiko, absolutely, just thinking out loud. I believe all this is why I didn't tackle this in Juno :)	15:21
egafford	tmckay: Is that what you were thinking?	15:21
tmckay	something like that.	15:21
elmiko	tmckay: like, at this point are you talking about doing away with DataSource objects and just embedding the URIs into the commands sent to the cluster?	15:22
openstackgerrit	Sergey Reshetnyak proposed openstack/sahara-image-elements: Migrate to OpenJDK https://review.openstack.org/138752	15:22
tmckay	elmiko, that already happens, from Sahara -> hadoop/spark/storm etc	15:22
tmckay	elmiko, this is from CLI/client/UI -> Sahara	15:23
elmiko	tmckay: right, but are you talking about just letting the user input the URIs instead of having them create DataSource objects?	15:23
tmckay	elmiko, it bothers me that we have datasource objects encoding locations, but we can't use them in all situations	15:23
tmckay	that's broken	15:23
tmckay	elmiko, no	15:23
tmckay	data source references	15:23
tmckay	elmiko, in actuality, they already can use only URIs.	15:24
tmckay	with the small workaround that they have to make Sahara happy with dummy data sources for certain jobs. I believe the specific user configs will override the autogenerated stuff	15:25
elmiko	wouldn't that be a custom config at that point though?	15:25
tmckay	elmiko, so I could write a Pig job today for instance that is driven strictly by params/configs	15:25
tmckay	yes	15:25
elmiko	right, but i would think you are in power user land at that point	15:25
tmckay	but no, my intention is not to push users toward raw URIs. I think data sources are a good idea, they just have to be usable everywhere, preferably by name	15:26
elmiko	i dunno, i think uuid is better than name	15:27
tmckay	alternatively by id	15:27
tmckay	name is a unique constraint in the data_source table	15:27
tmckay	and it's immediately apparent in a dump of the job exec	15:27
elmiko	sure, but uuid is really easy to validate on input	15:27
egafford	elmiko: Allowing name to serve as the reference allows automators to have a set of payloads that will reference one another without templating, which is nice.	15:28
tmckay	but in this case, replacement is only going to happen if a database search matches a data_source	15:28
tmckay	if it doesn't, the value is left alone. you must have meant something else	15:29
tmckay	name also doesn't get stale if delete/recreate the data source object. Which could be argued as a benefit, or not	15:30
tmckay	which is why having the user option to use either as a lookup is nice :) Decide for yourself	15:30
elmiko	ok, i need to read the original spec again, this is getting very confusing for me	15:30
elmiko	right, but it sounds like, as you talk through it, that validating the wide range of options for name input is getting sticky	15:31
tmckay	well, the way the spec is currently written, if you have a collision (a literal arg matches a data_source name) then you just don't use the feature.	15:32
tmckay	trying to find a simple way to avoid "just don't use the feature"	15:32
tmckay	I think it's highly unlikely that case comes up	15:32
tmckay	maybe "just don't use it" is enough	15:32
*** tnovacik has joined #openstack-sahara		15:33
tmckay	hmmm, actually -- you could always make another copy of your data source with duplicated path info, new name, and reference that	15:34
tmckay	that's a user workaround.	15:34
tmckay	alright, that's probably enough then. I don't think we need a mechanism to escape literals	15:35
tmckay	if we do, we know at least two ways. 1) a list and 2) prefix	15:36
elmiko	yea, i would need to think about this more before adding anything useful, sorry :/	15:37
tmckay	elmiko, egafford, thanks for indulging me	15:37
elmiko	i like the spec as written, but you are bringing up issues that are making me think twice about it	15:37
tmckay	elmiko, :) I'm convinced that's why it takes longer to do things the more experience you have. Counter-intuitive, you would think it would go faster.	15:38
egafford	tmckay: Seems to've been useful.	15:38
tmckay	elmiko, but the longer we all do this, the better we become at poking holes in our own solutions.	15:39
tmckay	ah, to be blissfully unaware of corner cases, like college	15:40
tmckay	"This is bullet proof!"	15:40
elmiko	tmckay: well, the thing is, now you have me thinking about why not change the job execution json validation to allow lists in the input_id and output_id fields, and then just use those for all jobs	15:41
elmiko	it's a much bigger change, but maybe more appropriate	15:41
*** zhidong has quit IRC		15:41
tmckay	elmiko, may have some utility, but the trouble is that the arg list for Java/Spark is unconstrained	15:42
tmckay	How do we know the order in which to pass the data sources?	15:42
tmckay	elmiko, MyCrazyWordcount takes ... what?	15:42
elmiko	tmckay: yea, that's the confusing part lol	15:43
tmckay	a recipe for muffins, number of slices, input, blog entry, output	15:43
tmckay	could be	15:43
elmiko	tmckay: what if we just start with implementing the spec as written, and then patch as we discover new cases?	15:44
tmckay	elmiko, yeah, I think that's where I ended up. The literal case is the corner case, and there are (slightly unseemly) ways to solve that	15:44
elmiko	i mean, currently it seems simple. parse the arg list, then replace values, if error then raise	15:44
tmckay	yep	15:45
egafford	tmckay: For Pig, as well, the $INPUT field is only convention, really; it's just a kwarg in the end, and could be anything else in theory. Often is in complex scripts.	15:45
tmckay	on error, rename your data sources, change your app, or turn off the feature	15:45
elmiko	tmckay: to start, yea.	15:45
tmckay	egafford, absolutely. I filed a bug about that a long time ago.	15:45
tmckay	We need customizable param names	15:45
egafford	tmckay, elmiko: We can demand convention for "sahara-compliant" scripts, but beyond that, it gets very difficult to actually inject inputs reliably in any case.	15:46
tmckay	egafford, there is a workaround, though.	15:46
tmckay	crobertsrh (wizard man) has been strangely silent during all this	15:46
egafford	tmckay: I'd be interested to hear it. Well, wizard men are a notoriously mysterious people.	15:47
tmckay	this effects you directly you know, pal	15:47
crobertsrh	sorry, wasnt reading along	15:47
tmckay	:) np, just joking	15:47
tmckay	egafford, oh, workaround is satisfy Sahara with dummy data sources then add custom param values to the job submission. Oozie will pass all to the Pig app on the commandline, app will ignore ones it doesn't need. I had to do this.	15:48
tmckay	app used $input and $output	15:48
tmckay	lowercase	15:48
tmckay	you can do this from the UI now	15:49
elmiko	tmckay: i like the simplicity and flexibility of the current spec. imo, if we were to add a prefix for the DataSource objects i would think something along the lines of "sahara://" might make sense, but i think we should burn that bridge when we get to it.	15:50
tmckay	cross that bridge?	15:50
tmckay	or just burn it outright	15:51
elmiko	i meant what i said! ;)	15:51
tmckay	extreme programming paradigm. Just burn it.	15:51
*** Poornima has joined #openstack-sahara		15:52
elmiko	joking aside though, currently a user might use something like "hdfs://datasource" or "swift://container/datasource" in their command line, is that accurate?	15:52
tmckay	well, they can supply the url	15:53
elmiko	so they could also use "http://datasource" ?	15:53
tmckay	for Java wordcount, that is what you have to do	15:53
tmckay	only if http://datasource is a literal path	15:54
elmiko	ok, they could also use "/some/path/to/datasource" ?	15:54
tmckay	right now, in Sahara, that would be legal if the datasource type is hdfs and it would be interpreted in hadoop as a relative path in the hadoop user's hdfs directory	15:56
elmiko	ok	15:56
elmiko	hmm	15:57
tmckay	well, actually, an abosulte path in the local hdfs	15:57
tmckay	relative it leading / is missing	15:57
elmiko	yea	15:57
elmiko	i dunno, now i'm almost thinking that "sahara://datasource_name" makes a certain amount of sense	15:57
tmckay	so the spec is talking about just replacing "my_input" with (pseudo) select path from data_sources where name == my_input	15:58
tmckay	that's the prefix idea, in general. I wouldn't use an http schema, it implies that there could be node/user stuff in there	15:59
tmckay	but "sahara." or "sahara:" I could see	15:59
tmckay	it's the opposite of marking literals. mark the items to be interpreted	15:59
elmiko	i suggest the URI schema because it follows the others like swift:// or hdfs://	16:00
tmckay	yeah, or internal_db://	16:00
tmckay	I think we use that	16:00
tmckay	that's actually what it is	16:00
elmiko	right, so sahara:// makes sense to me	16:00
tmckay	I mean, it's literally in the internal db	16:01
tmckay	small chance that would ever be a literal arg to a hadoop job	16:01
elmiko	plus, if you ever needed some sort of specific infor per data source you could a known pattern like sahara://datasource?extrainfo=foo or some such	16:01
tmckay	true	16:02
elmiko	i dunno, i'm just spit balling. like i said, i like the current spec for it's simplicity	16:02
tmckay	good ideas	16:02
elmiko	k, api wg meeting start, i might be distracted	16:03
tmckay	I like the tweak of allowing name, uuid, or both as the lookup key and changing the param from just a boolean to a string	16:04
tmckay	leave everything else the same	16:04
tmckay	elmiko, thanks, I've got enough to go on	16:04
*** Poornima has quit IRC		16:07
*** tellesnobrega_ has joined #openstack-sahara		16:07
*** miqui__ has joined #openstack-sahara		16:15
*** tmckay has quit IRC		16:16
*** clds_ has joined #openstack-sahara		16:16
*** tnovacik has quit IRC		16:18
*** miqui_ has quit IRC		16:18
*** clds has quit IRC		16:18
*** tnovacik has joined #openstack-sahara		16:19
*** mattf is now known as _mattf		16:24
*** tnovacik has quit IRC		16:28
SergeyLukjanov	elmiko, _mattf any good news about oslo sync? :)	16:32
elmiko	SergeyLukjanov: i didn't look any further, i thought _mattf might be getting it. i'll make sure to talk with him today though	16:33
SergeyLukjanov	elmiko, thx!	16:33
*** skolekonov has quit IRC		16:48
*** IvanBerezovskiy has quit IRC		16:51
*** IvanBerezovskiy1 has joined #openstack-sahara		16:51
*** tmckay has joined #openstack-sahara		16:51
*** IvanBerezovskiy1 has left #openstack-sahara		16:56
*** Longgeek_ has joined #openstack-sahara		16:57
*** tellesnobrega_ has quit IRC		17:11
*** Longgeek has quit IRC		17:12
*** tellesnobrega_ has joined #openstack-sahara		17:25
*** samuelms-away is now known as samuelms		17:33
tellesnobrega	the meeting was earlier today right?	17:39
elmiko	tellesnobrega: yea, 1400utc	17:39
tellesnobrega	elmiko, :( missed it	17:40
elmiko	:(	17:40
tellesnobrega	got a little busy and forgot about the time change	17:40
tellesnobrega	next week i will be there	17:40
elmiko	cool, next week is the later time	17:40
*** tellesnobrega_ has quit IRC		17:44
*** tosky has joined #openstack-sahara		17:54
jodah	wow, 6am meeting time now	18:05
elmiko	jodah: every other week	18:06
jodah	ok	18:06
tosky	argh, I didn't realize it was moved earlier today	18:13
openstackgerrit	Matthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator strutils https://review.openstack.org/139140	18:28
openstackgerrit	Matthew Farrellee proposed openstack/python-saharaclient: Updating oslo-incubator https://review.openstack.org/139141	18:28
openstackgerrit	Matthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator cliutils https://review.openstack.org/139142	18:28
openstackgerrit	Matthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator apiclient.exceptions https://review.openstack.org/139143	18:28
openstackgerrit	Matthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator importutils https://review.openstack.org/139144	18:28
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Removed _i18n module, it is not used directly https://review.openstack.org/139146	18:42
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Update oslo-incubator lockutils https://review.openstack.org/139147	18:42
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Update oslo-incubator log https://review.openstack.org/139148	18:42
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Update oslo-incubator policy https://review.openstack.org/139149	18:42
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Update oslo-incubator threadgroup https://review.openstack.org/139150	18:42
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Update oslo-incubator periodic_task https://review.openstack.org/139151	18:42
*** _mattf is now known as mattf		18:49
*** openstackgerrit has quit IRC		18:50
*** openstackgerrit has joined #openstack-sahara		18:50
* mattf flexes and pins sahara ci to the ground		18:50
* mattf tries to remember why we use oslo logging instead of python logging		18:50
mattf	at least one incubating module switched to python logging to remove a dep this time around	18:51
*** Networkn3rd has quit IRC		18:52
*** witlessb has quit IRC		18:56
elmiko	mattf: thanks =)	18:56
*** witlessb has joined #openstack-sahara		18:59
mattf	elmiko, at your service	19:04
elmiko	woot!	19:04
*** tellesnobrega_ has joined #openstack-sahara		19:06
openstackgerrit	Merged openstack/sahara-image-elements: Fix bashate errors https://review.openstack.org/138976	19:12
openstackgerrit	Matthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator strutils https://review.openstack.org/139140	19:17
openstackgerrit	Matthew Farrellee proposed openstack/python-saharaclient: Updating oslo-incubator https://review.openstack.org/139141	19:17
openstackgerrit	Matthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator cliutils https://review.openstack.org/139142	19:17
openstackgerrit	Matthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator apiclient.exceptions https://review.openstack.org/139143	19:18
openstackgerrit	Matthew Farrellee proposed openstack/python-saharaclient: Update oslo-incubator importutils https://review.openstack.org/139144	19:18
*** Longgeek_ has quit IRC		19:25
openstackgerrit	Trevor McKay proposed openstack/sahara-specs: [EDP] Add options supporting DataSource identifiers in job_configs https://review.openstack.org/138809	19:34
tmckay	elmiko, egafford, okay I generalized a bit and rewrote the spec. No reason it shouldn't apply to all job types, and all subitems of job_configs, and allow name or uuid	19:35
tmckay	darn it, didn't change the bp ref	19:36
openstackgerrit	Trevor McKay proposed openstack/sahara-specs: [EDP] Add options supporting DataSource identifiers in job_configs https://review.openstack.org/138809	19:37
*** tellesnobrega_ has quit IRC		19:38
elmiko	tmckay: cool, i'll take a look	19:39
tmckay	blah, whitespace	19:40
tmckay	gerrit, come on, make it green	19:40
tmckay	tox doesn't catch that	19:40
*** tellesnobrega_ has joined #openstack-sahara		19:42
elmiko	don't worry, i got a few things to get at before i can brandish the -1 sword of justice ;)	19:44
tmckay	elmiko, I'll wait and push a single patch with any other fixes	19:45
elmiko	cool	19:45
egafford	I imagine that we're calling the "the user has decided to name their data source a UUID and it collides with another data source's random UUID" case absurd enough not to mention, given that it is absurd enough not to mention.	19:48
egafford	He says, mentioning it.	19:48
elmiko	lol	19:49
elmiko	i think if a user is bold enough to use UUIDs for names then they probably know what they are doing, or are a robot =)	19:50
egafford	Or a particularly dedicated QA engineer, but yeah. You'd pretty much have to want that bug.	19:51
elmiko	seriously dedicated...	19:51
egafford	A massive tangent to this spec: is there a reason why this sort of flexible mapping should only apply to data_sources? job_binary_internal, job_binary, job, and even cluster could all be sanely referenced by name using a similar strategy, with similar benefits.	19:55
* mattf curses the convoluted tox mess on fedora		19:56
elmiko	i dunno, it makes sense to me in terms of sending custom command lines to the processing engines. but in a more general sense i think it breaks OpenStack convention to not use the IDs in the REST API.	19:56
elmiko	or did i miss your point?	19:57
egafford	Not that this spec can't stand without those changes, of course, even if they were deemed desirable. elmiko: Cool, argument by convention is a good argument. No, not at all.	19:57
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Update oslo-incubator log https://review.openstack.org/139148	19:58
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Update oslo-incubator policy https://review.openstack.org/139149	19:58
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Update oslo-incubator threadgroup https://review.openstack.org/139150	19:58
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Update oslo-incubator periodic_task https://review.openstack.org/139151	19:58
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Removed _i18n module, it is not used directly https://review.openstack.org/139146	19:58
openstackgerrit	Matthew Farrellee proposed openstack/sahara: Update oslo-incubator lockutils https://review.openstack.org/139147	19:58
egafford	Allowing name over id increases risk and provides a little ease of use. I can see why you'd only want to allow that at the most transient and repeated point of use.	19:58
openstackgerrit	Michael McCune proposed openstack/sahara-specs: Adding security guidelines documentation spec https://review.openstack.org/139170	19:59
elmiko	egafford: well, and are we talking about using name in a json template for a POST operation?	20:00
tmckay	I'll read back	20:00
tmckay	I'm starting to wonder if there is a better way -- as we alluded to, allow data sources to be lists and then map the elements to parameters, configs, or arg positions	20:01
tmckay	this is another example of stuff growing from a first case (mapreduce) where single input/output was the norm	20:02
*** mattf is now known as _mattf		20:02
egafford	elmiko: Yup, that's what I'm talking about. tmckay: That's dangerous; depending on how your script/job is written, it could get very hairy to try to infer input placement without demanding a lot of conventions.	20:02
elmiko	tmckay: yea, but you pointed out a good issue, namely the need for something more verbose than just a list	20:02
*** openstackgerrit has quit IRC		20:04
tmckay	it would be a map, no inference	20:04
*** openstackgerrit has joined #openstack-sahara		20:04
elmiko	egafford: i suppose as long as you could ensure that names were unique it would be ok, but i think decoupling a name from a unique id is generally a win	20:04
tmckay	you give me all your stuff, and tell me where it goes	20:04
egafford	tmckay: Maps are viable.	20:04
tmckay	inference is part of the trouble we have now ($INPUT for all pig/hive, for example)	20:04
tmckay	I would like to get this right, once. Maybe I'll think of a general overhaul a bit.	20:05
elmiko	my issue with a map is that we add a condition like we have with the configs where it can become very difficult to validate the map	20:05
egafford	tmckay: Absolutely; we're effectively demanding both convention and artificial limitation now, so I'm not arguing against the effort by any means.	20:05
tmckay	I think the spec is an improvement, but I'm not convinced that it can't be better still	20:06
tmckay	but, it needs to fall within the scope for kilo too :)	20:06
*** navesta has joined #openstack-sahara		20:06
elmiko	maybe you could keep it to lists if they were ordered or something, but that places limitations on the application writers to conform to how Sahara lays out the args	20:06
egafford	elmiko: It's difficult to validate, but it's a much harder problem to determine where solely positional input arguments should be placed.	20:06
tmckay	a map structure could be something like this:	20:07
elmiko	egafford: agreed, that's kinda why i like the config substitution method, it avoids the complication of interpolating lists or something...	20:07
tmckay	data source container is a list or a dict, value for each item holds the id of the data source and either an integer or a string	20:08
tmckay	integer is an arg position, string is a parameter or a config name	20:09
tmckay	very rough, but something like that	20:09
egafford	The integer value will intersect with the args list in what way? Slice notation?	20:09
tmckay	I was thinking just a position. But you need some sane way to merge that with other args	20:10
tmckay	maybe we should make the user do everything	20:11
tmckay	maybe trying to help at all is to much	20:11
elmiko	tmckay: i like what you're proposing, but it smells complicated	20:11
tmckay	maybe the UI should simply allow you to pop open a data source and copy/paste strings	20:11
tmckay	everything else is up to you	20:11
tmckay	wizard be darned	20:12
elmiko	tmckay: +1 about letting the user control, that's why i like the current spec with substitution. it gives the user a lot of freedom.	20:12
egafford	tmckay, elmiko: I agree with elmiko here. I think that in the map case, it works, but that in the positional case, it's too error-prone and power-user.	20:12
crobertsrh	Keep taking the wizard in vane will you?	20:15
* crobertsrh adds a smite button to UI		20:16
tmckay	:) My head is hurting trying to be all things to all people	20:16
tmckay	somewhere there is a line	20:16
egafford	Honestly, it seems to me that trying to provide any field definition for inputs and outputs is trying to assume a level of consistent abstraction that just doesn't exist across the engines we're trying to support. Referencing data sources as persistent sahara constructs is great, but it seems to me as though config/arg/param substitution is the only way to be all the things.	20:17
elmiko	egafford: +1	20:18
tmckay	agreed. So then the next question is, if we have this substitution mechanism, what do we do with the simple case that we have today? Pig job, input, output, Sahara maps it for you	20:18
tmckay	scrap it, and the wizard can ask you questions, I suppose	20:18
elmiko	tmckay: that's a good question	20:18
tmckay	Pig Wizard (I like that)	20:18
egafford	pig wizard: +1	20:19
tmckay	Pig Wizard says, "oh, I see you selected a data source. What parameter name would you like that to map to in your pig script?"	20:19
tmckay	because that's the real question	20:19
tmckay	if it can ask that ^^ we are all set	20:19
tmckay	UI just sends done a parameter list with the pig script var names and the data source references	20:19
tmckay	the whole "one input one output" thing is toast	20:20
tmckay	crobertsrh, ^^ okay read this one	20:20
tmckay	;-)	20:20
elmiko	tmckay: ooph, talk about can of worms... that sounds like it could get crazy complicated it little to no time.	20:20
elmiko	i think we almost have to err on the side of allowing the user to control all those mappings	20:21
tmckay	elmiko, I think the complication is on the wizard side. The mechanism is simple.	20:21
elmiko	tmckay: that's kinda what i meant	20:21
tmckay	elmiko, sure. If you want to go wizardless, you can do it all. But the wizard is supposed to hold your hand.	20:21
tmckay	right?	20:21
egafford	A decent first step would be to just request the pig arg name as a string and trust the user to map it, if the complexity is seen as too high.	20:21
elmiko	tmckay: well, maybe the wizard can only hold your hand if you need 1 input/1 output?	20:22
elmiko	i think it's perfectly acceptable to assume that the wizard can only perform certain types of actions	20:22
tmckay	elmiko, hmmm. If you know the script you're running, you know the expected var names. Sahara actually doesn't, it just guesses.	20:23
tmckay	if you can specify 1 data source, why not N? The only difference is that currently we assume $INPUT and $OUTPUT for the standard job	20:24
crobertsrh	Oh dear. All things "wizard" are completely theoretical at this point. I say design something that is usable and reasonably flexible for now. The UI will require magic no matter which way we go.	20:24
tmckay	well, you might know the var names. If you don't, you could look :)	20:24
tmckay	I'm searching for the grand unified theory of data sources	20:25
tmckay	too lofty	20:25
*** navesta has quit IRC		20:25
crobertsrh	Of course the grand unified data source theory according to mattf is "get rid of them"	20:25
elmiko	hmm	20:25
elmiko	lol	20:25
tmckay	yeah, that goes back to the "fill in the box with the url, dude" model	20:26
tmckay	less is more	20:26
elmiko	tmckay: +1 to less is more	20:27
tmckay	I've spammed openstack-sahara quite a bit today. Sorry	20:27
elmiko	tmckay: the grand unified solution would be awesome, but i think we should go with the simple straight forward solution to start with	20:27
elmiko	why apologize, it's usually so quiet in here you could hear a mouse...	20:27
egafford	I actually really like the data sources as an option. It makes sense to me that they exist, and can be managed and encapsulated. Lets you change out your entire data store and change one thing.	20:28
crobertsrh	It's stuff we need to really figure out	20:28
elmiko	crobertsrh: +1	20:28
tmckay	crobertsrh, what if we made data sources not required for Pig/Hive/MapReduce?	20:28
crobertsrh	And just use args?	20:29
crobertsrh	or params/configs?	20:29
egafford	Projects have spent tons of money to make changing out their entire data store possible, with much less likelihood of that happening, and with much less success than we already have.	20:29
tmckay	lifting that restriction, coupled with substitution would add some flexibility	20:29
crobertsrh	If it helps things, sure.	20:30
tmckay	I'll noodle that some more. I	20:30
elmiko	tmckay: that might be nice, shouldn't be too tough to mock up and try out	20:30
tmckay	I'm thinking of a split between "hey let me do this for you" and "I am Hadoop! Leave me alone"	20:30
tmckay	you choose	20:30
tmckay	today, if you want to do it all, for PigHiveMapreduce you still have to pass dummy data_sources. Blah.	20:31
elmiko	imo, in regards to the grand unified thingie, it would be really cool to allow the input_id and output_id fields to allow lists, and then just make it explicit how Sahara will generate the args so that users can always expect what will happen.	20:31
openstackgerrit	Andrew Lazarev proposed openstack/sahara: Update sahara.conf.sample after oslo.msg release https://review.openstack.org/139046	20:31
tmckay	yes	20:31
elmiko	tmckay: agreed on the blah...	20:31
tmckay	my dog is completely unaware of this conversation	20:32
egafford	tmckay: So at the moment, are we driving toward removing the data_source fields on the job_execution post, and only providing config/arg/param substitution, or some kind of map-based payload field / job config hybrid?	20:32
elmiko	tmckay: lol same here =)	20:32
egafford	Your dog has a much fuller and emptier life than we do, depending on your metrics.	20:33
tmckay	egafford, we might be. I like the spec, I think we agree on it. Next question is whether or not there is something else we can do to ease usage and increase consistency	20:33
elmiko	egafford: lol, totally	20:33
tmckay	without breaking backward compatibility	20:34
tmckay	ie, if we do do something, is there an alembic migration for it, etc etc	20:34
elmiko	tmckay: i think taking out the mandatory inclusion of data sources for pig/hive/mr might be a nice spec after the substitution gets implemented	20:34
tmckay	Probably worth spending another half a day thinking about	20:34
egafford	tmkcay: +1 for incrementalism.	20:34
tmckay	elmiko, yeah, just relaxing the constraint helps	20:35
tmckay	baby steps	20:35
elmiko	tmckay: +1 to baby steps	20:35
tmckay	alright guys, thanks, I'll shut up now, eat a snack, and write some cod	20:35
tmckay	well, code	20:35
egafford	Don't fix that... too late.	20:35
tmckay	I don't write cod. They squirm	20:35
elmiko	on a different topic, check this out https://github.com/stackforge/anchor	20:35
tmckay	ooo	20:36
elmiko	Anchor is a test project to provide ephemeral PKI for openstack, pretty cool	20:36
tmckay	sounds like Summit	20:36
tmckay	is this the talk we saw?	20:36
elmiko	yea	20:36
tmckay	I was just thinking about Barbican today	20:36
elmiko	it's an implementation of it	20:36
elmiko	the sec guys are talking about possibly working Anchor into Barbican somehow	20:36
elmiko	or letting Barb leverage Anchor	20:37
tmckay	still mulling how to make Oozie more secure in the back of my mind. Hadoop -> Barbican ... ??	20:37
elmiko	right	20:37
elmiko	Hadoop -> proxied kerb (Barb or Keystone)	20:37
tmckay	could we issue one set of creds to Hadoop, to allow access to Barbican, and then pull a key or cert per job?	20:37
elmiko	still trying to figure that one out	20:37
tmckay	that would be ideal	20:38
elmiko	what i'd like to see is something like this...	20:38
elmiko	sahara generates a secret for the proxy user and stores it in Barb	20:38
elmiko	then the nodes use one of their keys(known to the controller) to access the secret in Barb	20:39
elmiko	that way sahara distributes nothing	20:39
tmckay	the only way you can break that is if you have a key to one of the nodes	20:39
tmckay	but if you have that, you can do anything -- replace hadoop binaries with your own stuff, etc	20:39
elmiko	right	20:39
tmckay	so you're no worse off	20:39
elmiko	exactly	20:40
tmckay	sounds awesome	20:40
elmiko	i brought this up at summit with the Barb guys, they were actually kinda interested in our use case	20:40
tmckay	gets rid of the oozie issues, solves it for spark too	20:40
tmckay	well, anything that uses the hadoop plugin	20:40
elmiko	yea that's what i was hoping, but it does add complexity to the hadoop-openstack component	20:41
tmckay	that's okay, imho	20:41
elmiko	it just stresses the need for us to maintain and release that though	20:41
tmckay	yes, I think so	20:42
tmckay	brb	20:42
crobertsrh	Ok. Editing of node group templates is done on the UI side :) Doing the UI stuff before the backend is done (or started) kinda feels like 1) UI work 2) yada yada yada 3) done.	20:48
tmckay	heh	20:50
elmiko	crobertsrh: unfortunately the "yada yada yada" step contains the really tough stuff =(	20:50
crobertsrh	Exactly :)	20:51
tmckay	you know, one more wrinkle in this data source thing I didn't mention	20:51
tmckay	deletion constraints	20:51
elmiko	also, nice Seignfeld reference lol	20:51
crobertsrh	I'm trying to ice away as much of the UI stuff for Kilo as possible before the wizzerful wizard work takes off.	20:51
elmiko	crobertsrh: smart thinking	20:51
tmckay	you can't delete a datasource refeenced by a jobexec, I believe	20:51
crobertsrh	ty	20:51
elmiko	tmckay: i think that's correct	20:52
crobertsrh	ty on the Seinfeld ref.....actually ty on the "smart thinking" bit too :)	20:52
tmckay	so if we leave name or uuid references in the jobexec, that restriction is not enforced.	20:52
elmiko	crobertsrh: i feel like all the UI stuff from Juno was a big learning lesson	20:52
tmckay	unless we add a column that stores a list of ids	20:52
crobertsrh	Oh....while we're thinking of editing things and data sources....any chance that we need data source edit?	20:52
elmiko	tmckay: i would think that if we leave the non-deletion behavior in place that we might need to do the substitution check on deletion or something?	20:53
crobertsrh	UI for Juno was close to == the merge process.......no shortage of learning took place	20:53
elmiko	crobertsrh: i would think yes, edit all the things!	20:53
*** tnovacik has joined #openstack-sahara		20:54
tmckay	maybe. I was thinking the refs would stay in the job exec, but maybe if we wrote the changes through --- there wouldn't be any need to retain the data source	20:54
tmckay	deletion wouldn't matter	20:54
elmiko	tmckay: maybe we need to think about another level of indirection here. like a database object that models a relationship between a job execution and a data source?	20:54
tmckay	it's not like you can edit a data source now (can you?)	20:54
tmckay	ooo, nice idea	20:54
elmiko	tmckay: then we could have several relationship objects that model how they relate to a single execution. plus then we can perform ops on the relations	20:55
elmiko	i dunno, might be too complicated. again, just spitballing	20:55
crobertsrh	extra complicated if you start allowing data sources to be edited	20:56
elmiko	well, the relationship wouldn't need to change, just the data source	20:56
crobertsrh	I think my head is starting to hurt a bit. I should stick to UI stuff :)	20:57
elmiko	lol	20:58
elmiko	no ducking out now ;)	20:58
tmckay	well, if I write the changes through (I said I wouldn't) then there should be no expectation of a deletion constraint. But we're back to the inconsistency with the fixed input/output fields	20:59
crobertsrh	anything I can't figure out in the UI is clearly "an API problem"	20:59
tmckay	I bet there is a sqlalchemy way to store a list of foreign ids	20:59
tmckay	and have the constraint enforced	20:59
elmiko	crobertsrh: lol, nice!	20:59
elmiko	tmckay: i think so, some sort of many-to-one relation?	21:00
tmckay	yeah	21:00
tmckay	secondary problem, I guess	21:00
*** tellesnobrega_ has quit IRC		21:14
*** ViswaV has joined #openstack-sahara		21:20
openstackgerrit	Merged openstack/sahara-image-elements: Simplification: wget+rpm -> rpm https://review.openstack.org/139083	21:33
openstackgerrit	OpenStack Proposal Bot proposed openstack/sahara: Updated from global requirements https://review.openstack.org/139209	21:34
openstackgerrit	Merged openstack/sahara-image-elements: Use Fedora url to get EPEL https://review.openstack.org/139084	21:35
*** crobertsrh is now known as _crobertsrh		22:07
*** tellesnobrega_ has joined #openstack-sahara		22:16
*** hdd has quit IRC		22:19
*** hdd has joined #openstack-sahara		22:20
openstackgerrit	Michael McCune proposed openstack/sahara-specs: Adding security guidelines documentation spec https://review.openstack.org/139170	22:22
*** Longgeek has joined #openstack-sahara		22:25
*** egafford has quit IRC		22:30
*** Longgeek has quit IRC		22:30
*** ViswaV has quit IRC		22:37
openstackgerrit	OpenStack Proposal Bot proposed openstack/sahara: Updated from global requirements https://review.openstack.org/139209	22:43
*** ViswaV has joined #openstack-sahara		22:49
*** ViswaV_ has joined #openstack-sahara		22:51
*** ViswaV has quit IRC		22:54
*** miqui__ has quit IRC		22:54
openstackgerrit	Andrew Lazarev proposed openstack/sahara: Disabled requiretty in cloud-init script https://review.openstack.org/138942	23:18
*** jamielennox has joined #openstack-sahara		23:24
jamielennox	hey all, 3 +2s on https://review.openstack.org/#/c/138211/ - can someone kick it off for me?	23:25
*** Wenjie has joined #openstack-sahara		23:36
*** witlessb has quit IRC		23:38
*** Networkn3rd has joined #openstack-sahara		23:54

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!