Truncates long streaming job confs #1773
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Adds
stream.jobconf.truncate.limit=20000
to jobconfs of streaming jobs. The 20000 number is configurable via thejobconf_truncate
task property.Motivation and Context
When running a streaming job with lots of inputs, the job conf gets passed to
each mapper. This can cause an exception from passing too many arguments to the
mapper. This job conf is not actually, needed, so it's safe to truncate. 20000
is recommended as a safe value for this truncation in
http://aajisaka.github.io/hadoop-project/hadoop-streaming/HadoopStreaming.html#What_do_I_do_if_I_get_a_error7_Argument_list_too_long
Have you tested this? If so, how?
Ran a MR job that failed without this. It now runs fine.