-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25200][YARN] Allow specifying HADOOP_CONF_DIR as spark property #22289
Closed
Closed
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
f285081
read hadoop config dir from spark property
ae41a66
Merge remote-tracking branch 'upstream/master'
503c985
renames
5622511
test
d60736a
remove SPARK_TEST_HADOOP_CONF_DIR
d690b26
move conf.dir to end of classpath
bed3f44
use right prop in test
f8e828c
rename to hadoop.conf.dir
8bea68b
Revert "rename to hadoop.conf.dir"
3680c12
rename
df45ff9
flatten + fix test
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering how do we update the classpath to change to another hadoop confs with InProcessLauncher? Seems the classpath here is not changeable after JVM is launched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On another note, is this meant to extend to other resource-managers? As Kubernetes assumes only the ENV
HADOOP_CONF_DIR
, but if such a change is desirable this would cause a slight re-work of the current Hadoop Conf Files mounting logic.i.e.
"spark.mesos.conf.dir"
and"spark.kubernetes.conf.dir"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryshao My understanding is that this method is not used by the InProcessLauncher. So instead, the caller of InProcessLauncher has to make sure that the conf files are available to hadoop's Configuration class in the YarnClusterApplication. For example, by adding the config files to the calling thread's context class loader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saisai's question about the classpath configuration is actually the most complicated part of this feature. I haven't fully thought about how they would play out, but I really don't think it's as simple as appending this new config to the classpath.
e.g. what is the expectation if you run "spark-shell" with this option? Do you end up using the config from the env variable or from the config? If you have both, and you reference a file in
--files
that is on an HDFS namespace declared in thehdfs-site.xml
from the config, what will happen? (Answer: it will be ignored, since that is being masked by thehdfs-site.xml
from the env variable.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is quite tricky. My expectation is that it would behave the same way as if you pointed
HADOOP_CONF_DIR
andYARN_CONF_DIR
to different directories that both containhdfs-site.xml
. Files inHADOOP_CONF_DIR
would take precedence (as far as I know, nothing prevents this from happening). So with this new config, the order of preference would beHADOOP_CONF_DIR
,YARN_CONF_DIR
, thenspark.yarn.conf.dir
.Perhaps I could clarify this in the docs, but let me know what you think about it, I'm happy to implement other resolutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin Did you have time to think about how this config should work?