-
Notifications
You must be signed in to change notification settings - Fork 118
[WIP] Add Hadoop configuration files to driver pod. #190
Conversation
The idea is here but missing adding the configuration files to the executors as well as tests. |
@kimoonkim do you often see credentials in your @foxish are k8s config maps a safe enough way to send these files to the driver pod (pre-launch) or, if these contain sensitive materials, should we be using secrets instead? |
.filter { file => | ||
val isFile = file.isFile | ||
// We could theoretically make a way to load directories, but this isn't supported by YARN | ||
// and simplifies the logic greatly. Might be worth considering in a future iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding /etc/hadoop/conf
to the classpath makes new Configuration(true)
load defaults from files in that directory. Would have to verify that
mountPath: String): MountedHadoopConfiguration = { | ||
val confFilesContents: Map[String, String] = sys.env.get(confEnvVar) | ||
.map(new File(_)) | ||
.filter(_.isDirectory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind popping an INFO log line in here somewhere saying something along the lines of Sending the contents of /path/to/dir as HADOOP_CONF_DIR to the driver
@ash211 The majority of the time, YARN uses them for storing cluster-wide information, and I think that's the context here as well. If yes, I don't think we see a lot of sensitive information in these files. (@ssuchter would know more) Some uneducated users might mistake these for storing job specific information which could be potentially sensitive, but I am not sure what we can do about that. |
I agree. The *-site.xml files generally don’t have sensitive information.
Sometimes they contain pointers to other files (that have more restricted
permissions) that contain sensitive information. However, that use case
doesn’t really apply to us here, because of the difference between
momentary dockers vs long-lived hosts.
The job configs sometimes have semi-sensitive information, and sometimes
credentials, but good sites avoid that.
|
Based on that I think it's safe to defer on confidentially transferring these files to the pods -- in practical terms config maps probably the right choice here, and k8s secrets or more intricate mechanisms not worth the effort. |
getMountedHadoopConfiguration( | ||
confEnvVar = "HADOOP_CONF_DIR", | ||
configMapName = s"$kubernetesAppId-hadoopConfDir", | ||
mountPath = "/etc/hadoop/hadoopConfDir"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not /etc/hadoop/conf
? I think it's the default hadoop conf dir for most hadoop distros.
logWarning(s"Failed to list files from $confEnvVar at ${files._1.getAbsolutePath}") | ||
} | ||
isNull | ||
}.flatMap(_._2.toSeq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a scala compilation error here.
Neither of them is opaque really, but I'd go with secrets and not configmaps for credentials. |
Will re-do this on top of submission v2. |
Closes #130