Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading Config from SymLink #2

Open
bradleyfay opened this issue Dec 1, 2016 · 3 comments
Open

Reading Config from SymLink #2

bradleyfay opened this issue Dec 1, 2016 · 3 comments

Comments

@bradleyfay
Copy link

I'm working with the Sparklyr package and trying to write a custom config.yml file.

As part of my workflow, I work across multiple devices on a linux system. In order to use a single config.yml file, I'm trying to symlink from one drive to another. When I create the symlink, the config package is not able to read the file on the symlink drive.

Has anyone else run into this problem?

@bradleyfay
Copy link
Author

Also, I'm pretty sure this isn't traversing the directory correctly.

I tested putting an actual file, not symlink, in a parent directory. this is unable to find that file

@manselmi
Copy link

manselmi commented Dec 1, 2016

@bradleyfay: If to the sparklyr::spark_config function you pass either an absolute path or a path relative to R's current working directory (more on this below), then the underlying config::get function call should be able to find and read from your configuration file if the file (or symlink to the actual file) exists at the specified location (more on this below).

However, I am surprised to see sparklyr::spark_config rely on config::get for locating the configuration file, because if for example you do have a config file at /tmp/config.yml, do not have a config file at /tmp/subdir/config.yml, and call config::get("/tmp/subdir/config.yml"), /tmp/config.yml is loaded with no error or warning displayed to the user. config::get's behavior here is documented so this isn't a config::get bug. In my view, this is a sparklyr::spark_config bug as sparklyr is not warning users of this surprising behavior. @bradleyfay, I suggest you open an issue about this on sparklyr's issue tracker.

All that said, taking a brief look at the code in config::get responsible for locating a config file, I see a few of issues.

  1. Let's take the file and directory structure from the example in paragraph 2 above. Ensure that R's current working directory is /tmp/subdir, and invoke config::get("config.yml"). config::get is unable to ascend to the parent directory because both normalizePath calls return "config.yml" (line 31, line 35), so file_dir <- dirname(file) assigns "." to file_dir and parent_dir <- dirname(file_dir) assigns "." to parent_dir, and the while loop is broken out of.
  2. If the config file is located in the root directory, it's possible for parent_dir to equal "/", in which case the assignment file <- file.path(parent_dir, basename(file)) will assign "//config.yml" to file, an invalid path that normalizePath can't fix.
  3. Assuming no symlinks are involved, one solution to (1) would be to change file_dir <- dirname(file) to file_dir <- normalizePath(dirname(file), mustWork = FALSE). However, this solution is fundamentally broken because it fails to account for symlinked directories. For example, assume that /tmp is a regular directory containing a regular file config.yml, but that /tmp/subdir is now a symlink that ultimately resolves to a directory that does not have /tmp anywhere above it in the directory hierarchy; say, /home/user/subdir. Then file_dir <- dirname(file) will assign /home/user/subdir to file_dir, resulting in traversing the directories /home/user/subdir, /home/user, /home, and /. Either no config file will be found, or even worse, a different config.yml file may be unexpectedly loaded. Users would probably expect the behavior that would result by replacing calls to normalizePath with something like Python's os.path.abspath, but as an R newbie, I don't know if R has such a function.

@manselmi
Copy link

manselmi commented Dec 2, 2016

I just noticed that when R is launched from within or under a symlinked directory, R's getwd() resolves the current working directory to an absolute path without any symlinks. I imagine this is what's causing issue (3) above, making my suggested improvement of something like Python's os.path.abspath unworkable.

Surprising that R does this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants