-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move default location of data to the user home folder #652
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're still in the early stages of the project, I think it's a good thing to use the more modern tools like pathlib
instead of os.path
.
Co-authored-by: Jérémie du Boisberranger <[email protected]>
skrub/datasets/tests/test_utils.py
Outdated
@mock.patch("os.path.dirname") | ||
def test_get_data_dir(mock_os_path_dirname): | ||
|
||
def test_get_data_dir(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_get_data_dir(): | |
@pytest.mark.parametrize("data_home_type", ["string", "path"]) | |
@pytest.mark.parametrize("subdirectory_name", [None, "tests"]) | |
def test_get_data_dir_existing_folder(data_home_type, subdirectory_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can parametrize with subdirectory_name
and it will avoid repeating code below
skrub/datasets/tests/test_utils.py
Outdated
with tempfile.TemporaryDirectory() as tmpdirname: | ||
tmpdirpath = Path(tmpdirname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with tempfile.TemporaryDirectory() as tmpdirname: | |
tmpdirpath = Path(tmpdirname) | |
with tempfile.TemporaryDirectory() as data_home: | |
directory = data_home if data_home_type == "string" else Path(data_home) |
skrub/datasets/tests/test_utils.py
Outdated
from skrub.datasets._utils import get_data_dir | ||
with tempfile.TemporaryDirectory() as tmpdirname: | ||
tmpdirpath = Path(tmpdirname) | ||
assert get_data_dir(data_home=tmpdirpath) == tmpdirpath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you can adapt the test with the variable names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
Here's a review for the implementation, I still have to review the tests :)
Edit: sorry for the interlacing reviews, might be annoying to filter 😅
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Lilian <[email protected]>
Co-authored-by: Lilian <[email protected]>
Co-authored-by: Lilian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few more comments!
I'll commit the type hints right away, it's a minor issue.
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Lilian <[email protected]>
The suggestions of @glemaitre are relevent and they are a bit too big for me to do right now. Also, please add the new function to the API reference (file: doc/api.rst) |
I'm trying to finish this off. I've merged with main (resolving conflicts). Will implement the changes |
OK, I have refactored the tests. Unless someone sees something problematic, I will merge soon (hopefully the CI will be green) |
This was a real bug! Thanks to @glemaitre |
Closes #641
Add a
get_data_home
function that returns the root data directory path, and create it if it does not already exist. By default, it is set to~/skrub_data
.Modify the
get_data_dir
function that returns the data directory where a given loader stores data. The output directory is a subdirectory of the root data directory.Implement tests for these two functions.