-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: make to_json & to_csv transformers have deterministic filenames #862
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor things, but this is really fantastic!
""" | ||
Write the data model to a .json file and return a url based data model. | ||
""" | ||
data_json = _data_to_json_string(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the logic locally and it works as expected. In particular I made multiple plots of the same dataset (one file) and then mutated the data frame (second file). Very nice!
altair/utils/data.py
Outdated
def _data_to_csv_string(data): | ||
"""return a CSV string representation of the input data""" | ||
check_data_type(data) | ||
if isinstance(data, pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking - shouldn't the csv transformer also work with dict/values as the json transformer does? I know the original code didn't work this way, but I don't see any reason to not add the same logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good – I'll add that.
altair/utils/data.py
Outdated
data = sanitize_dataframe(data) | ||
return data.to_csv(index=False) | ||
else: | ||
raise NotImplementedError('to_csv only works with Pandas DataFrame objects.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this error message would need to change...
altair/utils/tests/test_data.py
Outdated
@@ -63,3 +66,39 @@ def test_type_error(): | |||
for f in (sample, limit_rows, to_values): | |||
with pytest.raises(TypeError): | |||
pipe(0, f) | |||
|
|||
|
|||
def test_to_json(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also a simple test that covers the dict/values path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good idea.
I added support for dict input in to_csv, and also added tests of dict input for to_json and to_csv. |
Great, thanks! |
Addresses #857
The idea here is that when someone uses the
json
orcsv
data transformer, the dataframes are stored on disk with a filename that is determined from the hash of the data contents. This prevents the proliferation of temporary files when doing a lot of plotting.