Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: make to_json & to_csv transformers have deterministic filenames #862

Merged
merged 4 commits into from
May 22, 2018

Conversation

jakevdp
Copy link
Collaborator

@jakevdp jakevdp commented May 17, 2018

Addresses #857

  • update functions
  • add tests

The idea here is that when someone uses the json or csv data transformer, the dataframes are stored on disk with a filename that is determined from the hash of the data contents. This prevents the proliferation of temporary files when doing a lot of plotting.

@jakevdp jakevdp requested a review from ellisonbg May 17, 2018 18:59
@jakevdp jakevdp added this to the 2.1 milestone May 18, 2018
Copy link
Collaborator

@ellisonbg ellisonbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor things, but this is really fantastic!

"""
Write the data model to a .json file and return a url based data model.
"""
data_json = _data_to_json_string(data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the logic locally and it works as expected. In particular I made multiple plots of the same dataset (one file) and then mutated the data frame (second file). Very nice!

def _data_to_csv_string(data):
"""return a CSV string representation of the input data"""
check_data_type(data)
if isinstance(data, pd.DataFrame):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking - shouldn't the csv transformer also work with dict/values as the json transformer does? I know the original code didn't work this way, but I don't see any reason to not add the same logic here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good – I'll add that.

data = sanitize_dataframe(data)
return data.to_csv(index=False)
else:
raise NotImplementedError('to_csv only works with Pandas DataFrame objects.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this error message would need to change...

@@ -63,3 +66,39 @@ def test_type_error():
for f in (sample, limit_rows, to_values):
with pytest.raises(TypeError):
pipe(0, f)


def test_to_json():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also a simple test that covers the dict/values path?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good idea.

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 21, 2018

I added support for dict input in to_csv, and also added tests of dict input for to_json and to_csv.

@FlorianGD
Copy link

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants