Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support categorical variables with CSVs #10153

Closed
esafak opened this issue May 16, 2015 · 4 comments
Closed

Support categorical variables with CSVs #10153

esafak opened this issue May 16, 2015 · 4 comments
Labels
API Design IO CSV read_csv, to_csv
Milestone

Comments

@esafak
Copy link

esafak commented May 16, 2015

It would be nice to be able to read CSVs with categorical variables using read_csv's dtype parameter instead of casting the columns after the fact.

@TomAugspurger TomAugspurger added API Design IO CSV read_csv, to_csv labels May 16, 2015
@TomAugspurger
Copy link
Contributor

I'm not opposed to this in principle, but I think the API will necessarily be clunky. Would we require (or allow) the user to specify all categories in the call to read_csv.

@esafak we do support categoricals in read/write_hdf if that's an option for you (it may not be).

@esafak
Copy link
Author

esafak commented May 16, 2015

Can't we already declare the dtypes of selected columns? I thought the problem was limited to categoricals, but if not, please expand my request to all dtypes.

@TomAugspurger
Copy link
Contributor

You can specify the types. I was just thinking

pd.read_csv('file.csv', dtypes={'A': np.int64, 'B': pd.CategoricalDtype(['cat1', 'cat2', 'cat3'])})

which means you'd need to know all the categories up front. Or we infer them and you'll need to check that they're aren't any surprising categories.

@sinhrks
Copy link
Member

sinhrks commented Jul 11, 2015

Nice workaround, but I think it is still nice to support category arg.

As a first step, how about converting the specified columns to Categorical after parsing? Though it is very nice to have optimized IO logic...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

4 participants