Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added UTF-8 support in corpus #216

Merged
merged 2 commits into from
Jul 23, 2016
Merged

Added UTF-8 support in corpus #216

merged 2 commits into from
Jul 23, 2016

Conversation

mbarisa
Copy link

@mbarisa mbarisa commented Jul 22, 2016

Diacritics did not work with Croatian corpus and I think it should help with other languages as well. I am planning to create a pull request for Croatian corpus as well these days.

@@ -29,7 +29,7 @@ def read_corpus(self, file_name):
"""
import json

with open(file_name) as data_file:
with open(file_name, encoding='utf-8') as data_file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to use io.open for Python 2 support.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I used io.open now, now CI checks passed.

Thanks

@gunthercox
Copy link
Owner

Looks good 👍

@gunthercox gunthercox merged commit 2c250f1 into gunthercox:master Jul 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants