Added UTF-8 support in corpus #216

mbarisa · 2016-07-22T23:22:03Z

Diacritics did not work with Croatian corpus and I think it should help with other languages as well. I am planning to create a pull request for Croatian corpus as well these days.

kevin-brown · 2016-07-23T00:10:55Z

chatterbot/corpus/corpus.py

@@ -29,7 +29,7 @@ def read_corpus(self, file_name):
        """
        import json

-        with open(file_name) as data_file:
+        with open(file_name, encoding='utf-8') as data_file:


You probably want to use io.open for Python 2 support.

Ok I used io.open now, now CI checks passed.

Thanks

gunthercox · 2016-07-23T11:53:54Z

Looks good 👍

Added UTF-8 support in corpus

9ea48ea

kevin-brown reviewed Jul 23, 2016
View reviewed changes

Added UTF-8 support for python 2.7 as well

f3f061c

mbarisa mentioned this pull request Jul 23, 2016

Chinese Training using List Trainer #215

Closed

gunthercox merged commit 2c250f1 into gunthercox:master Jul 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added UTF-8 support in corpus #216

Added UTF-8 support in corpus #216

mbarisa commented Jul 22, 2016 •

edited

Loading

kevin-brown Jul 23, 2016

mbarisa Jul 23, 2016

gunthercox commented Jul 23, 2016

Added UTF-8 support in corpus #216

Added UTF-8 support in corpus #216

Conversation

mbarisa commented Jul 22, 2016 • edited Loading

kevin-brown Jul 23, 2016

Choose a reason for hiding this comment

mbarisa Jul 23, 2016

Choose a reason for hiding this comment

gunthercox commented Jul 23, 2016

mbarisa commented Jul 22, 2016 •

edited

Loading