Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bizarre predictions #12

Open
RuABraun opened this issue Mar 12, 2019 · 3 comments
Open

Bizarre predictions #12

RuABraun opened this issue Mar 12, 2019 · 3 comments

Comments

@RuABraun
Copy link

RuABraun commented Mar 12, 2019

Think the results below speak for themselves. It would be helpful to know what sort of normalisation is expected since it seems WTL does not do any. Mostly seems to be newlines influencing the results.

from whatthelang import WhatTheLang as WTL
p = WTL()
s = 'Chapter 14: Person of Interest\n Chapter 16: The Debt\n'
p.predict_lang(s)  # fr
s = 'to actual persons, living or dead, \n business establishments, events,\n'
p.predict_lang(s)  # es
s = 'Chapter 11: Freedom\n Chapter 12: Everyday Life for Five Years\n'
p.predict_lang(s)  # fr
s = 'Prologue\n They were two beautiful people. Both strong and healthy, exactly what she was looking for.\n'
p.predict_lang(s)  # fr
s = '###\n By Heather Graham\n'
p.predict_lang(s)  # de
s = 'Raymond Stocker – Owner/operator of Nicoll’s Island amusement park\n Jasmine Stocker – wife of Raymond\n'
p.predict_lang(s)  # de
s = 'Prologue\n Flirtation lasts the brief flutter of a butterfly’s wing.\n'
p.predict_lang(s)  # fr
s = 'Any resemblance to places or actual persons,\n living or dead is entirely coincidental.\n'
p.predict_lang(s)  # es
s = 'Dream of the Fir Bolg\n Eochaidh mac Eirc, Ard Ri, High King of Ireland and leader of the Fir Bolg, stood on the cliff edge and looked out to sea. The night was clear, the water was calm, and the moon and stars observed the scene like many bright eyes.\n'
p.predict_lang(s)  # af
@whiletruelearn
Copy link
Contributor

whiletruelearn commented Mar 15, 2019

@RuABraun We don't do any preprocessing inside the library for removing \n, Thanks for bringing this up. Do you see the correct results once \n is removed ?

Since you have found this out, would you be interested in sending a PR for fixing this. You would have to update here

cc : @manojlds

@RuABraun
Copy link
Author

Removing \n and : fixes the predictions. I can do the PR. The way I would do it is by removing all newline and punctuation characters (as well as numbers as is already done). Sound good to you?

@whiletruelearn
Copy link
Contributor

Yeah that sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants