-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for issue #1009 #1018
Fix for issue #1009 #1018
Conversation
@@ -258,7 +258,7 @@ class WikiCorpus(TextCorpus): | |||
>>> MmCorpus.serialize('wiki_en_vocab200k.mm', wiki) # another 8h, creates a file in MatrixMarket format plus file with id->word | |||
|
|||
""" | |||
def __init__(self, fname, processes=None, lemmatize=utils.has_pattern(), dictionary=None, filter_namespaces=('0',)): | |||
def __init__(self, fname, processes=None, lemmatize=False, dictionary=None, filter_namespaces=('0',)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that default can't be True? True would be better
@@ -274,7 +274,16 @@ def __init__(self, fname, processes=None, lemmatize=utils.has_pattern(), diction | |||
if processes is None: | |||
processes = max(1, multiprocessing.cpu_count() - 1) | |||
self.processes = processes | |||
self.lemmatize = lemmatize | |||
# if the user has set lemmatize flag as True, check if installed | |||
if lemmatize == True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if lemmatize
is enough in python
self.lemmatize = lemmatize | ||
# if the user has set lemmatize flag as True, check if installed | ||
if lemmatize == True: | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for this code to be duplicated here and in lemmatize
?
Just leaving it in lemmatize
is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's correct. So the lemmatize in the constructor should be set to default value True then, and pattern related checks will be handled in lemmatize method, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the duplicate pattern library check code.
try: | ||
from pattern.en import parse | ||
except ImportError: | ||
warnings.warn("Pattern library is not installed, lemmatization won't be available.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it cause an unhandled exception later on if parse
not imported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it will create an unhandled expection later in the method where parse is used, since only a warning is created. I'll change that.
return pattern | ||
import pattern | ||
return True | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just catch ImportError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is just for testing if pattern is present or not, later to be called in wikicorpus method. Will adding ImportError make any difference?
Sorry I am not very good with exception handling. We hardly use them in our academic setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please only check import in one place and catch only ImportError
raise ImportError("Pattern library is not installed. Pattern library is needed in order \ | ||
to use lemmatize function") | ||
from pattern.en import parse | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please only check import in one place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Please review.
try: | ||
from pattern.en import parse | ||
pattern = True | ||
import pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep from pattern.en import parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will change it. Are there any changes required in the exception handling part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed it. During testing there was some bug in Travis's python2.7 and python3.3 initialization. Can you please check that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests seems to be passing now. Can it be merged ?
Thanks for the PR! |
This PR is to address the issue #1009. Removed has_pattern method from utils and added pattern library's installation check to lemmatize method. Warning will only appear if the user is using the lemmatize method and pattern library is not installed.
Also removed has_pattern method's call from corpora/wikicorpus's constructor.