Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #1009 #1018

Merged
merged 9 commits into from
Nov 16, 2016
Merged

Fix for issue #1009 #1018

merged 9 commits into from
Nov 16, 2016

Conversation

shashankg7
Copy link
Contributor

@shashankg7 shashankg7 commented Nov 14, 2016

This PR is to address the issue #1009. Removed has_pattern method from utils and added pattern library's installation check to lemmatize method. Warning will only appear if the user is using the lemmatize method and pattern library is not installed.

Also removed has_pattern method's call from corpora/wikicorpus's constructor.

@shashankg7 shashankg7 changed the title has_pattern removed from utils, pattern check moved to utils.lemmatize Fix #1009 Nov 14, 2016
@shashankg7 shashankg7 changed the title Fix #1009 Fix for issue #1009 Nov 14, 2016
@@ -258,7 +258,7 @@ class WikiCorpus(TextCorpus):
>>> MmCorpus.serialize('wiki_en_vocab200k.mm', wiki) # another 8h, creates a file in MatrixMarket format plus file with id->word

"""
def __init__(self, fname, processes=None, lemmatize=utils.has_pattern(), dictionary=None, filter_namespaces=('0',)):
def __init__(self, fname, processes=None, lemmatize=False, dictionary=None, filter_namespaces=('0',)):
Copy link
Contributor

@tmylk tmylk Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that default can't be True? True would be better

@@ -274,7 +274,16 @@ def __init__(self, fname, processes=None, lemmatize=utils.has_pattern(), diction
if processes is None:
processes = max(1, multiprocessing.cpu_count() - 1)
self.processes = processes
self.lemmatize = lemmatize
# if the user has set lemmatize flag as True, check if installed
if lemmatize == True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if lemmatize

is enough in python

self.lemmatize = lemmatize
# if the user has set lemmatize flag as True, check if installed
if lemmatize == True:
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for this code to be duplicated here and in lemmatize?
Just leaving it in lemmatize is enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's correct. So the lemmatize in the constructor should be set to default value True then, and pattern related checks will be handled in lemmatize method, is that right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the duplicate pattern library check code.

try:
from pattern.en import parse
except ImportError:
warnings.warn("Pattern library is not installed, lemmatization won't be available.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it cause an unhandled exception later on if parse not imported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it will create an unhandled expection later in the method where parse is used, since only a warning is created. I'll change that.

return pattern
import pattern
return True
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just catch ImportError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is just for testing if pattern is present or not, later to be called in wikicorpus method. Will adding ImportError make any difference?

Sorry I am not very good with exception handling. We hardly use them in our academic setting.

Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only check import in one place and catch only ImportError

raise ImportError("Pattern library is not installed. Pattern library is needed in order \
to use lemmatize function")
from pattern.en import parse
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only check import in one place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Please review.

try:
from pattern.en import parse
pattern = True
import pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep from pattern.en import parse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will change it. Are there any changes required in the exception handling part?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed it. During testing there was some bug in Travis's python2.7 and python3.3 initialization. Can you please check that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests seems to be passing now. Can it be merged ?

@tmylk tmylk merged commit 80e9c98 into piskvorky:develop Nov 16, 2016
@tmylk
Copy link
Contributor

tmylk commented Nov 16, 2016

Thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants