-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warning if vocabulary is entirely single character elements in doc2vec #692
Comments
I'll take this up 👍 |
Yes! The test could be Other sanity checks based on frequent issues are also possible. A few ideas:
|
What about just checking the type of the argument passed ? I can imagine at the beginning of
|
|
Left to implement:
|
Working on fixing this but I'm not fully certain of how one might go about "calling |
@Dust0x Class methods and instance methods are different in python, so you can raise warning on one of them. |
@tmylk yep, I see that. Could something go wrong when calling What kind of warning should one be looking for here? A call to load by instance seems to be working fine for me. Am I missing something here? |
Do you see the difference in behaviour of the two methods? |
Even though calling-on-an-instance might superficially succeed, often the user doing so doesn't realize they'll be getting a newly-loaded instance back as the return value. (Rather, their mental model is that a load is happening into the supplied instance.) To make this clear, and emphasize the intended/correct way to load, it'd be best if load-on-instance throws an error with helpful message that must be corrected. |
Could it be easier then to not inherit from |
Sharing save/load utilities via callouts/composition, rather than inheritance, might be a better approach. But, that'd be a pretty big change at this point (and contrast with practices elsewhere in gensim). So, it could require a lot more thought/refactoring and perhaps synchronization with a major-revision-number increment. |
For Raise warning if mismatch in expected count for train() and actual., correct me if I am wrong.
We have to add an |
I believe both the check-mark to-dos added by @tmylk when re-opening this issue have or are already being addressed elsewhere: warn if load()-on-instance: #889 warn if mismatch in provided/expected example counts: aa8a9cd Since the issue-in-the-title (warn-if-single-characters) is fixed, I'd consider this issue closed, and the other ideas spun out to elsewhere. |
Closing as resolved. |
Many users on the mailing list pass strings to doc2vec instead of lists.
It results in poor results as vocab is single chars.
A warning will help detect that.
The text was updated successfully, but these errors were encountered: