-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
corpus: why not update self.length after iterating all #3
Comments
Usually len() is needed earlier than iter(), so caching the length in iter wouldn't help. But I'll add length caching to IndexedCorpus (see our Google groups discussion), so it doesn't matter anyway :) Killing two flies at once... |
not in my case :)
it does. your codebase explicitly supports "the old way" of just having the streaming corpus without an index. I think A is quite expensive (building and storing the index structure but only using it for len()), so I would do B. But of course, it's your decision. |
Ok. I still think determining your input data length belongs conceptually elsewhere (i.e., not in gensim at all), but on the other hand, it's just 3 lines of code and i finally want to see how the pulls work on github :) Can you please initiate a pull request? EDIT: (to |
#4 |
note that github automatically generates an issue on a pull request. |
remove import error which was used for testing.
Hi,
why not do in every corpus, something like:
this reduces the chance of needing to run the highly expensive iteration for the sole sake of returning the length, in the len function.
The text was updated successfully, but these errors were encountered: