Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💫 Improve model saving and loading #1046

Closed
honnibal opened this issue May 7, 2017 · 4 comments
Closed

💫 Improve model saving and loading #1046

honnibal opened this issue May 7, 2017 · 4 comments
Labels
enhancement Feature requests and improvements 🌙 nightly Discussion and contributions related to nightly builds ⚠️ wip Work in progress

Comments

@honnibal
Copy link
Member

honnibal commented May 7, 2017

The APIs for saving and loading model files in spaCy 1.0 will be consolidated and made more consistent in spaCy 2.0. These changes will affect the following model classes:

  • Language (and its subclasses)
  • Vocab
  • StringStore
  • Morphology
  • Lemmatizer
  • Tokenizer
  • Tagger
  • Parser
  • Matcher

The following methods will be supported:

Pickle – __reduce__

All model classes will support the pickle protocol.

to_binary() / from_binary(bytes)

These methods will de/serialize the model from/to a byte stream. These methods will not do IO, so that they can be used to send the model over the wire instead of writing it to disk.

to_disk(path, format=None) / from_disk(path, format=None)

These methods will save or load the models from the file system. An optional format arg will be supported, with interpretation varying by class. These methods will prioritise convenience and simplicity.

Deprecated methods

The new methods will replace the existing saving and loading methods:

  • StringStore.save(), StringStore.load()
  • Tokenizer.load()
  • Tagger.model.dump(), Tagger.model.load(), Tagger.load()
  • Parser.model.dump(), Parser.model.load(), Parser.load()
  • Vocab.load_lexemes(), Vocab.load(), Vocab.load_vectors(), Vocab.load_vectors_from_bin_loc(), Vocab.dump(), Vocab.dump_vectors()
  • Language.load(), Language.save_to_directory()

Related issues

@christian-storm
Copy link

May I make a small documentation suggestion that might save others the 1/2 I spent looking through old issues to find a solution. Document the loading the pipeline keywords arguments that allow one to set parser, ner, etc. to false to speed up loading, e.g., just need tokenization.

@honnibal
Copy link
Member Author

honnibal commented May 9, 2017

Thanks, will definitely have that documented. Actually there should probably also be another load function, for the "slimmer" version.

@ines
Copy link
Member

ines commented Jun 5, 2017

See the v2.0.0 alpha release notes and #1105 🎉

@ines ines closed this as completed Jun 5, 2017
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Feature requests and improvements 🌙 nightly Discussion and contributions related to nightly builds ⚠️ wip Work in progress
Projects
None yet
Development

No branches or pull requests

3 participants