💫 Improve model saving and loading #1046

honnibal · 2017-05-07T22:24:33Z

The APIs for saving and loading model files in spaCy 1.0 will be consolidated and made more consistent in spaCy 2.0. These changes will affect the following model classes:

Language (and its subclasses)
Vocab
StringStore
Morphology
Lemmatizer
Tokenizer
Tagger
Parser
Matcher

The following methods will be supported:

Pickle – `reduce`

All model classes will support the pickle protocol.

`to_binary()` / `from_binary(bytes)`

These methods will de/serialize the model from/to a byte stream. These methods will not do IO, so that they can be used to send the model over the wire instead of writing it to disk.

`to_disk(path, format=None)` / `from_disk(path, format=None)`

These methods will save or load the models from the file system. An optional format arg will be supported, with interpretation varying by class. These methods will prioritise convenience and simplicity.

Deprecated methods

The new methods will replace the existing saving and loading methods:

StringStore.save(), StringStore.load()
Tokenizer.load()
Tagger.model.dump(), Tagger.model.load(), Tagger.load()
Parser.model.dump(), Parser.model.load(), Parser.load()
Vocab.load_lexemes(), Vocab.load(), Vocab.load_vectors(), Vocab.load_vectors_from_bin_loc(), Vocab.dump(), Vocab.dump_vectors()
Language.load(), Language.save_to_directory()

Related issues

Word vector loading: OSError on load_vectors_from_bin_loc() #671, Disabling glove vectors in Spacy 1.1 gives an error #809, Vocab.load_vectors_from_bin_loc does not import vectors #856, unable to access the link #1012
Training workflow: Trained NER models are not loadable #999, loading existing model throws exception KeyError: 1 #1026

The text was updated successfully, but these errors were encountered:

christian-storm · 2017-05-08T18:12:50Z

May I make a small documentation suggestion that might save others the 1/2 I spent looking through old issues to find a solution. Document the loading the pipeline keywords arguments that allow one to set parser, ner, etc. to false to speed up loading, e.g., just need tokenization.

honnibal · 2017-05-09T11:56:34Z

Thanks, will definitely have that documented. Actually there should probably also be another load function, for the "slimmer" version.

ines · 2017-06-05T23:44:59Z

See the v2.0.0 alpha release notes and #1105 🎉

lock · 2018-05-08T20:38:57Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added enhancement Feature requests and improvements 🌙 nightly Discussion and contributions related to nightly builds ⚠️ wip Work in progress labels May 7, 2017

ines closed this as completed Jun 5, 2017

lock bot locked as resolved and limited conversation to collaborators May 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💫 Improve model saving and loading #1046

💫 Improve model saving and loading #1046

honnibal commented May 7, 2017 •

edited by ines

Loading

christian-storm commented May 8, 2017

honnibal commented May 9, 2017

ines commented Jun 5, 2017

lock bot commented May 8, 2018

💫 Improve model saving and loading #1046

💫 Improve model saving and loading #1046

Comments

honnibal commented May 7, 2017 • edited by ines Loading

Pickle – __reduce__

to_binary() / from_binary(bytes)

to_disk(path, format=None) / from_disk(path, format=None)

Deprecated methods

Related issues

christian-storm commented May 8, 2017

honnibal commented May 9, 2017

ines commented Jun 5, 2017

lock bot commented May 8, 2018

honnibal commented May 7, 2017 •

edited by ines

Loading

Pickle – `reduce`

`to_binary()` / `from_binary(bytes)`

`to_disk(path, format=None)` / `from_disk(path, format=None)`