-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save Japanese NER model by using nlp.to_disk #1557
Comments
Thanks for the report! The reason this happens is that the Japanese tokenizer is a custom implementation via the Janome library and not using spaCy's serializable Possible solutions for now:
We should probably allow disabling the tokenizer via the Btw, curious to hear about your results on training Japanese NER – sounds very exciting! |
@ines Thanks so much for your quick reply. I'll try your solution and give you my feedback on training Japanese NER :) |
Hi @ines, |
Hmm, this is strange! I think the difference between Japanese and Thai/Chinese is that it provides a What happens if you don't use Pickle and the regular nlp.to_disk('/path/to/model', disable=['tokenizer']) If this works, the only problem here is that you'll also need to set We'll think about a good way to solve this in the future. When saving out a model, spaCy should probably check if the tokenizer is serializable and if not, show a warning, but serialize anyway. Nice to hear that Chinese and Thai worked well – this is really cool! |
@ines |
Just pushed a fix to Just tested it locally and both to/from disk and to/from bytes now works correctly. This means you should also be able to package your Japanese model as a Python package using the |
I have a similar problem that I could not fix: I've trained a custom NER model that I'd like to save to the disk, and since I'm using a custom tokenizer I don't want to save the tokenizer. Here's what I did: import spacy
nlp = spacy.load("en")
nlp.tokenizer = some_custom_tokenizer
# Train the NER model...
nlp.tokenizer = None
nlp.to_disk('/tmp/my_model', disable=['tokenizer']) (Due to this thread I did not packaged the model) nlp = spacy.blank('en').from_disk('/tmp/model', disable=['tokenizer']) I need to load the model without the tokenizer but with the full pipeline. Any ideas? thanks. |
More about this issue: when I tried to load the model like this: loaded_nlp = spacy.load('/model/directory', disable=['tokenizer']) I got an error:
I looked at the code of return nlp.from_disk(model_path) If the return nlp.from_disk(model_path, disable) |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I got this error "AttributeError: 'JapaneseTokenizer' object has no attribute 'to_disk'" when trying to save Japanese NER model in spaCy 2.0.2. Can you guys help me to fix this error? Thanks so much
Environment
The text was updated successfully, but these errors were encountered: