-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NER Transfer learning #351
Conversation
144fc6e
to
009a4ea
Compare
Hi @gawy, thank you for your interest in contributing to Stanza. These codes generally look good to me! I've changed back some data structures to ensure model backward compatibility. Questions about some details:
|
@yuhui-zh15 thank you for your feedback and happy to help I'll post answers to your questions in several replies: Q1: How do you solve it when model trains on a dataset which contains different NER labels? I believe building a new TagVocab and modifying the model architecture are necessary. Can you add the related code to your code? Answer: You can look at the code I used to modify model strucutre (classifier) and build new label set in the gist here https://gist.github.com/gawy/2fec736e6278db6e6a083c26d3ec745b As I mentioned in the original description, I loaded the existing model, modified classfier for the new label set and saved it to file. This way I good start training with minimal changes in Stanza code. The reason why I made these modification outside of Stanza code base instead of somehow integrating the whole thing was based on my assumptions about how TL could be used by other people.
In case #1 - the most user-friendly way to implement TL would be to drieve tag set a well as configuration of the classfier layer from a data set (the way it is done for a new model) - more modifications in Stanza will be required (mainly in the way how Vocabulary is initialized - similar to how init_vocab in data.py functions). As I'm not sure how popular this features will be - with sample code people can do whatever model and tagset modifications they want and proceed to training. I'll post answer to Q2 below later |
Training data set train - 1567 examples Detailed sample data: Device: CPU (MacBook Pro with Intel i5) Q2.1: Performance when training from scratch:
** - could have been slighltly longer as I've initially stopped it earlier when training results stopped improving Overall end-to-end shows much better results but training time is dramatically different. Training just the classifier allows to run experiments much faster while data sets are still small |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. |
I think if this has already been inspected once we should hopefully be able to merge it, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conflicts solved and should be able to merge now.
Desciption
With the current code it is possible to train NER model end to end. But in cases where data sets are limited and there is a need to train NER with custom classes, Transfer learning may come very handy. As was in my case.
I have patched current stanza code to allow for that with a bit manipulations on model classifier.
Summary of modifications:
Approach to inserting a new classifier
My assumption for TL process was following:
Whoevere to use it will probably have good enough background to mess with network architecture. So my decision was to make all necessary network modifications outside of Stanza code base. Maybe someone else would like to use several FC layers for classifier. This means that within the Stanza code it is just required to load model with modified architecture and proceed with normal training process.
Example of the code that I used to update model classifier and define new classes for NER model. Potentially this can be included somewhere into documentation or examples within Stanza.
https://gist.github.com/gawy/2fec736e6278db6e6a083c26d3ec745b
Example usage:
scripts/run_ner.sh Ukrainian-languk --finetune --train_classifier_only
Flags and reasoning behind them
finetune
- tells ner_tagger to load exising model from file instead of creating a new model from scratch for training. Potentially this has 2nd use-case in funetuning the model - that is why the name.train_classifier_only
- ner_tagger will stop gradient from popagating for all layers above classifier (code disables gradient for all layers except those with names containing ['tag_clf', 'crit'])Experimental results
I've stated with own trained NER model on 4 standard classes (Ukrainian-languk) with F1 score around 84. Any other language model can be equaly used in the same way.
Model was modified to have a new classifier with 2 new classes and trained on a data set that had roughly 200 and 150 examples of each class.
Initial NER model had F1 score of about 84.
Newly trained model showed decent results
Prec. Rec. F1
81.40 77.78 79.55
Manual isnspection in my case also showed nice results - something good enough to be used in practice and further improved.
Fixes Issues
none as far as I can see
Unit test coverage
Existing NER unit tests run successfully. No additional tests were created.
NER training was tested in end-to-end training as well as Transfer learning mode
Known breaking changes/behaviors
none just adds new features