Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER Transfer learning #351

Merged
merged 9 commits into from
Jan 6, 2021
Merged

Conversation

gawy
Copy link
Contributor

@gawy gawy commented Jun 15, 2020

Desciption

With the current code it is possible to train NER model end to end. But in cases where data sets are limited and there is a need to train NER with custom classes, Transfer learning may come very handy. As was in my case.

I have patched current stanza code to allow for that with a bit manipulations on model classifier.

Summary of modifications:

  1. 2 flags were added to ner_tagger.py
  2. minor update was done to DataLoader and NERTagger classes - only necessary object properties are passed to constructors instead of full objects themselves - simplifies using those objects in other contexts

Approach to inserting a new classifier
My assumption for TL process was following:

Whoevere to use it will probably have good enough background to mess with network architecture. So my decision was to make all necessary network modifications outside of Stanza code base. Maybe someone else would like to use several FC layers for classifier. This means that within the Stanza code it is just required to load model with modified architecture and proceed with normal training process.

Example of the code that I used to update model classifier and define new classes for NER model. Potentially this can be included somewhere into documentation or examples within Stanza.
https://gist.github.com/gawy/2fec736e6278db6e6a083c26d3ec745b

Example usage:
scripts/run_ner.sh Ukrainian-languk --finetune --train_classifier_only

Flags and reasoning behind them

  • finetune - tells ner_tagger to load exising model from file instead of creating a new model from scratch for training. Potentially this has 2nd use-case in funetuning the model - that is why the name.
  • train_classifier_only - ner_tagger will stop gradient from popagating for all layers above classifier (code disables gradient for all layers except those with names containing ['tag_clf', 'crit'])

Experimental results

I've stated with own trained NER model on 4 standard classes (Ukrainian-languk) with F1 score around 84. Any other language model can be equaly used in the same way.

Model was modified to have a new classifier with 2 new classes and trained on a data set that had roughly 200 and 150 examples of each class.

Initial NER model had F1 score of about 84.
Newly trained model showed decent results
Prec. Rec. F1
81.40 77.78 79.55

Manual isnspection in my case also showed nice results - something good enough to be used in practice and further improved.

Fixes Issues

none as far as I can see

Unit test coverage

Existing NER unit tests run successfully. No additional tests were created.
NER training was tested in end-to-end training as well as Transfer learning mode

Known breaking changes/behaviors

none just adds new features

@gawy gawy marked this pull request as draft June 15, 2020 14:27
@gawy gawy marked this pull request as ready for review June 15, 2020 16:08
@yuhui-zh15
Copy link
Member

Hi @gawy, thank you for your interest in contributing to Stanza. These codes generally look good to me! I've changed back some data structures to ensure model backward compatibility.

Questions about some details:

  1. How do you solve it when model trains on a dataset which contains different NER labels? I believe building a new TagVocab and modifying the model architecture are necessary. Can you add the related code to your code?

  2. Can you make it clearer by filling the following information?

  • Performance when training from scratch:
  • Performance when finetuning from existing models (allow to update all parameters):
  • Performance when finetuning from existing models (only allow to update final layer):

@gawy
Copy link
Contributor Author

gawy commented Jun 22, 2020

@yuhui-zh15 thank you for your feedback and happy to help

I'll post answers to your questions in several replies:

Q1: How do you solve it when model trains on a dataset which contains different NER labels? I believe building a new TagVocab and modifying the model architecture are necessary. Can you add the related code to your code?

Answer:
That's exactly the case and purpose of the whole thing in my case.
I needed a model that would produce completely different NER labeling compared to stock.

You can look at the code I used to modify model strucutre (classifier) and build new label set in the gist here https://gist.github.com/gawy/2fec736e6278db6e6a083c26d3ec745b

As I mentioned in the original description, I loaded the existing model, modified classfier for the new label set and saved it to file. This way I good start training with minimal changes in Stanza code.

The reason why I made these modification outside of Stanza code base instead of somehow integrating the whole thing was based on my assumptions about how TL could be used by other people.
There are couple of ways how TL can be used (as I see it):

  1. simple case when classifier has simply a different tag set to be trained on
  2. more complicated case when someone might like to change classifier a bit more radically: from 1 FC layer (like it is now) to let's say 2FC layers - might be handy in case of large number of tags.

In case #1 - the most user-friendly way to implement TL would be to drieve tag set a well as configuration of the classfier layer from a data set (the way it is done for a new model) - more modifications in Stanza will be required (mainly in the way how Vocabulary is initialized - similar to how init_vocab in data.py functions). As I'm not sure how popular this features will be - with sample code people can do whatever model and tagset modifications they want and proceed to training.

I'll post answer to Q2 below later

@gawy
Copy link
Contributor Author

gawy commented Jun 22, 2020

Training data set
2 custom tags (dimention of classifier = 13) INT_REF, EXT_REF

train - 1567 examples
dev - 641 examples
test - 1813 examples

Detailed sample data:
dev: Counter({'O': 16882, 'I-EXT_REF': 138, 'I-INT_REF': 107, 'B-INT_REF': 28, 'E-INT_REF': 28, 'B-EXT_REF': 17, 'E-EXT_REF': 17})
test: Counter({'O': 44615, 'I-INT_REF': 361, 'I-EXT_REF': 283, 'B-INT_REF': 114, 'E-INT_REF': 114, 'B-EXT_REF': 40, 'E-EXT_REF': 40})
train: Counter({'O': 40515, 'I-EXT_REF': 555, 'I-INT_REF': 253, 'B-INT_REF': 82, 'E-INT_REF': 82, 'B-EXT_REF': 66, 'E-EXT_REF': 66})

Device: CPU (MacBook Pro with Intel i5)
GPU: Google Colab Tesla P100 16GB

Q2.1: Performance when training from scratch:
Q2.2: Performance when finetuning from existing models (allow to update all parameters):
Q2.3: Performance when finetuning from existing models (only allow to update final layer):

mode Prec Rec F1 Time (cpu) Time (gpu)
from scratch (Q2.1) 95.90 75.97 84.78 -- 60 min
finetune end-to-end (Q2.2) 91.54 77.27 83.80 over 2 hours 53 min
final layer only (Q2.3) 86.99 69.48 77.26 ~ 20 min ** --

** - could have been slighltly longer as I've initially stopped it earlier when training results stopped improving

Overall end-to-end shows much better results but training time is dramatically different. Training just the classifier allows to run experiments much faster while data sets are still small

@stale
Copy link

stale bot commented Dec 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 29, 2020
@stale
Copy link

stale bot commented Jan 5, 2021

This issue has been automatically closed due to inactivity.

@stale stale bot closed this Jan 5, 2021
@AngledLuffa AngledLuffa reopened this Jan 5, 2021
@stale stale bot removed the stale label Jan 5, 2021
@AngledLuffa
Copy link
Collaborator

I think if this has already been inspected once we should hopefully be able to merge it, right?

Copy link
Member

@yuhui-zh15 yuhui-zh15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicts solved and should be able to merge now.

@yuhui-zh15 yuhui-zh15 merged commit 961c8c0 into stanfordnlp:dev Jan 6, 2021
@AngledLuffa AngledLuffa mentioned this pull request Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants