-
Notifications
You must be signed in to change notification settings - Fork 41
[many quesitons] allennlp train config.json > python train.py & others #130
Comments
I would be happy to move these questions somewhere else if there exists a dedicated forum for that..? |
This venue is fine, I've just been distracted with ACL ongoing right now. I'll respond soon. |
On (1): thanks for the catch! That should be fixed now. On (2): yeah, I'm not sure why the pooler was set to not be trainable, but I've now fixed that in the example, also. Thanks again for the catch. On why it doesn't do as well: I'm not certain; it could be a learning rate issue, or just a stability issue - BERT is known to have high variance between runs, and this is a small dataset. Masato originally wrote this section for an older version of allennlp, and I apparently didn't update it correctly when I updated it for the 1.0 release. It's possible that some of the learning rate / optimizer stuff also should have changed slightly to be optimal. But the point of the guide is to show you how to use the code, not provide optimal hyperparameters, so as long as it runs and gives reasonably close performance, I'm not too concerned. On (3): it doesn't look like you're shuffling the data. Do you agree? If you're not shuffling the data, that would definitely explain the difference. |
If you want more standard hyperparameters, you might look at some of the examples in our model library, e.g.: https://github.com/allenai/allennlp-models/blob/09395d233161859db4c11af3689a3e0bc62169d8/training_config/rc/transformer_qa.jsonnet#L28-L40. |
Re (2): Of course, makes sense 👍 Re (3): Right, but when I add the keyword (4) bert vocab -vs- Vocabulary.from_instancesWhile I'm at it, I'm taking this opportunity to ask another question I had :)
Thanks a lot! |
3: looks like the bucket sampler already shuffles, so, yeah, that wasn't the issue. But yes, lots of papers have pointed out how high BERT's training variance is. 4.1: This is a bit confusing, and we'd like to fix it. Currently, the vocab gets added when you index instances. I think we should probably also add that line to when we count vocab items, which would resolve this issue (PR to fix that welcome!). 4.2: see this method. |
Wow! indeed I can see that you and others have been thinking about a better way to handle HF's transformers vocab for a while now... 😄 |
Hi,
(1) Update guide to support newest version
I'm going through the "Next Steps" chapter > section "Switching to pre-trained contextualizers".
First, the config file showed in the guide uses :
instead of
but if I try to use this config file, I get an error saying that the key "data_loader" is required in the config file.
(2) bert-bas-uncased not as good as guide baseline model?
Secondly, when I replace the
tokenizer
,token_indexers
,embedder
andencoder
by the bert model as in the new config file proposed in https://guide.allennlp.org/next-steps#1, it looks like the model is not training. Training accuracy remains at 0.50 after 5 epochs.This is my config file:
and this is the last few lines printed at the end of
allennlp train ...
:I tried to play a little with the config and I noticed that if I replace
by
I get better performance:
but it is still not as good as with the original config:
which gets 100% training accuracy and 80% valid accuracy:
What could cause this behavior? maybe the gigantic amount of parameters of the bert model? or maybe bert vocab doesn't cover most of the tokens in the training file..? 🤔
(3) allennlp train config > python train.py
I also noticed that running
allennlp train config.json
yields good performances (~90% train accuracy and ~80% validation accuracy) while running my own training file withpython train.py
doesn't seem to learn (training accuracy stays at 50% after 5 epochs). I specifically made sure that my config and custom script are as similar as possible:config:
-vs- train.py:
Any idea why running the config file doesn't yield similar performances than running this custom script..?
Thanks a lot for your help :)
The text was updated successfully, but these errors were encountered: