BERT CoLA Example Question #102

jackn11 · 2022-07-08T05:37:39Z

I am trying to train several BERT classifier models in one program but i am running out of GPU RAM by loading too many BERT models with const _bert_model, wordpiece, tokenizer = pretrain"Bert-uncased_L-12_H-768_A-12"

I am following the CoLA exmaple found here https://github.com/chengchingwen/Transformers.jl/blob/master/example/BERT/cola/train.jl

I am wondering if the train!() function found in the example trains all of the parameters shown in Flux.params(bert_model) or if it only trains those found in Flux.params(_bert_model.classifier). The reason why this is important is, if only the classifier parameters are modified instead of all bert model parameters, then I can load one pretrain"Bert-uncased_L-12_H-768_A-12" into RAM, instead of many, and then just train new classifiers ( _bert_model.classifier) for each bert classifier I need. This saves a lot of RAM of not loading in a new full BERT model for each bert classifier needed.

Please let me know if the whole bert model is trained with the train!() function or just the classifier parameters.

Thank you,

Jack

The text was updated successfully, but these errors were encountered:

chengchingwen · 2022-07-08T15:37:35Z

The whole model is trained, according to const ps = params(bert_model). You can switch to params(bert_model.classifier) if you only want to train the classifier. The different between _bert_model and bert_model is that _bert_model is on cpu while bert_model is on GPU. There're only 1 bert model been loaded to the GPU. The problem are probably because your GPU RAM is not big enough for a batch with size 4.

Then whether you can train the classifier layer only is a design problem of your own model. That is totally doable, but might result in different model performance (for better or worse). So that's up to you.

chengchingwen closed this as completed Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT CoLA Example Question #102

BERT CoLA Example Question #102

jackn11 commented Jul 8, 2022

chengchingwen commented Jul 8, 2022

BERT CoLA Example Question #102

BERT CoLA Example Question #102

Comments

jackn11 commented Jul 8, 2022

chengchingwen commented Jul 8, 2022