activation function in BERTIntermediate #17

lukovnikov · 2018-11-13T15:47:46Z

Was previously hardcoded to gelu because pretrained BERT models use gelu.
Changed to make BERTIntermediate use functions and "gelu", "relu" or "swish" from config.

thomwolf · 2018-11-13T16:00:25Z

Looks good, thanks for that!

activation function in BERTIntermediate

fix at issues in roberta/berta modeling

Update trainer.py

Pop

Summary: This pull request adds 2D SPMD sharding to the table. It will shard both weights and activations. Here is the sharding strategy. Let's say we have a 2D mesh (data, model) and data x model == num_devices: 1. input (data,, None, model) 2. embedding (model, data) 3. attn QKV (data, model) 4. attn O (model, data) 5. mlp gate, up (model, data) 6. mlp down (data, model) 7. activation (data,, None, model) Currently you can specify the model dimension using a new option --spmd_2d_sharding, then the data dimension will be auto-calculated. TODO: maybe we should have another option to specify whether or not we should shard the activations/inputs or shard them differently.

lukovnikov and others added 9 commits November 6, 2018 17:47

bert weight loading from tf

4e52188

moved bert to qelos-util

bd91ae6

Merge remote-tracking branch 'upstream/master'

f4d79f4

clean up pr

fa0c5a2

clean up pr

7ba8373

clean up pr

d64db6d

Delete __init__.py

3d4c7a6

clean up pr

9f3cd27

Merge remote-tracking branch 'origin/master'

470076e

thomwolf merged commit 8513741 into huggingface:master Nov 13, 2018

qwang70 pushed a commit to DRL36/pytorch-pretrained-BERT that referenced this pull request Mar 2, 2019

Merge pull request huggingface#17 from lukovnikov/master

1d7d387

activation function in BERTIntermediate

maeotaku mentioned this pull request May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

HongyanJiao mentioned this pull request Sep 19, 2019

traced_model #1291

Closed

stevezheng23 added a commit to stevezheng23/transformers that referenced this pull request Mar 24, 2020

Merge pull request huggingface#17 from stevezheng23/dev/zheng/quac

97c6ac9

fix at issues in roberta/berta modeling

manchandasahil mentioned this pull request Mar 22, 2021

Longformer training : CUDA error: device-side assert triggered #10852

Closed

2 tasks

amathews-amd referenced this pull request in ROCm/transformers Aug 6, 2021

Merge pull request #17 from microsoft/raviskolli/torch-ort

25ea1d2

Update trainer.py

jlamypoirier added a commit to jlamypoirier/transformers that referenced this pull request Apr 4, 2023

Refactor repo (huggingface#17)

8138032

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this pull request Jun 1, 2023

Merge pull request huggingface#17 from jamesthesnake/pop

6d35a10

Pop

lwmlyy mentioned this pull request Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

activation function in BERTIntermediate #17

activation function in BERTIntermediate #17

lukovnikov commented Nov 13, 2018

thomwolf commented Nov 13, 2018

activation function in BERTIntermediate #17

activation function in BERTIntermediate #17

Conversation

lukovnikov commented Nov 13, 2018

thomwolf commented Nov 13, 2018