Applying parallel attn with ff to existing pretrained model? #12

huu4ontocord · 2022-12-03T19:20:37Z

Hi - awesome work! I am trying to understand ? I couldn't find a paper - only a reference to https://github.com/kingoflolz/mesh-transformer-jax. Is this right? Am I understanding that it is bascially applying multiple operations of for qkv and ff at once? Is it possible to use this trick to modify an existing pretrained model?

flamingo-pytorch/flamingo_pytorch/flamingo_palm.py

Line 90 in 749f824

# parallel attention and feedforward with residual

Many thanks in advance!

Huu

lucidrains · 2022-12-09T18:45:05Z

@ontocord yup that's correct, it was invented by Ben Wang for GPT-J, then subsequently adopted by PaLM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying parallel attn with ff to existing pretrained model? #12

Applying parallel attn with ff to existing pretrained model? #12

huu4ontocord commented Dec 3, 2022

lucidrains commented Dec 9, 2022

Applying parallel attn with ff to existing pretrained model? #12

Applying parallel attn with ff to existing pretrained model? #12

Comments

huu4ontocord commented Dec 3, 2022

lucidrains commented Dec 9, 2022