Add support to dropout. #29

aliciafmachado · 2024-09-08T15:56:16Z

Add dropout.
Add basic tests for dropout and trainer with dropout enabled.
Add example in app.
Add evalMode flag so that dropout can be disabled during eval.

Intended to resolve Issue: #1

…ble dropout during evaluation.

aliciafmachado

There are some commits going back and forth on some things that I did not fully understood at first, so feel free to squash the commits before merging to main to avoid confusion. Otherwise I can recreate the pull request and fix the commit history.

I also have a few questions / discussion topics:

I added support to dropout but we need something to manage random seeds so that we can seed properly. Should we create an issue for that?
I tried to add dropout based on T5 architecture, but I decided to not add it after the FF layer and in the output. For the FF, I don't think it makes sense since we have a single layer and we apply dropout before the residual connection after the FF network. For the output, I don't see any additional computations after getting out of the stack, so I think it would only increase noise if we were to add another dropout there (I also did not see an additional dropout on the output for the haiku implementation linked in the issue to add dropout).

iislucas

Looks great, a few small things.

iislucas · 2024-09-12T08:58:05Z

animated-transformer/src/lib/transformer/transformer_gtensor.ts

@@ -225,13 +229,20 @@ function gelu(x: tf.Tensor) {
 export function computeAttnHead(
  spec: AttnHeadComputeSpec,
  params: AttnHeadParams<TensorKind>,
-  seqInput: GTensor<'batch' | 'pos' | 'inputRep'>
+  seqInput: GTensor<'batch' | 'pos' | 'inputRep'>,
+  evalMode: boolean = false


lets drop evalMode flag, and just depend on the spec having dropoutRate set different at eval vs inference time.

iislucas · 2024-09-12T08:59:01Z

animated-transformer/src/lib/transformer/dropout.ts

+  export function dropout<G extends string, D extends G>(
+    dropoutRate: number,
+    g: GTensor<G>,
+    deterministic: boolean,


Lets remove deterministic, and just check if rate is 0.

iislucas · 2024-09-12T09:15:38Z

animated-transformer/src/lib/transformer/transformer_gtensor.ts

  let unNormedSeqOuput = inputToFF
    .contract(ff.w, ['inputRepToFF'])
    .pointwiseAdd(ff.bIn)
    .applyPointWiseTfFn(gelu)
    .pointwiseAdd(ff.bOut);
+
+  // Dropout before layer norm and residual connection.
+  let unNormedSeqOuputAfterDropout = unNormedSeqOuput;


Lets use this: https://github.com/Shivanandroy/simpleT5 as the reference for where to put it for T5. And maybe name this function computeT5AttnHead, and then later we can make a gpt2 one.

I suggest we use the T5 implementation in Jax (T5X): https://github.com/google-research/t5x/blob/705247b743d26a33d0c058b41c72ad030e51891b/t5x/examples/t5/network.py#L222

iislucas · 2024-09-12T09:17:48Z

animated-transformer/src/lib/trainer/basic_transformer_trainer.spec.ts

+    const layerSpec: transformer.TransformerParamLayerSpec = {
+      nHeads: 1,
+      hasPosEncoding: true,
+      computeSpec: { residuals: true, dropoutRate: 0.1 },


Maybe add one test also for dropout rate of 1, and then test that loss doesn't decrease.

…to follow the T5X implementation.

aliciafmachado · 2024-10-06T16:53:11Z

Will rebase once #36 is submitted and then pass a generator so that the dropout is reproducible, and then you can take a second look @iislucas.

iislucas

Looks great, just a couple of minor things.

animated-transformer/src/lib/trainer/basic_transformer_trainer.spec.ts

animated-transformer/src/lib/transformer/dropout.ts

…g on dropout as well.

aliciafmachado · 2024-10-11T09:33:17Z

I added the generator everywhere to be able to seed dropout. I also fixed your comments and small bug. Please take a look.
Additionally, I realized two things:

I don't know if it makes sense to have a test with 0.99 dropout because one single SGD step is highly sensitive and even for the case without dropout, if you change the seed, the loss does not necessarily go down. So I think that we can either remove the test with dropout or just test running a sgd step on it to make sure nothing is broken.
If the dropout is very high (0.999) sometimes you get a NaN loss, I think it probably has to do with everything being dropped out - but I need to investigate a bit more. I don't think we will use such high dropout in any case. So I think I can create an issue to investigate this corner case, and we can unblock the PR if you agree.

aliciafmachado · 2024-10-11T09:35:00Z

There's one additional thing: ideally we should just pass one dropout rate and use it everywhere, but the way it's setup we have to pass in each layer and even outside. I can try to think of a way to make it clean and just pass it once, but I can also open the issue and fix it in a future PR.

iislucas

Thanks! LGTM. We can discuss improvements more later, but this is great.

aliciafmachado added 5 commits September 1, 2024 18:31

Add dropout skeleton.

d8ea0d8

Add test for trainer when there is a Dropout layer.

53f72f7

Make dropout a spec and not a parameter, and add a test.

07a1711

Add flag to disable dropout during evaluation.

70d0040

Add more tests to dropout and pass flag to computeTransformer to disa…

cf25062

…ble dropout during evaluation.

aliciafmachado changed the title ~~Add more tests to dropout and pass flag to computeTransformer to disable dropout during evaluation.~~ Add support to dropout. Sep 8, 2024

aliciafmachado commented Sep 8, 2024

View reviewed changes

aliciafmachado marked this pull request as ready for review September 8, 2024 16:06

iislucas reviewed Sep 13, 2024

View reviewed changes

iislucas assigned aliciafmachado Sep 18, 2024

Improve dropout setup, and fix where the dropout is used in the code …

fa9b3f6

…to follow the T5X implementation.

iislucas reviewed Oct 8, 2024

View reviewed changes

animated-transformer/src/lib/trainer/basic_transformer_trainer.spec.ts Outdated Show resolved Hide resolved

animated-transformer/src/lib/trainer/basic_transformer_trainer.spec.ts Outdated Show resolved Hide resolved

animated-transformer/src/lib/transformer/dropout.ts Show resolved Hide resolved

aliciafmachado added 2 commits October 10, 2024 21:42

Merge branch 'PAIR-code:main' into main

bbbbd98

Pass generator through TrainState and transformer functions. Fixed bu…

d49fe1f

…g on dropout as well.

iislucas approved these changes Oct 14, 2024

View reviewed changes

aliciafmachado merged commit d2ccc7b into PAIR-code:main Oct 16, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to dropout. #29

Add support to dropout. #29

aliciafmachado commented Sep 8, 2024 •

edited by iislucas

Loading

aliciafmachado left a comment

iislucas left a comment

iislucas Sep 12, 2024

iislucas Sep 12, 2024

iislucas Sep 12, 2024

aliciafmachado Sep 14, 2024

iislucas Sep 17, 2024

iislucas Sep 12, 2024 •

edited

Loading

aliciafmachado Sep 14, 2024

aliciafmachado commented Oct 6, 2024

iislucas left a comment

aliciafmachado commented Oct 11, 2024

aliciafmachado commented Oct 11, 2024

iislucas left a comment

Add support to dropout. #29

Add support to dropout. #29

Conversation

aliciafmachado commented Sep 8, 2024 • edited by iislucas Loading

aliciafmachado left a comment

Choose a reason for hiding this comment

iislucas left a comment

Choose a reason for hiding this comment

iislucas Sep 12, 2024

Choose a reason for hiding this comment

iislucas Sep 12, 2024

Choose a reason for hiding this comment

iislucas Sep 12, 2024

Choose a reason for hiding this comment

aliciafmachado Sep 14, 2024

Choose a reason for hiding this comment

iislucas Sep 17, 2024

Choose a reason for hiding this comment

iislucas Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

aliciafmachado Sep 14, 2024

Choose a reason for hiding this comment

aliciafmachado commented Oct 6, 2024

iislucas left a comment

Choose a reason for hiding this comment

aliciafmachado commented Oct 11, 2024

aliciafmachado commented Oct 11, 2024

iislucas left a comment

Choose a reason for hiding this comment

aliciafmachado commented Sep 8, 2024 •

edited by iislucas

Loading

iislucas Sep 12, 2024 •

edited

Loading