60 minute blitz uses stacked Dense layers with no activation function #339

staticfloat · 2022-03-02T18:15:25Z

In the 60 minute blitz tutorial, we use a sequence of stacked Dense layers, each with no activation function. This doesn't make much sense, as multiple linear operators can always be combined down into a single linear operator:

julia> using Flux
       model = Chain(
           Dense(200, 120, bias=false),
           Dense(120, 84, bias=false),
           Dense(84, 10, bias=false),
       )

       model_condensed = Chain(
           Dense(model[3].W * model[2].W * model[1].W),
       )

       x = randn(200)
       sum(abs, model(x) .- model_condensed(x))
2.4189600187907168e-6

While yes, there are machine precision/rounding issues that cause it to not be exactly equivalent, you don't get any material benefit from multiple stacked Dense layers, and in fact you get a performance penalty due to the same values moving in and out of CPU cache.

It would be better to either add nonlinearities between these Dense layers to increase model flexibility, or replace them with a single Dense layer that directly drops from rank 200 to 10.

The text was updated successfully, but these errors were encountered:

ToucheSir · 2022-03-02T18:45:59Z

Good point, the PyTorch version uses relu and I can only assume that was missed when porting because the activation functions weren't included in the constructor.

ToucheSir added the good first issue label Apr 21, 2022

mcabbott mentioned this issue Feb 3, 2023

Update transfer learning tutorial #387

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

60 minute blitz uses stacked Dense layers with no activation function #339

60 minute blitz uses stacked Dense layers with no activation function #339

staticfloat commented Mar 2, 2022

ToucheSir commented Mar 2, 2022

60 minute blitz uses stacked Dense layers with no activation function #339

60 minute blitz uses stacked Dense layers with no activation function #339

Comments

staticfloat commented Mar 2, 2022

ToucheSir commented Mar 2, 2022