Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make activation applied to the output layer optional #567

Open
elcorto opened this issue Aug 13, 2024 · 2 comments
Open

Make activation applied to the output layer optional #567

elcorto opened this issue Aug 13, 2024 · 2 comments
Labels
enhancement New feature or request needs discussion

Comments

@elcorto
Copy link
Member

elcorto commented Aug 13, 2024

I saw @nerkulec's branch which made me realize that by default, FeedForwardNet applies an activation to the output layer. This is somewhat unusual for regression (see e.g. here eq. 5.1.3).

If this is done to enforce positive LDOS values, then one probably needs to fix this to ReLU only.

However, having any activation here (also ReLU) prevents using a MALA model together with a fast last layer code path in https://github.com/aleximmer/Laplace. There is an alternative way to do the same but this has not been benchmarked extensively for the MALA case so far.

In any case, I think such an activation should be opt-in (or -out). It should however not removed, in order to have backward compatibility with results already produced with the current code version. One could for instance introduce a bool setting parameters.network.use_output_activation. Then, the logic here

elif len(self.params.layer_activations) < self.number_of_layers:
raise Exception("Not enough activation layers provided.")
elif len(self.params.layer_activations) > self.number_of_layers:
printout(
"Too many activation layers provided. The last",
str(
len(self.params.layer_activations) - self.number_of_layers
),
"activation function(s) will be ignored.",
min_verbosity=1,
)

needs to be adapted as well.

@elcorto elcorto added enhancement New feature or request needs discussion labels Aug 13, 2024
@nerkulec
Copy link

I created the branch after seeing that by default we have "normal" scaling + an activation function after the last layer of the network. But then I realized the defaults actually correspond to minmax scaling (as you noted in #482) and the sigmoid activation function, so this doesn't necessarily pose a huge problem as they have the same value range [0, 1].
After all, I still believe that the last layer shouldn't use an activation function, but I agree it should be left as an option for the user.

@elcorto
Copy link
Member Author

elcorto commented Aug 21, 2024

In regression we generally don't want to clamp the model outputs to a range, which is equivalent to having the identity as activation. The intuition is that when the model faces unseen out-of-distribution data, outputs should be allowed to be "as wrong as possible", i.e. too large, for instance. When clamping to a range, model outputs might still be in the right ballpark, but for the wrong reasons. But I haven't tested this specific case, I have to admit.

Having ReLU as a last layer activation may be justified to enforce positive LDOS, while having the identity for positive values. But still this is problematic for the reason explained above. Enforcing positive outputs can also be done with a penalty term in the loss, maybe.

The Sigmoid activation is especially problematic I think, since (i) it clamps to a range and (ii) adds an non-linearity at the end which doesn't help the model to be more complex / expressive since it merely modifies values without passing them to other layers.

I have a hotfix in my dev branch which adds parameters.network.ff_use_output_activation (one cannot add a constructor keyword to FeedForwardNet due to the way __new__ does things in Network) but I haven't adapted the code snippet above, which is for the case when people supply different activations for different layers. One might also think about simplifying the API by letting people pass a torch.nn.Sequential object which gets called in Network.forward(), but that's a bit OT here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs discussion
Projects
None yet
Development

No branches or pull requests

3 participants