Make activation applied to the output layer optional #567

elcorto · 2024-08-13T13:10:47Z

I saw @nerkulec's branch which made me realize that by default, FeedForwardNet applies an activation to the output layer. This is somewhat unusual for regression (see e.g. here eq. 5.1.3).

If this is done to enforce positive LDOS values, then one probably needs to fix this to ReLU only.

However, having any activation here (also ReLU) prevents using a MALA model together with a fast last layer code path in https://github.com/aleximmer/Laplace. There is an alternative way to do the same but this has not been benchmarked extensively for the MALA case so far.

In any case, I think such an activation should be opt-in (or -out). It should however not removed, in order to have backward compatibility with results already produced with the current code version. One could for instance introduce a bool setting parameters.network.use_output_activation. Then, the logic here

mala/mala/network/network.py

Lines 214 to 224 in 0fed9cc

    
           elif len(self.params.layer_activations) < self.number_of_layers: 
        
               raise Exception("Not enough activation layers provided.") 
        
           elif len(self.params.layer_activations) > self.number_of_layers: 
        
               printout( 
        
                   "Too many activation layers provided. The last", 
        
                   str( 
        
                       len(self.params.layer_activations) - self.number_of_layers 
        
                   ), 
        
                   "activation function(s) will be ignored.", 
        
                   min_verbosity=1, 
        
               )

needs to be adapted as well.

The text was updated successfully, but these errors were encountered:

nerkulec · 2024-08-21T13:00:09Z

I created the branch after seeing that by default we have "normal" scaling + an activation function after the last layer of the network. But then I realized the defaults actually correspond to minmax scaling (as you noted in #482) and the sigmoid activation function, so this doesn't necessarily pose a huge problem as they have the same value range [0, 1].
After all, I still believe that the last layer shouldn't use an activation function, but I agree it should be left as an option for the user.

elcorto · 2024-08-21T19:03:05Z

In regression we generally don't want to clamp the model outputs to a range, which is equivalent to having the identity as activation. The intuition is that when the model faces unseen out-of-distribution data, outputs should be allowed to be "as wrong as possible", i.e. too large, for instance. When clamping to a range, model outputs might still be in the right ballpark, but for the wrong reasons. But I haven't tested this specific case, I have to admit.

Having ReLU as a last layer activation may be justified to enforce positive LDOS, while having the identity for positive values. But still this is problematic for the reason explained above. Enforcing positive outputs can also be done with a penalty term in the loss, maybe.

The Sigmoid activation is especially problematic I think, since (i) it clamps to a range and (ii) adds an non-linearity at the end which doesn't help the model to be more complex / expressive since it merely modifies values without passing them to other layers.

I have a hotfix in my dev branch which adds parameters.network.ff_use_output_activation (one cannot add a constructor keyword to FeedForwardNet due to the way __new__ does things in Network) but I haven't adapted the code snippet above, which is for the case when people supply different activations for different layers. One might also think about simplifying the API by letting people pass a torch.nn.Sequential object which gets called in Network.forward(), but that's a bit OT here :)

elcorto added enhancement New feature or request needs discussion labels Aug 13, 2024

RandomDefaultUser added this to the v1.3.0 - Into the multi-GPU-verse milestone Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make activation applied to the output layer optional #567

Make activation applied to the output layer optional #567

elcorto commented Aug 13, 2024 •

edited

Loading

nerkulec commented Aug 21, 2024

elcorto commented Aug 21, 2024

Make activation applied to the output layer optional #567

Make activation applied to the output layer optional #567

Comments

elcorto commented Aug 13, 2024 • edited Loading

nerkulec commented Aug 21, 2024

elcorto commented Aug 21, 2024

elcorto commented Aug 13, 2024 •

edited

Loading