-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment of using Instance Normalization vs Layer Normalization on Decoder #107
Comments
Hi |
In their model series (nvlabs imaginaire), the adaptive step only in Resblocks which means there's no resolution changing when adding conditional information.
|
thank you for your replying especially in this: |
LN normalizes the feature cross all dimension (spatial & channel). Different to LN, IN normalizes the feature cross only spatial which means each dimension (channels) is normalized by its spatial statistics. That's why the correlation is destroyed. You can refer gram matrix (Style Transfer) and U-GAT-IT (combine instance and layer norm). (You can treat dimension as units in mlp case which does not have spatial information) Gamma and beta are the predicted vector conditioned on the input of style feature. In the official implementation, they predict all gamma and beta use wide mlp, i.e., style feature (batch, 8) --> (batch, 256 * 2 * 2 * 9). That means there are 9 residual blocks and each contains 2 convolutional blocks which need 2 gamma & beta pair and the dimension is 256. Due to the benefit that deep structure can fit more complex function than wide structure, instead of predicting all gamma & beta at ones, you can equiped with two gamma & beta predictors in each layer as follows. class InstanceNorm(layers.Layer):
def __init__(self, epsilon=1e-5, affine=False, **kwargs):
super(InstanceNorm, self).__init__(**kwargs)
self.epsilon = epsilon
self.affine = affine
def build(self, input_shape):
if self.affine:
self.gamma = self.add_weight(name='gamma',
shape=(input_shape[-1],),
initializer=tf.random_normal_initializer(0, 0.02),
trainable=True)
self.beta = self.add_weight(name='beta',
shape=(input_shape[-1],),
initializer=tf.zeros_initializer(),
trainable=True)
def call(self, inputs, training=None):
mean, var = tf.nn.moments(inputs, axes=[1, 2], keepdims=True)
x = tf.divide(tf.subtract(inputs, mean), tf.math.sqrt(tf.add(var, self.epsilon)))
if self.affine:
return self.gamma * x + self.beta
return x |
No, the tunning parameter gamma and beta are not use in LN. They just cite the paper to refer the LN method. |
so, this means that the LN is different normalization layer and not mention in the paper, |
Yes you can, and you will find that the training is unstable as the magnitude of feature values becomes large. |
By the computation operation of the normalization methods, the MUNIT architecture can be summarized as follows.
This means that since there's no tuning channel correlation on the upsampling layer (i.e., Adaptive Instance Normalization, StyleGAN), if you use instance normalization during upsampling, the tunned channel correlation (ResNet + Adaptive Instance Normalization) will be destroyed.
The text was updated successfully, but these errors were encountered: