Scaling from 128x128, to 256x256, 512x512 and 1024x1024? #95

tin-sely · 2024-02-05T06:30:15Z

hey,

loved your paper and thanks a bunch for providing the code!

i have a quick question, how do you scale and train the network (HDiT) for increased resolutions? i saw you mentioned here: #14 (comment) that you first need to build the entire network, and then skip layers but i'm not sure if this also applies to this new architecture?

many thanks!

tin-sely · 2024-02-06T09:06:27Z

it looks like it's not meant for progressive scaling? i guess the best option would be to train a lower resolution and then copy the relevant weights to a higher-res network

another thing i was curious about was the inputs:

def forward(self, x, sigma, aug_cond=None, class_cond=None, mapping_cond=None):

x, sigma, and class_cond are clear, but do you have any more details on aug_cond and mapping_cond?

madebyollin · 2024-02-06T15:51:57Z

@tin-sely I believe aug_cond is for non-leaky augmentations. When an input image is augmented during training, a description of how that image was augmented is also given to the generator (as aug_cond - augmentation conditioning), so that the generator eventually learns how to generate either augmented or non-augmented images depending the value of the aug_cond input.

I believe mapping_cond is an older name for aug_cond which is used in the non-transformer model configs (the ones that use KarrasAugmentWrapper - which takes the aug_cond tensor and gives it to the model as mapping_cond)

tin-sely · 2024-02-07T07:56:37Z

thanks a bunch @madebyollin! ✨

mnslarcher · 2024-02-09T11:45:35Z

My understanding is that you use aug_cond when you wish to provide the model with information about the augmentations using Fourier Features:

k-diffusion/k_diffusion/models/image_transformer_v2.py

Line 657 in 6ab5146

self.aug_emb = layers.FourierFeatures(9, mapping.width)

k-diffusion/k_diffusion/models/image_transformer_v2.py

Line 658 in 6ab5146

self.aug_in_proj = Linear(mapping.width, mapping.width, bias=False)

k-diffusion/k_diffusion/models/image_transformer_v2.py

Line 718 in 6ab5146

aug_emb = self.aug_in_proj(self.aug_emb(aug_cond))

On the other hand, if you use mapping_cond, the condition will be fed directly into a linear layer, as shown here:

k-diffusion/k_diffusion/models/image_transformer_v2.py

Line 660 in 6ab5146

    
           self.mapping_cond_in_proj = Linear(mapping_cond_dim, mapping.width, bias=False) if mapping_cond_dim else None

k-diffusion/k_diffusion/models/image_transformer_v2.py

Line 720 in 6ab5146

    
           mapping_emb = self.mapping_cond_in_proj(mapping_cond) if self.mapping_cond_in_proj is not None else 0

These embeddings are then both fed into the MappingNetwork:

k-diffusion/k_diffusion/models/image_transformer_v2.py

Line 721 in 6ab5146

cond = self.mapping(time_emb + aug_emb + class_emb + mapping_emb)

But getting more clarity on this would definitely help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling from 128x128, to 256x256, 512x512 and 1024x1024? #95

Scaling from 128x128, to 256x256, 512x512 and 1024x1024? #95

tin-sely commented Feb 5, 2024

tin-sely commented Feb 6, 2024

madebyollin commented Feb 6, 2024

tin-sely commented Feb 7, 2024

mnslarcher commented Feb 9, 2024

Scaling from 128x128, to 256x256, 512x512 and 1024x1024? #95

Scaling from 128x128, to 256x256, 512x512 and 1024x1024? #95

Comments

tin-sely commented Feb 5, 2024

tin-sely commented Feb 6, 2024

madebyollin commented Feb 6, 2024

tin-sely commented Feb 7, 2024

mnslarcher commented Feb 9, 2024