Replies: 1 comment 10 replies
-
You can train SD 2.1 with 768px images and should work, probably you need to adapt the params for it. One example of this is SPRIGHT that was trained with 768px images with this dataset. cc: @sayakpaul if you have some more insights |
Beta Was this translation helpful? Give feedback.
10 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I conducted fine-tuning experiments on two base models, SD-2-1-base (512x512 resolution) and SD-2-1 (768x768 resolution), using identical hyperparameters:
Model architecture:
An image-conditioned model with extended input channels, where the conditioning image is concatenated with the noise (similar to the approach in Zero123), and the CLIP text encoder is disabled.
Dataset:
15k frames for a single person.
Results:
SD-2-1-base (512x512 resolution):
(Above: Ground Truth (GT), Below: Model Generation)
SD-2-1 (768x768 resolution):
(Above: Ground Truth (GT), Below: Model Generation)
It looks like the SD-2-1 (768x768 resolution) faced model collapse.
Beta Was this translation helpful? Give feedback.
All reactions