-
-
Notifications
You must be signed in to change notification settings - Fork 335
Tiled Diffusion
Kahsolt edited this page Mar 30, 2024
·
1 revision
- From the illustration, you can see how is an image split into tiles.
- In each step, each tile in the latent space will be sent to Stable Diffusion UNet.
- The tiles are split and fused over and over again until all steps are completed.
- What is a good tile size?
- A larger tile size will increase the speed because it produces fewer tiles.
- However, the optimal size depends on your checkpoint. The basic SD1.4 is only good at drawing 512 * 512 images (SD2.1 will be 768 * 768). And most checkpoints cannot generate good pictures larger than 1280 * 1280. So in latent space let's divide this by 8, and you will get 64 - 160.
- Hence, you should pick a value between 64 - 160.
- Personally, I recommend 96 or 128 for fast speed.
- What is a good overlap?
- The overlap reduces seams in fusion. Obviously, a larger overlap means fewer seams, but will significantly reduce the speed as it brings much more tiles to redraw.
- Compared to MultiDiffusion, Mixture of Diffusers requires less overlap because it uses Gaussian smoothing (and therefore can be faster).
- Personally, I recommend 32 or 48 for MultiDiffusion, 16 or 32 for Mixture of Diffusers
- Upscaler will appear in i2i. You can select one to upscale your image in advance.
ℹ Please use simple positive prompts at the top of the page, as they will be applied to each tile. ℹ If you want to add objects to a specific position, use regional prompt control and enable draw full canvas background
- 22020 x 1080 ultra-wide image conversion
- Masterpiece, best quality, highres, ultra-detailed 8k unity wallpaper, bird's-eye view, trees, ancient architectures, stones, farms, crowd, pedestrians
- Before: click for the raw image
- After: click for the raw image
- ControlNet (canny edge)
Leverage Tiled Diffusion to upscale & redraw large images
-
Params:
- denoise=0.4, steps=20, Sampler=Euler a, Upscaler=RealESRGAN++, Negative Prompts=EasyNegative,
- Ckpt: Gf-style2 (4GB version), CFG Scale = 14, Clip Skip = 2
- method = MultiDiffusion, tile batch size = 8, tile size height = 96, tile size width = 96, overlap = 32
- Prompt = masterpiece, best quality, highres, extremely detailed 8k wallpaper, very clear, Neg prompt = EasyNegative.
-
Before upscaling
-
After 4x upscale, No cherry-picking. 1min12s on NVIDIA Tesla V100. (If 2x, it completes in 10s)
-
Recommend Parameters for Efficient Upscaling.
- Sampler = Euler a, steps = 20, denoise = 0.35, method = Mixture of Diffusers, Latent tile height & width = 128, overlap = 16, tile batch size = 8 (reduce tile batch size if see CUDA out of memory).
- We are compatible with masked inpainting
- If you want to keep some parts, or the Tiled Diffusion gives you weird results, just mask these areas.
-
The checkpoint is crucial.
- MultiDiffusion works very similar to highres.fix, so it highly relies on your checkpoint.
- A checkpoint that is good at drawing details can add amazing details to your image.
- A full checkpoint instead of a pruned one can yield much finer results.
-
Don't include any concrete objects in your main prompts, otherwise, the results get ruined.
- Just use something like "highres, masterpiece, best quality, ultra-detailed 8k wallpaper, extremely clear".
- And use regional prompt control for concrete objects if you like.
- You don't need too large tile size, large overlap and many denoising steps, or it can be very slow.
-
CFG scale can significantly affect the details.
- A large CFG scale (e.g., 14) gives you much more details.
- You can control how much you want to change the original image with denoising strength from 0.1 - 0.6.
- If your results are still not as satisfying as mine, see our discussions here.