Skip to content

Commit

Permalink
Improve FLUX image-to-image (Trajectory Guidance) (#6900)
Browse files Browse the repository at this point in the history
## Summary

This PR makes some improvements to the FLUX image-to-image and
inpainting behaviours.

Changes:
- Expand inpainting region at a cutoff timestep. This improves seam
coherence around inpainting regions.
- Add Trajectory Guidance to improve the ability to control how much an
image gets modified during image-to-image/inpainting (see the code for a
more technical explanation - it's well-documented).

## `trajectory_guidance_strength` Usage

- The `trajectory_guidance_strength` param has been added to the `FLUX
Denoise` invocation.
- `trajectory_guidance_strength` defaults to `0.0` and should be in the
range [0, 1].
- `trajectory_guidance_strength = 0.0` has no effect on the denoising
process.
- `trajectory_guidance_strength = 1.0` will guide strongly towards the
original image.

## FLUX image-to-image usage tips

- As always, prompt matters a lot.
- If you are trying to making minor perturbations to an image, use
vanilla image-to-image by setting the `denoising_start` param.
- If you are trying to make significant changes to an image, using
trajectory guidance will give more control than using vanilla
image-to-image. Set `denoising_start=0.0` and adjust
`trajectory_guidance_strength` to control the amount of change in the
image.
- The 'transition point' where the image changes the most as you adjust
`trajectory_guidance_strength` or `denoise_start` varies depending on
the noise. So, set a fixed noise seed, then tune those params.


## QA Instructions

- [x] Vanilla image-to-image - No change in output
- [x] Vanilla inpainting - No change in output
- [x] Vanilla outpainting - No change in output
- Trajectory Guidance image-to-image
    - [x] TGS = 0.0 is identical to Vanilla case
    - [x] TGS = 1.0 guides close to the original image
      - Not as close as I'd like, but it's not broken.
    - [x] Smooth transition as TGS varies
    - [x] Smoke test: TGS with denoise_start > 0.0
- TG inpainting
    - [x] TGS = 0.0 is identical to Vanilla case
    - [x] TGS = 1.0 guides close to the original image
      - Not as close as I'd like, but it's not broken
    - [x] Smooth transition as TGS varies
    - [x] Smoke test: TGS with denoise_start > 0.0
- TG outpainting
    - [x] TGS = 0.0 is identical to Vanilla case
    - [x] Smoke test TGS outpainting
- [x] Smoke test FLUX text-to-image
- [x] Preview images look ok for all of above.

## Known issues (will be addressed in follow-up PRs)

- The current TGS scale biases towards creating more change than desired
in the image. More tuning of the TG change schedule is required.
- TGS does not work very well for outpainting right now. This _might_ be
solvable, but more likely we'll just want to discourage it in the Linear
UI.

## Merge Plan

No special instructions.

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
  • Loading branch information
RyanJDick committed Sep 20, 2024
2 parents 2f4a5a2 + 183a67c commit eea20f1
Show file tree
Hide file tree
Showing 14 changed files with 418 additions and 131 deletions.
23 changes: 14 additions & 9 deletions invokeai/app/invocations/flux_denoise.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.flux.denoise import denoise
from invokeai.backend.flux.inpaint_extension import InpaintExtension
from invokeai.backend.flux.model import Flux
from invokeai.backend.flux.sampling_utils import (
clip_timestep_schedule,
Expand All @@ -30,6 +29,7 @@
pack,
unpack,
)
from invokeai.backend.flux.trajectory_guidance_extension import TrajectoryGuidanceExtension
from invokeai.backend.lora.lora_model_raw import LoRAModelRaw
from invokeai.backend.lora.lora_patcher import LoRAPatcher
from invokeai.backend.model_manager.config import ModelFormat
Expand All @@ -43,7 +43,7 @@
title="FLUX Denoise",
tags=["image", "flux"],
category="image",
version="2.0.0",
version="2.1.0",
classification=Classification.Prototype,
)
class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
Expand All @@ -68,6 +68,12 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
description=FieldDescriptions.denoising_start,
)
denoising_end: float = InputField(default=1.0, ge=0, le=1, description=FieldDescriptions.denoising_end)
trajectory_guidance_strength: float = InputField(
default=0.0,
ge=0.0,
le=1.0,
description="Value indicating how strongly to guide the denoising process towards the initial latents (during image-to-image). Range [0, 1]. A value of 0.0 is equivalent to vanilla image-to-image. A value of 1.0 will guide the denoising process very close to the original latents.",
)
transformer: TransformerField = InputField(
description=FieldDescriptions.flux_model,
input=Input.Connection,
Expand Down Expand Up @@ -181,14 +187,13 @@ def _run_diffusion(
# Now that we have 'packed' the latent tensors, verify that we calculated the image_seq_len correctly.
assert image_seq_len == x.shape[1]

# Prepare inpaint extension.
inpaint_extension: InpaintExtension | None = None
if inpaint_mask is not None:
assert init_latents is not None
inpaint_extension = InpaintExtension(
# Prepare trajectory guidance extension.
traj_guidance_extension: TrajectoryGuidanceExtension | None = None
if init_latents is not None:
traj_guidance_extension = TrajectoryGuidanceExtension(
init_latents=init_latents,
inpaint_mask=inpaint_mask,
noise=noise,
trajectory_guidance_strength=self.trajectory_guidance_strength,
)

with (
Expand Down Expand Up @@ -236,7 +241,7 @@ def _run_diffusion(
timesteps=timesteps,
step_callback=self._build_step_callback(context),
guidance=self.guidance,
inpaint_extension=inpaint_extension,
traj_guidance_extension=traj_guidance_extension,
)

x = unpack(x.float(), self.height, self.width)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "FLUX Image to Image",
"author": "InvokeAI",
"description": "A simple image-to-image workflow using a FLUX dev model. ",
"version": "1.0.4",
"version": "1.1.0",
"contact": "",
"tags": "image2image, flux, image-to-image",
"notes": "Prerequisite model downloads: T5 Encoder, CLIP-L Encoder, and FLUX VAE. Quantized and un-quantized versions can be found in the starter models tab within your Model Manager. We recommend using FLUX dev models for image-to-image workflows. The image-to-image performance with FLUX schnell models is poor.",
Expand All @@ -23,17 +23,13 @@
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "vae_model"
},
{
"nodeId": "ace0258f-67d7-4eee-a218-6fff27065214",
"fieldName": "denoising_start"
},
{
"nodeId": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"fieldName": "prompt"
},
{
"nodeId": "ace0258f-67d7-4eee-a218-6fff27065214",
"fieldName": "num_steps"
"nodeId": "2981a67c-480f-4237-9384-26b68dbf912b",
"fieldName": "image"
}
],
"meta": {
Expand All @@ -42,48 +38,18 @@
},
"nodes": [
{
"id": "2981a67c-480f-4237-9384-26b68dbf912b",
"type": "invocation",
"data": {
"id": "2981a67c-480f-4237-9384-26b68dbf912b",
"type": "flux_vae_encode",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"inputs": {
"image": {
"name": "image",
"label": "",
"value": {
"image_name": "8a5c62aa-9335-45d2-9c71-89af9fc1f8d4.png"
}
},
"vae": {
"name": "vae",
"label": ""
}
}
},
"position": {
"x": 732.7680166609682,
"y": -24.37398171806909
}
},
{
"id": "ace0258f-67d7-4eee-a218-6fff27065214",
"id": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"type": "invocation",
"data": {
"id": "ace0258f-67d7-4eee-a218-6fff27065214",
"id": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"type": "flux_denoise",
"version": "1.0.0",
"version": "2.1.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"nodePack": "invokeai",
"inputs": {
"board": {
"name": "board",
Expand Down Expand Up @@ -111,6 +77,11 @@
"label": "",
"value": 1
},
"trajectory_guidance_strength": {
"name": "trajectory_guidance_strength",
"label": "",
"value": 0.0
},
"transformer": {
"name": "transformer",
"label": ""
Expand All @@ -131,7 +102,7 @@
},
"num_steps": {
"name": "num_steps",
"label": "Steps (Recommend 30 for Dev, 4 for Schnell)",
"label": "",
"value": 30
},
"guidance": {
Expand All @@ -147,8 +118,36 @@
}
},
"position": {
"x": 1182.8836633018684,
"y": -251.38882958913183
"x": 1159.584057771928,
"y": -175.90561201366845
}
},
{
"id": "2981a67c-480f-4237-9384-26b68dbf912b",
"type": "invocation",
"data": {
"id": "2981a67c-480f-4237-9384-26b68dbf912b",
"type": "flux_vae_encode",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"inputs": {
"image": {
"name": "image",
"label": ""
},
"vae": {
"name": "vae",
"label": ""
}
}
},
"position": {
"x": 732.7680166609682,
"y": -24.37398171806909
}
},
{
Expand Down Expand Up @@ -202,18 +201,32 @@
"inputs": {
"model": {
"name": "model",
"label": "Model (dev variant recommended for Image-to-Image)"
"label": "Model (dev variant recommended for Image-to-Image)",
"value": {
"key": "b4990a6c-0899-48e9-969b-d6f3801acc6a",
"hash": "random:aad8f7bc19ce76541dfb394b62a30f77722542b66e48064a9f25453263b45fba",
"name": "FLUX Dev (Quantized)_2",
"base": "flux",
"type": "main"
}
},
"t5_encoder_model": {
"name": "t5_encoder_model",
"label": ""
"label": "",
"value": {
"key": "d18d5575-96b6-4da3-b3d8-eb58308d6705",
"hash": "random:f2f9ed74acdfb4bf6fec200e780f6c25f8dd8764a35e65d425d606912fdf573a",
"name": "t5_bnb_int8_quantized_encoder",
"base": "any",
"type": "t5_encoder"
}
},
"clip_embed_model": {
"name": "clip_embed_model",
"label": "",
"value": {
"key": "fa23a584-b623-415d-832a-21b5098ff1a1",
"hash": "blake3:17c19f0ef941c3b7609a9c94a659ca5364de0be364a91d4179f0e39ba17c3b70",
"key": "5a19d7e5-8d98-43cd-8a81-87515e4b3b4e",
"hash": "random:4bd08514c08fb6ff04088db9aeb45def3c488e8b5fd09a35f2cc4f2dc346f99f",
"name": "clip-vit-large-patch14",
"base": "any",
"type": "clip_embed"
Expand All @@ -223,8 +236,8 @@
"name": "vae_model",
"label": "",
"value": {
"key": "74fc82ba-c0a8-479d-a890-2126f82da758",
"hash": "blake3:ce21cb76364aa6e2421311cf4a4b5eb052a76c4f1cd207b50703d8978198a068",
"key": "9172beab-5c1d-43f0-b2f0-6e0b956710d9",
"hash": "random:c54dde288e5fa2e6137f1c92e9d611f598049e6f16e360207b6d96c9f5a67ba0",
"name": "FLUX.1-schnell_ae",
"base": "flux",
"type": "vae"
Expand Down Expand Up @@ -308,68 +321,68 @@
],
"edges": [
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bheight-ace0258f-67d7-4eee-a218-6fff27065214height",
"id": "reactflow__edge-eebd7252-0bd8-401a-bb26-2b8bc64892falatents-7e5172eb-48c1-44db-a770-8fd83e1435d1latents",
"type": "default",
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "height",
"targetHandle": "height"
"source": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"sourceHandle": "latents",
"targetHandle": "latents"
},
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bwidth-ace0258f-67d7-4eee-a218-6fff27065214width",
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-eebd7252-0bd8-401a-bb26-2b8bc64892fatransformer",
"type": "default",
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "width",
"targetHandle": "width"
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "transformer",
"targetHandle": "transformer"
},
{
"id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-eebd7252-0bd8-401a-bb26-2b8bc64892fapositive_text_conditioning",
"type": "default",
"source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "conditioning",
"targetHandle": "positive_text_conditioning"
},
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912blatents-ace0258f-67d7-4eee-a218-6fff27065214latents",
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912blatents-eebd7252-0bd8-401a-bb26-2b8bc64892falatents",
"type": "default",
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "latents",
"targetHandle": "latents"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-2981a67c-480f-4237-9384-26b68dbf912bvae",
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bwidth-eebd7252-0bd8-401a-bb26-2b8bc64892fawidth",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "2981a67c-480f-4237-9384-26b68dbf912b",
"sourceHandle": "vae",
"targetHandle": "vae"
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "width",
"targetHandle": "width"
},
{
"id": "reactflow__edge-ace0258f-67d7-4eee-a218-6fff27065214latents-7e5172eb-48c1-44db-a770-8fd83e1435d1latents",
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bheight-eebd7252-0bd8-401a-bb26-2b8bc64892faheight",
"type": "default",
"source": "ace0258f-67d7-4eee-a218-6fff27065214",
"target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"sourceHandle": "latents",
"targetHandle": "latents"
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "height",
"targetHandle": "height"
},
{
"id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-ace0258f-67d7-4eee-a218-6fff27065214seed",
"id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-eebd7252-0bd8-401a-bb26-2b8bc64892faseed",
"type": "default",
"source": "4754c534-a5f3-4ad0-9382-7887985e668c",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "value",
"targetHandle": "seed"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-ace0258f-67d7-4eee-a218-6fff27065214transformer",
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-2981a67c-480f-4237-9384-26b68dbf912bvae",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "transformer",
"targetHandle": "transformer"
},
{
"id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-ace0258f-67d7-4eee-a218-6fff27065214positive_text_conditioning",
"type": "default",
"source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "conditioning",
"targetHandle": "positive_text_conditioning"
"target": "2981a67c-480f-4237-9384-26b68dbf912b",
"sourceHandle": "vae",
"targetHandle": "vae"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-7e5172eb-48c1-44db-a770-8fd83e1435d1vae",
Expand Down
Loading

0 comments on commit eea20f1

Please sign in to comment.