Improve FLUX image-to-image (Trajectory Guidance) (#6900)

## Summary This PR makes some improvements to the FLUX image-to-image and inpainting behaviours. Changes: - Expand inpainting region at a cutoff timestep. This improves seam coherence around inpainting regions. - Add Trajectory Guidance to improve the ability to control how much an image gets modified during image-to-image/inpainting (see the code for a more technical explanation - it's well-documented). ## `trajectory_guidance_strength` Usage - The `trajectory_guidance_strength` param has been added to the `FLUX Denoise` invocation. - `trajectory_guidance_strength` defaults to `0.0` and should be in the range [0, 1]. - `trajectory_guidance_strength = 0.0` has no effect on the denoising process. - `trajectory_guidance_strength = 1.0` will guide strongly towards the original image. ## FLUX image-to-image usage tips - As always, prompt matters a lot. - If you are trying to making minor perturbations to an image, use vanilla image-to-image by setting the `denoising_start` param. - If you are trying to make significant changes to an image, using trajectory guidance will give more control than using vanilla image-to-image. Set `denoising_start=0.0` and adjust `trajectory_guidance_strength` to control the amount of change in the image. - The 'transition point' where the image changes the most as you adjust `trajectory_guidance_strength` or `denoise_start` varies depending on the noise. So, set a fixed noise seed, then tune those params. ## QA Instructions - [x] Vanilla image-to-image - No change in output - [x] Vanilla inpainting - No change in output - [x] Vanilla outpainting - No change in output - Trajectory Guidance image-to-image - [x] TGS = 0.0 is identical to Vanilla case - [x] TGS = 1.0 guides close to the original image - Not as close as I'd like, but it's not broken. - [x] Smooth transition as TGS varies - [x] Smoke test: TGS with denoise_start > 0.0 - TG inpainting - [x] TGS = 0.0 is identical to Vanilla case - [x] TGS = 1.0 guides close to the original image - Not as close as I'd like, but it's not broken - [x] Smooth transition as TGS varies - [x] Smoke test: TGS with denoise_start > 0.0 - TG outpainting - [x] TGS = 0.0 is identical to Vanilla case - [x] Smoke test TGS outpainting - [x] Smoke test FLUX text-to-image - [x] Preview images look ok for all of above. ## Known issues (will be addressed in follow-up PRs) - The current TGS scale biases towards creating more change than desired in the image. More tuning of the TG change schedule is required. - TGS does not work very well for outpainting right now. This _might_ be solvable, but more likely we'll just want to discourage it in the Linear UI. ## Merge Plan No special instructions. ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_
invoke-ai · Sep 20, 2024 · eea20f1 · eea20f1
2 parents 2f4a5a2 + 183a67c
commit eea20f1
Show file tree

Hide file tree

Showing 14 changed files with 418 additions and 131 deletions.
diff --git a/invokeai/app/invocations/flux_denoise.py b/invokeai/app/invocations/flux_denoise.py
@@ -20,7 +20,6 @@
 from invokeai.app.invocations.primitives import LatentsOutput
 from invokeai.app.services.shared.invocation_context import InvocationContext
 from invokeai.backend.flux.denoise import denoise
-from invokeai.backend.flux.inpaint_extension import InpaintExtension
 from invokeai.backend.flux.model import Flux
 from invokeai.backend.flux.sampling_utils import (
     clip_timestep_schedule,
@@ -30,6 +29,7 @@
     pack,
     unpack,
 )
+from invokeai.backend.flux.trajectory_guidance_extension import TrajectoryGuidanceExtension
 from invokeai.backend.lora.lora_model_raw import LoRAModelRaw
 from invokeai.backend.lora.lora_patcher import LoRAPatcher
 from invokeai.backend.model_manager.config import ModelFormat
@@ -43,7 +43,7 @@
     title="FLUX Denoise",
     tags=["image", "flux"],
     category="image",
-    version="2.0.0",
+    version="2.1.0",
     classification=Classification.Prototype,
 )
 class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
@@ -68,6 +68,12 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
         description=FieldDescriptions.denoising_start,
     )
     denoising_end: float = InputField(default=1.0, ge=0, le=1, description=FieldDescriptions.denoising_end)
+    trajectory_guidance_strength: float = InputField(
+        default=0.0,
+        ge=0.0,
+        le=1.0,
+        description="Value indicating how strongly to guide the denoising process towards the initial latents (during image-to-image). Range [0, 1]. A value of 0.0 is equivalent to vanilla image-to-image. A value of 1.0 will guide the denoising process very close to the original latents.",
+    )
     transformer: TransformerField = InputField(
         description=FieldDescriptions.flux_model,
         input=Input.Connection,
@@ -181,14 +187,13 @@ def _run_diffusion(
         # Now that we have 'packed' the latent tensors, verify that we calculated the image_seq_len correctly.
         assert image_seq_len == x.shape[1]
 
-        # Prepare inpaint extension.
-        inpaint_extension: InpaintExtension | None = None
-        if inpaint_mask is not None:
-            assert init_latents is not None
-            inpaint_extension = InpaintExtension(
+        # Prepare trajectory guidance extension.
+        traj_guidance_extension: TrajectoryGuidanceExtension | None = None
+        if init_latents is not None:
+            traj_guidance_extension = TrajectoryGuidanceExtension(
                 init_latents=init_latents,
                 inpaint_mask=inpaint_mask,
-                noise=noise,
+                trajectory_guidance_strength=self.trajectory_guidance_strength,
             )
 
         with (
@@ -236,7 +241,7 @@ def _run_diffusion(
                 timesteps=timesteps,
                 step_callback=self._build_step_callback(context),
                 guidance=self.guidance,
-                inpaint_extension=inpaint_extension,
+                traj_guidance_extension=traj_guidance_extension,
             )
 
         x = unpack(x.float(), self.height, self.width)

diff --git a/invokeai/app/services/workflow_records/default_workflows/FLUX Image to Image.json b/invokeai/app/services/workflow_records/default_workflows/FLUX Image to Image.json
@@ -2,7 +2,7 @@
   "name": "FLUX Image to Image",
   "author": "InvokeAI",
   "description": "A simple image-to-image workflow using a FLUX dev model. ",
-  "version": "1.0.4",
+  "version": "1.1.0",
   "contact": "",
   "tags": "image2image, flux, image-to-image",
   "notes": "Prerequisite model downloads: T5 Encoder, CLIP-L Encoder, and FLUX VAE. Quantized and un-quantized versions can be found in the starter models tab within your Model Manager. We recommend using FLUX dev models for image-to-image workflows. The image-to-image performance with FLUX schnell models is poor.",
@@ -23,17 +23,13 @@
       "nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
       "fieldName": "vae_model"
     },
-    {
-      "nodeId": "ace0258f-67d7-4eee-a218-6fff27065214",
-      "fieldName": "denoising_start"
-    },
     {
       "nodeId": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
       "fieldName": "prompt"
     },
     {
-      "nodeId": "ace0258f-67d7-4eee-a218-6fff27065214",
-      "fieldName": "num_steps"
+      "nodeId": "2981a67c-480f-4237-9384-26b68dbf912b",
+      "fieldName": "image"
     }
   ],
   "meta": {
@@ -42,48 +38,18 @@
   },
   "nodes": [
     {
-      "id": "2981a67c-480f-4237-9384-26b68dbf912b",
-      "type": "invocation",
-      "data": {
-        "id": "2981a67c-480f-4237-9384-26b68dbf912b",
-        "type": "flux_vae_encode",
-        "version": "1.0.0",
-        "label": "",
-        "notes": "",
-        "isOpen": true,
-        "isIntermediate": true,
-        "useCache": true,
-        "inputs": {
-          "image": {
-            "name": "image",
-            "label": "",
-            "value": {
-              "image_name": "8a5c62aa-9335-45d2-9c71-89af9fc1f8d4.png"
-            }
-          },
-          "vae": {
-            "name": "vae",
-            "label": ""
-          }
-        }
-      },
-      "position": {
-        "x": 732.7680166609682,
-        "y": -24.37398171806909
-      }
-    },
-    {
-      "id": "ace0258f-67d7-4eee-a218-6fff27065214",
+      "id": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
       "type": "invocation",
       "data": {
-        "id": "ace0258f-67d7-4eee-a218-6fff27065214",
+        "id": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
         "type": "flux_denoise",
-        "version": "1.0.0",
+        "version": "2.1.0",
         "label": "",
         "notes": "",
         "isOpen": true,
         "isIntermediate": true,
         "useCache": true,
+        "nodePack": "invokeai",
         "inputs": {
           "board": {
             "name": "board",
@@ -111,6 +77,11 @@
             "label": "",
             "value": 1
           },
+          "trajectory_guidance_strength": {
+            "name": "trajectory_guidance_strength",
+            "label": "",
+            "value": 0.0
+          },
           "transformer": {
             "name": "transformer",
             "label": ""
@@ -131,7 +102,7 @@
           },
           "num_steps": {
             "name": "num_steps",
-            "label": "Steps (Recommend 30 for Dev, 4 for Schnell)",
+            "label": "",
             "value": 30
           },
           "guidance": {
@@ -147,8 +118,36 @@
         }
       },
       "position": {
-        "x": 1182.8836633018684,
-        "y": -251.38882958913183
+        "x": 1159.584057771928,
+        "y": -175.90561201366845
+      }
+    },
+    {
+      "id": "2981a67c-480f-4237-9384-26b68dbf912b",
+      "type": "invocation",
+      "data": {
+        "id": "2981a67c-480f-4237-9384-26b68dbf912b",
+        "type": "flux_vae_encode",
+        "version": "1.0.0",
+        "label": "",
+        "notes": "",
+        "isOpen": true,
+        "isIntermediate": true,
+        "useCache": true,
+        "inputs": {
+          "image": {
+            "name": "image",
+            "label": ""
+          },
+          "vae": {
+            "name": "vae",
+            "label": ""
+          }
+        }
+      },
+      "position": {
+        "x": 732.7680166609682,
+        "y": -24.37398171806909
       }
     },
     {
@@ -202,18 +201,32 @@
         "inputs": {
           "model": {
             "name": "model",
-            "label": "Model (dev variant recommended for Image-to-Image)"
+            "label": "Model (dev variant recommended for Image-to-Image)",
+            "value": {
+              "key": "b4990a6c-0899-48e9-969b-d6f3801acc6a",
+              "hash": "random:aad8f7bc19ce76541dfb394b62a30f77722542b66e48064a9f25453263b45fba",
+              "name": "FLUX Dev (Quantized)_2",
+              "base": "flux",
+              "type": "main"
+            }
           },
           "t5_encoder_model": {
             "name": "t5_encoder_model",
-            "label": ""
+            "label": "",
+            "value": {
+              "key": "d18d5575-96b6-4da3-b3d8-eb58308d6705",
+              "hash": "random:f2f9ed74acdfb4bf6fec200e780f6c25f8dd8764a35e65d425d606912fdf573a",
+              "name": "t5_bnb_int8_quantized_encoder",
+              "base": "any",
+              "type": "t5_encoder"
+            }
           },
           "clip_embed_model": {
             "name": "clip_embed_model",
             "label": "",
             "value": {
-              "key": "fa23a584-b623-415d-832a-21b5098ff1a1",
-              "hash": "blake3:17c19f0ef941c3b7609a9c94a659ca5364de0be364a91d4179f0e39ba17c3b70",
+              "key": "5a19d7e5-8d98-43cd-8a81-87515e4b3b4e",
+              "hash": "random:4bd08514c08fb6ff04088db9aeb45def3c488e8b5fd09a35f2cc4f2dc346f99f",
               "name": "clip-vit-large-patch14",
               "base": "any",
               "type": "clip_embed"
@@ -223,8 +236,8 @@
             "name": "vae_model",
             "label": "",
             "value": {
-              "key": "74fc82ba-c0a8-479d-a890-2126f82da758",
-              "hash": "blake3:ce21cb76364aa6e2421311cf4a4b5eb052a76c4f1cd207b50703d8978198a068",
+              "key": "9172beab-5c1d-43f0-b2f0-6e0b956710d9",
+              "hash": "random:c54dde288e5fa2e6137f1c92e9d611f598049e6f16e360207b6d96c9f5a67ba0",
               "name": "FLUX.1-schnell_ae",
               "base": "flux",
               "type": "vae"
@@ -308,68 +321,68 @@
   ],
   "edges": [
     {
-      "id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bheight-ace0258f-67d7-4eee-a218-6fff27065214height",
+      "id": "reactflow__edge-eebd7252-0bd8-401a-bb26-2b8bc64892falatents-7e5172eb-48c1-44db-a770-8fd83e1435d1latents",
       "type": "default",
-      "source": "2981a67c-480f-4237-9384-26b68dbf912b",
-      "target": "ace0258f-67d7-4eee-a218-6fff27065214",
-      "sourceHandle": "height",
-      "targetHandle": "height"
+      "source": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
+      "target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
+      "sourceHandle": "latents",
+      "targetHandle": "latents"
     },
     {
-      "id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bwidth-ace0258f-67d7-4eee-a218-6fff27065214width",
+      "id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-eebd7252-0bd8-401a-bb26-2b8bc64892fatransformer",
       "type": "default",
-      "source": "2981a67c-480f-4237-9384-26b68dbf912b",
-      "target": "ace0258f-67d7-4eee-a218-6fff27065214",
-      "sourceHandle": "width",
-      "targetHandle": "width"
+      "source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
+      "target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
+      "sourceHandle": "transformer",
+      "targetHandle": "transformer"
+    },
+    {
+      "id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-eebd7252-0bd8-401a-bb26-2b8bc64892fapositive_text_conditioning",
+      "type": "default",
+      "source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
+      "target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
+      "sourceHandle": "conditioning",
+      "targetHandle": "positive_text_conditioning"
     },
     {
-      "id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912blatents-ace0258f-67d7-4eee-a218-6fff27065214latents",
+      "id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912blatents-eebd7252-0bd8-401a-bb26-2b8bc64892falatents",
       "type": "default",
       "source": "2981a67c-480f-4237-9384-26b68dbf912b",
-      "target": "ace0258f-67d7-4eee-a218-6fff27065214",
+      "target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
       "sourceHandle": "latents",
       "targetHandle": "latents"
     },
     {
-      "id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-2981a67c-480f-4237-9384-26b68dbf912bvae",
+      "id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bwidth-eebd7252-0bd8-401a-bb26-2b8bc64892fawidth",
       "type": "default",
-      "source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
-      "target": "2981a67c-480f-4237-9384-26b68dbf912b",
-      "sourceHandle": "vae",
-      "targetHandle": "vae"
+      "source": "2981a67c-480f-4237-9384-26b68dbf912b",
+      "target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
+      "sourceHandle": "width",
+      "targetHandle": "width"
     },
     {
-      "id": "reactflow__edge-ace0258f-67d7-4eee-a218-6fff27065214latents-7e5172eb-48c1-44db-a770-8fd83e1435d1latents",
+      "id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bheight-eebd7252-0bd8-401a-bb26-2b8bc64892faheight",
       "type": "default",
-      "source": "ace0258f-67d7-4eee-a218-6fff27065214",
-      "target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
-      "sourceHandle": "latents",
-      "targetHandle": "latents"
+      "source": "2981a67c-480f-4237-9384-26b68dbf912b",
+      "target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
+      "sourceHandle": "height",
+      "targetHandle": "height"
     },
     {
-      "id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-ace0258f-67d7-4eee-a218-6fff27065214seed",
+      "id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-eebd7252-0bd8-401a-bb26-2b8bc64892faseed",
       "type": "default",
       "source": "4754c534-a5f3-4ad0-9382-7887985e668c",
-      "target": "ace0258f-67d7-4eee-a218-6fff27065214",
+      "target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
       "sourceHandle": "value",
       "targetHandle": "seed"
     },
     {
-      "id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-ace0258f-67d7-4eee-a218-6fff27065214transformer",
+      "id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-2981a67c-480f-4237-9384-26b68dbf912bvae",
       "type": "default",
       "source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
-      "target": "ace0258f-67d7-4eee-a218-6fff27065214",
-      "sourceHandle": "transformer",
-      "targetHandle": "transformer"
-    },
-    {
-      "id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-ace0258f-67d7-4eee-a218-6fff27065214positive_text_conditioning",
-      "type": "default",
-      "source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
-      "target": "ace0258f-67d7-4eee-a218-6fff27065214",
-      "sourceHandle": "conditioning",
-      "targetHandle": "positive_text_conditioning"
+      "target": "2981a67c-480f-4237-9384-26b68dbf912b",
+      "sourceHandle": "vae",
+      "targetHandle": "vae"
     },
     {
       "id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-7e5172eb-48c1-44db-a770-8fd83e1435d1vae",