Stable Diffusion using aot.export and external parameters #217

aviator19941 · 2023-12-02T00:50:31Z

Saves weights to .safetensors file
Load weights at runtime with a "stripped" .mlir

python/turbine_models/custom_models/sd_inference/clip_test.py

aviator19941 · 2023-12-02T01:09:25Z

aviator19941 · 2023-12-02T01:55:54Z

python/turbine_models/custom_models/sd_inference/unet_test.py

+                self,
+                sample=AbstractTensor(1, 4, 64, 64, dtype=torch.float32),
+                timestep=AbstractTensor(1, dtype=torch.float32),
+                encoder_hidden_states=AbstractTensor(2, 77, 768, dtype=torch.float32),


need to find better way to change AbstractTensor size

What do you mean? You can use None for dynamic if needed

Hey @dan-garvey, the issue is that for SD 1.4/2.1 the last dimension of encoder_hidden_states is 768/1024, respectively. When I try to use None in the AbstractTensor with the constraint set to encoder_hidden_states.dynamic_dim(2), I get this error:

RuntimeError: a and b must have same reduction dim, but got [s2*s3, s2] X [768, 320].

I see, yeah unless the encoder hidden state is also dynamic it won't work. In this case I'd just parameterize the last dim based on which model you're instantiating

dan-garvey · 2023-12-04T18:50:56Z

Is this ready to review?

aviator19941 · 2023-12-04T18:57:32Z

Is this ready to review?

@dan-garvey Yes, please review, thanks!

dan-garvey

The test files all do the right thing, but I think warrant some refactoring (we are doing the same with stateless llama I think). Try and deduplicate the code and also add tests to the turbine_models ci

dan-garvey · 2023-12-04T19:25:44Z

python/shark_turbine/importers/fx_importer.py

@@ -628,6 +628,12 @@ def _import_torch_op_overload(
        elif target == torch.ops.aten.lift_fresh_copy.out:
            node.target = target = torch.ops.aten.clone.out
            node.args = (node.args[0], None, node.args[1])
+        # TODO: generalize empty memory_format in the future


can you add else case with an explanation?

Not sure what the else case would be, but I added an explanation for the case.

python/turbine_models/custom_models/sd_inference/clip_test.py

dan-garvey · 2023-12-04T19:28:35Z

python/turbine_models/custom_models/sd_inference/unet_test.py

+                self,
+                sample=AbstractTensor(1, 4, 64, 64, dtype=torch.float32),
+                timestep=AbstractTensor(1, dtype=torch.float32),
+                encoder_hidden_states=AbstractTensor(2, 77, 768, dtype=torch.float32),


What do you mean? You can use None for dynamic if needed

python/turbine_models/custom_models/sd_inference/unet_test.py

dan-garvey · 2023-12-04T19:32:05Z

looks like you already deduped, thanks

dan-garvey

Looks good to me, can you add a full e2e test (vmfb and compore to torch result) as a follow up?

IanNod

Looks like we are compiling and running the 3 models (clip, unet, and vae) independently? This won't be able to do a full e2e inference to generate an image from a prompt will it?

python/turbine_models/custom_models/sd_inference/utils.py

IanNod · 2023-12-06T22:18:38Z

python/turbine_models/custom_models/sd_inference/clip.py

+            return jittable(text_encoder_model.forward)(inp)
+
+    import_to = "INPUT" if compile_to == "linalg" else "IMPORT"
+    inst = CompiledClip(context=Context(), import_to=import_to)


Do we want to provide the option to do quantization on the matmuls like we are for llama?

Not sure if we want to provide that option. I can add it later if needed.

Quantized (int8) SD is a popular request but we don't have a proof-of-concept yet. Can be follow-up.

aviator19941 · 2023-12-06T23:55:16Z

Looks like we are compiling and running the 3 models (clip, unet, and vae) independently? This won't be able to do a full e2e inference to generate an image from a prompt will it?

No, this won't generate an image from a prompt yet. The 3 models are mainly for checking that we can compile and run them without issues as well as verify the results are similar to torch's results.

IanNod · 2023-12-07T00:03:03Z

Looks like we are compiling and running the 3 models (clip, unet, and vae) independently? This won't be able to do a full e2e inference to generate an image from a prompt will it?

No, this won't generate an image from a prompt yet. The 3 models are mainly for checking that we can compile and run them without issues as well as verify the results are similar to torch's results.

ok, cool. Is there anything holding us back from generating an image, or are we planning on doing that when integrating with the web ui?

aviator19941 · 2023-12-07T00:05:25Z

ok, cool. Is there anything holding us back from generating an image, or are we planning on doing that when integrating with the web ui?

Yep, I was planning on doing that when integrating with the web ui

IanNod

Looks good to me

aviator19941 requested review from stellaraccident, dan-garvey, IanNod and monorimet December 2, 2023 00:50

aviator19941 commented Dec 2, 2023

View reviewed changes

python/turbine_models/custom_models/sd_inference/clip_test.py Outdated Show resolved Hide resolved

aviator19941 commented Dec 2, 2023

View reviewed changes

python/turbine_models/custom_models/sd_inference/clip_test.py Outdated Show resolved Hide resolved

aviator19941 commented Dec 2, 2023

View reviewed changes

dan-garvey requested changes Dec 4, 2023

View reviewed changes

aviator19941 requested a review from dan-garvey December 5, 2023 22:53

aviator19941 force-pushed the stateless_sd branch 2 times, most recently from 922e4e0 to 49d1dfb Compare December 6, 2023 18:25

dan-garvey approved these changes Dec 6, 2023

View reviewed changes

IanNod reviewed Dec 6, 2023

View reviewed changes

IanNod mentioned this pull request Dec 7, 2023

add end to end llama test, including generating and running vmfb #224

Merged

aviator19941 added 9 commits December 7, 2023 17:41

Initial SD 1.5 inference script

3322a70

Add CLIP test

781516e

Use aot.export for CLIPTextModel inference module

a0e4a08

[WIP] Add Unet CompiledModule example

e3641af

Fix usage of vae to decode latents into real images

b24546f

[WIP] Debug Unet and VAE nan values

d9d7158

Add linalg mlir for debugging

cde951a

[WIP] Use linalg for debugging

5736574

Change empty.memory_format to aten.zeros.default to fix VAE

ceaf350

aviator19941 added 19 commits December 7, 2023 17:41

Fix unet and add torch tests

33978b4

Rename to sd1.4_inference

ced8dc6

[WIP] Update CLIP 1.4 example to export parameters/save "stripped" .mlir

7c24f1f

Load weights at runtime for CLIP

1d1c1a1

[WIP] Fix batch size for encoder_hidden_states

c27efb4

Start cleaning up code

7f4a455

Finish clip example

bba568c

Finish clip and unet scripts

918d91c

Finish vae script

dd072b1

Add hf token flag and fix vae output comparison

c83d56c

Move scripts to turbine_models/custom_models

cb199ea

Fix formatting

65cc5e6

Move reusable functions to utils

d8e3306

Fix black formatting for utils

c70db4c

Address Dan's comments

23cfead

Rename sd_inference files and add tests to turbine_models ci

865cedd

Add 2.1 test and add requirements for SD

df6ad67

Add accelerate and diffusers to setup.py

904f519

Address comments, make tests run vmfb, add device support

990d3c9

aviator19941 force-pushed the stateless_sd branch from 49d1dfb to 990d3c9 Compare December 7, 2023 17:45

IanNod approved these changes Dec 7, 2023

View reviewed changes

aviator19941 merged commit b5a6192 into main Dec 7, 2023
3 checks passed

aviator19941 deleted the stateless_sd branch December 7, 2023 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable Diffusion using aot.export and external parameters #217

Stable Diffusion using aot.export and external parameters #217

aviator19941 commented Dec 2, 2023

aviator19941 commented Dec 2, 2023

aviator19941 Dec 2, 2023

dan-garvey Dec 4, 2023

aviator19941 Dec 4, 2023

dan-garvey Dec 4, 2023

dan-garvey commented Dec 4, 2023

aviator19941 commented Dec 4, 2023

dan-garvey left a comment

dan-garvey Dec 4, 2023

aviator19941 Dec 6, 2023

dan-garvey Dec 4, 2023

dan-garvey commented Dec 4, 2023

dan-garvey left a comment

IanNod left a comment

IanNod Dec 6, 2023

aviator19941 Dec 7, 2023

monorimet Dec 7, 2023

aviator19941 commented Dec 6, 2023

IanNod commented Dec 7, 2023

aviator19941 commented Dec 7, 2023

IanNod left a comment

Stable Diffusion using aot.export and external parameters #217

Stable Diffusion using aot.export and external parameters #217

Conversation

aviator19941 commented Dec 2, 2023

aviator19941 commented Dec 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dan-garvey commented Dec 4, 2023

aviator19941 commented Dec 4, 2023

dan-garvey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dan-garvey commented Dec 4, 2023

dan-garvey left a comment

Choose a reason for hiding this comment

IanNod left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aviator19941 commented Dec 6, 2023

IanNod commented Dec 7, 2023

aviator19941 commented Dec 7, 2023

IanNod left a comment

Choose a reason for hiding this comment