[LoRA] fix: lora loading when using with a device_mapped model. #9449

sayakpaul · 2024-09-17T01:55:50Z

What does this PR do?

Fixes LoRA loading behaviour when used with a model that is sharded into multiple devices.

Minimal code

"""
Minimal example to show how to load a LoRA into the Flux transformer
that is sharded in two GPUs. 

Limitation:
* Latency
* If the LoRA has text encoder layers then this needs to be revisited.
"""

from diffusers import FluxTransformer2DModel, FluxPipeline 
import torch 

ckpt_id = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
transformer = FluxTransformer2DModel.from_pretrained(
    ckpt_id, 
    subfolder="transformer",
    device_map="auto",
    max_memory={0: "16GB", 1: "16GB"},
    torch_dtype=dtype
)
print(transformer.hf_device_map)
pipeline = FluxPipeline.from_pretrained(
    ckpt_id,
    text_encoder=None,
    text_encoder_2=None,
    tokenizer=None,
    tokenizer_2=None,
    vae=None,
    transformer=transformer,
    torch_dtype=dtype
)
pipeline.load_lora_weights("TheLastBen/Jon_Snow_Flux_LoRA", weight_name="jon_snow.safetensors")
# print(pipeline.transformer.hf_device_map)

# Essentially you'd pre-compute these embeddings beforehand.
# Reference: https://gist.github.com/sayakpaul/a9266fe2d0d510ec44a9cdc385b3dd74. 
example_inputs = {
    "prompt_embeds": torch.randn(1, 512, 4096, dtype=dtype, device="cuda"),
    "pooled_projections": torch.randn(1, 768, dtype=dtype, device="cuda"),
}

_ =  pipeline(
    prompt_embeds=example_inputs["prompt_embeds"],
    pooled_prompt_embeds=example_inputs["pooled_projections"],
    num_inference_steps=50,
    guidance_scale=3.5,
    height=1024,
    width=1024,
    output_type="latent",
)

Some internal discussions:

Cc: @philschmid for awareness as you were interested in this feature.

TODOs

Tests
Docs

Once I get a sanity review from Marc and Benjamin, will request a review from Yiyi.

src/diffusers/loaders/lora_base.py

BenjaminBossan · 2024-09-17T10:57:33Z

Does diffusers have multi GPU tests? If yes, would it make sense to add a test there and check that after LoRA loading, no parameter was transferred to meta device?

sayakpaul · 2024-09-17T11:07:55Z

That is a TODO ;)

BenjaminBossan

That is a TODO ;)

I see. In that case, I have just some nits, otherwise I'd defer to Marc as I'm not an expert on device maps.

src/diffusers/pipelines/pipeline_utils.py

sayakpaul · 2024-09-17T13:59:11Z

Does diffusers have multi GPU tests?

@BenjaminBossan yes, we do: https://github.com/search?q=repo%3Ahuggingface%2Fdiffusers%20require_torch_multi_gpu&type=code

But not for the use case, being described here. Will add them as a part of this PR.

Co-authored-by: Benjamin Bossan <[email protected]>

sayakpaul · 2024-09-22T10:52:48Z

@SunMarc a gentle ping when you find a moment.

HuggingFaceDocBuilderDev · 2024-09-22T10:58:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

LGTM ! Just a few suggestions !

src/diffusers/loaders/lora_base.py

Co-authored-by: Marc Sun <[email protected]>

sayakpaul · 2024-09-24T14:26:30Z

@yiyixuxu can you give this an initial look and once we agree, I will work on adding testing, docs, etc.

sayakpaul · 2024-10-02T13:48:23Z

@yiyixuxu a gentle ping for a first review as it touches pipeline_utils.py.

src/diffusers/pipelines/pipeline_utils.py

docs/source/en/training/distributed_inference.md

sayakpaul · 2024-10-19T12:38:22Z

src/diffusers/loaders/unet.py

@@ -398,9 +399,18 @@ def _optionally_disable_offloading(cls, _pipeline):
        is_model_cpu_offload = False
        is_sequential_cpu_offload = False

+        def model_has_device_map(model):


After-effects of make fix-copies.

sayakpaul · 2024-10-19T12:39:00Z

src/diffusers/pipelines/pipeline_utils.py

@@ -387,6 +387,11 @@ def to(self, *args, **kwargs):

        device = device or device_arg

+        def model_has_device_map(model):


@DN6 it would make sense to make this a separate utility instead of having redefine three times. WDYT?

Yup, you can add as a util function inside pipeline_utils.

sayakpaul · 2024-10-19T12:39:41Z

tests/pipelines/test_pipelines_common.py

+    @slow
+    @nightly
+    def test_calling_to_raises_error_device_mapped_components(self):
+        if "Combined" in self.pipeline_class.__name__:


Because for connected pipelines, we don't support device mapping in the first place.

docs/source/en/training/distributed_inference.md

BenjaminBossan

Thanks for working on this, LGTM.

Co-authored-by: Steven Liu <[email protected]>

sayakpaul · 2024-10-31T15:47:36Z

Failing tests are unrelated.

…l. (#9449)" This reverts commit 41e4779.

#9823) Revert "[LoRA] fix: lora loading when using with a device_mapped model. (#9449)" This reverts commit 41e4779.

* fix: lora loading when using with a device_mapped model. * better attibutung * empty Co-authored-by: Benjamin Bossan <[email protected]> * Apply suggestions from code review Co-authored-by: Marc Sun <[email protected]> * minors * better error messages. * fix-copies * add: tests, docs. * add hardware note. * quality * Update docs/source/en/training/distributed_inference.md Co-authored-by: Steven Liu <[email protected]> * fixes * skip properly. * fixes --------- Co-authored-by: Benjamin Bossan <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Steven Liu <[email protected]>

#9823) Revert "[LoRA] fix: lora loading when using with a device_mapped model. (#9449)" This reverts commit 41e4779.

fix: lora loading when using with a device_mapped model.

dc1aee2

sayakpaul added the lora label Sep 17, 2024

sayakpaul requested review from BenjaminBossan and SunMarc September 17, 2024 01:55

sayakpaul commented Sep 17, 2024

View reviewed changes

src/diffusers/loaders/lora_base.py Outdated Show resolved Hide resolved

BenjaminBossan reviewed Sep 17, 2024

View reviewed changes

src/diffusers/pipelines/pipeline_utils.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/pipeline_utils.py Outdated Show resolved Hide resolved

sayakpaul and others added 3 commits September 17, 2024 19:34

better attibutung

949a929

empty

64b3ad1

Co-authored-by: Benjamin Bossan <[email protected]>

Merge branch 'main' into lora-device-map

6d03c12

Merge branch 'main' into lora-device-map

d4bd94b

SunMarc approved these changes Sep 24, 2024

View reviewed changes

src/diffusers/loaders/lora_base.py Outdated Show resolved Hide resolved

src/diffusers/loaders/lora_base.py Outdated Show resolved Hide resolved

Apply suggestions from code review

5479198

Co-authored-by: Marc Sun <[email protected]>

sayakpaul requested a review from yiyixuxu September 24, 2024 14:26

sayakpaul added 3 commits September 27, 2024 09:35

Merge branch 'main' into lora-device-map

2846549

Merge branch 'main' into lora-device-map

1ed0eb0

Merge branch 'main' into lora-device-map

d2d59c3

sayakpaul added 4 commits October 6, 2024 10:00

Merge branch 'main' into lora-device-map

5f3cae2

Merge branch 'main' into lora-device-map

8f670e2

Merge branch 'main' into lora-device-map

e42ec19

Merge branch 'main' into lora-device-map

f63b04c

sayakpaul requested a review from DN6 October 15, 2024 09:51

DN6 reviewed Oct 19, 2024

View reviewed changes

src/diffusers/pipelines/pipeline_utils.py Outdated Show resolved Hide resolved

DN6 reviewed Oct 19, 2024

View reviewed changes

src/diffusers/pipelines/pipeline_utils.py Outdated Show resolved Hide resolved

sayakpaul commented Oct 19, 2024

View reviewed changes

docs/source/en/training/distributed_inference.md Outdated Show resolved Hide resolved

sayakpaul commented Oct 19, 2024

View reviewed changes

sayakpaul added 3 commits October 19, 2024 18:10

add hardware note.

5ea1173

Merge branch 'main' into lora-device-map

f64751e

quality

c0dee87

stevhliu reviewed Oct 21, 2024

View reviewed changes

docs/source/en/training/distributed_inference.md Outdated Show resolved Hide resolved

BenjaminBossan approved these changes Oct 22, 2024

View reviewed changes

sayakpaul and others added 3 commits October 22, 2024 16:00

Merge branch 'main' into lora-device-map

4b6124a

Update docs/source/en/training/distributed_inference.md

fe2cca8

Co-authored-by: Steven Liu <[email protected]>

Merge branch 'main' into lora-device-map

2db5d48

DN6 approved these changes Oct 31, 2024

View reviewed changes

sayakpaul added 5 commits October 31, 2024 18:34

Merge branch 'main' into lora-device-map

61903c8

fixes

03377b7

skip properly.

0bd40cb

fixes

a61b754

resolve conflicts.

ccd8d2a

sayakpaul merged commit 41e4779 into main Oct 31, 2024
17 of 18 checks passed

sayakpaul deleted the lora-device-map branch October 31, 2024 15:47

sayakpaul mentioned this pull request Oct 31, 2024

[device_map] fix device_map check behaviour. #9821

Closed

yiyixuxu restored the lora-device-map branch October 31, 2024 17:59

yiyixuxu added a commit that referenced this pull request Oct 31, 2024

Revert "[LoRA] fix: lora loading when using with a device_mapped mode…

cd723b0

…l. (#9449)" This reverts commit 41e4779.

yiyixuxu mentioned this pull request Oct 31, 2024

Revert "[LoRA] fix: lora loading when using with a device_mapped mode… #9823

Merged

yiyixuxu added a commit that referenced this pull request Oct 31, 2024

Revert "[LoRA] fix: lora loading when using with a device_mapped mode… (

d2e5cb3

#9823) Revert "[LoRA] fix: lora loading when using with a device_mapped model. (#9449)" This reverts commit 41e4779.

yiyixuxu deleted the lora-device-map branch October 31, 2024 18:20

sayakpaul mentioned this pull request Nov 1, 2024

[LoRA] device_map fix when loading LoRAs #9827

Open

a-r-r-o-w pushed a commit that referenced this pull request Nov 1, 2024

Revert "[LoRA] fix: lora loading when using with a device_mapped mode… (

a91e8ed

#9823) Revert "[LoRA] fix: lora loading when using with a device_mapped model. (#9449)" This reverts commit 41e4779.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoRA] fix: lora loading when using with a device_mapped model. #9449

[LoRA] fix: lora loading when using with a device_mapped model. #9449

sayakpaul commented Sep 17, 2024 •

edited

Loading

BenjaminBossan commented Sep 17, 2024

sayakpaul commented Sep 17, 2024

BenjaminBossan left a comment

sayakpaul commented Sep 17, 2024

sayakpaul commented Sep 22, 2024

HuggingFaceDocBuilderDev commented Sep 22, 2024

SunMarc left a comment

sayakpaul commented Sep 24, 2024

sayakpaul commented Oct 2, 2024

sayakpaul Oct 19, 2024

sayakpaul Oct 19, 2024

DN6 Oct 31, 2024

sayakpaul Oct 19, 2024

BenjaminBossan left a comment

sayakpaul commented Oct 31, 2024

		@@ -387,6 +387,11 @@ def to(self, args, *kwargs):

		device = device or device_arg

		def model_has_device_map(model):

[LoRA] fix: lora loading when using with a device_mapped model. #9449

[LoRA] fix: lora loading when using with a device_mapped model. #9449

Conversation

sayakpaul commented Sep 17, 2024 • edited Loading

What does this PR do?

TODOs

BenjaminBossan commented Sep 17, 2024

sayakpaul commented Sep 17, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

sayakpaul commented Sep 17, 2024

sayakpaul commented Sep 22, 2024

HuggingFaceDocBuilderDev commented Sep 22, 2024

SunMarc left a comment

Choose a reason for hiding this comment

sayakpaul commented Sep 24, 2024

sayakpaul commented Oct 2, 2024

sayakpaul Oct 19, 2024

Choose a reason for hiding this comment

sayakpaul Oct 19, 2024

Choose a reason for hiding this comment

DN6 Oct 31, 2024

Choose a reason for hiding this comment

sayakpaul Oct 19, 2024

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

sayakpaul commented Oct 31, 2024

sayakpaul commented Sep 17, 2024 •

edited

Loading