Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support InstantStyle #7586

Closed
wants to merge 4 commits into from
Closed

Support InstantStyle #7586

wants to merge 4 commits into from

Conversation

haofanwang
Copy link
Contributor

This PR supports for our recent work InstantStyle in native diffusers API. The idea has also been discussed at #7534 .

The modifications are mainly about IP-Adapter loader, allowing users to specify target blocks used for image feature injection. After merged, InstantStyle can be achieved via following way

pipe.load_ip_adapter(pretrained_model_name_or_path_or_dict="./", 
                     subfolder="sdxl_models", 
                     weight_name="ip-adapter_sdxl.bin",
                     image_encoder_folder=image_encoder_path,
                     target_blocks=["block"]
                    )

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great!
left one comment. Thanks!

Comment on lines 868 to 872
selected = False
for block_name in target_blocks:
if block_name in name:
selected = True
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work?

Suggested change
selected = False
for block_name in target_blocks:
if block_name in name:
selected = True
break
selected = any( block_name in name for block_name in target_blocks)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is much clearer. Updated.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Apr 5, 2024

@asomoza can you do a final review and test this out?

@asomoza
Copy link
Member

asomoza commented Apr 5, 2024

Nice, it works as intended and I really like the skip functionality, thank you for your work.

But I want to give my opinion here and open a discussion on about how is this implemented. Diffusers is used as a backend for UIs, for advanced users and also not for that advanced that knows all the diffusers naming for the blocks and layers, so I think we need to take this into consideration:

  1. Doing this when loading the weights is restricting users and UIs that once the blocks are selected we can't change them, for that we have to reload the IP adapter. This is going to be specially noticeable when we use a lot of ip adapters, since even if we reload one, we need to reload all of them with the current implementation. I think we need to be able to change an IP adapter from a "full" configuration, a "style" configuration, a "composition" configuration or whatever we like on the fly without reloading.
  2. Users will have to know this kind of names "up_blocks.0.attentions.1" to be able to use it and we already have the naming that @UmerHA implemented in Implements Blockwise lora #7352 which I like a lot more.

I really like to be able to use the same dict as the LoRAs for example.:

    {
        "down": {"block_1": [0.0, 0.0], "block_2": [0.0, 0.0]},
        "mid": 0.0,
        "up": {"block_0": 1.0, "block_1": [0.0, 0.0, 0.0]},
    },

on the other side, I really like the simplicity of this PR, if we go with the other implementation it will probably make the code a lot more complex.

@haofanwang
Copy link
Contributor Author

I believe we can set different weights for each block just like LoRA dict, in this case, we don't need to reload modules. Will update soon.

@ivanprado
Copy link
Contributor

Cool @haofanwang! What would be the mechanism to subtract the CLIP text embedding from the image embedding with diffusers? This is done here in the original implementation.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Apr 8, 2024

we will merge this one soon #7499
cool to rebase here? or is it easier the other way around?

@haofanwang
Copy link
Contributor Author

Will rebase on #7499.

@okaris
Copy link

okaris commented Apr 12, 2024

Is it possible to load multiple ip adapters at the same time?
ip-adapter + instantid
instantid + instantstyle

I would appreciate an example that shows how we could load/unload different adapters seperately or together.

Thanks!

@JY-Joy JY-Joy mentioned this pull request Apr 14, 2024
@haofanwang
Copy link
Contributor Author

@yiyixuxu @asomoza Our teammate has made a new PR. This PR will be closed soon.

@haofanwang
Copy link
Contributor Author

Is it possible to load multiple ip adapters at the same time? ip-adapter + instantid instantid + instantstyle

I would appreciate an example that shows how we could load/unload different adapters seperately or together.

Thanks!

This is another problem. We can support it later.

@haofanwang
Copy link
Contributor Author

Cool @haofanwang! What would be the mechanism to subtract the CLIP text embedding from the image embedding with diffusers? This is done here in the original implementation.

This can be natively achieved already. We will show how to do it once this PR merged.

@haofanwang haofanwang closed this Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants