Does the IP Adapter support mounting multiple IP Adapter models simultaneously and using multiple reference images at the same time? #6318

cjt222 · 2023-12-25T07:10:05Z

No description provided.

sayakpaul · 2023-12-25T15:49:40Z

Are you referring to something like so?
https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/adapter#combining-multiple-adapters

I don't think we support the loading of multiple IP adapters at the moment. Cc: @yiyixuxu

Do you have any convincing results from such a pipeline? Otherwise, it'd be hard for us to prioritize it.

asomoza · 2023-12-25T20:11:20Z

I would like to comment here since is something I'm currently looking at, multiple IP Adapters its useless if it doesn't support attention masking, but with that its really good, I don't have examples but you can watch the video of the developer that made the node for comfyui:

https://www.youtube.com/watch?v=vqG1VXKteQg&t=470s

this really opens up a lot of options for generation, also its good to use them when the images are big and you don't want to lose details, so you can divide the image and assign ip adapters to the portions of the image to make the final generation.

Finally its not that important to me since it can be done outside of diffusers, but the multiple images option is really popular right now and its better know as "instant-lora", where you feed multiple images to one IP-Adapter and they're combined in the attention layer.

sayakpaul · 2023-12-26T02:56:26Z

Thanks for the pointers. Would be great to have some code references if you have any.

whiterose199187 · 2023-12-26T10:44:52Z

hi @sayakpaul

Here are some relevant discussions I found:

tencent-ailab/IP-Adapter#45
cubiq/ComfyUI_IPAdapter_plus#145 (comment)

asomoza · 2023-12-26T15:27:38Z

I have a couple of things to do before I can get into this but here's the invokeai implementation of the multiple adapters:

https://github.com/invoke-ai/InvokeAI/pull/4818/files

and here's the code for the ip-adapter node attention masks:

cubiq/ComfyUI_IPAdapter_plus@ebd946f

asomoza · 2023-12-26T15:54:03Z

I did a test in comfyui to get a better understanding:

Image 1	Image 2	Mask	Image 3

Instant lora

Image 1 and image 2 at same weight

More weight image 1

More weight image 2

All three images same weight

Multiple IP Adapters with attention masking

Using the mask for the people only (in comfyui you can assign a mask to a color) and the prompt "two women holding each other"

same weight for each adapter produces a "two faces" effect:

lowering the weight of each ip adapter produces the desired effect:

and using the third image as a background:

hope it helps to better understand how do they work together.

whiterose199187 · 2024-01-07T14:01:23Z

Hello folks,

Is there any plan to support this capability in diffusers?

patrickvonplaten · 2024-01-09T13:57:35Z

cc @yiyixuxu

yiyixuxu · 2024-01-10T06:10:18Z

I think code wise it's pretty straightforward to support multiple ip-adapters.

However, I'm trying to understand if it makes sense to support this for every single pipeline? i.e. text2img, img2img, inpaint, controlnet?

@asomoza said it does not work great without the mask - this makes me think maybe we just need one community text2img pipeline that supports multiple ip-adapter, along with the mask. Let me know what you think!

thibaudart · 2024-01-10T10:36:22Z

I think code wise it's pretty straightforward to support multiple ip-adapters.

However, I'm trying to understand if it makes sense to support this for every single pipeline? i.e. text2img, img2img, inpaint, controlnet?

@asomoza said it does not work great without the mask - this makes me think maybe we just need one community text2img pipeline that supports multiple ip-adapter, along with the mask. Let me know what you think!

yes, often when we worked on some image we used:

ipadater for style (with multiple reference images)
another one for face
some controlnet too

we generate a first image then we work on it using img2img and inpainting.

asomoza · 2024-01-10T14:28:26Z

Multiple IP Adapters without masks won't do any harm though, but IMO is just the more or less the same as one with multiple images (weighted).

I always though that diffusers pipelines were just examples so a community pipeline as an example would be sufficient and people can use it as a reference for their own. But personally I use them with controlnet and/or t2i adapters and masks all the time and almost never use them alone.

The same as a mask, I did an example with just one mask but it would be better to be able to provide each adapter with its own mask.

Probably the list for use them in a community pipeline that would be useful is

Multiple IP Adapters with masks
Multiple weighted images with each adapter
Negative noise for each image (it really makes a difference)
Controlnet and T2I Adapters
Start and end in steps or % for each adapter

I don't see that much use for them in img2img or inpainting but I must admit I haven't tested them that much for those tasks.

thibaudart · 2024-01-10T14:37:38Z

no multi ip adapters without mask are so helpful:
one for style
one for face (using a different checkpoint)

asomoza · 2024-01-10T14:42:23Z

no worries, its just my experience, but in my tests the face adapter also interferes with the style so for me its better to use a mask for the face too.

thibaudart · 2024-01-10T15:17:15Z

i generally use weight 30% for face and 70% for style.

yiyixuxu · 2024-01-10T18:10:24Z

I think we can:

support MultiIPAdapter( without mask) for all the pipelines
add community pipeline for the masking capability

what do you guys think?

cc @vladmandic for his insights too

vladmandic · 2024-01-10T18:46:09Z

@yiyixuxu thanks for looping me in - and i think you're spot on.

Big value of ipadapters is their ease-of-use. and having multiple ipadapters does come up quite often.
(ipadapters are basically a fancy embeddings (done using different clip models depending on the adapter) with fixed vector count)

On the other hand, as soon you have masking of any kind in the picture, the user intention is far more manual process and its totally fine to have separate pipeline.

A bit off-topic - having separate pipelines in diffusers for features is somewhat cumbersome, especially since pipeline inheritance is less than ideal (e.g. why doesn't StableDiffusionImage2ImagePipeline inherit from StableDiffusionPipeline so I cannot check current model type easily?) and AutoPipeline is does not have full coverage and .from_pipeline even less.

IMO, we need a cleaner way to switch pipelines for already loaded pipeline - right now I'm instantiating it manually using loaded pipeline components, but it does cause issues with model offloading and things like that).

Especially using community pipelines - I cannot load from scratch just to run one generate. I want to switch to it when I want to use specific feature and then switch back.

asomoza · 2024-01-11T03:00:17Z

yeah, I'm targeting a more professional use case (like photoshop) and no a so creative, automated or simple one, that's why I was just telling my opinion, masking without a UI is not a use case I would think people would use a lot if is not done automatically which us not the case here, also lowering the IP Face adapter to a 30% just to make it work with other adapters is not ideal for me too.

@yiyixuxu anything that could be added to the core diffusers and not pipelines works for me, right now to use them I had to monkey patch the attention processor and the unet forward method which is not ideal, the less I have to do that the better. Just the addition of multiple IP adapters would help a lot.

@vladmandic what you're commenting is the core reason I don't use pipelines, they're too rigid to be able to use them in UIs were people need the freedom to add or remove any features they want, you would need to make a pipeline for all the possible combinations or a huge one with everything in it, but I really like the design of them since they're really easy to follow and understand as a starting point.

Just another two cents here, I don't think they need to be added to diffusers but if you want to make it easier to use for people, you could add an automated negative noise image to the pipeline, here's an example of the difference:

source image	without noise	with noise

Thanks for taking our opinions into account.

yiyixuxu · 2024-01-11T03:23:46Z

Cool! I will put out an issue. If no one picks it up quickly, I will work on it.

Also, I just looked into the mask-related code a little bit more cubiq/ComfyUI_IPAdapter_plus@ebd946f.

I think maybe we can allow IP-Adapter masks to be optionally passed in cross_attention_kwargs and handle it from the attention processor class. My main concern is that we do not want to over-complicate the pipelines. If we can get away with not adding any additional code to the pipelines we are happy to support the IP adapter mask as well.

hipsterusername · 2024-01-12T04:08:52Z

FWIW Invoke has supported both multiple IP adapters and images for a while now - We implemented our support before it was in Diffusers, so aren't leveraging the diffusers pipeline, but it may be useful to use as reference since we're leveraging Diffusers underneath.

sayakpaul · 2024-01-12T04:11:20Z

Links to relevant pieces of code would be much appreciated.

asomoza · 2024-01-12T05:02:08Z

I posted them before, but these are the PRs from InvokeAI.:

Multi-Image IP-Adapter: https://github.com/invoke-ai/InvokeAI/pull/4882/files
Support multiple IP-Adapters (workflow editor only): https://github.com/invoke-ai/InvokeAI/pull/4818/files

I learn from them too, very cool project.

yiyixuxu · 2024-01-13T18:20:58Z

I'm starting to work on this now. We opened a discussion here too #6544.

It would be very nice if any of you can provide an example that I can play with that includes:
1: IP-adapter model checkpoints you used and their respective scale weights
2. input images and other inputs needed, i.e. prompts etc
3. expected outputs from either ComfyUI or invoke

thibaudart · 2024-01-13T18:49:26Z

I'm starting to work on this now. We opened a discussion here too #6544.

It would be very nice if any of you can provide an example that I can play with that includes: 1: IP-adapter model checkpoints you used and their respective scale weights 2. input images and other inputs needed, i.e. prompts etc 3. expected outputs from either ComfyUI or invoke

of course:

here's my comfyUI workflow: https://github.com/fictions-ai/sharing-is-caring/blob/main/workflow_controlnet_ipadapter.json
an archive for style: https://github.com/thibaudart/dreambooth-768/raw/main/style_ziggy.zip

For models, I used SDXL versions:
https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter-plus-face_sdxl_vit-h.safetensors
https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter-plus_sdxl_vit-h.safetensors

yiyixuxu · 2024-01-15T05:29:28Z

hi @thibaudart
in addition to the style images, would you be able to provide the face image input, prompt, and maybe an expected output? it will be super helpful.

I started the PR here #6573. The test example I used did not have very meaningful results

thibaudart · 2024-01-15T07:35:57Z

input portrait:
prompt: wonderwoman
Face weight: 0.3
Style weight: 0.7
Result:

yiyixuxu · 2024-01-15T16:48:45Z

@thibaudart thanks!

so cool !! 🤩

thibaudart · 2024-01-15T16:50:20Z

@yiyixuxu my pleasure.

sayakpaul · 2024-01-31T02:18:23Z

@yiyixuxu has a done great job of adding its support in #6573. Look out for the merge :)

asomoza · 2024-01-31T17:48:36Z

Thank you for your hard work @yiyixuxu.

yiyixuxu · 2024-02-01T04:16:33Z

@asomoza I'm not too familiar with the "negative noise" feature you pointed out here.
can you provide some reference?

asomoza · 2024-02-01T09:41:20Z

no problem @yiyixuxu , the IP Adapters allow to pass a negative image which is rarely used, in fact in the diffusers code, the implementation it's just to create a zero filled tensor for each image:

diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

Lines 538 to 540 in 093a03a

    
           uncond_image_enc_hidden_states = self.image_encoder( 
        
               torch.zeros_like(image), output_hidden_states=True 
        
           ).hidden_states[-2]

What I do and thanks to the developer (@cubiq) of the ComfyUI IPAdapterPlus Node (I saw this there and nowhere else) is to instead pass a noisy image created from the original image, for this I just use the same code:

    image = image.permute([0,3,1,2])
    torch.manual_seed(0) # use a fixed random for reproducible results
    transforms = TT.Compose([
        TT.CenterCrop(min(image.shape[2], image.shape[3])),
        TT.Resize((224, 224), interpolation=TT.InterpolationMode.BICUBIC, antialias=True),
        TT.ElasticTransform(alpha=75.0, sigma=noise*3.5), # shuffle the image
        TT.RandomVerticalFlip(p=1.0), # flip the image to change the geometry even more
        TT.RandomHorizontalFlip(p=1.0),
    ])
    image = transforms(image.cpu())
    image = image.permute([0,2,3,1])
    image = image + ((0.25*(1-noise)+0.05) * torch.randn_like(image) )   # add further random noise

https://github.com/cubiq/ComfyUI_IPAdapter_plus/blob/46241f3ba5401f076f8d90c2aa85f2194910e1a9/IPAdapterPlus.py#L170

Where noise is the parameter I control with the UI for each image in each IP Adapter, so for example in the case of just one image:

IP Adapter

Source	zero filled negative	0.05 noise	0.2 noise	1 noise

IP Adapter PLUS

zero filled	0.05 noise	0.2 noise	1 noise

What it does is that it allows more freedom to the generation so it can add more details or you can change more the image with a prompt, for example, the same image with the prompt "white background" and a t2i line art adapter:

zero filled	0.2 noise	1 noise

This is also good for styles, for example if we take the same example we were using of the "wonder woman"

zero filled style	1 noise style

so at the end, is just another parameter you can use to control the generation, IMO it makes them better but sometimes I need the details so the zero filled tensor also works. The ComfyUI Node implements the noise at the Adapter level which means that all the images of the same adapter have the same amount of noise (makes more sense for diffusers) and I for more control implemented this on each image.

I really don't know if this should be implemented in diffusers since I think most people don't want to make too much effort in the generations and it might become too cumbersome without a user interface.

cubiq · 2024-02-01T09:47:48Z

you can create custom noise and send it to the negative image. It also works very well with mandelbrot noise.

asomoza · 2024-02-01T14:55:16Z

thanks @cubiq, I tested it with mandelbrot noise and indeed it works nice, specially for the normal IP Adapter. I will add it too, and also test more kind of noise algorithms.

Just for the fun of it I linked the noise slider with the iterations.

Zero	10 iterations	50 iterations	100 iterations

cubiq · 2024-02-01T14:57:20Z

nice! you probably need to lower the CFG or use some kind of CFG rescaling strategy

yiyixuxu · 2024-02-19T02:36:51Z

@asomoza @cubiq

I think maybe we can just create a nice section on our doc about this!! no? We introduced the ip_adapter_image_embeds arguments now thanks to @sayakpaul. We can just create the image embedding with the negative noise and pass it to the pipelines as ip_adapter_image_embeds. We don't need to add any code to diffusers this way

let me know what you think!
cc @stevhliu here too

asomoza · 2024-02-19T13:39:17Z

I'm having my doubts about the negative noise being useful in diffusers, you need to fiddle with it a lot to get the results you want, this is easy with UIs but to run the entire pipeline again to see if the output gets any better is not very practical.

I added and tested 6 types of noise and each of them gives different results which makes it even more harder to test in a pipeline.

Maybe this would be better in the new tips and tricks section you're thinking about adding with a basic example. If people are interested or use it, then maybe we can expand on it.

github-actions · 2024-03-14T15:03:43Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-03-14T15:19:29Z

Closing this as it seems we have added support for the feature from many different angles. Feel free to reopen if that's not the case.

patrickvonplaten assigned yiyixuxu Dec 26, 2023

sayakpaul mentioned this issue Jan 11, 2024

[Design] better design to switch pipelines for already loaded pipeline #6531

Closed

yiyixuxu added the IPAdapter label Jan 14, 2024

yiyixuxu mentioned this issue Jan 15, 2024

[IP-Adapter] Support multiple IP-Adapters #6573

Merged

4 tasks

github-actions bot added the stale Issues that haven't received updates label Mar 14, 2024

sayakpaul closed this as completed Mar 14, 2024

Does the IP Adapter support mounting multiple IP Adapter models simultaneously and using multiple reference images at the same time? #6318

Does the IP Adapter support mounting multiple IP Adapter models simultaneously and using multiple reference images at the same time? #6318

Comments

cjt222 commented Dec 25, 2023

sayakpaul commented Dec 25, 2023

asomoza commented Dec 25, 2023

sayakpaul commented Dec 26, 2023

whiterose199187 commented Dec 26, 2023

asomoza commented Dec 26, 2023

asomoza commented Dec 26, 2023 • edited Loading

Instant lora

Multiple IP Adapters with attention masking

whiterose199187 commented Jan 7, 2024

patrickvonplaten commented Jan 9, 2024

yiyixuxu commented Jan 10, 2024

thibaudart commented Jan 10, 2024

asomoza commented Jan 10, 2024 • edited Loading

thibaudart commented Jan 10, 2024

asomoza commented Jan 10, 2024

thibaudart commented Jan 10, 2024

yiyixuxu commented Jan 10, 2024 • edited Loading

vladmandic commented Jan 10, 2024

asomoza commented Jan 11, 2024

yiyixuxu commented Jan 11, 2024

hipsterusername commented Jan 12, 2024

sayakpaul commented Jan 12, 2024

asomoza commented Jan 12, 2024 • edited Loading

yiyixuxu commented Jan 13, 2024

thibaudart commented Jan 13, 2024

yiyixuxu commented Jan 15, 2024

thibaudart commented Jan 15, 2024

yiyixuxu commented Jan 15, 2024

thibaudart commented Jan 15, 2024

sayakpaul commented Jan 31, 2024

asomoza commented Jan 31, 2024

yiyixuxu commented Feb 1, 2024

asomoza commented Feb 1, 2024

cubiq commented Feb 1, 2024

asomoza commented Feb 1, 2024

cubiq commented Feb 1, 2024

yiyixuxu commented Feb 19, 2024

asomoza commented Feb 19, 2024

github-actions bot commented Mar 14, 2024

sayakpaul commented Mar 14, 2024

asomoza commented Dec 26, 2023 •

edited

Loading

asomoza commented Jan 10, 2024 •

edited

Loading

yiyixuxu commented Jan 10, 2024 •

edited

Loading

asomoza commented Jan 12, 2024 •

edited

Loading