-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does the IP Adapter support mounting multiple IP Adapter models simultaneously and using multiple reference images at the same time? #6318
Comments
Are you referring to something like so? I don't think we support the loading of multiple IP adapters at the moment. Cc: @yiyixuxu Do you have any convincing results from such a pipeline? Otherwise, it'd be hard for us to prioritize it. |
I would like to comment here since is something I'm currently looking at, multiple IP Adapters its useless if it doesn't support attention masking, but with that its really good, I don't have examples but you can watch the video of the developer that made the node for comfyui: https://www.youtube.com/watch?v=vqG1VXKteQg&t=470s this really opens up a lot of options for generation, also its good to use them when the images are big and you don't want to lose details, so you can divide the image and assign ip adapters to the portions of the image to make the final generation. Finally its not that important to me since it can be done outside of diffusers, but the multiple images option is really popular right now and its better know as "instant-lora", where you feed multiple images to one IP-Adapter and they're combined in the attention layer. |
Thanks for the pointers. Would be great to have some code references if you have any. |
hi @sayakpaul Here are some relevant discussions I found: tencent-ailab/IP-Adapter#45 |
I have a couple of things to do before I can get into this but here's the invokeai implementation of the multiple adapters: https://github.com/invoke-ai/InvokeAI/pull/4818/files and here's the code for the ip-adapter node attention masks: |
Hello folks, Is there any plan to support this capability in diffusers? |
cc @yiyixuxu |
I think code wise it's pretty straightforward to support multiple ip-adapters. However, I'm trying to understand if it makes sense to support this for every single pipeline? i.e. text2img, img2img, inpaint, controlnet? @asomoza said it does not work great without the mask - this makes me think maybe we just need one community text2img pipeline that supports multiple ip-adapter, along with the mask. Let me know what you think! |
yes, often when we worked on some image we used:
we generate a first image then we work on it using img2img and inpainting. |
Multiple IP Adapters without masks won't do any harm though, but IMO is just the more or less the same as one with multiple images (weighted). I always though that diffusers pipelines were just examples so a community pipeline as an example would be sufficient and people can use it as a reference for their own. But personally I use them with controlnet and/or t2i adapters and masks all the time and almost never use them alone. The same as a mask, I did an example with just one mask but it would be better to be able to provide each adapter with its own mask. Probably the list for use them in a community pipeline that would be useful is
I don't see that much use for them in img2img or inpainting but I must admit I haven't tested them that much for those tasks. |
no multi ip adapters without mask are so helpful: |
no worries, its just my experience, but in my tests the face adapter also interferes with the style so for me its better to use a mask for the face too. |
i generally use weight 30% for face and 70% for style. |
I think we can:
what do you guys think? cc @vladmandic for his insights too |
@yiyixuxu thanks for looping me in - and i think you're spot on. Big value of ipadapters is their ease-of-use. and having multiple ipadapters does come up quite often. On the other hand, as soon you have masking of any kind in the picture, the user intention is far more manual process and its totally fine to have separate pipeline. A bit off-topic - having separate pipelines in diffusers for features is somewhat cumbersome, especially since pipeline inheritance is less than ideal (e.g. why doesn't StableDiffusionImage2ImagePipeline inherit from StableDiffusionPipeline so I cannot check current model type easily?) and AutoPipeline is does not have full coverage and IMO, we need a cleaner way to switch pipelines for already loaded pipeline - right now I'm instantiating it manually using loaded pipeline components, but it does cause issues with model offloading and things like that). Especially using community pipelines - I cannot load from scratch just to run one generate. I want to switch to it when I want to use specific feature and then switch back. |
yeah, I'm targeting a more professional use case (like photoshop) and no a so creative, automated or simple one, that's why I was just telling my opinion, masking without a UI is not a use case I would think people would use a lot if is not done automatically which us not the case here, also lowering the IP Face adapter to a 30% just to make it work with other adapters is not ideal for me too. @yiyixuxu anything that could be added to the core diffusers and not pipelines works for me, right now to use them I had to monkey patch the attention processor and the unet forward method which is not ideal, the less I have to do that the better. Just the addition of multiple IP adapters would help a lot. @vladmandic what you're commenting is the core reason I don't use pipelines, they're too rigid to be able to use them in UIs were people need the freedom to add or remove any features they want, you would need to make a pipeline for all the possible combinations or a huge one with everything in it, but I really like the design of them since they're really easy to follow and understand as a starting point. Just another two cents here, I don't think they need to be added to diffusers but if you want to make it easier to use for people, you could add an automated negative noise image to the pipeline, here's an example of the difference:
Thanks for taking our opinions into account. |
Cool! I will put out an issue. If no one picks it up quickly, I will work on it. Also, I just looked into the mask-related code a little bit more cubiq/ComfyUI_IPAdapter_plus@ebd946f. I think maybe we can allow IP-Adapter masks to be optionally passed in |
FWIW Invoke has supported both multiple IP adapters and images for a while now - We implemented our support before it was in Diffusers, so aren't leveraging the diffusers pipeline, but it may be useful to use as reference since we're leveraging Diffusers underneath. |
Links to relevant pieces of code would be much appreciated. |
I posted them before, but these are the PRs from InvokeAI.: Multi-Image IP-Adapter: https://github.com/invoke-ai/InvokeAI/pull/4882/files I learn from them too, very cool project. |
I'm starting to work on this now. We opened a discussion here too #6544. It would be very nice if any of you can provide an example that I can play with that includes: |
of course: here's my comfyUI workflow: https://github.com/fictions-ai/sharing-is-caring/blob/main/workflow_controlnet_ipadapter.json For models, I used SDXL versions: |
hi @thibaudart I started the PR here #6573. The test example I used did not have very meaningful results |
@thibaudart thanks! so cool !! 🤩 |
@yiyixuxu my pleasure. |
Thank you for your hard work @yiyixuxu. |
@asomoza I'm not too familiar with the "negative noise" feature you pointed out here. |
no problem @yiyixuxu , the IP Adapters allow to pass a negative image which is rarely used, in fact in the diffusers code, the implementation it's just to create a zero filled tensor for each image: diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Lines 538 to 540 in 093a03a
What I do and thanks to the developer (@cubiq) of the ComfyUI IPAdapterPlus Node (I saw this there and nowhere else) is to instead pass a noisy image created from the original image, for this I just use the same code: image = image.permute([0,3,1,2])
torch.manual_seed(0) # use a fixed random for reproducible results
transforms = TT.Compose([
TT.CenterCrop(min(image.shape[2], image.shape[3])),
TT.Resize((224, 224), interpolation=TT.InterpolationMode.BICUBIC, antialias=True),
TT.ElasticTransform(alpha=75.0, sigma=noise*3.5), # shuffle the image
TT.RandomVerticalFlip(p=1.0), # flip the image to change the geometry even more
TT.RandomHorizontalFlip(p=1.0),
])
image = transforms(image.cpu())
image = image.permute([0,2,3,1])
image = image + ((0.25*(1-noise)+0.05) * torch.randn_like(image) ) # add further random noise Where IP Adapter
IP Adapter PLUS
What it does is that it allows more freedom to the generation so it can add more details or you can change more the image with a prompt, for example, the same image with the prompt "white background" and a t2i line art adapter:
This is also good for styles, for example if we take the same example we were using of the "wonder woman"
so at the end, is just another parameter you can use to control the generation, IMO it makes them better but sometimes I need the details so the zero filled tensor also works. The ComfyUI Node implements the noise at the Adapter level which means that all the images of the same adapter have the same amount of noise (makes more sense for diffusers) and I for more control implemented this on each image. I really don't know if this should be implemented in diffusers since I think most people don't want to make too much effort in the generations and it might become too cumbersome without a user interface. |
thanks @cubiq, I tested it with mandelbrot noise and indeed it works nice, specially for the normal IP Adapter. I will add it too, and also test more kind of noise algorithms. Just for the fun of it I linked the noise slider with the iterations.
|
nice! you probably need to lower the CFG or use some kind of CFG rescaling strategy |
I think maybe we can just create a nice section on our doc about this!! no? We introduced the let me know what you think! |
I'm having my doubts about the negative noise being useful in diffusers, you need to fiddle with it a lot to get the results you want, this is easy with UIs but to run the entire pipeline again to see if the output gets any better is not very practical. I added and tested 6 types of noise and each of them gives different results which makes it even more harder to test in a pipeline. Maybe this would be better in the new |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Closing this as it seems we have added support for the feature from many different angles. Feel free to reopen if that's not the case. |
No description provided.
The text was updated successfully, but these errors were encountered: