IP Adapters: Negative image #7167

asomoza · 2024-03-01T08:25:56Z

asomoza
Mar 1, 2024
Maintainer

I'm starting this discussion to document and share some examples of this technique with IP Adapters.

First of all, this wasn't my initial idea, so thanks to @cubiq and his repository https://github.com/cubiq/ComfyUI_IPAdapter_plus. It is from here and his comments that I got started playing with this idea. AFAIK there's no mention about this in the official repository or the paper.

For this discussion I'm only applying noise and images that can be generated automatically, there's a lot more that can be done with manual intervention but those would be better with an UI.

Lets start with an initial image without prompts, I'm using the IP Adapter PLUS with a 1.0 scale, all settings are the same with a fixed seed, so this is the base where I start:

source	result

The quality isn't that great right now. I'm using the latest juggernaut model and what I'm trying to achieve is a more cinematic result and to change the initial setting of the image, so I'll add this prompts:

prompt = "cinematic photo of a cyborg in the city, 4k, high quality, intricate, highly detailed"
negative_prompt = "blurry, smooth, plastic"

The result with this prompt is like this:

without IP Adapter	With IP Adapter

The quality still isn't that great, also even with the prompt there's almost no city in the image. So now I'll start with the negative image of IP Adapters to try to make it better, first I'll start with mandelbrot noise, one normal and one inverted.

mandelbrot	mandelbrot inverted

normal result	inverted result

I like it better the result with the inverted mandelbrot, but still it doesn't have that much of a city so I had to lower the scale of the IP Adapter to 0.5, but with that and without controlnet I lose the composition position and pose of the cyborg.

Even without that I still think it looks good so here's the result:

To compare the results with another type of noise, I did gaussian:

Results:

So all this got me thinking, how about if I start feeding it other types of images as negatives, for example a blurred image:

blurred image	result scale 0.5	result scale 1.0

So I got a really sharp image of the cyborg, with a scale of 0.5 it didn't have any of the original image left so I used a scale of 1.0.

This again got me thinking, if a blurred image makes it sharper, what about colors, so I tested it with passing the isolated color channels of the image as negatives:

red	green	blue

That wasn't the result I was expecting but I like the blue and green ones, so as a final test, since I liked those two and I wanted a sharper image, I did a mix of those three:

blue + green	low gaussian blur	final negative image

With this I got the image I was looking for, still need some inpainting to fix details, but IMO it looks really good to be generated with just a single IP Adapter Image:

Without masks and controlnet, the use of the IP Adapter here is like a creative start (instead of using prompts you use an image to feed the model with your initial idea). If you want to do the same but preserving more of the initial image you'll need to use them.

asomoza · 2024-03-01T09:35:29Z

asomoza
Mar 1, 2024
Maintainer Author

For reference, this is the code I used:

import torch

from diffusers import AutoencoderKL, DPMSolverMultistepScheduler, StableDiffusionXLPipeline
from diffusers.models import ImageProjection
from diffusers.utils import load_image


def encode_image(
    image_encoder,
    feature_extractor,
    image,
    device,
    num_images_per_prompt,
    output_hidden_states=None,
    negative_image=None,
):
    dtype = next(image_encoder.parameters()).dtype

    if not isinstance(image, torch.Tensor):
        image = feature_extractor(image, return_tensors="pt").pixel_values

    image = image.to(device=device, dtype=dtype)
    if output_hidden_states:
        image_enc_hidden_states = image_encoder(image, output_hidden_states=True).hidden_states[-2]
        image_enc_hidden_states = image_enc_hidden_states.repeat_interleave(num_images_per_prompt, dim=0)

        if negative_image is None:
            uncond_image_enc_hidden_states = image_encoder(
                torch.zeros_like(image), output_hidden_states=True
            ).hidden_states[-2]
        else:
            if not isinstance(negative_image, torch.Tensor):
                negative_image = feature_extractor(negative_image, return_tensors="pt").pixel_values
            negative_image = negative_image.to(device=device, dtype=dtype)
            uncond_image_enc_hidden_states = image_encoder(negative_image, output_hidden_states=True).hidden_states[-2]

        uncond_image_enc_hidden_states = uncond_image_enc_hidden_states.repeat_interleave(num_images_per_prompt, dim=0)
        return image_enc_hidden_states, uncond_image_enc_hidden_states
    else:
        image_embeds = image_encoder(image).image_embeds
        image_embeds = image_embeds.repeat_interleave(num_images_per_prompt, dim=0)
        uncond_image_embeds = torch.zeros_like(image_embeds)

        return image_embeds, uncond_image_embeds


@torch.no_grad()
def prepare_ip_adapter_image_embeds(
    unet,
    image_encoder,
    feature_extractor,
    ip_adapter_image,
    do_classifier_free_guidance,
    device,
    num_images_per_prompt,
    ip_adapter_negative_image=None,
):
    if not isinstance(ip_adapter_image, list):
        ip_adapter_image = [ip_adapter_image]

    if len(ip_adapter_image) != len(unet.encoder_hid_proj.image_projection_layers):
        raise ValueError(
            f"`ip_adapter_image` must have same length as the number of IP Adapters. Got {len(ip_adapter_image)} images and {len(unet.encoder_hid_proj.image_projection_layers)} IP Adapters."
        )

    image_embeds = []
    for single_ip_adapter_image, image_proj_layer in zip(
        ip_adapter_image, unet.encoder_hid_proj.image_projection_layers
    ):
        output_hidden_state = not isinstance(image_proj_layer, ImageProjection)
        single_image_embeds, single_negative_image_embeds = encode_image(
            image_encoder,
            feature_extractor,
            single_ip_adapter_image,
            device,
            1,
            output_hidden_state,
            negative_image=ip_adapter_negative_image,
        )
        single_image_embeds = torch.stack([single_image_embeds] * num_images_per_prompt, dim=0)
        single_negative_image_embeds = torch.stack([single_negative_image_embeds] * num_images_per_prompt, dim=0)

        if do_classifier_free_guidance:
            single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
            single_image_embeds = single_image_embeds.to(device)

        image_embeds.append(single_image_embeds)

    return image_embeds


vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16,
).to("cuda")

pipeline = StableDiffusionXLPipeline.from_pretrained(
    "RunDiffusion/Juggernaut-XL-v9",
    torch_dtype=torch.float16,
    vae=vae,
    variant="fp16",
).to("cuda")

pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
pipeline.scheduler.config.use_karras_sigmas = True

pipeline.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="sdxl_models",
    weight_name="ip-adapter-plus_sdxl_vit-h.safetensors",
    image_encoder_folder="models/image_encoder",
)
pipeline.set_ip_adapter_scale(0.7)

ip_image = load_image("source.png")
negative_ip_image = load_image("noise.png")

image_embeds = prepare_ip_adapter_image_embeds(
    unet=pipeline.unet,
    image_encoder=pipeline.image_encoder,
    feature_extractor=pipeline.feature_extractor,
    ip_adapter_image=[[ip_image]],
    do_classifier_free_guidance=True,
    device="cuda",
    num_images_per_prompt=1,
    ip_adapter_negative_image=negative_ip_image,
)


prompt = "cinematic photo of a cyborg in the city, 4k, high quality, intricate, highly detailed"
negative_prompt = "blurry, smooth, plastic"

image = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    ip_adapter_image_embeds=image_embeds,
    guidance_scale=6.0,
    num_inference_steps=25,
    generator=torch.Generator(device="cpu").manual_seed(1556265306),
).images[0]

image.save("result.png")

0 replies

yiyixuxu · 2024-03-01T18:42:15Z

yiyixuxu
Mar 1, 2024
Maintainer

this is awesome :)

0 replies

cubiq · 2024-03-01T18:54:31Z

cubiq
Mar 1, 2024

monochrome noise doesn't work very well in my tests. An interesting way of adding noise might be as follow

factor = 0.5 # 0.0....1.0
mask = (torch.rand_like(image) < factor).float()
noise = torch.rand_like(image)
noise = torch.zeros_like(image) * (1-mask) + noise * mask

you can further calibrate the effect by multiplying the noise by the factor again at the end for an even more subtle effect.
This way you have a very gradual conditioning where at 0.01 you get a very light effect and it starts to get more impactful as you increase the noise factor.

PS: it is important to remember that the noise generation needs to be seeded if you want repeatable results

1 reply

asomoza Mar 1, 2024
Maintainer Author

yeah, I forgot to add that note. I normally do this with the tensors instead of pillow images and as parameters instead of images, so for example the type of noise and strength (forgot the seed thank you) as arguments.

I was toying with the idea of a factor or maybe add opacity if they're images. I will try your example and CFG strategies that work best with this.

yiyixuxu · 2024-03-02T18:34:58Z

yiyixuxu
Mar 2, 2024
Maintainer

what a great post! thank you for sharing this

I'm adding the link to the initial discussion on the IP adapter negative image here #6318 (comment)

to summarize based on my understanding: the negative image allows you to generate images that have more variation from the original ip-adapter image; you can also do that by lowering the `ip_adapter_scale, but with the negative image you can have more control over the generation e.g. you can preserve more of the composition of the original image ad only lose the details you want to modify etc

super neat technique :)

10 replies

asomoza Mar 5, 2024
Maintainer Author

Just a follow up, if I use an UI with all of this plus t2i adapters, ip adapters masking and prompts, this is the result:

cubiq Mar 5, 2024

if you are going to use just noise you can try to send directly noisy uncond embeds. that saves you encoding the image. in which case you can use the code I posted earlier

asomoza Mar 5, 2024
Maintainer Author

oh yeah, thanks, I totally missed that lol. I'm going to test now with that code and playing with multiplying the factor after and maybe do a combination with gradually lowering the CFG.

cubiq Mar 5, 2024

does diffusers have a CFG-rescale strategy? if so, go for it, otherwise you need to drastically lower the CFG

sayakpaul Mar 5, 2024
Collaborator

We do:

diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

Line 59 in 687bc27

def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IP Adapters: Negative image #7167

{{title}}

Replies: 4 comments 11 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

IP Adapters: Negative image #7167

asomoza Mar 1, 2024 Maintainer

Replies: 4 comments · 11 replies

asomoza Mar 1, 2024 Maintainer Author

yiyixuxu Mar 1, 2024 Maintainer

cubiq Mar 1, 2024

asomoza Mar 1, 2024 Maintainer Author

yiyixuxu Mar 2, 2024 Maintainer

asomoza Mar 5, 2024 Maintainer Author

cubiq Mar 5, 2024

asomoza Mar 5, 2024 Maintainer Author

cubiq Mar 5, 2024

sayakpaul Mar 5, 2024 Collaborator

asomoza
Mar 1, 2024
Maintainer

Replies: 4 comments 11 replies

asomoza
Mar 1, 2024
Maintainer Author

yiyixuxu
Mar 1, 2024
Maintainer

cubiq
Mar 1, 2024

asomoza Mar 1, 2024
Maintainer Author

yiyixuxu
Mar 2, 2024
Maintainer

asomoza Mar 5, 2024
Maintainer Author

asomoza Mar 5, 2024
Maintainer Author

sayakpaul Mar 5, 2024
Collaborator