Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDXL Fooocus Inpaint #6529

Open
WaterKnight1998 opened this issue Jan 11, 2024 · 50 comments
Open

SDXL Fooocus Inpaint #6529

WaterKnight1998 opened this issue Jan 11, 2024 · 50 comments
Labels
contributions-welcome help wanted Extra attention is needed inpainting issues/questions related related to inpainting/outpainting stale Issues that haven't received updates

Comments

@WaterKnight1998
Copy link

Is your feature request related to a problem? Please describe.
I have seen that diffusers StableDiffusionXLInpaintPipeline generates worse results than SD 1.5 pipeline.

Describe the solution you'd like.
Include Fooocus inpaint patch, you could specify with a new loader.
Weights are available right now in hub.
https://huggingface.co/lllyasviel/fooocus_inpaint

@Laidawang
Copy link

Laidawang commented Jan 12, 2024

they also seem to use fooocus_inpaint_head.pth I'm not quite sure what this will do, I read the code and maybe an additional patch for unet?
image

The inpaint_v26.fooocus.patch is more similar to a lora, and then the first 50% executes base_model + lora, and the last 50% executes base_model.
There is no doubt that fooocus has the best inpainting effect and diffusers has the fastest speed, it would be perfect if they could be combined.

@asomoza
Copy link
Member

asomoza commented Jan 12, 2024

Actually it seems more like a controlnet, something more like this one: https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl.

They also use a custom sampler for the inpainting, but I agree, it would be nice to be able to use those in diffusers.

You can read about it here: lllyasviel/Fooocus#414

@WaterKnight1998 WaterKnight1998 changed the title SDXL Focuus Inpaint SDXL Fooocus Inpaint Jan 12, 2024
@WaterKnight1998
Copy link
Author

The inpaint_v26.fooocus.patch is more similar to a lora, and then the first 50% executes base_model + lora, and the last 50% executes base_model. There is no doubt that fooocus has the best inpainting effect and diffusers has the fastest speed, it would be perfect if they could be combined.

I was reading the code and they download the model here: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/modules/config.py#L398-L399

This function is called here: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/modules/async_worker.py#L301 You can see inpaint_patch_model_path is passed to base_model_additional_loras. They have an strange coded for applying the lora.

After model is loaded you can see in following tabs that they apply the head in top of the result of applying the lora

@Laidawang
Copy link

Actually it seems more like a controlnet, something more like this one: https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl.实际上,它看起来更像是一个控制网,更像是这个:https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl。

They also use a custom sampler for the inpainting, but I agree, it would be nice to be able to use those in diffusers.他们还使用自定义采样器进行修复,但我同意,如果能够在扩散器中使用它们那就太好了。

You can read about it here: lllyasviel/Fooocus#414您可以在这里阅读:lllyasviel/Fooocus#414

I have read the comparison between Fooocus and comfyui of loading lora. I think they are basically the same.
COMFY: https://github.com/comfyanonymous/ComfyUI/blob/53c8a99e6c00b5e20425100f6680cd9ea2652218/comfy/lora.py#L13
FOOOCUS:
https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/ldm_patched/modules/lora.py#L13

This can also be confirmed from the code provided by @WaterKnight1998. it just defined different names to ensure that only fooocus can load it correctly.

@WaterKnight1998
Copy link
Author

Yup, that's the problem I saw. I had a difficult time trying to load in diffusers I didn't managed to map keys of layers into diffusers expected format :(

@Laidawang
Copy link

https://github.com/lllyasviel/Fooocus/blob/main/modules/inpaint_worker.py#L187 Another thing worth considering is how to implement this patch for inpaint head model.

@WaterKnight1998
Copy link
Author

Actually it seems more like a controlnet, something more like this one: https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl.实际上,它看起来更像是一个控制网,更像是这个:https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl。
They also use a custom sampler for the inpainting, but I agree, it would be nice to be able to use those in diffusers.他们还使用自定义采样器进行修复,但我同意,如果能够在扩散器中使用它们那就太好了。
You can read about it here: lllyasviel/Fooocus#414您可以在这里阅读:lllyasviel/Fooocus#414

I have read the comparison between Fooocus and comfyui of loading lora. I think they are basically the same. COMFY: https://github.com/comfyanonymous/ComfyUI/blob/53c8a99e6c00b5e20425100f6680cd9ea2652218/comfy/lora.py#L13 FOOOCUS: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/ldm_patched/modules/lora.py#L13

Ok, both codes are the same. Is it possible to load ComfyUI weights in diffusers?

@WaterKnight1998
Copy link
Author

https://github.com/lllyasviel/Fooocus/blob/main/modules/inpaint_worker.py#L187 Another thing worth considering is how to implement this patch for inpaint head model.

But the code is just updating the first conv, no?

@Laidawang
Copy link

https://github.com/lllyasviel/Fooocus/blob/main/modules/inpaint_worker.py#L187 Another thing worth considering is how to implement this patch for inpaint head model.

But the code is just updating the first conv, no?

You are right, but we also need to use it in diffusers as input to start with

@Laidawang
Copy link

Maybe consider loading it in comfy and saving it as overall weights and then using it in diffusers?

@Laidawang
Copy link

But as I saw in fooocus, the base model will still be used in the second stage, so the most elegant way is to load and unload it freely.

@WaterKnight1998
Copy link
Author

But as I saw in fooocus, the base model will still be used in the second stage, so the most elegant way is to load and unload it freely.

What do you mean with this?

@Laidawang
Copy link

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps.
https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

@asomoza
Copy link
Member

asomoza commented Jan 12, 2024

yeah I saw it afterwards, they switched to a custom model for inpainting, how good is the inpainting? can any of you post an example? if its really good maybe I can try or even better, someone from the diffusers team, but they'll probably need solid proof to work on it.

@Laidawang
Copy link

Laidawang commented Jan 15, 2024

before:
6341705296165_ pic
after:
6121704958599_ pic_hd
I tried outpainting and it was amazingly realistic.
image
for inpainting it, it blends well with the background.

@WaterKnight1998
Copy link
Author

Maybe consider loading it in comfy and saving it as overall weights and then using it in diffusers?

I tested this today, after export I am not able to load with this:

from diffusers import AutoPipelineForInpainting, StableDiffusionXLInpaintPipeline,StableDiffusionInpaintPipeline, DPMSolverMultistepScheduler, AutoencoderKL
import torch
from diffusers.utils import load_image, make_image_grid
pipeline = StableDiffusionXLInpaintPipeline.from_single_file("https://huggingface.co/WaterKnight/fooocus-inpaint/blob/main/fooocus_inpaint_unet.safetensors", torch_dtype=torch.float16).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(33)

Error:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    115 if check_use_auth_token:
    116     kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
--> 118 return fn(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/diffusers/loaders/single_file.py:263, in FromSingleFileMixin.from_single_file(cls, pretrained_model_link_or_path, **kwargs)
    249         file_path = file_path[len("main/") :]
    251     pretrained_model_link_or_path = hf_hub_download(
    252         repo_id,
    253         filename=file_path,
   (...)
    260         force_download=force_download,
    261     )
--> 263 pipe = download_from_original_stable_diffusion_ckpt(
    264     pretrained_model_link_or_path,
    265     pipeline_class=cls,
    266     model_type=model_type,
    267     stable_unclip=stable_unclip,
    268     controlnet=controlnet,
    269     adapter=adapter,
    270     from_safetensors=from_safetensors,
    271     extract_ema=extract_ema,
    272     image_size=image_size,
    273     scheduler_type=scheduler_type,
    274     num_in_channels=num_in_channels,
    275     upcast_attention=upcast_attention,
    276     load_safety_checker=load_safety_checker,
    277     prediction_type=prediction_type,
    278     text_encoder=text_encoder,
    279     text_encoder_2=text_encoder_2,
    280     vae=vae,
    281     tokenizer=tokenizer,
    282     tokenizer_2=tokenizer_2,
    283     original_config_file=original_config_file,
    284     config_files=config_files,
    285     local_files_only=local_files_only,
    286 )
    288 if torch_dtype is not None:
    289     pipe.to(dtype=torch_dtype)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py:1445, in download_from_original_stable_diffusion_ckpt(checkpoint_path_or_dict, original_config_file, image_size, prediction_type, model_type, extract_ema, scheduler_type, num_in_channels, upcast_attention, device, from_safetensors, stable_unclip, stable_unclip_prior, clip_stats_path, controlnet, adapter, load_safety_checker, pipeline_class, local_files_only, vae_path, vae, text_encoder, text_encoder_2, tokenizer, tokenizer_2, config_files)
   1442 unet_config["upcast_attention"] = upcast_attention
   1444 path = checkpoint_path_or_dict if isinstance(checkpoint_path_or_dict, str) else ""
-> 1445 converted_unet_checkpoint = convert_ldm_unet_checkpoint(
   1446     checkpoint, unet_config, path=path, extract_ema=extract_ema
   1447 )
   1449 ctx = init_empty_weights if is_accelerate_available() else nullcontext
   1450 with ctx():

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py:426, in convert_ldm_unet_checkpoint(checkpoint, config, path, extract_ema, controlnet, skip_extract_state_dict)
    422                 unet_state_dict[key.replace(unet_key, "")] = checkpoint.pop(key)
    424 new_checkpoint = {}
--> 426 new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
    427 new_checkpoint["time_embedding.linear_1.bias"] = unet_state_dict["time_embed.0.bias"]
    428 new_checkpoint["time_embedding.linear_2.weight"] = unet_state_dict["time_embed.2.weight"]

@Laidawang
Copy link

There seems to be something wrong with the size of the weights. If you only saved unet, you cannot load it through from_single_file.

@lawsonxwl
Copy link

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task.

But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??

@lawsonxwl
Copy link

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

A question, why do you think that the inpaint patch is only used in the first 50% of the sampling?

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Feb 5, 2024

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task.

But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??

Could you share this, please?

@yiyixuxu yiyixuxu added the inpainting issues/questions related related to inpainting/outpainting label Feb 5, 2024
@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Feb 5, 2024

Interesting! I'm keeping my eyes on this :)
do share your results and findings with us

@WaterKnight1998
Copy link
Author

@lawsonxwl any news???

@Laidawang
Copy link

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.例如,在fooocus inpainting中,假设进行30步采样,前15步将使用xl_base_model + inpainting_model,后15步将xl_base_model切换为单独推理。 https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 请参阅此处。

A question, why do you think that the inpaint patch is only used in the first 50% of the sampling?一个问题,为什么你认为 inpaint patch 只在前 50% 的采样中使用?

I have read the code and I'm sure of this, and also when generating, it will also have a print in the console.

@lawsonxwl
Copy link

lawsonxwl commented Feb 18, 2024

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task.
But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??

Could you share this, please?

sorry, as is restricted by the regulation, I cannot share you the code. If you want to migrate fooocus to diffusers,you have to check almost all the code in fooocus project... really overwhelming.
After several rounds of optimization, the quality of my pipeline result can be quite close to fooocus webui(In my personal view).

@lawsonxwl
Copy link

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.例如,在fooocus inpainting中,假设进行30步采样,前15步将使用xl_base_model + inpainting_model,后15步将xl_base_model切换为单独推理。 https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 请参阅此处。

A question, why do you think that the inpaint patch is only used in the first 50% of the sampling?一个问题,为什么你认为 inpaint patch 只在前 50% 的采样中使用?

I have read the code and I'm sure of this, and also when generating, it will also have a print in the console.

Yes, you are absolutely right. Do you mind leaving your wechat? we can talk about this

@Laidawang
Copy link

Yes, you are absolutely right. Do you mind leaving your wechat? we can talk about this

laidawang233

@asomoza
Copy link
Member

asomoza commented Feb 20, 2024

if you want another resource to look at:

https://github.com/Acly/comfyui-inpaint-nodes

Adds two nodes which allow using Fooocus inpaint model. It's a small and flexible patch which can be applied to any SDXL checkpoint and will transform it into an inpaint model. This model can then be used like other inpaint models, and provides the same benefits.

it also has other cool stuff for inpainting, I will try them too and I think that combined with this: #7038 the inpainting would be really good now.

@yiyixuxu
Copy link
Collaborator

@asomoza
keep us updated!

@quark-toon
Copy link

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task.
But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??

Could you share this, please?

sorry, as is restricted by the regulation, I cannot share you the code. If you want to migrate fooocus to diffusers,you have to check almost all the code in fooocus project... really overwhelming. After several rounds of optimization, the quality of my pipeline result can be quite close to fooocus webui(In my personal view).

@lawsonxwl @WaterKnight1998 @yiyixuxu

Hi, So we at Dashtoon are also working on our custom diffusers pipeline to get best out of inpainting using sdxl inpaint. I also have been going through the fooocus codebase to merge fooocus's inpaint patch model to hf diffusers unet layers. So far, I also have managed to include inpaint head module to unet and merge the inpaint patch model layers to hf unet layers, by matching keys as @lawsonxwl also mentioned. And yes, it is quite overwhelming to navigate fooocus codebase..!

One thing to note down is that it is not exactly lora. It basically replaces original pretrained weight tensor (lets say w_orig) of unet for a given key (from a set of keys for which the weight needs to be updated) with the new weight tensor w_new. Now this w_new is calculated using three weight tensors w1, w_max, w_min. These three weights tensors you get from that inpaint patch model dict (fooocus_v26.inpaint.patch), where the key is the unet key (to be mapped to diffusers unet) and value is a tuple of those three tensors.
So, w_new becomes w_old + (w1/255.0)*(w_max - w_min) + w_min. If w_old is of shape (320, 320, 3, 3), then w1 will be of same shape as w_old and w_max and w_min will both be of shapes (320, 1, 3, 3) which makes sense as I believe, it really is a shifting and scaling operation as done in above formula.

But the problem is when I tested using default params of sdxl inpaint pipeline with just inpaint head, I am getting something like this in the generated result (First is input image, 2nd is mask, 3rd is generated image using default sdxl inpaint pipeline without fooocus inpaint head, 4th is using fooocus inpaint head):
output_ihead

Also, if I use just the inpaint patch model, I am currently getting something like below:
output_patch

Prompt used in both the cases for inpainting was "Young Female, Blue Eyes, Brown Long Hair"

I havent implemented any other change from fooocus yet.

@lawsonxwl any idea as to why this might be happening for both the cases? Especially when using the fooocus inpaint patch model. What could I possibly be missing?

@bonlime
Copy link
Contributor

bonlime commented Mar 8, 2024

@quark-toon I believe you forgot to disable passing extra inpaint_features to Unet after you've unloaded the Fooocus lora/patch. Also make sure you add the inpaint_features right after the conv_in

you can also message me in Telegram at bonlime if you want to debug this together

@yiyixuxu yiyixuxu added contributions-welcome help wanted Extra attention is needed labels Mar 8, 2024
@quark-toon
Copy link

@quark-toon I believe you forgot to disable passing extra inpaint_features to Unet after you've unloaded the Fooocus lora/patch. Also make sure you add the inpaint_features right after the conv_in

you can also message me in Telegram at bonlime if you want to debug this together

Yeah I did add the inpaint_features right after the conv_in layer and then the rest of the flow happens. For the first thing, yeah I did not. Will check once.
Sure will ping you on telegram.

@Laidawang
Copy link

Laidawang commented Mar 11, 2024

@quark-toon I think you've almost got the result right, one more thing to consider is that fooocus only uses fooocus inpaint for the first 12 out of 24 steps.
refiner at 12
refiner12

no refiner
no refiner

which is close to your results
image

@WaterKnight1998
Copy link
Author

@quark-toon @Laidawang @bonlime @asomoza do you have some code that we could refactor for PR? Or everything is private?

@asomoza
Copy link
Member

asomoza commented Mar 13, 2024

@WaterKnight1998

I don’t understand why some people keep their code or models secret. They’re using an open rail model that’s also used in a very popular open-source app. Even ComfyUI has it. So, it’s not as if they’re doing something unique that could give them a significant financial gain or a competitive edge over others. But in the end, to each their own.

In my case, I converted the model but I have three issues about this:

  1. The creator of fooocus is the same person as controlnet and even the layer diffusion and he made both of those very public and open source but not the inpaint part of his app, so maybe he wanted to keep it there as a unique feature and I probably want to respect that.
  2. I don't find the results (fooocus, comfyui and diffusers) using this model that good or even better than conventional inpainting in some cases.
  3. This would be difficult to do in a PR, if you want to respect what fooocus does, you'll have to apply a patch (1.3 GB) to the unet, then patch also the head into the model at inference and then use back the original model to finish the details, that's a lot of work for something that IMO is not even better than other simpler solutions.

as an example, this code in comfyui:

        inpaint_head_model, inpaint_lora = patch
        feed = torch.cat([latent_mask, latent_pixels], dim=1)
        inpaint_head_model.to(device=feed.device, dtype=feed.dtype)
        inpaint_head_feature = inpaint_head_model(feed)

        def input_block_patch(h, transformer_options):
            if transformer_options["block"][1] == 0:
                h = h + inpaint_head_feature.to(h)
            return h

        m = model.clone()
        m.set_model_input_block_patch(input_block_patch)
        patched = m.add_patches(loaded_lora, 1.0)

isn't that straightforward to do in diffusers, in fooocus, forge and comfyui, patching the unet is very easy.

If someone can provide a real example with prompt, source image and mask where the fooocus model is better than the original inpaint, soft inpainting or differential diffusion, I can continue but as is right now IMO is not worth it.

What I think people find it good and better than diffusers is that fooocus does a lot more under the hood and you have to do the same in diffusers to get good results, not just pass the image and the mask and expect the same kind of quality.

@bonlime
Copy link
Contributor

bonlime commented Mar 13, 2024

@asomoza I would argue the fooocus inpaint patch produces better results than current approach in diffusers, the reason for that is because current implementation uses so-called "combine-noise" approach and the model has zero insights about known areas of the image and attempts to harmonise them later during generation. But if few first predictions were incorrect/very different from the image you're trying to inpaint, it may fail to harmonise the regions. For example in the 2girls example above current approach may start to generate girls at different locations, and therefore fail to inpaint correctly. Fooocus inpaint patch actually only does 1 things - with it model almost perfectly predicts known areas of the image, which allows to much better predictions of the unknown parts.

But also IMO this approach requires too many changes required (like new extra arg in unet forward), and is not worth adding to an already huge diffusers code base. Anyone interested could write their own custom pipeline with this inpainting

@Laidawang
Copy link

Laidawang commented Mar 18, 2024

@quark-toon
Based on the image you provided, did you pass in text in the pipeline of stage1?
I think that's probably normal, because it’s referencing your other areas, and maybe that's how it's supposed to be such a style?Maybe try using mask crop to reduce the reference area(maybe do some crop and paste back).
BTW, like all of you, I am also waiting for a stable and good diffusers fooocus inpaint pipeline. I'm sure it's worth doing.

@asomoza
Copy link
Member

asomoza commented Mar 18, 2024

@bonlime

the model has zero insights about known areas of the image and attempts to harmonise them later during generation.

This is only true in diffusers if you use the padding_mask_crop argument, if you don't use it, it takes the whole image as context and just replaces the masked part. If you use an inpainting model, they're even trained to do this and have more inputs channels just for that. Also you can use other means to feed it even more information like an IP Adapter or an inpaint control net.

I'm not a close minded person and also it seems that I'm the only one willing to do it as an open source solution, is just that I'm not motivated to do it when I can't find a clear benefit in the results. We can discuss it with words for a long time but a clear example of a good inpainting in fooocus than I can't replicate or make better in diffusers is the fastest way, that's all that I'm asking to get some motivation.

Just seeing the image that was deleted of the "alien" is a clear indication that the bad quality in the inpainting is not a diffusers problem but more of an implementation problem. As I stated before, diffusers is just a library/tool, the quality of the results depend on what you do with it, fooocus is a complete solution where it does a lot under the hood and that's why people like it, it is a zero effort quick generation tool.

@Laidawang
Copy link

Laidawang commented Mar 18, 2024

@asomoza I'll give you a few examples that you can look at, which I think are not easy to do in diffusers.
input: prompt: 'A thick and cozy beige wool rug'
image
image
result:
image
input: prompt: 'Indonesia girl, white skin, hijab, beautifull, smile'
image
image
result:
image

@asomoza
Copy link
Member

asomoza commented Mar 18, 2024

@Laidawang

Thank you for the examples, at first glance I thought the rug one was good but has a couple of bad details that IMO doesn't make them usable:

for example it made part of the table blurry:

image

and there's something weird going on here:

image

Even so, with the same mask you posted (blurred because diff-diff works with gradients), I get these results:

differential_20240318121701_3442287102 differential_20240318121925_3251872246

is the fooocus one better than these ones? Also We have to take into account that I didn't use any prompt enhancer or negatives (fooocus use them) and I could make a better and more precise mask because diff-diff benefits a lot with it.

So if a put a little more effort in the mask and use a prompt enhancer, I get these results:

differential_20240318125226_4141026577 differential_20240318125321_1058436758

Still have room for improvements, I can use lama to first fill the inpainting area and control net to prevent some minor modifications. Also this is with just the normal juggernaut, not even the inpainting one.

For the second image, these are my results:

differential_20240318135644_972128935 differential_20240318135707_2368548471

With the woman I was struggling a bit until I realized that in yours, she even has her eyes open, so I probably can get away with just doing a whole face mask, but still, these are without that.

I can get these results without any model patching or the need to modify the forward method and this is not all you can do with diffusers. To get real good results I'll have to put some more effort.

I don't want to make this an "inpainting competition", but IMO at the very least, the diffusers results aren't worse than the fooocus ones.

@Laidawang
Copy link

Laidawang commented Mar 19, 2024

For blurring, this is derived from my own code, which results in a diff-diff-like effect, but not as good as diff-diff's.
how about outpainting,If you can prove that diff-diff is as good as fooocus, then I think it's not worth achieving
prompt:a wolf playing basketball
11
522523174384966

For the following two examples, I used blip2 to do prompt interogate.
input1 result1

input2 result2
In addition, I would like to ask, what is the speed of using diff-diff to infer 20 steps?

@asomoza
Copy link
Member

asomoza commented Mar 19, 2024

Nice, I like the challenge, let me get back at you soon since I still haven't done any outpainting with diffusers and I don't think there's a pipeline or workflow for that yet.

I plan to do a guide/example/tutorial for inpainting and outpanting soon. I'll work in an outpainting solution so I can tackle this first, but IMO is the same, just need to solve the math for expanding the "canvas" and probably need to fill it with something first, not just noise.

In addition, I would like to ask, what is the speed of using diff-diff to infer 20 steps?

for the woman this is the speed I get with a 3090:

image

in this comment: exx8/differential-diffusion#17 (comment) the author says it is just a 0.25% penalty.

But I'll also do it with normal inpainting because the results are also good, I like to use diff-diff but normal inpainting is not bad, the trick to it is the image area we use as a context and how we merge back the inpainted part.

@Laidawang
Copy link

Laidawang commented Mar 19, 2024

I agree with you, the original image is crucial to the generation process, diffusion model are trained to do that. So for the outpainting here, I would use lama first to fix outpaint area.

@Laidawang
Copy link

Laidawang commented Mar 19, 2024

i test it in comfy,with 2 methods:24 steps for all
M1: base model + inpaint patch(0-12) then base_model (12-24) according to the source code of fooocus.
as i realized base_model + inpaint patch ≈ a inpainting model so i think way2 is:
M2: inpaint base model(i choose real3.0inpaint model which can be find here:https://civitai.com/models/139562?modelVersionId=297320)(0-12) + base model (12 -24)
For both methods I used diff-diff as this does not conflict.Here are some results.(left: M1-fooocus, right:M2)
a wolf in a swimming pool
image
a wolf playing football
image
a wolf playing football on the beach
image
some food placed on the table
image
My test case:
testing.zip
In some scenes, I think they're pretty close, but I think fooocus is slightly better. For example, the bowl in the last picture.
One more thing worth noting is that in M2 I loaded two weights (5g each), whereas in M1 I only have to load an additional pathch (1.23g), which means this method uses less GPU memory.

@asomoza
Copy link
Member

asomoza commented Mar 19, 2024

I always thought that the model patching that fooocus does is just to convert a regular model to an inpainting one, you can do the same by merging the difference of the inpaint model trained by the diffusers team with the base model, that's is what most people did with SD 1.5.

The first "inpainting" of fooocus was a controlnet too, so I don't know if there's something else in the patch or if he trained it from scratch or used the diffusers model, so I left it better as an "unknown" patch.

Thank you for doing more tests, I have more to compare my results with that. This time, I’m putting more effort to do the best I can, instead of just simply replicating fooocus or comfyui.

Edit: the VRAM and RAM can be managed, I remember that fooocus has to unload and load the model so it probably clones the base model (taking more RAM), also I think comfyui manages better the memory than fooocus since comfyui can run in a potato pc, so it should unload the model that is not using. In diffusers you practically can do whatever you want if you have the knowledge.

@asomoza
Copy link
Member

asomoza commented Mar 19, 2024

just to have a baseline, I tested the wolf one with just the controlnet inpaint, I use my app for this, but it can be done with just code:

image

I don't think is that worse, but If I want to make it better, I can use the prompt enhancer and fix the composition with a t2i adapter (fixed a bit of the tail with painting in the canny preprocessed image)

image

not bad for a quick inference, I'm going to do this also with just diffusers code.

20240319121216

@asomoza
Copy link
Member

asomoza commented Mar 26, 2024

Hi @Laidawang, I just posted a guide on the discussions on outpainting, I did a middle step without changing the prompt so you can compare it to the fooocus result.

I'll use your other images in the other methods I know because they are more suited for them. Let me know if you still think fooocus is better but IMO they're of the same quality or better.

@viperyl
Copy link

viperyl commented Apr 12, 2024

thanks for you sharing, i found the Fooocus inpaint lora weight contains (unit8, fp16, fp16) data, can anyone explain uint8 weights here?

@bonlime
Copy link
Contributor

bonlime commented Apr 21, 2024

@viperyl if i remember correctly they quantized the main matrices to uint8 to take less space and then use min/max stored in fp16 to scale them back. IMO very good idea with negligible loss of information

@viperyl
Copy link

viperyl commented Apr 22, 2024

@viperyl if i remember correctly they quantized the main matrices to uint8 to take less space and then use min/max stored in fp16 to scale them back. IMO very good idea with negligible loss of information

Yes, I debuged it and found uint8 quant, that's make me feel confused. The uint8 checkpoint needs 1.4 Gb disk space, but fp16 version only needs 2.5 Gb. Considering the quantization protentially damaged result, make a quant here for saving 1.1 Gb disk space looks not good idea.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions-welcome help wanted Extra attention is needed inpainting issues/questions related related to inpainting/outpainting stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

8 participants