-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDXL Fooocus Inpaint #6529
Comments
they also seem to use fooocus_inpaint_head.pth I'm not quite sure what this will do, I read the code and maybe an additional patch for unet? The inpaint_v26.fooocus.patch is more similar to a lora, and then the first 50% executes base_model + lora, and the last 50% executes base_model. |
Actually it seems more like a controlnet, something more like this one: https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl. They also use a custom sampler for the inpainting, but I agree, it would be nice to be able to use those in diffusers. You can read about it here: lllyasviel/Fooocus#414 |
I was reading the code and they download the model here: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/modules/config.py#L398-L399 This function is called here: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/modules/async_worker.py#L301 You can see After model is loaded you can see in following tabs that they apply the head in top of the result of applying the lora |
I have read the comparison between Fooocus and comfyui of loading lora. I think they are basically the same. This can also be confirmed from the code provided by @WaterKnight1998. it just defined different names to ensure that only fooocus can load it correctly. |
Yup, that's the problem I saw. I had a difficult time trying to load in diffusers I didn't managed to map keys of layers into diffusers expected format :( |
https://github.com/lllyasviel/Fooocus/blob/main/modules/inpaint_worker.py#L187 Another thing worth considering is how to implement this patch for inpaint head model. |
Ok, both codes are the same. Is it possible to load ComfyUI weights in diffusers? |
But the code is just updating the first conv, no? |
You are right, but we also need to use it in diffusers as input to start with |
Maybe consider loading it in comfy and saving it as overall weights and then using it in diffusers? |
But as I saw in fooocus, the base model will still be used in the second stage, so the most elegant way is to load and unload it freely. |
What do you mean with this? |
For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. |
yeah I saw it afterwards, they switched to a custom model for inpainting, how good is the inpainting? can any of you post an example? if its really good maybe I can try or even better, someone from the diffusers team, but they'll probably need solid proof to work on it. |
I tested this today, after export I am not able to load with this: from diffusers import AutoPipelineForInpainting, StableDiffusionXLInpaintPipeline,StableDiffusionInpaintPipeline, DPMSolverMultistepScheduler, AutoencoderKL
import torch
from diffusers.utils import load_image, make_image_grid
pipeline = StableDiffusionXLInpaintPipeline.from_single_file("https://huggingface.co/WaterKnight/fooocus-inpaint/blob/main/fooocus_inpaint_unet.safetensors", torch_dtype=torch.float16).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(33) Error:
|
There seems to be something wrong with the size of the weights. If you only saved unet, you cannot load it through from_single_file. |
I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task. But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice?? |
A question, why do you think that the inpaint patch is only used in the first 50% of the sampling? |
Could you share this, please? |
Interesting! I'm keeping my eyes on this :) |
@lawsonxwl any news??? |
I have read the code and I'm sure of this, and also when generating, it will also have a print in the console. |
sorry, as is restricted by the regulation, I cannot share you the code. If you want to migrate fooocus to diffusers,you have to check almost all the code in fooocus project... really overwhelming. |
Yes, you are absolutely right. Do you mind leaving your wechat? we can talk about this |
laidawang233 |
if you want another resource to look at: https://github.com/Acly/comfyui-inpaint-nodes
it also has other cool stuff for inpainting, I will try them too and I think that combined with this: #7038 the inpainting would be really good now. |
@asomoza |
@lawsonxwl @WaterKnight1998 @yiyixuxu Hi, So we at Dashtoon are also working on our custom diffusers pipeline to get best out of inpainting using sdxl inpaint. I also have been going through the fooocus codebase to merge fooocus's inpaint patch model to hf diffusers unet layers. So far, I also have managed to include inpaint head module to unet and merge the inpaint patch model layers to hf unet layers, by matching keys as @lawsonxwl also mentioned. And yes, it is quite overwhelming to navigate fooocus codebase..! One thing to note down is that it is not exactly lora. It basically replaces original pretrained weight tensor (lets say w_orig) of unet for a given key (from a set of keys for which the weight needs to be updated) with the new weight tensor w_new. Now this w_new is calculated using three weight tensors w1, w_max, w_min. These three weights tensors you get from that inpaint patch model dict (fooocus_v26.inpaint.patch), where the key is the unet key (to be mapped to diffusers unet) and value is a tuple of those three tensors. But the problem is when I tested using default params of sdxl inpaint pipeline with just inpaint head, I am getting something like this in the generated result (First is input image, 2nd is mask, 3rd is generated image using default sdxl inpaint pipeline without fooocus inpaint head, 4th is using fooocus inpaint head): Also, if I use just the inpaint patch model, I am currently getting something like below: Prompt used in both the cases for inpainting was "Young Female, Blue Eyes, Brown Long Hair" I havent implemented any other change from fooocus yet. @lawsonxwl any idea as to why this might be happening for both the cases? Especially when using the fooocus inpaint patch model. What could I possibly be missing? |
@quark-toon I believe you forgot to disable passing extra you can also message me in Telegram at |
Yeah I did add the inpaint_features right after the conv_in layer and then the rest of the flow happens. For the first thing, yeah I did not. Will check once. |
@quark-toon I think you've almost got the result right, one more thing to consider is that fooocus only uses fooocus inpaint for the first 12 out of 24 steps. |
@quark-toon @Laidawang @bonlime @asomoza do you have some code that we could refactor for PR? Or everything is private? |
I don’t understand why some people keep their code or models secret. They’re using an open rail model that’s also used in a very popular open-source app. Even ComfyUI has it. So, it’s not as if they’re doing something unique that could give them a significant financial gain or a competitive edge over others. But in the end, to each their own. In my case, I converted the model but I have three issues about this:
as an example, this code in comfyui: inpaint_head_model, inpaint_lora = patch
feed = torch.cat([latent_mask, latent_pixels], dim=1)
inpaint_head_model.to(device=feed.device, dtype=feed.dtype)
inpaint_head_feature = inpaint_head_model(feed)
def input_block_patch(h, transformer_options):
if transformer_options["block"][1] == 0:
h = h + inpaint_head_feature.to(h)
return h
m = model.clone()
m.set_model_input_block_patch(input_block_patch)
patched = m.add_patches(loaded_lora, 1.0) isn't that straightforward to do in diffusers, in fooocus, forge and comfyui, patching the unet is very easy. If someone can provide a real example with prompt, source image and mask where the fooocus model is better than the original inpaint, soft inpainting or differential diffusion, I can continue but as is right now IMO is not worth it. What I think people find it good and better than diffusers is that fooocus does a lot more under the hood and you have to do the same in diffusers to get good results, not just pass the image and the mask and expect the same kind of quality. |
@asomoza I would argue the fooocus inpaint patch produces better results than current approach in diffusers, the reason for that is because current implementation uses so-called "combine-noise" approach and the model has zero insights about known areas of the image and attempts to harmonise them later during generation. But if few first predictions were incorrect/very different from the image you're trying to inpaint, it may fail to harmonise the regions. For example in the 2girls example above current approach may start to generate girls at different locations, and therefore fail to inpaint correctly. Fooocus inpaint patch actually only does 1 things - with it model almost perfectly predicts known areas of the image, which allows to much better predictions of the unknown parts. But also IMO this approach requires too many changes required (like new extra arg in unet forward), and is not worth adding to an already huge diffusers code base. Anyone interested could write their own custom pipeline with this inpainting |
@quark-toon |
This is only true in diffusers if you use the I'm not a close minded person and also it seems that I'm the only one willing to do it as an open source solution, is just that I'm not motivated to do it when I can't find a clear benefit in the results. We can discuss it with words for a long time but a clear example of a good inpainting in fooocus than I can't replicate or make better in diffusers is the fastest way, that's all that I'm asking to get some motivation. Just seeing the image that was deleted of the "alien" is a clear indication that the bad quality in the inpainting is not a diffusers problem but more of an implementation problem. As I stated before, diffusers is just a library/tool, the quality of the results depend on what you do with it, |
@asomoza I'll give you a few examples that you can look at, which I think are not easy to do in diffusers. |
For blurring, this is derived from my own code, which results in a diff-diff-like effect, but not as good as diff-diff's. For the following two examples, I used blip2 to do prompt interogate. |
Nice, I like the challenge, let me get back at you soon since I still haven't done any outpainting with diffusers and I don't think there's a pipeline or workflow for that yet. I plan to do a guide/example/tutorial for inpainting and outpanting soon. I'll work in an outpainting solution so I can tackle this first, but IMO is the same, just need to solve the math for expanding the "canvas" and probably need to fill it with something first, not just noise.
for the woman this is the speed I get with a 3090: in this comment: exx8/differential-diffusion#17 (comment) the author says it is just a 0.25% penalty. But I'll also do it with normal inpainting because the results are also good, I like to use diff-diff but normal inpainting is not bad, the trick to it is the image area we use as a context and how we merge back the inpainted part. |
I agree with you, the original image is crucial to the generation process, diffusion model are trained to do that. So for the outpainting here, I would use lama first to fix outpaint area. |
i test it in comfy,with 2 methods:24 steps for all |
I always thought that the model patching that fooocus does is just to convert a regular model to an inpainting one, you can do the same by merging the difference of the inpaint model trained by the diffusers team with the base model, that's is what most people did with SD 1.5. The first "inpainting" of fooocus was a controlnet too, so I don't know if there's something else in the patch or if he trained it from scratch or used the diffusers model, so I left it better as an "unknown" patch. Thank you for doing more tests, I have more to compare my results with that. This time, I’m putting more effort to do the best I can, instead of just simply replicating fooocus or comfyui. Edit: the VRAM and RAM can be managed, I remember that fooocus has to unload and load the model so it probably clones the base model (taking more RAM), also I think comfyui manages better the memory than fooocus since comfyui can run in a potato pc, so it should unload the model that is not using. In diffusers you practically can do whatever you want if you have the knowledge. |
just to have a baseline, I tested the wolf one with just the controlnet inpaint, I use my app for this, but it can be done with just code: I don't think is that worse, but If I want to make it better, I can use the prompt enhancer and fix the composition with a t2i adapter (fixed a bit of the tail with painting in the canny preprocessed image) not bad for a quick inference, I'm going to do this also with just diffusers code. |
Hi @Laidawang, I just posted a guide on the discussions on outpainting, I did a middle step without changing the prompt so you can compare it to the fooocus result. I'll use your other images in the other methods I know because they are more suited for them. Let me know if you still think fooocus is better but IMO they're of the same quality or better. |
thanks for you sharing, i found the Fooocus inpaint lora weight contains (unit8, fp16, fp16) data, can anyone explain uint8 weights here? |
@viperyl if i remember correctly they quantized the main matrices to uint8 to take less space and then use min/max stored in fp16 to scale them back. IMO very good idea with negligible loss of information |
Yes, I debuged it and found uint8 quant, that's make me feel confused. The uint8 checkpoint needs 1.4 Gb disk space, but fp16 version only needs 2.5 Gb. Considering the quantization protentially damaged result, make a quant here for saving 1.1 Gb disk space looks not good idea. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Is your feature request related to a problem? Please describe.
I have seen that diffusers StableDiffusionXLInpaintPipeline generates worse results than SD 1.5 pipeline.
Describe the solution you'd like.
Include Fooocus inpaint patch, you could specify with a new loader.
Weights are available right now in hub.
https://huggingface.co/lllyasviel/fooocus_inpaint
The text was updated successfully, but these errors were encountered: