Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add masked loss implementation #589

Closed
wants to merge 7 commits into from
Closed

Conversation

recris
Copy link

@recris recris commented Jun 14, 2023

This is mostly a rebase of #236

Relevant differences:

  • Instead of looking for .mask files it will now look for a matching PNG file in the mask sub-directory
  • When computing MSE loss, mask values are rescaled by the mask mean value.

@kohya-ss
Copy link
Owner

Thank you for this! The implementation is simple and wonderful.

However, as I mentioned here: #236 (comment) , I would like to integrate the ControlNet dataset with the mask dataset.

This is because in the future I would like to be able to handle some additional images in addition to ControlNet and masks in a generic way in the dataset. Also, it is redundant to have different processes for ControlNet and masks considering bucketing, caching (memory and disk), cropping etc...

I intend to extend these processes to mask loss after the ControlNet PR #551 is complete.

If it seems that pull request #551 will take a long time to complete, I think it's possible that I could merge this pull request first. Please give me a little time to consider. Also, please understand that there is a possibility I may close this PR without merging it.

@recris
Copy link
Author

recris commented Jun 16, 2023

No worries, if you decide to merge the other PR first I will rebase again

@Elevory
Copy link

Elevory commented Jun 19, 2023

I'm getting some errors with load_mask() when it encounters unconventional image sizes. I had to wrap line 727 of train_util.py in a try/except as a workaround:

try: img[..., -1] = self.load_mask(image_path)
except: pass

As well as line 1037:

try: example["masks"] = torch.stack(masks) if masks[0] is not None else None
except: example["masks"] = None

Is there a better way to ensure compatibility with bucketing?

Thanks!

@Elevory
Copy link

Elevory commented Jun 19, 2023

Hi,

I moved the initialization of mask below the trim_and_resize_if_required() function that is used with bucketing. Updated copy of train_util.py is attached. It will also catch a couple errors as described in my previous post.

train_util.zip

Seems to work in my (brief) testing. EDIT: Hmm, it's blowing out the exposure on my DreamBooth models. Not sure if it's an issue with DB or my bucketing fix. Is it normal for masking to cause the model to trend toward white?

@recris
Copy link
Author

recris commented Jun 20, 2023

I'm getting some errors with load_mask() when it encounters unconventional image sizes. I had to wrap line 727 of train_util.py in a try/except as a workaround:

try: img[..., -1] = self.load_mask(image_path)
except: pass

As well as line 1037:

try: example["masks"] = torch.stack(masks) if masks[0] is not None else None
except: example["masks"] = None

Is there a better way to ensure compatibility with bucketing?

Thanks!

I've been using this with square images and masks, of various sizes. The important part is that mask must be in grayscale format, dimensions exactly matching the corresponding training input.

@recris
Copy link
Author

recris commented Jun 20, 2023

Hi,

I moved the initialization of mask below the trim_and_resize_if_required() function that is used with bucketing. Updated copy of train_util.py is attached. It will also catch a couple errors as described in my previous post.

train_util.zip

Seems to work in my (brief) testing. EDIT: Hmm, it's blowing out the exposure on my DreamBooth models. Not sure if it's an issue with DB or my bucketing fix. Is it normal for masking to cause the model to trend toward white?

I haven't observed anything like this, do you have an example you could share?

@Elevory
Copy link

Elevory commented Jun 20, 2023

Sure, let me provide more context. :)

I am using the --enable_buckets flag which produces the following error at the start of training:

Traceback (most recent call last):
  File "T:\code\python\kohya-ss\train_db.py", line 507, in <module>
    train(args)
  File "T:\code\python\kohya-ss\train_db.py", line 334, in train
    .reshape(
RuntimeError: shape '[1, 1, 704, 320]' is invalid for input of size 1323504

If I understand correctly, this is because on line 965 of train_util.py, you are initializing the mask variable before applying trim_and_resize_if_required() for bucketing on line 970.

But in case I've misdiagnosed the issue, here is my entire configuration:

set PROJECT="project_name"
call ./venv/Scripts/activate

accelerate launch --num_cpu_threads_per_process=12 train_db.py ^
  --pretrained_model_name_or_path="T:/code/python/automatic-stable-diffusion-webui/models/Stable-diffusion/general/reliberate_v10.safetensors" ^
  --train_data_dir="T:/code/python/kohya-ss/_in/%PROJECT%/train" ^
  --reg_data_dir="T:/code/python/kohya-ss/_in/%PROJECT%/reg" ^
  --output_dir="T:/code/python/kohya-ss/_out_db/%PROJECT%" ^
  --save_every_n_steps=100 ^
  --sample_every_n_epochs=5 ^
  --train_batch_size=8 ^
  --prior_loss_weight=1.0 ^
  --output_name=%PROJECT% ^
  --resolution=512,512 ^
  --learning_rate=1e-6 ^
  --lr_scheduler="constant" ^
  --lr_warmup_steps=0 ^
  --max_train_steps=5000 ^
  --mixed_precision="fp16" ^
  --gradient_accumulation_steps=1 ^
  --gradient_checkpointing ^
  --save_model_as=safetensors ^
  --clip_skip=1 ^
  --seed=1 ^
  --color_aug ^
  --xformers ^
  --use_8bit_adam ^
  --sample_prompts="./prompts.txt" ^
  --persistent_data_loader_workers ^
  --enable_bucket ^
  --random_crop

pause

Now, as far as the image exposure problem: I spliced your train_network.py code into train_db.py because I want to use masking with DreamBooth. I haven't actually tried this PR with Loras, so it's possible that this is specifically an issue with DB.

I can see the sample images becoming gradually whiter/brighter/more washed out over the course of training, but it's most evident if I prompt for a dark scene with the finished model. For example, here's "Will Smith at the nighttime fireworks show" in the non-masked model:

And here's the same prompt in the masked version (identical inference settings and pretty similar training settings):

Finally, here's the output from a checkpoint trained twice as long:

I've been using this with square images and masks, of various sizes. The important part is that mask must be in grayscale format, dimensions exactly matching the corresponding training input.

I can confirm that the masks are all grayscale 8-bit PNGs. Just to be sure, I even tried generating them with your awesome masking tool, which unfortunately didn't make a difference.

@AI-Casanova
Copy link
Contributor

@Elevory It is possible that --color_aug might have a greater effect with masks, since only the subject is getting backpropped, and I've noticed a tendency for human faces to react unpredictably with --color_aug in general.

Perhaps try without?

@Elevory
Copy link

Elevory commented Jun 21, 2023

Hi @AI-Casanova ,

Thank you for the suggestion. Unfortunately, removing the --color_aug flag does not eliminate the issue. Non-masked:

Masked:

I may try preparing another dataset with images at a single resolution to see if bucketing is somehow responsible, but I find it unlikely.

Do you think regularization images might be at fault here? DreamBooth requires hundreds of pictures of the object class, and I haven't created masks for those. Are they perhaps having a stronger-than-intended effect on training?

EDIT: I made a couple interesting discoveries. First, if I use --masked_loss but do not include any masks in my training set, the model still becomes washed out.

Second, I found out that my custom token--which was gibberish--had a tendency to produce black-and-white images in the base model. I switched to a different token and, so far, the exposure problem is somewhat reduced. Still need to run more tests.

EDIT 2: Tried disabling some other things to no avail: --random_crop, --gradient_checkpointing, rescaling masks by mean value.

@Elevory
Copy link

Elevory commented Jun 22, 2023

I think I got it working!

When I moved the initialization of the mask var below trim_and_resize_if_required(), I neglected to move the "drop alpha channel" line with it. So I was dropping the alpha channel too early.

I'm attaching my updated copies of train_util.py and train_db.py below. By the way, I'm seeing a noticeable improvement in the quality of my sample images, so masking may be a gamechanger after all - great work @AI-Casanova and @recris!

masking_for_db.zip

@TingTingin
Copy link
Contributor

I think I got it working!

When I moved the initialization of the mask var below trim_and_resize_if_required(), I neglected to move the "drop alpha channel" line with it. So I was dropping the alpha channel too early.

I'm attaching my updated copies of train_util.py and train_db.py below. By the way, I'm seeing a noticeable improvement in the quality of my sample images, so masking may be a gamechanger after all - great work @AI-Casanova and @recris!

masking_for_db.zip

any example images?

@briansemrau
Copy link

briansemrau commented Aug 2, 2023

I've just reimplemented this on top of the sdxl branch: https://github.com/briansemrau/kohya-ss-sd-scripts/tree/sdxl_lora_mask

Feel free to pull it in

@TingTingin
Copy link
Contributor

Should probably make a pull request

@recris recris changed the base branch from main to dev October 9, 2023 12:10
@recris
Copy link
Author

recris commented Oct 9, 2023

Update:

  • Rebased PR on latest main
  • Made mask loading a bit more robust
  • Added latent masking to all training methods, not just LoRA

I've tested this with LoRA for SD 1.5 and SDXL. Have not yet tested with other methods, if someone would volunteer it would be great :)

@recris
Copy link
Author

recris commented Oct 9, 2023

I have removed mask rescaling by mean value. I am concerned that it can make loss magnitude fluctuate in a hard to predict manner because the loss would be scaled differently for each image and cause an "uneven" learning process.

Rescaling would give an increase to loss magnitude but I think we can achieve a similar effect just by adjusting the learning rate.

@sashasubbbb
Copy link

So what's the dataset configuration for masks?
Subfolder "mask" and greyscale .png with same file names inside?

What about multiple concepts:

dataset/
--5_obj1/
----------mask/
--10_obj2/
----------mask/

Is fine?

For some reason i'm running into this error using this configuration.

File "B:\AIimages\sd-scripts\sd-scripts\venv\lib\site-packages\torch\utils\data\dataset.py", line 243, in getitem
return self.datasets[dataset_idx][sample_idx]
File "B:\AIimages\sd-scripts\sd-scripts\library\train_util.py", line 1140, in getitem
masks.append(torch.tensor(mask))
UnboundLocalError: local variable 'mask' referenced before assignment

@recris
Copy link
Author

recris commented Oct 13, 2023

So what's the dataset configuration for masks? Subfolder "mask" and greyscale .png with same file names inside?

It will look for a folder named "mask" inside the dataset folder. Each mask image should have the same name as the training image, but end with .png. If no mask file is provided then it will use a default all-white mask (same as no mask).

Do you have this error when using a single dataset or multiple datasets?
What training method are you using, and what parameters?

Do you see any message related to mask files in the console? If a mask fails to load it should print something.

@sashasubbbb
Copy link

I somewhat figured it out, that error occured when using caching latents.

There were other errors about tensor size mismatch when i used all kinds of resolutions of pictures and masks (res of corresponding pictures and masks matched), but disappeared when i rescaled them all to 768,768. It's like it was trying to hook up different mask and got a resolution mismatch? Maybe that one occured because i use batch size of 3, instead of 1?

Also does training resolution must match resolution of dataset? Because when i tried to train on 512,512 with 768,768 dataset i've ran into an error.

So, to start a training i had to:

  1. Disable cache latents;
  2. Resize all dataset to training resolution.

@recris
Copy link
Author

recris commented Oct 14, 2023

I somewhat figured it out, that error occured when using caching latents.

There were other errors about tensor size mismatch when i used all kinds of resolutions of pictures and masks (res of corresponding pictures and masks matched), but disappeared when i rescaled them all to 768,768. It's like it was trying to hook up different mask and got a resolution mismatch? Maybe that one occured because i use batch size of 3, instead of 1?

Also does training resolution must match resolution of dataset? Because when i tried to train on 512,512 with 768,768 dataset i've ran into an error.

So, to start a training i had to:

1. Disable cache latents;

2. Resize all dataset to training resolution.

Right now it does not work with disk caching of latents, but in-memory caching should work.

Also I've been testing with a dataset having images of different sizes and is working fine. When the mask does not exactly match the corresponding image then it also should resize the mask automatically.

@cheald
Copy link

cheald commented Oct 16, 2023

One suggestion: when using masks, consider dividing loss by the mean of the mask. The idea is that you don't want a masked image's loss computation relative to the other samples affected by the portion of the image that is masked. For loss, you really care about how close the predicted noise got to the actual noise for only the masked pixels. Right now, the more masked an image is, the more its loss is damped, due to all the 0 MSE pixels included in the calculation.

In the current implementation, the mask "knocks out" the contribution to loss for anything but the masked pixels. This means that all else equal, images with less masking will have higher loss. By dividing the loss by the mean of the mask, you boost the observed loss by the same amount that the loss scalar is reduced due to zeroing out the masked pixels.

In my empirical tests, this vastly improved the results of my training. My implementation looks like this:

loss_div = 1.0
if args.masked_loss and batch['masks'] is not None:
    mask = get_latent_masks(batch['masks'], noise_pred.shape, noise_pred.device)
    noise_pred = noise_pred * mask
    target = target * mask
    loss_div = mask.mean()
    if loss_div == 0:
        loss_div = 1.0

loss = torch.nn.functional.mse_loss(noise_pred.float(), target.float(), reduction="none")
loss = loss.mean([1, 2, 3])

loss_weights = batch["loss_weights"]  # 各sampleごとのweight
loss = loss * loss_weights / loss_div

@recris
Copy link
Author

recris commented Oct 16, 2023

One suggestion: when using masks, consider dividing loss by the mean of the mask. The idea is that you don't want a masked image's loss computation relative to the other samples affected by the portion of the image that is masked. For loss, you really care about how close the predicted noise got to the actual noise for only the masked pixels. Right now, the more masked an image is, the more its loss is damped, due to all the 0 MSE pixels included in the calculation.

In the current implementation, the mask "knocks out" the contribution to loss for anything but the masked pixels. This means that all else equal, images with less masking will have higher loss. By dividing the loss by the mean of the mask, you boost the observed loss by the same amount that the loss scalar is reduced due to zeroing out the masked pixels.

In my empirical tests, this vastly improved the results of my training. My implementation looks like this:

loss_div = 1.0
if args.masked_loss and batch['masks'] is not None:
    mask = get_latent_masks(batch['masks'], noise_pred.shape, noise_pred.device)
    noise_pred = noise_pred * mask
    target = target * mask
    loss_div = mask.mean()
    if loss_div == 0:
        loss_div = 1.0

loss = torch.nn.functional.mse_loss(noise_pred.float(), target.float(), reduction="none")
loss = loss.mean([1, 2, 3])

loss_weights = batch["loss_weights"]  # 各sampleごとのweight
loss = loss * loss_weights / loss_div

This is something I had in a previous commit, but decided to roll back. I am not entirely sure dividing by the mean is the best "ratio", I guess we need further experimentation.

@AI-Casanova
Copy link
Contributor

My thought on dividing by mean:

Sure the overall loss will be lower with masks, but the goal is to speed up convergence by eliminating extraneous input, and step size can be compensated for by raising the overall learning rate.

Besides, adjusting a squared loss by a linear factor is far from perfect.

@TingTingin
Copy link
Contributor

how exactly does the mask work is it like this
black = ignore
white = include
grey = ignore 50%

@AI-Casanova
Copy link
Contributor

black = ignore
white = include
grey = ignore 50%

Precisely

Though I now think the mask should be applied to the loss itself instead of the noise, because (0.5x -0.5y)^2 != 0.5(x-y)^2

@recris
Copy link
Author

recris commented Oct 16, 2023

We can apply the mask to the noise, we just have to use sqrt(mask) for the same effect.

For gray regions we're just reducing the loss magnitude in those regions, so in practice what we have is kind of a dynamic learning rate. That is, a gray value of 0.5 means we cutting the LR in half.

@AI-Casanova
Copy link
Contributor

Yes we can sqrt mask

But I'm looking at clean implementation.

We can just take the unreduced loss, and call a function(loss, masks) ... return loss

@recris
Copy link
Author

recris commented Nov 13, 2023

Been using this a lot this weekend and had an idea for a "masked min". This would set the masked value to a minimum value so the background can be masked freely to extract the subject but we can set a minimum value to apply.

# Assuming new value `--masked_min=0.2` up to say 0.5 seems pretty good. 
if args.masked_min and torch.count_nonzero(mask) != torch.numel(noise_pred):
    minimum = torch.tensor([args.masked_min]).to(mask.device)
    minimum_mask = torch.gt(minimum, mask)
    mask = mask + (minimum * minimum_mask)

I'm still a little rough around the code here but hopefully it's enough to express what I mean. This helps to reduce a certain haloing effect that the trainings get around the masked subjects. It also helps that we don't need to make specific masks to enforce these minimums.

I also have this little bit of code to see the masks

def save_mask(mask, tag):
    import torchvision.transforms as T
    image = T.ToPILImage()(mask)
    image.save(f'{tag}.mask.png')

I recommend doing this kind of mask processing outside the training code, it is easy to do with a small python script. If we are going to add a new parameter every time we need a new operation the list would grow indefinitely.

@recris
Copy link
Author

recris commented Nov 13, 2023

Yeah, I've been thinking about how to keep some background too, @rockerBOO. I think some people won't realise that you have to keep some background, or reflective parts of your character (including just the ambient color of the light in the scene) won't be learned correctly, and the resulting learned subject won't seem to fit properly into scenes that you make with the model.

Even just the scale of the foreground object needs some background for context for the learning to work at its best.

I've been testing masks will an all black background and it seems to be working fine. However I do provide lighting "cues" in my captions which is probably helping.

@rockerBOO
Copy link
Contributor

I recommend doing this kind of mask processing outside the training code, it is easy to do with a small python script. If we are going to add a new parameter every time we need a new operation the list would grow indefinitely.

I do agree with this approach but in practice it's prohibitively complicated.

We'd make masks for the things we want but then would require creating new masks for the different minimums. Which requires some modifications of directories or putting originals elsewhere. It gets more complicated if you have many subset directories and need to modify each one separately. It can also cause false positive not knowing if you did all the masks correctly to the same value for testing purposes. I use upwards of 30 different dataset subsets which can make the process of individually modifying all the masks much more complicated and error prone. Scripting against my dataset_config toml is an option though but still need to manage different minimum masks for each subset.

I believe Kohya supports many options already for different things to allow a lot of experimentation to happen. If we happen along a path of indefinite growth of arguments the option of specific masked arguments --mask_args could be an option to allow the flexibility to happen more specifically in the masked direction. I do not think we should limit the options in these cases if they have good value.

In this case I'm recommending 1 option that makes a clear testable, reproducible result that reduces complication, false positives, and other mistakes that can confuse the trainer. Provided the code works properly of course.

Ultimately it's a suggestion so I'd be down with whatever is chosen. Thanks.

@recris
Copy link
Author

recris commented Nov 13, 2023

I understand your concern, in fact it is something I also struggled with... that is why I am building a set of tools to make preparing and modifying training data very easy. This is something I am planning to open-source soon.

@twister77
Copy link

twister77 commented Dec 3, 2023

@recris

Am i wrong or not, in a Runpod notebook my dataset is organized like this
/workspace/dataset
01.jpg
02.jpg

Does that mean that my masks should be organized like:
/workspace/dataset/mask
01.png
02.png

@recris
Copy link
Author

recris commented Dec 5, 2023

@recris

Am i wrong or not, in a Runpod notebook my dataset is organized like this /workspace/dataset 01.jpg 02.jpg

Does that mean that my masks should be organized like: /workspace/dataset/mask 01.png 02.png

correct

@araleza
Copy link

araleza commented Dec 9, 2023

I do agree with this approach but in practice it's prohibitively complicated.

Hello, I thought I'd throw in my opinion here. Which is easy for me to do, as I'm not the person doing any of the work. And I really appreciate that you people are actually doing the work here, so please don't mind me saying stuff.

I also think that the masks subdirectory seems like it'll be a burden to work with. I do realize that the idea of 'use the alpha channel of the training image' had difficulties when it came to knowing whether premultiplied alpha was being used in the source .png files or not.

But, premultiplied alpha does not affect any pixel that has an alpha of 0.0 or an alpha of 1.0. I doubt many people will care about alpha values that are not 0 or 1; they mostly want to just delete noisy objects from their training images. All you need is alpha=0 for the parts of the image that are to be deleted, and alpha=1 for the parts you want to keep.

If you just assumed either premult alpha was present in the .png files, or not present, either way would be good, because alpha=0 and alpha=1 works the same in both cases. Then, a command line parameter could be added to override that default if that was actually important to them, but I imagine no-one would ever care enough to use that flag.

Thank you for listening to my suggestion. And thank you for the work you've all done so far.

gesen2egee added a commit to gesen2egee/sd-scripts that referenced this pull request Feb 11, 2024
@kohya-ss
Copy link
Owner

I apologize for the long delay in responding to this PR.

I've implemented the masked loss functionality in the masked-loss branch. This implementation reuses the ControlNet dataset; therefore, mask images should be placed in a separate directory. Unfortunately, the alpha channel is not supported. I believe this approach is the simplest way to implement masked loss. Your understanding is greatly appreciated.

Currently, train_network.py and sdxl_train_network.py are configured to support masked loss.

I would be happy to assist with testing.

@araleza
Copy link

araleza commented Feb 27, 2024

Great to see this moving ahead, I feel it's a valuable feature for the training scripts. Thanks for getting this working. :)

Do you know if it is difficult to port it to sdxl_train.py as well, for fine tuning instead of generating a LoRA?

kohya-ss added a commit that referenced this pull request Feb 27, 2024
@kohya-ss
Copy link
Owner

I've update the branch to support masked loss with sdxl_train.py, but not tested yet.

@jordoh
Copy link

jordoh commented Mar 3, 2024

I've update the branch to support masked loss with sdxl_train.py, but not tested yet.

I ran a test test with SDXL, but the fine-tuned model produces (fully) black images. I'll try the same config without masking today to see if this is an issue with my config, but posting as a heads up in the meantime.

Using the follow command line (formatted for easier reading):

accelerate launch --num_cpu_threads_per_process=2
    ./sdxl_train.py
    --enable_bucket --min_bucket_reso=256 --max_bucket_reso=1024
    --pretrained_model_name_or_path="C:\Users\demo\ComfyUI\models\checkpoints\juggernautXL_v9Rundiffusionphoto2.safetensors"
    --dataset_config="C:\Users\demo\dataset.toml"
    --caption_extension=".txt"
    --resolution="1024,1024"
    --output_dir="C:\Users\demo\output" --logging_dir="C:\Users\demo\log"
    --save_model_as=safetensors --full_bf16 --output_name="hta-juggernautxl9-3"
    --max_token_length=225 --max_data_loader_n_workers="0"
    --learning_rate_te1="1e-05" --learning_rate_te2="1e-05" --learning_rate="1.0"
    --lr_scheduler="cosine" --lr_scheduler_num_cycles="10"
    --train_batch_size="1" --max_train_steps="6600"
    --save_every_n_epochs="1"
    --mixed_precision="bf16" --save_precision="bf16"
    --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01
    --min_snr_gamma=5 --noise_offset=0.0
    --log_with wandb --wandb_api_key="..."
    --gradient_checkpointing --cache_latents --cache_latents_to_disk
    --masked_loss

and dataset.toml:

[general]
batch_size = 1
enable_bucket = true
bucket_reso_steps = 64


[[datasets]]
resolution = [1024, 1024]
min_bucket_reso = 512
max_bucket_reso = 1024

	[[datasets.subsets]]
	image_dir = "C:\\Users\\demo\\img\\10_hta woman"
	caption_extension = ".txt"
	conditioning_data_dir = "C:\\Users\\demo\\mask"
	num_repeats = 10

It could be that I've mis-configured the dataset.toml, as this is the first time I've used one, but training did appear to be proceeding as expected, with W&B showing a reasonable average loss at 1k steps, though there was a very high initial spike that I haven't seen before.

@jordoh
Copy link

jordoh commented Mar 3, 2024

Testing today, I spotted the mix of Adafactor/DAdaptation settings I had in above command line (learning_rate, in particular). 🤦

SDXL support appears to be working well - training still running now but early checkpoints clearly showing learning of masked areas only. 👍

@araleza
Copy link

araleza commented Mar 5, 2024

Hi, I'm testing it out now for SDXL fine tuning.

One thing I think might help people get used to the feature is adding something to the debug output. Maybe an INFO line of 'no mask images' or '<x> mask images found', maybe next to the line about there being no regularization images found.

Nice work getting it in. And that future ControlNet-guided training sounds interesting too.

@araleza
Copy link

araleza commented Mar 6, 2024

So I gave this code a much stronger test today, and placed specific objects in a series of 10 test images, and then masked them out. And I found that they were being learned, so my masks were not working.

This was because I was not passing --masked_loss as a parameter to sdxl_train.py. But when I added this parameter, I just got an error message that didn't explain what to do:

Traceback (most recent call last):
  File "/home/zmx/m.2/Dev/sdxl/sd-scripts_mask/./sdxl_train.py", line 810, in <module>
    train(args)
  File "/home/zmx/m.2/Dev/sdxl/sd-scripts_mask/./sdxl_train.py", line 586, in train
    mask_image = batch["conditioning_images"].to(dtype=weight_dtype)[:, 0].unsqueeze(1)  # use R channel
KeyError: 'conditioning_images'

I then read @jordoh 's setup, above. It turns out that if you want to use the masked images, it seems that you must use a .toml file to set up the conditioning_data_dir, which should be set to the mask image directory. With that in place, the specific objects I had masked out stopped showing up in my sample output, so the masks do work.

I don't generally use .toml files, so would it be possible to add a command line option, --conditioning_data_dir, to achieve the same result? I don't think bmaltais's webui supports .toml files. Or, just look in the 'mask' subdirectory by default for the masks? That's how I thought it worked when I was doing training yesterday, but my masks weren't being used at all. It's easy to talk yourself into believing that they're having an effect, even when they're not. I really do think some debug INFO lines are important to clarify if masks have been found or not.

@araleza
Copy link

araleza commented Mar 6, 2024

After that, I tried copying that .toml from my directory with 10 test images to my real training dataset. I updated the paths inside the .toml for the images and masks, but when I run sdxl_train.py, I just get:

Traceback (most recent call last):
  File "/home/zmx/m.2/Dev/sdxl/sd-scripts_mask/./sdxl_train.py", line 810, in <module>
    train(args)
  File "/home/zmx/m.2/Dev/sdxl/sd-scripts_mask/./sdxl_train.py", line 166, in train
    blueprint = blueprint_generator.generate(user_config, args, tokenizer=[tokenizer1, tokenizer2])
  File "/home/zmx/m.2/Dev/sdxl/sd-scripts_mask/library/config_util.py", line 395, in generate
    sanitized_user_config = self.sanitizer.sanitize_user_config(user_config)
  File "/home/zmx/m.2/Dev/sdxl/sd-scripts_mask/library/config_util.py", line 358, in sanitize_user_config
    return self.user_config_validator(user_config)
  File "/home/zmx/m.2/Dev/sdxl/sd-scripts_mask/venv/lib/python3.10/site-packages/voluptuous/schema_builder.py", line 272, in __call__
    return self._compiled([], data)
  File "/home/zmx/m.2/Dev/sdxl/sd-scripts_mask/venv/lib/python3.10/site-packages/voluptuous/schema_builder.py", line 595, in validate_dict
    return base_validate(path, iteritems(data), out)
  File "/home/zmx/m.2/Dev/sdxl/sd-scripts_mask/venv/lib/python3.10/site-packages/voluptuous/schema_builder.py", line 433, in validate_mapping
    raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: extra keys not allowed @ data['datasets'][0]['subsets'][0]['conditioning_data_dir']

I have no idea what this error means. It goes away if I delete the conditioning_data_dir = "[...]" bit from my .toml file, but then the masks won't work of course.

Any thoughts what the cause / fix for this error message is?

@cheald
Copy link

cheald commented Mar 6, 2024

Can you show your dataset toml?

Here's an example of what's working for me:

# dataset.toml
[[datasets]]
  [[datasets.subsets]]
  num_repeats = 2
  caption_extension = ".txt"
  image_dir = "/path/to/images/subdir1"
  conditioning_data_dir = "/path/to/images/subdir1/masks"

  [[datasets.subsets]]
  num_repeats = 4
  caption_extension = ".txt"
  image_dir = "/path/to/images/subdir2"
  conditioning_data_dir = "/path/to/images/subdir2/masks"

You could then add --dataset_config /path/to/dataset.toml to the "Additional Parameters" field in the advanced tab.

@araleza
Copy link

araleza commented Mar 6, 2024

Thanks for asking, @cheald . Here's my dataset.toml:

[[datasets]]
 
resolution = [1024, 1024]
min_bucket_reso = 32
max_bucket_reso = 1024

	[[datasets.subsets]]
	image_dir = "/home/zmx/m.2/Dev/sdxl/training/terra/kohya/img/1_terra planet/"
	caption_extension = ".txt"
	conditioning_data_dir = "/home/zmx/m.2/Dev/sdxl/training/terra/kohya/img/1_terra planet/mask/"
	num_repeats = 1

I've tried it both with and without the trailing slashes on the ends of the paths. Also, I've copied and pasted those directories into Ubuntu Dolphin, to check they definitely exist, and they do.

This same .toml worked for my test dataset of 10 images with 10 mask images. It just doesn't work for my actual dataset, which has far more images. I don't have masks for all of them yet, but that's not supposed to be necessary.

@cheald
Copy link

cheald commented Mar 6, 2024

If it's not expecting conditioning_data_dir then that'd suggest to me that you don't have --masked_loss set. Are you sure you set that for your real data run? ConfigSanitizer only accepts conditioning_data_dir when support_controlnet is set, and during lora training that's set by the args.masked_loss flag.

The trainer will expect masks for each of your inputs (and no extras), but you'll get an error message about that once you get past the schema checks.

@araleza
Copy link

araleza commented Mar 6, 2024

If it's not expecting conditioning_data_dir then that'd suggest to me that you don't have --masked_loss set.

Aha yep that's it. I had that --masked_loss parameter set for my 10 image test, but I forgot to copy it across to my real training set. Thanks, @cheald .

It might be worth upgrading that error message, or having --masked_loss set implicitly when people pass in a conditioning_data_dir.

@araleza
Copy link

araleza commented Mar 7, 2024

I found an issue that if the training image is a .jpg file, it seems that the mask image also has to be a .jpg, which is not likely to be something you want to do.

You get this error message if you have a .jpg for the training image and a .png for the mask image:

AssertionError: missing conditioning data for 1 images: ['terra_landscape_1.jpg']

@kohya-ss
Copy link
Owner

I found an issue that if the training image is a .jpg file, it seems that the mask image also has to be a .jpg, which is not likely to be something you want to do.

Thank you for letting me know. I will fix it sooner.

@Serallan
Copy link

Serallan commented Apr 2, 2024

Is it possible to use masked loss with a regularization subset? Ideally masks for regularization images also, but for now I'd be happy if it worked just for the dataset when there are also regularization images.

Calling the test script (from a notebook):

!accelerate launch --num_cpu_threads_per_process=2 "sd-scripts/train_network.py" \
    --dataset_config=$dataset"/dataset.toml" \
    --pretrained_model_name_or_path=$models"/dreamshaper_8.safetensors" \
    --resolution=768,768 \
    --output_dir=$output"/model" \
    --logging_dir=$output"/log" \
    --save_model_as=safetensors \
    --network_module=networks.lora \
    --xformers \
    --output_name="msk00" \
    --optimizer_type="AdamW8bit" \
    --lr_scheduler="constant" \
    --masked_loss \
    --seed=4321

Toml:

[[datasets]]
  [[datasets.subsets]]
  num_repeats = 2
  caption_extension = ".txt"
  image_dir = "/home/studio-lab-user/sagemaker-studiolab-notebooks/datasets/test_img"
  conditioning_data_dir = "/home/studio-lab-user/sagemaker-studiolab-notebooks/datasets/test_img/mask"
  
  [[datasets.subsets]]
  num_repeats = 1
  caption_extension = ".txt"
  is_reg = true
  image_dir = "/home/studio-lab-user/sagemaker-studiolab-notebooks/datasets/test_reg"
  conditioning_data_dir = "/home/studio-lab-user/sagemaker-studiolab-notebooks/datasets/test_reg/mask"

When the reg subset has a conditioning_data_dir:

raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: extra keys not allowed @ data['datasets'][0]['subsets'][1]['is_reg']

When conditioning_data_dir is removed from reg:

raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: extra keys not allowed @ data['datasets'][0]['subsets'][0]['conditioning_data_dir']

It runs when the entire reg subset is removed.

@recris
Copy link
Author

recris commented Apr 2, 2024

I've done some tests using the dev branch, and it seems to be working well for me, at least for SDXL LoRA training.

I will close this PR once the feature lands in main branch.

@kohya-ss
Copy link
Owner

kohya-ss commented Apr 4, 2024

Unfortunately the dataset with conditioning_data_dir doesn't support is_reg. Please set num_repeats of each dataset so that 'number of images * num_repeats' are the same. This is very similar to use reg images.

@Serallan
Copy link

Serallan commented Apr 5, 2024

Unfortunately the dataset with conditioning_data_dir doesn't support is_reg. Please set num_repeats of each dataset so that 'number of images * num_repeats' are the same. This is very similar to use reg images.

Thanks for clearing this up! Once I tried to use reg images as training images and results were pretty different from a run with them as reg, but I didn't alter repeats. When you mean to set repeats so they're the same you mean that if I originally have something like:

150 images in 2_trainingImages (150 * 2 = 300)
50 images in 1_regImages (50 * 1 = 50)

To achieve the closest result (and be able to use masks!) then I must turn them into:
2_trainingImages (150 * 2 = 300)
6_regImages (now as training imgs) (50 * 6 = 300)
?

Dumb example with weird unnecessary repeats I know, I just want to make sure I understood it correctly.

@recris recris closed this Apr 7, 2024
@recris recris deleted the masked-loss-rebase branch April 7, 2024 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.