Fixed the issue on flux dreambooth lora training #9549

jeongiin · 2024-09-28T15:57:42Z

What does this PR do?

I resolved the issue by changing the autocast context from nullcontext() to torch.autocast(accelerator.device.type, dtype=torch_dtype). This adjustment ensures that the model properly uses the correct mixed-precision mode during validation, preventing the errors I encountered.

I performed the test with dog images as suggested in the README_flux.md, and I also confirmed that the results were successfully uploaded to wandb:

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul

linoytsaban · 2024-09-30T10:12:54Z

Thanks @jeongiin! looks like this will also fix #9476

linoytsaban

looks good to me, just unsure about the commented line with the condition over last validation (also mentioned in #9476 (comment))
@sayakpaul if it looks ok to you we can merge

jeongiin · 2024-09-30T13:16:56Z

Thank you for reviewing, @linoytsaban !!

I'm not sure if this will be helpful, but when I used just autocast_ctx = torch.autocast(accelerator.device.type) to solve the issue, a problem similar to issue #9558 occurred. I was using wandb, black images were uploaded. Like this:

icsl-Jeon · 2024-10-01T12:01:45Z

@jeongiin did you check non nan of prompt_embeds from T5?
This might due to nan value of your pipeline output.

sayakpaul · 2024-10-02T13:41:52Z

To use autocast successfully with Flux during validation inference, we need to pre-compute the text embeddings because, otherwise, T5-xxl doesn't really work under autocast.

See how it's handled in:

diffusers/examples/controlnet/train_controlnet_flux.py

Line 139 in 61d3764

    
           # pre calculate  prompt embeds, pooled prompt embeds, text ids because t5 does not support autocast

jeongiin · 2024-10-02T15:46:20Z

Thank you for good advice, @icsl-Jeon and @sayakpaul !
I will follow @sayakpaul 's advice and apply this to train_dreambooth_lora_flux.py.

sayakpaul · 2024-10-03T04:20:27Z

@jeongiin there's another adjacent PR here: #9565

cc: @icsl-Jeon

sayakpaul · 2024-10-06T06:17:38Z

@jeongiin thank you for the changes but as mentioned in #9549 (comment), we need to handle the autocasting a bit differently. Let me know if anything is unclear.

jeongiin · 2024-10-12T07:08:31Z

Hello! I apologize for the delay! @sayakpaul

If I understand correctly, you're suggesting that further revisions may be needed, referring to #9565 and #9549's comment.

Would the issue not be resolved with just the addition of torch.autocast(accelerator.device.type, dtype=torch_dtype)?
In my testing, there didn't seem to be any problems with my modification.

Would you mind clarifying if there's something I might have overlooked?

sayakpaul · 2024-10-12T07:33:50Z

@jeongiin thanks a lot for your contributions! Could you maybe check if #9565 solves the problems this PR is trying to address?

github-actions · 2024-11-05T15:02:58Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu · 2024-11-05T17:09:01Z

Should we close this now if it is fixed by #9565? @sayakpaul @linoytsaban

sayakpaul · 2024-11-05T22:28:19Z

Yes this can be closed. Sorry @jeongiin for the delay on our side but we appreciate your willingness to help us.

jeongiin · 2024-11-06T05:23:44Z

I haven't had enough GPU available lately so I can't verify this. :(
It's disappointing, but I'll check it out next time I get a chance! Thank you for your help!

fix the issue on flux dreambooth lora training

bbc9179

sayakpaul requested a review from linoytsaban September 28, 2024 16:00

linoytsaban mentioned this pull request Sep 30, 2024

train_dreambooth_lora_flux validation RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same #9476

Open

linoytsaban approved these changes Sep 30, 2024

View reviewed changes

Merge branch 'main' into main

6800885

sayakpaul mentioned this pull request Oct 3, 2024

Handling mixed precision for dreambooth flux lora training #9565

Merged

6 tasks

jeongiin added 3 commits October 5, 2024 14:29

Merge branch 'huggingface:main' into main

37ee764

update : origin main code

63840b7

fix the issue on flux dreambooth lora training

d2a0b12

github-actions bot added the stale Issues that haven't received updates label Nov 5, 2024

sayakpaul closed this Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed the issue on flux dreambooth lora training #9549

Fixed the issue on flux dreambooth lora training #9549

jeongiin commented Sep 28, 2024

linoytsaban commented Sep 30, 2024

linoytsaban left a comment

jeongiin commented Sep 30, 2024 •

edited

Loading

icsl-Jeon commented Oct 1, 2024

sayakpaul commented Oct 2, 2024

jeongiin commented Oct 2, 2024

sayakpaul commented Oct 3, 2024

sayakpaul commented Oct 6, 2024

jeongiin commented Oct 12, 2024

sayakpaul commented Oct 12, 2024

github-actions bot commented Nov 5, 2024

yiyixuxu commented Nov 5, 2024

sayakpaul commented Nov 5, 2024

jeongiin commented Nov 6, 2024

Fixed the issue on flux dreambooth lora training #9549

Fixed the issue on flux dreambooth lora training #9549

Conversation

jeongiin commented Sep 28, 2024

What does this PR do?

Before submitting

Who can review?

linoytsaban commented Sep 30, 2024

linoytsaban left a comment

Choose a reason for hiding this comment

jeongiin commented Sep 30, 2024 • edited Loading

icsl-Jeon commented Oct 1, 2024

sayakpaul commented Oct 2, 2024

jeongiin commented Oct 2, 2024

sayakpaul commented Oct 3, 2024

sayakpaul commented Oct 6, 2024

jeongiin commented Oct 12, 2024

sayakpaul commented Oct 12, 2024

github-actions bot commented Nov 5, 2024

yiyixuxu commented Nov 5, 2024

sayakpaul commented Nov 5, 2024

jeongiin commented Nov 6, 2024

jeongiin commented Sep 30, 2024 •

edited

Loading