-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed the issue on flux dreambooth lora training #9549
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, just unsure about the commented line with the condition over last validation (also mentioned in #9476 (comment))
@sayakpaul if it looks ok to you we can merge
Thank you for reviewing, @linoytsaban !! I'm not sure if this will be helpful, but when I used just |
@jeongiin did you check non nan of |
To use autocast successfully with Flux during validation inference, we need to pre-compute the text embeddings because, otherwise, T5-xxl doesn't really work under autocast. See how it's handled in:
|
Thank you for good advice, @icsl-Jeon and @sayakpaul ! |
@jeongiin there's another adjacent PR here: #9565 cc: @icsl-Jeon |
@jeongiin thank you for the changes but as mentioned in #9549 (comment), we need to handle the autocasting a bit differently. Let me know if anything is unclear. |
Hello! I apologize for the delay! @sayakpaul If I understand correctly, you're suggesting that further revisions may be needed, referring to #9565 and #9549's comment. Would the issue not be resolved with just the addition of Would you mind clarifying if there's something I might have overlooked? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Should we close this now if it is fixed by #9565? @sayakpaul @linoytsaban |
Yes this can be closed. Sorry @jeongiin for the delay on our side but we appreciate your willingness to help us. |
I haven't had enough GPU available lately so I can't verify this. :( |
What does this PR do?
Fixes #9548
I resolved the issue by changing the autocast context from
nullcontext()
totorch.autocast(accelerator.device.type, dtype=torch_dtype)
. This adjustment ensures that the model properly uses the correct mixed-precision mode during validation, preventing the errors I encountered.I performed the test with dog images as suggested in the
README_flux.md
, and I also confirmed that the results were successfully uploaded towandb
:Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@sayakpaul