Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
full fp16 will cause some unstable problem.
And if the GPU support bf16, use full bf16 will be better (bf16 weight + bf16 amp + bf16 grad)
The only problem of this thing is, if user want to use optimizer in the bitsandbytes, they will need the newest version to utilize full bf16 training.
Which is bad nes for windows user. (or we need to compile the newest bitsandbytes for windows, which could be tricky)
so just a tiny start for a useful feature, need more improvement. (like check if the gpu support bf16, if the bitsandbytes support bf16 grad ...)