-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large ops with multiprocess fail with internal error #1231
Comments
Hmm this has nothing to do with distributed processing unfortunately. The following also fails with the same error
|
I guess since this is somewhat rare.. and there isn't much if anything we can do in MLX, I will close this for now :. |
Some possible work-arounds: The internal error signifies that the GPU kernel timed out. One way to fix that is to decrease the size of the operations: For example if you are using LoRA, try
Another possible fix is to decrease the number of operations per command buffer, this may work in some cases but may also slow things down. You can do that like so:
|
Run the following with:
Fails on an M2 Ultra with:
The text was updated successfully, but these errors were encountered: