-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
which blas #85
Comments
Thank you for sharing. These are interesting. It is unclear to me what is "accelerate" in the benchmarks. Maybe you can elaborate on what they wrote. Finally, isuruf recommended:
|
Veclib is from Accelerate while vortex and zen are from OpenBLAS. However, this is besides the point for now --- we would need to show something about PyTorch in particular before moving forward. This is just additional info as I am trying to better understand this :) Again, my inspiration was from my work on mxnet where they seemed to have a clear hierarchy. I would like to understand if that's just for packaging purposes (avoiding shipping yet another BLAS) or for actual performance reasons. One thing is really clear though: MKL is really the choice wherever possible, so you're making the perfect call here with forcing everything to use MKL (obviously MKL isn't available on osx-arm64 and that's the only narrow case we would be changing) |
Just to expand on @isuruf's comment. I believe I understand where he comes from on two fronts:
|
Regarding the first point above, this will require a lot more work and may not be possible after all --- I think pytorch requires setting a BLAS clearly and maybe it would not work if you can just switch back and forth. Again, see #82. So our option would likely be to build with explicit CMAKE flag settings, I believe. Regarding the second point, that's an important and valid reason to avoid Accelerate altogether. |
I'm not a blas expert, but compiling with netlib allows for the switch. If you compile with the underlying library straight, you cannot switch. Typically you need a bit of trickery. For what its worth: pytorch has been open to PRs that allow customization like this. |
Good call, I will look around and open an issue upstream to ask about |
closing this as we have two other issues addressing the same exact issue |
I'm going to refer to the comment made #83 (comment)
I would suggest, that for this kind of optimization, that you demonstrate some real gains to be had by moving to accelerate. Maybe you can provide a benchmark showing the improvement? It is pretty easy to build things yourself and upload them to your channel. I strongly suggest that instead of waiting too long on the community for your own work anyway.
once we've established the gains, we can work on integrating it here
Originally posted by @hmaarrfk in #84 (comment)
The text was updated successfully, but these errors were encountered: