implement accelerate for osx-arm64 #88

ngam · 2022-02-04T16:56:31Z

This PR only changes netlib implementation for osx-arm64. Everything else remains the same. Skipping all builds but osx for confirmation.

Fixes #82

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

conda-forge-linter · 2022-02-04T16:56:34Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

ngam · 2022-02-04T17:24:52Z

@hmaarrfk please review if changes are acceptable.

@isuruf could you please double-check if I did this correctly?

Thank you both!!

ngam · 2022-02-04T17:25:22Z

(I will test this locally and put the results here in a bit)

Okay results: Essentially, the build seems to hardcode to either/or, so in this case it just goes to Accelerate. However, the good news is, it doesn't bomb like before. Note I am not really sure if this is hardcoding or not --- it could just be that the config is just printed this way and I have no idea how to check this further. I can only say that it seems it hardcodes because of this https://github.com/pytorch/pytorch/blob/v1.10.2/cmake/Modules/FindBLAS.cmake

Anyway, imo, this could/should be merged.

Details with Accelerate:

(ptps_ac) ~$ mamba list
# packages in environment at /Users/ngam/.Mambaforge-MacOSX-arm64/envs/ptps_ac:
#
# Name                    Version                   Build  Channel
bzip2                     1.0.8                h3422bc3_4    conda-forge
ca-certificates           2021.10.8            h4653dfc_0    conda-forge
cffi                      1.15.0           py39h52b1de0_0    conda-forge
future                    0.18.2           py39h2804cbe_4    conda-forge
libblas                   3.9.0           13_osxarm64_accelerate    conda-forge
libcblas                  3.9.0           13_osxarm64_accelerate    conda-forge
libcxx                    12.0.1               h168391b_1    conda-forge
libffi                    3.4.2                h3422bc3_5    conda-forge
libgfortran               5.0.0.dev0      11_0_1_hf114ba7_23    conda-forge
libgfortran5              11.0.1.dev0         hf114ba7_23    conda-forge
liblapack                 3.9.0           13_osxarm64_accelerate    conda-forge
liblapacke                3.9.0           13_osxarm64_accelerate    conda-forge
libprotobuf               3.19.4               hccf11d3_0    conda-forge
libzlib                   1.2.11            hee7b306_1013    conda-forge
llvm-openmp               12.0.1               hf3c4609_1    conda-forge
ncurses                   6.3                  hc470f4d_0    conda-forge
ninja                     1.10.2               hc021e02_1    conda-forge
numpy                     1.22.2           py39h61a45d2_0    conda-forge
openssl                   3.0.0                h3422bc3_2    conda-forge
pip                       22.0.3             pyhd8ed1ab_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
python                    3.9.10          h38ef502_2_cpython    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytorch                   1.10.2          cpu_py39h0d1fb64_0    ngam
readline                  8.1                  hedafd6a_0    conda-forge
setuptools                60.7.1           py39h2804cbe_0    conda-forge
sleef                     3.5.1                h156473d_2    conda-forge
sqlite                    3.37.0               h72a2b83_0    conda-forge
tk                        8.6.11               he1e0b03_1    conda-forge
typing_extensions         4.0.1              pyha770c72_0    conda-forge
tzdata                    2021e                he74cb21_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.5                h642e427_1    conda-forge
zlib                      1.2.11            hee7b306_1013    conda-forge
(ptps_ac) ~$ python
Python 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:25:34) 
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__config__.show()
'PyTorch built with:\n  - GCC 4.2\n  - C++ Version: 201402\n  - clang 11.1.0\n  - OpenMP 201811\n  - LAPACK is enabled (usually provided by MKL)\n  - NNPACK is enabled\n  - CPU capability usage: NO AVX\n  - Build settings: BLAS_INFO=accelerate, BUILD_TYPE=Release, CXX_COMPILER=/Users/ngam/Repos/pytorch-cpu-feedstock/miniforge3/conda-bld/pytorch-recipe_1644000430681/_build_env/bin/arm64-apple-darwin20.0.0-clang++, CXX_FLAGS=-ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -stdlib=libc++  -std=c++14 -fmessage-length=0 -isystem /Users/ngam/Repos/pytorch-cpu-feedstock/miniforge3/conda-bld/pytorch-recipe_1644000430681/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include -fdebug-prefix-map=/Users/ngam/Repos/pytorch-cpu-feedstock/miniforge3/conda-bld/pytorch-recipe_1644000430681/work=/usr/local/src/conda/pytorch-1.10.2 -fdebug-prefix-map=/Users/ngam/Repos/pytorch-cpu-feedstock/miniforge3/conda-bld/pytorch-recipe_1644000430681/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol=/usr/local/src/conda-prefix -Wno-deprecated-declarations -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp=libomp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-missing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-unused-private-field -Wno-missing-braces -Wno-c++14-extensions -Wno-constexpr-not-const, LAPACK_INFO=accelerate, TORCH_VERSION=1.10.2, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, \n'
>>>

Details with OpenBLAS:

(ptps_op) ~$ mamba list
# packages in environment at /Users/ngam/.Mambaforge-MacOSX-arm64/envs/ptps_op:
#
# Name                    Version                   Build  Channel
bzip2                     1.0.8                h3422bc3_4    conda-forge
ca-certificates           2021.10.8            h4653dfc_0    conda-forge
cffi                      1.15.0           py39h52b1de0_0    conda-forge
future                    0.18.2           py39h2804cbe_4    conda-forge
libblas                   3.9.0           13_osxarm64_openblas    conda-forge
libcblas                  3.9.0           13_osxarm64_openblas    conda-forge
libcxx                    12.0.1               h168391b_1    conda-forge
libffi                    3.4.2                h3422bc3_5    conda-forge
libgfortran               5.0.0.dev0      11_0_1_hf114ba7_23    conda-forge
libgfortran5              11.0.1.dev0         hf114ba7_23    conda-forge
liblapack                 3.9.0           13_osxarm64_openblas    conda-forge
liblapacke                3.9.0           13_osxarm64_openblas    conda-forge
libopenblas               0.3.18          openmp_h5dd58f0_0    conda-forge
libprotobuf               3.19.4               hccf11d3_0    conda-forge
libzlib                   1.2.11            hee7b306_1013    conda-forge
llvm-openmp               12.0.1               hf3c4609_1    conda-forge
ncurses                   6.3                  hc470f4d_0    conda-forge
ninja                     1.10.2               hc021e02_1    conda-forge
numpy                     1.22.2           py39h61a45d2_0    conda-forge
openssl                   3.0.0                h3422bc3_2    conda-forge
pip                       22.0.3             pyhd8ed1ab_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
python                    3.9.10          h38ef502_2_cpython    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytorch                   1.10.2          cpu_py39h0d1fb64_0    ngam
readline                  8.1                  hedafd6a_0    conda-forge
setuptools                60.7.1           py39h2804cbe_0    conda-forge
sleef                     3.5.1                h156473d_2    conda-forge
sqlite                    3.37.0               h72a2b83_0    conda-forge
tk                        8.6.11               he1e0b03_1    conda-forge
typing_extensions         4.0.1              pyha770c72_0    conda-forge
tzdata                    2021e                he74cb21_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.5                h642e427_1    conda-forge
zlib                      1.2.11            hee7b306_1013    conda-forge
(ptps_op) ~$ python
Python 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:25:34) 
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__config__.show()
'PyTorch built with:\n  - GCC 4.2\n  - C++ Version: 201402\n  - clang 11.1.0\n  - OpenMP 201811\n  - LAPACK is enabled (usually provided by MKL)\n  - NNPACK is enabled\n  - CPU capability usage: NO AVX\n  - Build settings: BLAS_INFO=accelerate, BUILD_TYPE=Release, CXX_COMPILER=/Users/ngam/Repos/pytorch-cpu-feedstock/miniforge3/conda-bld/pytorch-recipe_1644000430681/_build_env/bin/arm64-apple-darwin20.0.0-clang++, CXX_FLAGS=-ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -stdlib=libc++  -std=c++14 -fmessage-length=0 -isystem /Users/ngam/Repos/pytorch-cpu-feedstock/miniforge3/conda-bld/pytorch-recipe_1644000430681/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include -fdebug-prefix-map=/Users/ngam/Repos/pytorch-cpu-feedstock/miniforge3/conda-bld/pytorch-recipe_1644000430681/work=/usr/local/src/conda/pytorch-1.10.2 -fdebug-prefix-map=/Users/ngam/Repos/pytorch-cpu-feedstock/miniforge3/conda-bld/pytorch-recipe_1644000430681/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol=/usr/local/src/conda-prefix -Wno-deprecated-declarations -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp=libomp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-missing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-unused-private-field -Wno-missing-braces -Wno-c++14-extensions -Wno-constexpr-not-const, LAPACK_INFO=accelerate, TORCH_VERSION=1.10.2, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, \n'
>>>

hmaarrfk · 2022-02-05T13:34:14Z

Could you add a constraint like:

- blas *=accelerate   # [osx and arm64 and py==38]
- blas *=openblas     # [osx and arm64 and py==39]

And run the tests locally? this would somewhat build up a minimal test matrix.

ngam · 2022-02-05T14:04:57Z

Could you add a constraint like:
- blas *=accelerate   # [osx and arm64 and py==38]
- blas *=openblas     # [osx and arm64 and py==39]
And run the tests locally? this would somewhat build up a minimal test matrix.

Yes, but what's the point? We already know what this will lead to: same exact outcome. The reason: 568f298.

If we want to build a matrix, then we need to base it on the CMAKE flag. Happy to go that route, but I believe it's just too much work for nothing -- this is already a challenging build. I think it is better to just default osx-arm to accelerate for now. But if you want me to go ahead and build matrix for arm64, I am happy to do so --- again, it would need to be done with a control flow on the cmake flag, I believe.

ngam · 2022-02-05T14:07:19Z

To clarify: 568f298 removes instructing it to find a specific BLAS. So it goes through its list: MKL, BLIS, Accelerate, and then OpenBLAS (and maybe some others, see here https://github.com/pytorch/pytorch/blob/v1.10.2/cmake/Modules/FindBLAS.cmake). So unless we specify that cmake flag, it will just repeat the same process again: MKL, BLIS, Accelerate, OpenBLAS, etc. --- note: Accelerate is always part of the macos SDK, so it will always be discovered before OpenBLAS unless instructed otherwise.

hmaarrfk · 2022-02-05T14:49:36Z

don't we want it to therefore find netlib?

ngam · 2022-02-05T14:56:11Z

don't we want it to therefore find netlib?

Maybe we can force it to find 'generic'?

# Generic BLAS library?
if((NOT BLAS_LIBRARIES)
    AND ((NOT WITH_BLAS) OR (WITH_BLAS STREQUAL "generic")))
  check_fortran_libraries(
  BLAS_LIBRARIES
  BLAS
  sgemm
  ""
  "blas")
  if (BLAS_LIBRARIES)
    set(BLAS_INFO "generic")
  endif(BLAS_LIBRARIES)
endif()

https://github.com/pytorch/pytorch/blob/71f889c7d265b9636b93ede9d651c0a9c4bee191/cmake/Modules/FindBLAS.cmake#L279-L291

ngam · 2022-02-05T14:59:49Z

If this 'generic' trick works, we can implement it for all builds...

ngam · 2022-02-05T15:40:31Z

@isuruf does this count as netlib?

-- Using BLAS: /Applications/Xcode_12.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk/usr/lib/libblas.tbd

Compare this to:

  --   Library Accelerate: /Applications/Xcode_12.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk/System/Library/Frameworks/Accelerate.framework

ngam · 2022-02-05T15:41:23Z

I don't think so... https://conda-forge.org/docs/maintainer/knowledge_base.html?highlight=mesa#how-it-works

ngam · 2022-02-05T15:54:03Z

Trying to find the way isuruf implemented this, I remember seeing .tbd in his hacks :)

ngam · 2022-02-05T15:56:18Z

https://github.com/conda-forge/blas-feedstock/pull/82/files

ngam · 2022-02-05T16:40:14Z

lrwxr-xr-x    1 root  wheel     114 Dec 14 19:47 libblas.tbd -> ../../System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd

lrwxr-xr-x    1 root  wheel     114 Dec 14 19:47 libcblas.tbd -> ../../System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.tbd

I am going to revert back to the default since this won't work I think. I will give it sometime for people to weigh in

ngam · 2022-02-05T16:41:01Z

Or I can build a matrix... will revisit later

conda-forge-linter · 2022-02-08T07:22:10Z

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe) and found some lint.

Here's what I've got...

For recipe:

Failed to even lint the recipe, probably because of a conda-smithy bug 😢. This likely indicates a problem in your meta.yaml, though. To get a traceback to help figure out what's going on, install conda-smithy and run conda smithy recipe-lint . from the recipe directory.

…nda-forge-pinning 2022.02.07.06.12.16

ngam · 2022-02-08T07:34:41Z

conda smithy recipe-lint .

pytorch-cpu-feedstock/recipe$ conda smithy recipe-lint .
. is in fine form

🤷

ngam · 2022-02-08T07:53:37Z

@hmaarrfk, I thought about this and did a little more looking around. Now I believe this is likely not worth it and so I am going to abandon this for now. The reason is that this the benefit of defaulting to Accelerate don't really materialize for PyTorch (at least not yet). They're supposedly working on supporting the new Apple GPUs (Metal) and we can try again then. For now, I just think this shouldn't be a priority.

Closing. Happy to revisit again if other people really want this...

hmaarrfk · 2022-02-08T09:46:41Z

Interesting findings!

ngam · 2022-02-08T16:06:15Z

I think as part of their metal effort, they might reorganize things and so we are just better off waiting until then, especially that this is an osx-arm64 issue/improvement only. They have a big central issue upstream with a lot of annoying Apple fans asking for updates incessantly 😆

jkleckner · 2023-07-28T14:15:36Z

At the risk of necroposting, this approach in numpy with wrappers is interesting [1].
Some of the benchmarks show huge improvement.

Note that MacOS 13.3 improved netlib compatibility [2]:

The BLAS and LAPACK libraries under the Accelerate framework are now inline with reference version 3.9.1. These new interfaces provide additional functionality and a new ILP64 interface. To use the new interfaces, define ACCELERATE_NEW_LAPACK before including the Accelerate or vecLib headers. For ILP64 interfaces, also define ACCELERATE_LAPACK_ILP64.

[1] numpy/numpy#24053
[2] https://developer.apple.com/documentation/macos-release-notes/macos-13_3-release-notes#New-Features

Cross links to:

[3] #85
[4] #82

jkleckner · 2023-07-30T22:48:46Z

Closing. Happy to revisit again if other people really want this...

A conda environment that includes pytorch forces the use of openblas rather than the up to 10x faster implementation MacOS 13.3 accelerate BLAS [1]. If you don't use pytorch in your conda environment, numpy works great with the accelerate BLAS but you end up having two environments, one for pytorch and GPU and one for numpy and not GPU.

How hard/complex would it be to add pytorch builds specific to MacOS 13.3+ that enable the flags to use the accelerate BLAS? It is a bit unfortunate that Apple ties this important performance improvement into the OS version...

[1] numpy/numpy#24053

isuruf · 2023-07-30T23:15:16Z

You can already use conda install pytorch blas=*=*accelerate. It gives you only BLAS functions with accelerate. LAPACK is still provided by netlib.

jkleckner · 2023-07-31T02:21:58Z

You can already use conda install pytorch blas=*=*accelerate. It gives you only BLAS functions with accelerate. LAPACK is still provided by netlib.

@isuruf I thought I tried that. I just re-ran this script and get the same output as before my post with the error below.
Attaching the script and perhaps you can point out my error?

Traceback (most recent call last):
  File "<stdin>", line 7, in <module>
  File "/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/__init__.py", line 229, in <module>
    from torch._C import *  # noqa: F403
ImportError: dlopen(/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/_C.cpython-310-darwin.so, 0x0002): Library not loaded: @rpath/libopenblas.0.dylib
  Referenced from: <51247470-C91F-3562-843D-7DDA002EA002> /opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/libtorch_cpu.dylib
  Reason: tried: '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/../../../../libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/../../../../libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/../../../../libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/../../../../libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/../../../libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/lib/libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/python3.10/site-packages/torch/../../../libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/bin/../lib/libopenblas.0.dylib' (no such file), '/opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/bin/../lib/libopenblas.0.dylib' (no such file), '/usr/local/lib/libopenblas.0.dylib' (no such file), '/usr/lib/libopenblas.0.dylib' (no such file, not in dyld cache)

try_pytorch_alone.txt

jkleckner · 2023-07-31T02:53:00Z

Side note, this is a great read for context of these libraries: https://pypackaging-native.github.io/key-issues/native-dependencies/blas_openmp/

ngam · 2023-07-31T03:33:52Z

I didn’t review the above carefully… but my recollection: We tried this before and things didn’t turn out well super well. I’d say we need to rebuild more carefully… and we likely need to take care of deps like numpy and scipy as well… @jkleckner if you’re interested in helping, please try to follow the logic here and elsewhere and we/I can try to help

isuruf · 2023-07-31T03:48:26Z

@jkleckner, ah I thought we had #175 merged. Until that PR is merged you can do ln -sf $CONDA_PREFIX/lib/libcblas.3.dylib $CONDA_PREFIX/lib/libopenblas.0.dylib

jkleckner · 2023-07-31T04:27:25Z

@isuruf Wow, thanks! That got it going. Now the numpy runs fast and pytorch still uses the GPU. I used one of these benchmarks [1] to try it out. Hopefully, pytorch fallback to CPU from GPU will reap the speed benefits. You mention that it is still LAPACK via netlib rather than accelerate, true? This [2] suggests a 4x difference in speed when netlib is the BLAS, but if netlib LAPACK is using the underlying BLAS then it might not be so big a difference. Those benchmarks [2] don't really exercise LAPACK apis directly.

[1] https://github.com/lucadiliello/pytorch-apple-silicon-benchmarks

python tests/transformers_sequence_classification.py --device mps --pre_trained_name bert-base-cased --batch_size 64 --mode training --steps 100 --sequence_length 128
Took 86s on M1 Max and saturated the GPUs.

[2] https://stackoverflow.com/a/70255105

jkleckner · 2023-07-31T04:35:25Z

After your symlink, these are the dylibs. (The symlink could be local).

$ ll  *vec*.dylib *blas*.dylib  *lapa*.dylib
lrwxr-xr-x  1 jim  admin    22B Jul 30 19:29 libblas.3.dylib -> libvecLibFort-ng.dylib
lrwxr-xr-x  1 jim  admin    22B Jul 30 19:29 libblas.dylib -> libvecLibFort-ng.dylib
lrwxr-xr-x  1 jim  admin    22B Jul 30 19:29 libcblas.3.dylib -> libvecLibFort-ng.dylib
lrwxr-xr-x  1 jim  admin    22B Jul 30 19:29 libcblas.dylib -> libvecLibFort-ng.dylib
-rwxrwxr-x  5 jim  admin   5.8M Jun  4 19:10 liblapack-netlib.3.9.0.dylib
lrwxr-xr-x  1 jim  admin    22B Jul 30 19:29 liblapack.3.dylib -> libvecLibFort-ng.dylib
lrwxr-xr-x  1 jim  admin    22B Jul 30 19:29 liblapack.dylib -> libvecLibFort-ng.dylib
-rwxrwxr-x  5 jim  admin   1.6M Jun  4 19:10 liblapacke-netlib.3.9.0.dylib
lrwxr-xr-x  1 jim  admin    22B Jul 30 19:29 liblapacke.3.dylib -> libvecLibFort-ng.dylib
lrwxr-xr-x  1 jim  admin    22B Jul 30 19:29 liblapacke.dylib -> libvecLibFort-ng.dylib
lrwxr-xr-x  1 jim  admin    80B Jul 30 21:00 libopenblas.0.dylib -> /opt/homebrew/Caskroom/miniforge/base/envs/try_pytorch_3_10/lib/libcblas.3.dylib
-rwxrwxr-x  5 jim  admin    72K Jun  4 19:10 libvecLibFort-ng.dylib

jkleckner · 2023-08-01T02:06:43Z

I can confirm that the script runs correctly after the merge of #175 with numpy executing fast and GPU on arm64 working.
Thank you.

ngam requested review from benjaminrwilson, hmaarrfk and sodre as code owners February 4, 2022 16:56

ngam force-pushed the add_accelerate branch from 35c5511 to aee56ae Compare February 4, 2022 16:59

ngam changed the title ~~implement netlib for osx-arm64~~ implement accelerate for osx-arm64 Feb 5, 2022

ngam added 3 commits February 8, 2022 02:24

implement netlib for osx-arm64

880b42b

blas impl matrix

5230732

MNT: Re-rendered with conda-build 3.21.8, conda-smithy 3.16.2, and co…

690bc80

…nda-forge-pinning 2022.02.07.06.12.16

ngam force-pushed the add_accelerate branch from 675fce8 to 690bc80 Compare February 8, 2022 07:24

set openblas for blas impl

053f8e3

ngam closed this Feb 8, 2022

ngam mentioned this pull request May 12, 2022

BLAS pinned to MKL in x64 is problematic in macOS rosetta environments #101

Closed

1 task

ngam deleted the add_accelerate branch May 28, 2022 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement accelerate for osx-arm64 #88

implement accelerate for osx-arm64 #88

ngam commented Feb 4, 2022 •

edited

Loading

conda-forge-linter commented Feb 4, 2022

ngam commented Feb 4, 2022

ngam commented Feb 4, 2022 •

edited

Loading

hmaarrfk commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

hmaarrfk commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022 •

edited

Loading

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

conda-forge-linter commented Feb 8, 2022

ngam commented Feb 8, 2022

ngam commented Feb 8, 2022

hmaarrfk commented Feb 8, 2022

ngam commented Feb 8, 2022

jkleckner commented Jul 28, 2023

jkleckner commented Jul 30, 2023

isuruf commented Jul 30, 2023

jkleckner commented Jul 31, 2023

jkleckner commented Jul 31, 2023

ngam commented Jul 31, 2023

isuruf commented Jul 31, 2023

jkleckner commented Jul 31, 2023

jkleckner commented Jul 31, 2023

jkleckner commented Aug 1, 2023

implement accelerate for osx-arm64 #88

implement accelerate for osx-arm64 #88

Conversation

ngam commented Feb 4, 2022 • edited Loading

conda-forge-linter commented Feb 4, 2022

ngam commented Feb 4, 2022

ngam commented Feb 4, 2022 • edited Loading

hmaarrfk commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

hmaarrfk commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022 • edited Loading

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

ngam commented Feb 5, 2022

conda-forge-linter commented Feb 8, 2022

ngam commented Feb 8, 2022

ngam commented Feb 8, 2022

hmaarrfk commented Feb 8, 2022

ngam commented Feb 8, 2022

jkleckner commented Jul 28, 2023

jkleckner commented Jul 30, 2023

isuruf commented Jul 30, 2023

jkleckner commented Jul 31, 2023

jkleckner commented Jul 31, 2023

ngam commented Jul 31, 2023

isuruf commented Jul 31, 2023

jkleckner commented Jul 31, 2023

jkleckner commented Jul 31, 2023

jkleckner commented Aug 1, 2023

ngam commented Feb 4, 2022 •

edited

Loading

ngam commented Feb 4, 2022 •

edited

Loading

ngam commented Feb 5, 2022 •

edited

Loading