-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Choose number of OpenBLAS threads based on process affinity #55572
Comments
Concrete use case where this will be helpful: HPC. For an MPI application, for example, SLURM automatically sets the affinity for each MPI rank (Julia process). Currently, we are oversubscribing cores as demonstrated above (as @PetrKryslUCSD can tell, because he ran into this issue). If we respected the affinity mask, that'd be much better. |
Interestingly, Taka had mentioned the BLAS case as well. He found that OpenBLAS respected the affinity and did the "right thing". Clearly that's not the case anymore, at least not for my test above (Julia 1.10.4). So it seems like either julia or openblas has regressed here. |
Related issue: #46226, where it's suggested to use libuv to get the number of available CPUs |
Related indeed, but note that #46226 talks about Julia threads not BLAS threads. |
Sure, but my point is that we can replace julia/stdlib/LinearAlgebra/src/LinearAlgebra.jl Lines 845 to 849 in 3d20a92
|
This also works with MPI (I'm using OpenMPI here): $ mpirun -np 6 --map-by node:PE=8 ./julia -e 'println(@ccall uv_available_parallelism()::Cint)'
8
8
8
8
8
8
$ mpirun -np 4 --map-by node:PE=12 ./julia -e 'println(@ccall uv_available_parallelism()::Cint)'
12
12
12
12
$ mpirun -np 2 --map-by node:PE=24 ./julia -e 'println(@ccall uv_available_parallelism()::Cint)'
24
24 And this using Slurm's [mosgiordano@fj-debug2 bin]$ srun -N 1 -n 6 -c 8 -t 04:00:00 -p short --pty bash
[mosgiordano@fj037 bin]$ srun ./julia -e 'println(@ccall uv_available_parallelism()::Cint)'
8
8
8
8
8
8
[mosgiordano@fj037 bin]$ exit
[mosgiordano@fj-debug2 bin]$ srun -N 1 -n 4 -c 12 -t 04:00:00 -p short --pty bash
[mosgiordano@fj037 bin]$ srun ./julia -e 'println(@ccall uv_available_parallelism()::Cint)'
12
12
12
12
[mosgiordano@fj037 bin]$ exit
[mosgiordano@fj-debug2 bin]$ srun -N 1 -n 2 -c 24 -t 04:00:00 -p short --pty bash
[mosgiordano@fj037 bin]$ srun ./julia -e 'println(@ccall uv_available_parallelism()::Cint)'
24
24 |
Similar to #42340 we should probably also consider the affinity of the Julia process when deciding how many BLAS threads we spawn by default. Currently, we don't:
In the latter case, despite the fact that our process is restricted to 2 hardware threads, we spawn 16 BLAS threads. That never seems to be a good choice.
The text was updated successfully, but these errors were encountered: