Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build openBLAS with oneTBB parallelism support #4066

Closed
vineel96 opened this issue May 28, 2023 · 10 comments
Closed

Build openBLAS with oneTBB parallelism support #4066

vineel96 opened this issue May 28, 2023 · 10 comments

Comments

@vineel96
Copy link

vineel96 commented May 28, 2023

Hello,
I have following doubts:

  1. how to build openBLAS with oneTBB parallelism support instead of openmp?
  2. which combination is better to build openBLAS with? openBLAS +openMP or openBLAS +oneTBB or openBLAS+pthreads?
@vineel96 vineel96 changed the title Build openblas with oneTBB parallelism support Build openBLAS with oneTBB parallelism support May 28, 2023
@martin-frbg
Copy link
Collaborator

There is currently no support for TBB in OpenBLAS, what people appear to be doing (with varying success if you search for earlier issues mentioning TBB) is use a single-threaded build of OpenBLAS with their TBB-parallelized program.
The choice of pthreads or OpenMP depends largely on what your main program uses. At least in theory, OpenMP would offer better thread safety and easier thread affinity handling, but may incur some overhead. On many/most platforms, OpenMP is implemented on top of pthreads anyway, and if your main
program uses OpenMP you will want OpenBLAS to be either built with OpenMP, or built-single threaded, to avoid having to sets of threads that do not know of each other and compete for resources.

@vineel96
Copy link
Author

Hi @martin-frbg ,
Thanks for the reply.

  1. is there a way to modify makefile to build openBLAS with oneTBB similar to how openmp is given as option while building openBLAS? or only way is manually write oneTBB based code and use single threaded openBLAS?
  2. I am trying to build openBLAS with oneTBB so that it is algorithm/code independent. which means whatever algorithm that runs on top of openBLAS should make use of oneTBB without actually modifying the algorithm. is this right way to try? and can we see any performance gain by doing this?

@martin-frbg
Copy link
Collaborator

there is currently no support for this - only an unfinished concept from four years ago in PR #2255

@brada4
Copy link
Contributor

brada4 commented May 30, 2023

1/ TBB equivalents of OMP PARALLEL pragmas
2/ TBB mutexes
3/ TBB malloc in place of current sbrk/mmap allocator

@vineel96
Copy link
Author

vineel96 commented Aug 4, 2023

Hi @martin-frbg and @brada4,
Thanks for the comments. I have some questions:

  1. The file changes that are proposed in WIP: allow threading backend to be replaced by caller #2255 pull request isnt enough to merge it to openblas? Why its not merged or further taken to implement if any if it boosts performance?
  2. Also, can you suggest any room for improvement in openblas in threading part which helps improve parallelism and improve speed? Like any Gsoc project on this or i want to know if there is any scope in improving the threading part?
  3. Im intended to improve the threading performance w.r.t scikit learn library which uses openblas. So any ideas/leads regarding improving threading and parallelism part of scikit learn library?

@vineel96 vineel96 closed this as completed Aug 4, 2023
@vineel96 vineel96 reopened this Aug 4, 2023
@martin-frbg
Copy link
Collaborator

  1. The PR has a list of unfinished tasks at the top, I'd like to see at least the pthreads backend changes implemented (and some testing, of course). Some people expressed interest in the changes, but nobody got around to actually doing anything, not even during the time when it could trivially be merged into a local checkout for testing. (And even now I expect the merge conflicts that are currently flagged by git can easily be resolved again - it is just that I did not get around to that yet).
  2. No Gsoc project, and probably nobody with the time to provide the associated mentoring, just plans and ongoing wkr to identify any remaining bottlenecks e.g. from excessive locking or use of too many threads for a given task. OpenBLAS does not have a big team of developers behind it, and never had.
  3. I'm not familiar with scikit learn, but I suspect it is using BLAS functions through NumPy. It would probably help to know which BLAS functions are involved, the "typical" matrix sizes in what people use scikit-learn for, and to get some
    fair comparison figures for OpenBLAS vs some other library on the same kind of hardware. So far there has only been Sklearn Performance on ARM NeoverseN1 #3925 which is basically "MKL on high-performance hardware is faster than OpenBLAS on low-performance hardware" with no data that would allow reproducing the reported problem.

@vineel96
Copy link
Author

vineel96 commented Aug 29, 2023

Hello @martin-frbg,
Very much thanks for the replies and suggestions.
3. As of now scikit-learn is using gemm() function from scipy.linalg.cython_blas , mainly for matrix multiplication(eg kmeans). According to you, by knowing the sizes of mostly used matrices, this info will be usefull in optimizing algorithm further?
4. Another doubt: Is this PR (#2255) is equivalent to building openBLAS with no threading support at all? Like building openBLAS without pthreads and openmp?
5. Also is it possible to build openBLAS no threaded version i.e build openBLAS from source without pthreads and without openmp? if yes what is the instructions/methods to build it?

@brada4
Copy link
Contributor

brada4 commented Aug 29, 2023

Open a new issue since it is not related to TBB.
Documentation is in Makefile.rule - USE_THREAD=0 USE_LOCKING=1

@vineel96
Copy link
Author

vineel96 commented Sep 4, 2023

Hi @brada4,
Thank you for the reply.

@martin-frbg
Copy link
Collaborator

now fixed by #4577

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants