Parallelization

BLAS

TMB uses the following BLAS kernels when calculating function value and derivatives

Function	Gradient
dgemm	dgemm
dsyrk	dsymm
dtrsm	dtrsm
dpotrf	dpotri

If your model spends a significant amount of time in these BLAS operations you may benefit from an optimized BLAS library e.g. MKL or OpenBLAS for CPU or nvblas for GPU. For a good result it's critical that

All required BLAS kernels are part of the library (currently not the case for nvblas ? ).
The library should not add significant overhead for small matrices (OPENBLAS have had problems - is it still the case ? ).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization

BLAS

Clone this wiki locally