Fastor V0.5.1
Although with a minor tag Fastor V0.5.1 includes some major changes specially in the API design, performance and stability
SIMDVector
has been reworked to fix the long-standing issue with fall-back to non SIMD code for non-64 bit types. The fall-back is now always to the correct scalar type where a scalar specialisation is available i.e.float, double, int32_t, int64_t
and to a fixed array of size 1 holding the type for other cases. The API is now a lot closer toVc
andstd::experimental::simd
.SIMDVector
for floating points is now also activated atSSE2
level allowing any compiler that automatically definesSSE2
without-march=native
vectorise Fastor's code since all compiler these days define__SSE2__
at-O2/-O3
levels- Fix a long-standing bug in network tensor contraction. Rework opmin_meta/cost models to be truly compile-time recursive in terms of depth first search. Strided contractions for networks have completely been removed and for pairs it is deactivated. Tensor contraction of networks now dispatches to by-pair
einsum
which has many specialisation including dispatching to matmul. More than an order of magninute performance gain in certain cases. - Extremely fast
matmul/gemm
routines. Fastor now provides potentially the fastestgemm
routine for small to medium sized tensors of single and double precision as far as static dispatch is concerned. Benchmarks have been added here. Many flavours of matmul implementations are now available, for different sizes and with remainder handling and mask loading/storing. AVX512
support for single and double floats- Better macro handling through a series of new
FASTOR_...
macros - Accurate
timeit
function based onrdtsc
together with memory clobber and serialisation for further accuracy - Fastor is now Windows compatible. The whole test suite runs and passes on MSVC 2019
- Quite a few bugs and compiler warnings have been fixed along the way