Tfloat float and double coexistance -- working towards that goal #7

Up to now Tesseract used double for training and recognition with "best" models. This commit replaces double by a new data type TFloat which is double by default, but float if FAST_FLOAT is defined. Ideally this should allow faster training. Signed-off-by: Stefan Weil <[email protected]>

Signed-off-by: Stefan Weil <[email protected]>

…vector (8x32) (contrasting 4 double FPs: 4*64)

…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.

Signed-off-by: Stefan Weil <[email protected]>

…aster" This partially reverts commit 122daf1, reversing changes made to 4cd56dc. This fixes a fatal assertion for certain images: cell_y_.size() >= 2 && cell_x_.size() >= 2:Error:Assert failed:in file ../../../src/textord/tablerecog.cpp, line 363 Signed-off-by: Stefan Weil <[email protected]>

…ith all other tesseract defined types, to prevent collisions with thirdparty software.

…at got through while I manually extracted the template work from my mainline (warnings due to running MSVC at Level 4) [sw]: Use different fix for blamer.cpp Signed-off-by: Stefan Weil <[email protected]>

…g function templates for TFloat float & double implementations to co-exist in the run-time without cluttering the code with #if/#else and no run-time switches (yet). ## Observations thus far - DRY? Check! - the whole function template (and let the C++ compiler do the heavy lifting) idea of stops somewhere. This regrettably happens to be at the weightmatrix.cpp code, where the code calls the CPU+configuration-selected SIMD implementation via function pointer: `intSimdMatrix->matrixDotVectorFunction` -- this would require code duplication of some kind (e.g. a FP32 callback pointer co-existing with a FP64 callback ptr in the struct and then have the code pick the right one, depending on current TFloat size, for example) and is thus deemed unsatisfactory (my opinion). - So far, and very probably independent of any solutions for the co-existence issue at higher levels in the code, this template approach works out well, with the compiler smartly picking the one matching the current float/double choice. - while we have double the number of specialized SIMD implementations (obviously), these do not need #if/#else checks as we can let the C++ compiler do its prototype matching job --> cleaner code. - the template functions also help clean up the serialization/de-serialization code as the `<T, ST>` dual-type approach there allows one to specify the run-time type (TFloat) and the file-storage type at the same time: also do note how this cleans up the 'Old' scales deserialization code, as the old file storage is simply 'float' instead of 'double'. - the added cost there is a double copy of file data when T==ST, but that turned out negligible in the preliminary tests as that bit of code didn't even reach the Top20 CPU Guzzlers Chart, so that extra copy can wait for smarter C++ template writers to take care of when microtuning is called for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tfloat float and double coexistance -- working towards that goal #7

Tfloat float and double coexistance -- working towards that goal #7

Commits on Jul 15, 2021