Tfloat patch 4: bugfixes for AVX2 FAST_FLOAT Extract8+16 implementations #3494

GerHobbelt · 2021-07-13T07:58:48Z

Extract from #3490 - bugfixing the AVX2 Extract8+16 codes, where there's lines like __m256d scale01234567 = _mm256_loadu_ps(scales), i.e. loading float vectors into double vector types. Extract from #3490.

Note: next pullreq is a reduced version of this: less code duplication for bleeding edge tfloat branch.

Up to now Tesseract used double for training and recognition with "best" models. This commit replaces double by a new data type TFloat which is double by default, but float if FAST_FLOAT is defined. Ideally this should allow faster training. Signed-off-by: Stefan Weil <[email protected]>

Signed-off-by: Stefan Weil <[email protected]>

…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.

egorpugin · 2021-07-13T08:03:50Z

Hi,

You are doing patches wrong.

…ation: for TFloat to work, we don't need to duplicate the integer work functions as it's only the ExtractResults16[8,16] functions that need different implementations for float vs. double. These are therefor common to both implementations: ``` static void PartialMatrixDotVector64(const int8_t *wi, const TFloat *scales, const int8_t *u, int num_in, TFloat *v) { static void PartialMatrixDotVector32(const int8_t *wi, const TFloat *scales, const int8_t *u, int num_in, TFloat *v) { static void PartialMatrixDotVector16(const int8_t *wi, const TFloat *scales, const int8_t *u, int num_in, TFloat *v) { static inline void PartialMatrixDotVector8(const int8_t *wi, const TFloat *scales, const int8_t *u, int num_in, TFloat *v) { static void matrixDotVector(int dim1, int dim2, const int8_t *wi, const TFloat *scales, const int8_t *u, TFloat *v) { ```

GerHobbelt · 2021-07-13T08:09:38Z

Note: #3495 is this one (#3494) PLUS FAST_FLOAT condition only applied to the ExtractXYZ calls, as the others are good to go with only their prototype adjusted from double --> TFloat. Hence #3495 is only moving code compared to this one, no code change. (I don't know what diff tools you use, but thus this one (#3494) would be easier to diff/review, and then verify that #3495 is only copy/cut/paste work, resulting in a much larger diff)

GerHobbelt · 2021-07-13T08:26:38Z

Hi,

You are doing patches wrong.

Crap. Yep, seen it. 😊

Discard. Will re-issue.

GerHobbelt · 2021-07-13T08:30:35Z

🤔 I used the github link and didn't watch carefully that the bugger ref'd against mainline master instead of stweil/tfloat. Checked against my own visual commit graph and these were correct, but definitely wholly wrong to submit against master. ugh.

GerHobbelt · 2021-07-13T08:55:48Z

Re-issued as stweil#4.

stweil and others added 11 commits July 13, 2021 07:18

Fix some compiler warnings

c64ab2e

Signed-off-by: Stefan Weil <[email protected]>

Optimize DotProductStdInnerProduct for float

78871a9

Signed-off-by: Stefan Weil <[email protected]>

Avoid double / float conversion

1b9e462

Signed-off-by: Stefan Weil <[email protected]>

Implement TFloat for IntSimdMatrix

93e9022

Signed-off-by: Stefan Weil <[email protected]>

Test more implementations of DotProduct

00e4283

Signed-off-by: Stefan Weil <[email protected]>

Add unittest for dotproduct

e2529dd

Signed-off-by: Stefan Weil <[email protected]>

Support Apple Accelerate framework for training and best models

01ae69e

Signed-off-by: Stefan Weil <[email protected]>

Fix TFloat builds for Apple M1

a09531a

Signed-off-by: Stefan Weil <[email protected]>

Fix DotProductNative for TFloat

1a59b6f

Signed-off-by: Stefan Weil <[email protected]>

bugfixing the AVX2 Extract8+16 codes, where there's lines like `__m25…

ba85ac4

…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.

GerHobbelt mentioned this pull request Jul 13, 2021

Improved #3494: AVX2 bugfixes + no code duplication for the integer workhorses in there #3495

Closed

GerHobbelt closed this Jul 13, 2021

GerHobbelt mentioned this pull request Jul 13, 2021

Improved #4 / 3494: AVX2 bugfixes + no code duplication for the integer workhorses in there stweil/tesseract#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tfloat patch 4: bugfixes for AVX2 FAST_FLOAT Extract8+16 implementations #3494

Tfloat patch 4: bugfixes for AVX2 FAST_FLOAT Extract8+16 implementations #3494

GerHobbelt commented Jul 13, 2021

egorpugin commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021 •

edited

Loading

GerHobbelt commented Jul 13, 2021

Tfloat patch 4: bugfixes for AVX2 FAST_FLOAT Extract8+16 implementations #3494

Tfloat patch 4: bugfixes for AVX2 FAST_FLOAT Extract8+16 implementations #3494

Conversation

GerHobbelt commented Jul 13, 2021

egorpugin commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021 • edited Loading

GerHobbelt commented Jul 13, 2021

GerHobbelt commented Jul 13, 2021 •

edited

Loading