TFloat (FAST_FLOAT) work done & slightly different idea used to make code easily switchable between double & float #3490

GerHobbelt · 2021-07-11T13:51:59Z

@stweil : saw your work in the TFloat branch.

This pullreq is FYI only (it's an unaldulterated copy of my tesseract fork's TFloat branch and thus has way too many commit diffs to be mergeable) -- I'll ready a decent pullreq tonight or tomorrow, but I wanted to send a heads up so you can decide whether you like this or not.

The key idea here with FAST_FLOAT is to use template<T, ST> functions for Serialize and DeSerialize, so we don't have code cluttered with lots of #ifdef/ifndef/... to make it happen.

Problem being the tesseract data files: those carry data in double format (or float for 'old' scales[]), while the run-time type is dictated by the new TFloat type you came up with: that one is either float or double, depending on a compile-time define. (FAST_FLOAT)

The idea implemented here is to have the Serialize and DeSerialize functions (in TFile and elsewhere) do any necessary conversions between the run-time type and file-storage (persisted) type.

See for example this bit of code from serialis.h:

  // Serialize data.
  bool Serialize(const std::string &data);
  bool Serialize(const std::vector<char> &data);
  template <typename T>
  bool Serialize(const T* data, size_t count = 1) {
	  return FWrite(data, sizeof(T), count) == static_cast<int>(count);
  }
  template <typename T, typename ST>
  bool Serialize(const T *data, size_t count = 1) {
	  ST* arr = new ST[count];
  	  for (size_t i = 0; i < count; i++)
	  {
		arr[i] = data[i];
	  }
	  bool rv = (FWrite(&arr[0], sizeof(ST), count) == static_cast<int>(count));
	  delete[] arr;
	  return rv;
  }
  template <typename T, typename ST>
  bool Serialize(const std::vector<T>& data)
  {
	  std::vector<ST> arr;
		size_t len = data.size();
		arr.resize(len);
		for (size_t i = 0; i < len; i++) {
			arr[i] = data[i];
		}
		bool rv = Serialize(arr);
		return rv;
  }
  template <typename T>
  bool Serialize(const std::vector<T> &data) {
    // Serialize number of elements first.
    uint32_t size = data.size();
...

which now has the extra Serialize() members:

  template <typename T, typename ST>
  bool Serialize(const T *data, size_t count = 1)

and

  template <typename T, typename ST>
  bool Serialize(const std::vector<T>& data)

where T represents the run-time type (Tfloat) and ST represents the storage type (double for new data files; float for old scales[] data).

Then the code can easily dictate what the output to disk is going to be and thus be cross-compatible with other builds, which have their FAST_FLOAT /un/defined, e.g. this snippet from weightmatrix.cpp:

bool WeightMatrix::Serialize(bool training, TFile *fp) const {
  // For backward compatibility, add kDoubleFlag to mode to indicate the doubles
  // format, without errs, so we can detect and read old format weight matrices.
  uint8_t mode = (int_mode_ ? kInt8Flag : 0) | (use_adam_ ? kAdamFlag : 0) | kDoubleFlag;
  if (!fp->Serialize(&mode)) {
    return false;
  }
  if (int_mode_) {
    if (!wi_.Serialize<int8_t>(fp)) {
      return false;
    }
    // The scales stored in memory have an extra factor applied to them
    // to allow faster operation. We have to remove that factor here
    // before writing to disc.
    auto scales = scales_;
    for (auto &scale : scales) {
      scale *= INT8_MAX;
    }
    uint32_t size = scales.size();
    if (!fp->Serialize(&size)) {
      return false;
    }
    if (!fp->Serialize<TFloat, double>(&scales[0], size)) {
      return false;
    }
  } else {
    if (!wf_.Serialize<double>(fp)) {
      return false;
    }
    if (training) {
      if (!updates_.Serialize<double>(fp)) {
        return false;
      }
      if (use_adam_ && !dw_sq_sum_.Serialize<double>(fp)) {
        return false;
      }
    }
  }
  return true;
}

Note the <float> and <double> usage there: this in the function calls: these now dictate the output format (and at a place in the code where this is relevant: now this code remains easy to read as the file format can be read off the code lines without any trouble).

I hope you like it. The commit list attached to this pullreq is a mess: disregard anything but the last couple commits (a cleaned up pullreq will follow next day), where this work was done (including work on AVX/SSE code to cope with the new float vs double TFloat approach:

SHA-1: 31de23d = Continued work on SHA-1: d397065 --> Part 2
SHA-1: d397065 = Part 1: redesigned the TFloat approach using templates for the Serialization and Deserialization methods. Tested Deserialization with double (i.e. standard, non-optimized) layout: run-time type == storage type.

There's other work in there too, which will be filed in a separate pullreq as it's only sideways related:

SHA-1: e0a9b7c..21d5cbb

# Conflicts: # src/training/combine_tessdata.cpp

# Conflicts: # src/ccutil/errcode.h # src/ccutil/serialis.cpp # src/ccutil/tprintf.h # src/viewer/scrollview.h

# Conflicts: # Makefile.am # src/ccutil/helpers.h # src/ccutil/scanutils.h # src/ccutil/tprintf.h # unittest/Makefile.am

# Conflicts: # dll/i686-w64-mingw32/iconv.dll # dll/i686-w64-mingw32/icudt64.dll # dll/i686-w64-mingw32/icuin64.dll # dll/i686-w64-mingw32/icuuc64.dll # dll/i686-w64-mingw32/libarchive-13.dll # dll/i686-w64-mingw32/libbz2-1.dll # dll/i686-w64-mingw32/libcairo-2.dll # dll/i686-w64-mingw32/libcurl-4.dll # dll/i686-w64-mingw32/libeay32.dll # dll/i686-w64-mingw32/libexpat-1.dll # dll/i686-w64-mingw32/libffi-6.dll # dll/i686-w64-mingw32/libfontconfig-1.dll # dll/i686-w64-mingw32/libfreetype-6.dll # dll/i686-w64-mingw32/libgcc_s_sjlj-1.dll # dll/i686-w64-mingw32/libgif-7.dll # dll/i686-w64-mingw32/libglib-2.0-0.dll # dll/i686-w64-mingw32/libgobject-2.0-0.dll # dll/i686-w64-mingw32/libgomp-1.dll # dll/i686-w64-mingw32/libharfbuzz-0.dll # dll/i686-w64-mingw32/libintl-8.dll # dll/i686-w64-mingw32/libjbig-2.dll # dll/i686-w64-mingw32/libjpeg-8.dll # dll/i686-w64-mingw32/liblept-5.dll # dll/i686-w64-mingw32/liblz4-1.dll # dll/i686-w64-mingw32/liblzma-5.dll # dll/i686-w64-mingw32/liblzo2-2.dll # dll/i686-w64-mingw32/libnettle-6.dll # dll/i686-w64-mingw32/libnghttp2-14.dll # dll/i686-w64-mingw32/libopenjp2.dll # dll/i686-w64-mingw32/libpango-1.0-0.dll # dll/i686-w64-mingw32/libpangocairo-1.0-0.dll # dll/i686-w64-mingw32/libpangoft2-1.0-0.dll # dll/i686-w64-mingw32/libpangowin32-1.0-0.dll # dll/i686-w64-mingw32/libpcre-1.dll # dll/i686-w64-mingw32/libpixman-1-0.dll # dll/i686-w64-mingw32/libpng16-16.dll # dll/i686-w64-mingw32/libssh2-1.dll # dll/i686-w64-mingw32/libstdc++-6.dll # dll/i686-w64-mingw32/libtiff-5.dll # dll/i686-w64-mingw32/libwebp-7.dll # dll/i686-w64-mingw32/libwinpthread-1.dll # dll/i686-w64-mingw32/libxml2-2.dll # dll/i686-w64-mingw32/libzstd-1.dll # dll/i686-w64-mingw32/ssleay32.dll # dll/i686-w64-mingw32/zlib1.dll # dll/x86_64-w64-mingw32/iconv.dll # dll/x86_64-w64-mingw32/icudt64.dll # dll/x86_64-w64-mingw32/icuin64.dll # dll/x86_64-w64-mingw32/icuuc64.dll # dll/x86_64-w64-mingw32/libarchive-13.dll # dll/x86_64-w64-mingw32/libbz2-1.dll # dll/x86_64-w64-mingw32/libcairo-2.dll # dll/x86_64-w64-mingw32/libcurl-4.dll # dll/x86_64-w64-mingw32/libeay32.dll # dll/x86_64-w64-mingw32/libexpat-1.dll # dll/x86_64-w64-mingw32/libffi-6.dll # dll/x86_64-w64-mingw32/libfontconfig-1.dll # dll/x86_64-w64-mingw32/libfreetype-6.dll # dll/x86_64-w64-mingw32/libgcc_s_seh-1.dll # dll/x86_64-w64-mingw32/libgif-7.dll # dll/x86_64-w64-mingw32/libglib-2.0-0.dll # dll/x86_64-w64-mingw32/libgobject-2.0-0.dll # dll/x86_64-w64-mingw32/libgomp-1.dll # dll/x86_64-w64-mingw32/libharfbuzz-0.dll # dll/x86_64-w64-mingw32/libintl-8.dll # dll/x86_64-w64-mingw32/libjbig-2.dll # dll/x86_64-w64-mingw32/libjpeg-8.dll # dll/x86_64-w64-mingw32/liblept-5.dll # dll/x86_64-w64-mingw32/liblz4-1.dll # dll/x86_64-w64-mingw32/liblzma-5.dll # dll/x86_64-w64-mingw32/liblzo2-2.dll # dll/x86_64-w64-mingw32/libnettle-6.dll # dll/x86_64-w64-mingw32/libnghttp2-14.dll # dll/x86_64-w64-mingw32/libopenjp2.dll # dll/x86_64-w64-mingw32/libpango-1.0-0.dll # dll/x86_64-w64-mingw32/libpangocairo-1.0-0.dll # dll/x86_64-w64-mingw32/libpangoft2-1.0-0.dll # dll/x86_64-w64-mingw32/libpangowin32-1.0-0.dll # dll/x86_64-w64-mingw32/libpcre-1.dll # dll/x86_64-w64-mingw32/libpixman-1-0.dll # dll/x86_64-w64-mingw32/libpng16-16.dll # dll/x86_64-w64-mingw32/libssh2-1.dll # dll/x86_64-w64-mingw32/libstdc++-6.dll # dll/x86_64-w64-mingw32/libtiff-5.dll # dll/x86_64-w64-mingw32/libwebp-7.dll # dll/x86_64-w64-mingw32/libwinpthread-1.dll # dll/x86_64-w64-mingw32/libxml2-2.dll # dll/x86_64-w64-mingw32/libzstd-1.dll # dll/x86_64-w64-mingw32/ssleay32.dll # dll/x86_64-w64-mingw32/zlib1.dll # src/ccutil/errcode.h # src/ccutil/tprintf.h # src/viewer/scrollview.h

# Conflicts: # configure.ac

# Conflicts: # Makefile.am # unittest/Makefile.am

# Conflicts: # src/api/pdfrenderer.cpp

…er merge

Signed-off-by: Stefan Weil <[email protected]>

…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.

…ent)

…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.

…ome NOT NICE: code repetition at another level. TODO: Better idea? --> Maybe namespaces and double kernel projects or compile via #define+#include-all-source-files hack collective source code pages? (Latter approach may become a problem when debugging, or will the compiler suite cope well? Will know only once done & tested.) At least this is about the point where the function template solution stops to be useful. The run-time switching desire between float and double is doable, but not by using #ifdef/#else throughout, nor templating all the way up the TFloat usage calltree.

Revert previous commit: "HMMM. This is where the float/double co-existence stuff starts to become NOT NICE: code repetition at another level." This reverts commit 8d40552.

# Conflicts: # src/arch/dotproductsse.cpp

# Conflicts: # src/arch/intsimdmatrixavx2.cpp

Signed-off-by: Stefan Weil <[email protected]>

# Conflicts: # src/arch/dotproduct.cpp # src/arch/dotproductsse.cpp # src/arch/intsimdmatrixavx2.cpp

@stweil

…: added that one as another enabling condition since benchmarks have shown MSVC2019's `/openmp:experimental` to deliver. :-) (See tesseract-ocr#3486 benchmark reports on @stweil's DotProductNative() implementation)

@stweil

…: added that one as another enabling condition since benchmarks have shown MSVC2019's `/openmp:experimental` to deliver. :-) (See tesseract-ocr#3486 benchmark reports on @stweil's DotProductNative() implementation)

…g function templates for TFloat float & double implementations to co-exist in the run-time without cluttering the code with #if/#else and no run-time switches (yet). ## Observations thus far - DRY? Check! - the whole function template (and let the C++ compiler do the heavy lifting) idea of stops somewhere. This regrettably happens to be at the weightmatrix.cpp code, where the code calls the CPU+configuration-selected SIMD implementation via function pointer: `intSimdMatrix->matrixDotVectorFunction` -- this would require code duplication of some kind (e.g. a FP32 callback pointer co-existing with a FP64 callback ptr in the struct and then have the code pick the right one, depending on current TFloat size, for example) and is thus deemed unsatisfactory (my opinion). - So far, and very probably independent of any solutions for the co-existence issue at higher levels in the code, this template approach works out well, with the compiler smartly picking the one matching the current float/double choice. - while we have double the number of specialized SIMD implementations (obviously), these do not need #if/#else checks as we can let the C++ compiler do its prototype matching job --> cleaner code. - the template functions also help clean up the serialization/de-serialization code as the `<T, ST>` dual-type approach there allows one to specify the run-time type (TFloat) and the file-storage type at the same time: also do note how this cleans up the 'Old' scales deserialization code, as the old file storage is simply 'float' instead of 'double'. - the added cost there is a double copy of file data when T==ST, but that turned out negligible in the preliminary tests as that bit of code didn't even reach the Top20 CPU Guzzlers Chart, so that extra copy can wait for smarter C++ template writers to take care of when microtuning is called for.

GerHobbelt · 2021-07-13T14:55:26Z

The cleaned-up version of this now exists as stweil#7 : using C++ function templates to DRY & still have float/double (Tfloat: #3486) selectable.

…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.

…g function templates for TFloat float & double implementations to co-exist in the run-time without cluttering the code with #if/#else and no run-time switches (yet). ## Observations thus far - DRY? Check! - the whole function template (and let the C++ compiler do the heavy lifting) idea of stops somewhere. This regrettably happens to be at the weightmatrix.cpp code, where the code calls the CPU+configuration-selected SIMD implementation via function pointer: `intSimdMatrix->matrixDotVectorFunction` -- this would require code duplication of some kind (e.g. a FP32 callback pointer co-existing with a FP64 callback ptr in the struct and then have the code pick the right one, depending on current TFloat size, for example) and is thus deemed unsatisfactory (my opinion). - So far, and very probably independent of any solutions for the co-existence issue at higher levels in the code, this template approach works out well, with the compiler smartly picking the one matching the current float/double choice. - while we have double the number of specialized SIMD implementations (obviously), these do not need #if/#else checks as we can let the C++ compiler do its prototype matching job --> cleaner code. - the template functions also help clean up the serialization/de-serialization code as the `<T, ST>` dual-type approach there allows one to specify the run-time type (TFloat) and the file-storage type at the same time: also do note how this cleans up the 'Old' scales deserialization code, as the old file storage is simply 'float' instead of 'double'. - the added cost there is a double copy of file data when T==ST, but that turned out negligible in the preliminary tests as that bit of code didn't even reach the Top20 CPU Guzzlers Chart, so that extra copy can wait for smarter C++ template writers to take care of when microtuning is called for.

…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.

…g function templates for TFloat float & double implementations to co-exist in the run-time without cluttering the code with #if/#else and no run-time switches (yet). ## Observations thus far - DRY? Check! - the whole function template (and let the C++ compiler do the heavy lifting) idea of stops somewhere. This regrettably happens to be at the weightmatrix.cpp code, where the code calls the CPU+configuration-selected SIMD implementation via function pointer: `intSimdMatrix->matrixDotVectorFunction` -- this would require code duplication of some kind (e.g. a FP32 callback pointer co-existing with a FP64 callback ptr in the struct and then have the code pick the right one, depending on current TFloat size, for example) and is thus deemed unsatisfactory (my opinion). - So far, and very probably independent of any solutions for the co-existence issue at higher levels in the code, this template approach works out well, with the compiler smartly picking the one matching the current float/double choice. - while we have double the number of specialized SIMD implementations (obviously), these do not need #if/#else checks as we can let the C++ compiler do its prototype matching job --> cleaner code. - the template functions also help clean up the serialization/de-serialization code as the `<T, ST>` dual-type approach there allows one to specify the run-time type (TFloat) and the file-storage type at the same time: also do note how this cleans up the 'Old' scales deserialization code, as the old file storage is simply 'float' instead of 'double'. - the added cost there is a double copy of file data when T==ST, but that turned out negligible in the preliminary tests as that bit of code didn't even reach the Top20 CPU Guzzlers Chart, so that extra copy can wait for smarter C++ template writers to take care of when microtuning is called for.

…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.

GerHobbelt and others added 30 commits January 15, 2021 22:46

Merge remote-tracking branch 'remotes/stweil/network-string'

cc2f5be

# Conflicts: # src/training/combine_tessdata.cpp

Merge remote-tracking branch 'remotes/UB-Mannheim/windows'

ebfb844

# Conflicts: # src/ccutil/errcode.h # src/ccutil/serialis.cpp # src/ccutil/tprintf.h # src/viewer/scrollview.h

Merge remote-tracking branch 'remotes/stweil/fuzzers'

cacad1b

# Conflicts: # Makefile.am # src/ccutil/helpers.h # src/ccutil/scanutils.h # src/ccutil/tprintf.h # unittest/Makefile.am

Merge remote-tracking branch 'remotes/tesseract-ocr/master'

81b21e4

fix build errors following the latest merges.

e30195a

Merge remote-tracking branch 'remotes/ulb-sachsen-anhalt/master'

c84f864

# Conflicts: # configure.ac

Merge remote-tracking branch 'remotes/tesseract-ocr/master'

fd58d5a

# Conflicts: # Makefile.am # unittest/Makefile.am

Merge remote-tracking branch 'remotes/tesseract-ocr/master'

1d717cc

updated Pix input format handling

62dfe0b

# Conflicts: # src/api/pdfrenderer.cpp

Merge remote-tracking branch 'remotes/tesseract-ocr/master'

d92c8c7

Merge branch 'master' into artifex

f3f83c5

Fixes required to make Tesseract build with MuPDF on Linux.

4ee6d50

Merge commit '4902e6868288b739bbbe8d9b864b0b19fb97256d'

9e725ec

Merge remote-tracking branch 'remotes/Artifex/artifex'

d632e6c

Merge commit 'cefa3e7e7edfe5081ea9feea976abe02e99eb403'

045d491

Merge remote-tracking branch 'remotes/tesseract-ocr/master'

2edbe0d

remove old mingw32 binary files in dll/ that have crept in via an old…

33e90db

…er merge

clang fix?

589c139

Merge remote-tracking branch 'remotes/tesseract-ocr/master'

34f2eb0

Merge remote-tracking branch 'remotes/stweil/master'

c570bbf

Merge remote-tracking branch 'remotes/Shreeshrii/master'

c319286

update submodules

d91dd4d

Merge remote-tracking branch 'remotes/tesseract-ocr/master'

99be81c

updated submodules

1e72f9d

Implement unpack for lstmf files

ed5e40e

Signed-off-by: Stefan Weil <[email protected]>

Support lstmf files with more than one line

5876fc4

Signed-off-by: Stefan Weil <[email protected]>

Add missing include statement for access

0995870

Signed-off-by: Stefan Weil <[email protected]>

Add info command

f439170

Signed-off-by: Stefan Weil <[email protected]>

Actions CI: 2017 Visual Studio

eb62f07

GerHobbelt and others added 14 commits July 13, 2021 14:31

bugfixing the AVX2 Extract8+16 codes, where there's lines like `__m25…

81b69b0

…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.

extracted from 3490: implements DotProductSSE() for FAST_FLOAT

d23ec1d

allowing float and double instances to co-exist as per stweil#2 (comm…

5d16bab

…ent)

bugfixing the AVX2 Extract8+16 codes, where there's lines like `__m25…

4e3c112

…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.

Reverting so we have a useful and still 'kinda clean' codebase.

603831b

Revert previous commit: "HMMM. This is where the float/double co-existence stuff starts to become NOT NICE: code repetition at another level." This reverts commit 8d40552.

Merge branch 'tfloat-AVX-SSE-etc' into TFloat

160949a

# Conflicts: # src/arch/dotproductsse.cpp

Merge branch 'tfloat-patch-4' into TFloat

02d94bc

# Conflicts: # src/arch/intsimdmatrixavx2.cpp

Improve build code for native dotproduct

15f7549

Signed-off-by: Stefan Weil <[email protected]>

Enhance unittest/dotproduct_test

3eae6d7

Signed-off-by: Stefan Weil <[email protected]>

Merge remote-tracking branch 'remotes/stweil/tfloat' into TFloat

f32b9de

# Conflicts: # src/arch/dotproduct.cpp # src/arch/dotproductsse.cpp # src/arch/intsimdmatrixavx2.cpp

Merge branch 'openmp-patch-1' into TFloat

44a8f41

GerHobbelt mentioned this pull request Jul 13, 2021

Tfloat float and double coexistance -- working towards that goal stweil/tesseract#7

Closed

GerHobbelt closed this Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFloat (FAST_FLOAT) work done & slightly different idea used to make code easily switchable between double & float #3490

TFloat (FAST_FLOAT) work done & slightly different idea used to make code easily switchable between double & float #3490

GerHobbelt commented Jul 11, 2021

GerHobbelt commented Jul 13, 2021

TFloat (FAST_FLOAT) work done & slightly different idea used to make code easily switchable between double & float #3490

TFloat (FAST_FLOAT) work done & slightly different idea used to make code easily switchable between double & float #3490

Conversation

GerHobbelt commented Jul 11, 2021

GerHobbelt commented Jul 13, 2021