Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFloat (FAST_FLOAT) work done & slightly different idea used to make code easily switchable between double & float #3490

Closed
wants to merge 486 commits into from

Conversation

GerHobbelt
Copy link
Contributor

@stweil : saw your work in the TFloat branch.

This pullreq is FYI only (it's an unaldulterated copy of my tesseract fork's TFloat branch and thus has way too many commit diffs to be mergeable) -- I'll ready a decent pullreq tonight or tomorrow, but I wanted to send a heads up so you can decide whether you like this or not.

The key idea here with FAST_FLOAT is to use template<T, ST> functions for Serialize and DeSerialize, so we don't have code cluttered with lots of #ifdef/ifndef/... to make it happen.

Problem being the tesseract data files: those carry data in double format (or float for 'old' scales[]), while the run-time type is dictated by the new TFloat type you came up with: that one is either float or double, depending on a compile-time define. (FAST_FLOAT)

The idea implemented here is to have the Serialize and DeSerialize functions (in TFile and elsewhere) do any necessary conversions between the run-time type and file-storage (persisted) type.

See for example this bit of code from serialis.h:

  // Serialize data.
  bool Serialize(const std::string &data);
  bool Serialize(const std::vector<char> &data);
  template <typename T>
  bool Serialize(const T* data, size_t count = 1) {
	  return FWrite(data, sizeof(T), count) == static_cast<int>(count);
  }
  template <typename T, typename ST>
  bool Serialize(const T *data, size_t count = 1) {
	  ST* arr = new ST[count];
  	  for (size_t i = 0; i < count; i++)
	  {
		arr[i] = data[i];
	  }
	  bool rv = (FWrite(&arr[0], sizeof(ST), count) == static_cast<int>(count));
	  delete[] arr;
	  return rv;
  }
  template <typename T, typename ST>
  bool Serialize(const std::vector<T>& data)
  {
	  std::vector<ST> arr;
		size_t len = data.size();
		arr.resize(len);
		for (size_t i = 0; i < len; i++) {
			arr[i] = data[i];
		}
		bool rv = Serialize(arr);
		return rv;
  }
  template <typename T>
  bool Serialize(const std::vector<T> &data) {
    // Serialize number of elements first.
    uint32_t size = data.size();
...

which now has the extra Serialize() members:

  template <typename T, typename ST>
  bool Serialize(const T *data, size_t count = 1)

and

  template <typename T, typename ST>
  bool Serialize(const std::vector<T>& data)

where T represents the run-time type (Tfloat) and ST represents the storage type (double for new data files; float for old scales[] data).

Then the code can easily dictate what the output to disk is going to be and thus be cross-compatible with other builds, which have their FAST_FLOAT /un/defined, e.g. this snippet from weightmatrix.cpp:

bool WeightMatrix::Serialize(bool training, TFile *fp) const {
  // For backward compatibility, add kDoubleFlag to mode to indicate the doubles
  // format, without errs, so we can detect and read old format weight matrices.
  uint8_t mode = (int_mode_ ? kInt8Flag : 0) | (use_adam_ ? kAdamFlag : 0) | kDoubleFlag;
  if (!fp->Serialize(&mode)) {
    return false;
  }
  if (int_mode_) {
    if (!wi_.Serialize<int8_t>(fp)) {
      return false;
    }
    // The scales stored in memory have an extra factor applied to them
    // to allow faster operation. We have to remove that factor here
    // before writing to disc.
    auto scales = scales_;
    for (auto &scale : scales) {
      scale *= INT8_MAX;
    }
    uint32_t size = scales.size();
    if (!fp->Serialize(&size)) {
      return false;
    }
    if (!fp->Serialize<TFloat, double>(&scales[0], size)) {
      return false;
    }
  } else {
    if (!wf_.Serialize<double>(fp)) {
      return false;
    }
    if (training) {
      if (!updates_.Serialize<double>(fp)) {
        return false;
      }
      if (use_adam_ && !dw_sq_sum_.Serialize<double>(fp)) {
        return false;
      }
    }
  }
  return true;
}

Note the <float> and <double> usage there: this in the function calls: these now dictate the output format (and at a place in the code where this is relevant: now this code remains easy to read as the file format can be read off the code lines without any trouble).

I hope you like it. The commit list attached to this pullreq is a mess: disregard anything but the last couple commits (a cleaned up pullreq will follow next day), where this work was done (including work on AVX/SSE code to cope with the new float vs double TFloat approach:

  • SHA-1: 31de23d = Continued work on SHA-1: d397065 --> Part 2
  • SHA-1: d397065 = Part 1: redesigned the TFloat approach using templates for the Serialization and Deserialization methods. Tested Deserialization with double (i.e. standard, non-optimized) layout: run-time type == storage type.

There's other work in there too, which will be filed in a separate pullreq as it's only sideways related:

GerHobbelt and others added 30 commits January 15, 2021 22:46
# Conflicts:
#	src/training/combine_tessdata.cpp
# Conflicts:
#	src/ccutil/errcode.h
#	src/ccutil/serialis.cpp
#	src/ccutil/tprintf.h
#	src/viewer/scrollview.h
# Conflicts:
#	Makefile.am
#	src/ccutil/helpers.h
#	src/ccutil/scanutils.h
#	src/ccutil/tprintf.h
#	unittest/Makefile.am
# Conflicts:
#	dll/i686-w64-mingw32/iconv.dll
#	dll/i686-w64-mingw32/icudt64.dll
#	dll/i686-w64-mingw32/icuin64.dll
#	dll/i686-w64-mingw32/icuuc64.dll
#	dll/i686-w64-mingw32/libarchive-13.dll
#	dll/i686-w64-mingw32/libbz2-1.dll
#	dll/i686-w64-mingw32/libcairo-2.dll
#	dll/i686-w64-mingw32/libcurl-4.dll
#	dll/i686-w64-mingw32/libeay32.dll
#	dll/i686-w64-mingw32/libexpat-1.dll
#	dll/i686-w64-mingw32/libffi-6.dll
#	dll/i686-w64-mingw32/libfontconfig-1.dll
#	dll/i686-w64-mingw32/libfreetype-6.dll
#	dll/i686-w64-mingw32/libgcc_s_sjlj-1.dll
#	dll/i686-w64-mingw32/libgif-7.dll
#	dll/i686-w64-mingw32/libglib-2.0-0.dll
#	dll/i686-w64-mingw32/libgobject-2.0-0.dll
#	dll/i686-w64-mingw32/libgomp-1.dll
#	dll/i686-w64-mingw32/libharfbuzz-0.dll
#	dll/i686-w64-mingw32/libintl-8.dll
#	dll/i686-w64-mingw32/libjbig-2.dll
#	dll/i686-w64-mingw32/libjpeg-8.dll
#	dll/i686-w64-mingw32/liblept-5.dll
#	dll/i686-w64-mingw32/liblz4-1.dll
#	dll/i686-w64-mingw32/liblzma-5.dll
#	dll/i686-w64-mingw32/liblzo2-2.dll
#	dll/i686-w64-mingw32/libnettle-6.dll
#	dll/i686-w64-mingw32/libnghttp2-14.dll
#	dll/i686-w64-mingw32/libopenjp2.dll
#	dll/i686-w64-mingw32/libpango-1.0-0.dll
#	dll/i686-w64-mingw32/libpangocairo-1.0-0.dll
#	dll/i686-w64-mingw32/libpangoft2-1.0-0.dll
#	dll/i686-w64-mingw32/libpangowin32-1.0-0.dll
#	dll/i686-w64-mingw32/libpcre-1.dll
#	dll/i686-w64-mingw32/libpixman-1-0.dll
#	dll/i686-w64-mingw32/libpng16-16.dll
#	dll/i686-w64-mingw32/libssh2-1.dll
#	dll/i686-w64-mingw32/libstdc++-6.dll
#	dll/i686-w64-mingw32/libtiff-5.dll
#	dll/i686-w64-mingw32/libwebp-7.dll
#	dll/i686-w64-mingw32/libwinpthread-1.dll
#	dll/i686-w64-mingw32/libxml2-2.dll
#	dll/i686-w64-mingw32/libzstd-1.dll
#	dll/i686-w64-mingw32/ssleay32.dll
#	dll/i686-w64-mingw32/zlib1.dll
#	dll/x86_64-w64-mingw32/iconv.dll
#	dll/x86_64-w64-mingw32/icudt64.dll
#	dll/x86_64-w64-mingw32/icuin64.dll
#	dll/x86_64-w64-mingw32/icuuc64.dll
#	dll/x86_64-w64-mingw32/libarchive-13.dll
#	dll/x86_64-w64-mingw32/libbz2-1.dll
#	dll/x86_64-w64-mingw32/libcairo-2.dll
#	dll/x86_64-w64-mingw32/libcurl-4.dll
#	dll/x86_64-w64-mingw32/libeay32.dll
#	dll/x86_64-w64-mingw32/libexpat-1.dll
#	dll/x86_64-w64-mingw32/libffi-6.dll
#	dll/x86_64-w64-mingw32/libfontconfig-1.dll
#	dll/x86_64-w64-mingw32/libfreetype-6.dll
#	dll/x86_64-w64-mingw32/libgcc_s_seh-1.dll
#	dll/x86_64-w64-mingw32/libgif-7.dll
#	dll/x86_64-w64-mingw32/libglib-2.0-0.dll
#	dll/x86_64-w64-mingw32/libgobject-2.0-0.dll
#	dll/x86_64-w64-mingw32/libgomp-1.dll
#	dll/x86_64-w64-mingw32/libharfbuzz-0.dll
#	dll/x86_64-w64-mingw32/libintl-8.dll
#	dll/x86_64-w64-mingw32/libjbig-2.dll
#	dll/x86_64-w64-mingw32/libjpeg-8.dll
#	dll/x86_64-w64-mingw32/liblept-5.dll
#	dll/x86_64-w64-mingw32/liblz4-1.dll
#	dll/x86_64-w64-mingw32/liblzma-5.dll
#	dll/x86_64-w64-mingw32/liblzo2-2.dll
#	dll/x86_64-w64-mingw32/libnettle-6.dll
#	dll/x86_64-w64-mingw32/libnghttp2-14.dll
#	dll/x86_64-w64-mingw32/libopenjp2.dll
#	dll/x86_64-w64-mingw32/libpango-1.0-0.dll
#	dll/x86_64-w64-mingw32/libpangocairo-1.0-0.dll
#	dll/x86_64-w64-mingw32/libpangoft2-1.0-0.dll
#	dll/x86_64-w64-mingw32/libpangowin32-1.0-0.dll
#	dll/x86_64-w64-mingw32/libpcre-1.dll
#	dll/x86_64-w64-mingw32/libpixman-1-0.dll
#	dll/x86_64-w64-mingw32/libpng16-16.dll
#	dll/x86_64-w64-mingw32/libssh2-1.dll
#	dll/x86_64-w64-mingw32/libstdc++-6.dll
#	dll/x86_64-w64-mingw32/libtiff-5.dll
#	dll/x86_64-w64-mingw32/libwebp-7.dll
#	dll/x86_64-w64-mingw32/libwinpthread-1.dll
#	dll/x86_64-w64-mingw32/libxml2-2.dll
#	dll/x86_64-w64-mingw32/libzstd-1.dll
#	dll/x86_64-w64-mingw32/ssleay32.dll
#	dll/x86_64-w64-mingw32/zlib1.dll
#	src/ccutil/errcode.h
#	src/ccutil/tprintf.h
#	src/viewer/scrollview.h
# Conflicts:
#	Makefile.am
#	unittest/Makefile.am
# Conflicts:
#	src/api/pdfrenderer.cpp
Signed-off-by: Stefan Weil <[email protected]>
GerHobbelt and others added 14 commits July 13, 2021 14:31
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
…ome NOT NICE: code repetition at another level.

TODO: Better idea? --> Maybe namespaces and double kernel projects or compile via #define+#include-all-source-files hack collective source code pages? (Latter approach may become a problem when debugging, or will the compiler suite cope well? Will know only once done & tested.)

At least this is about the point where the function template solution stops to be useful. The run-time switching desire between float and double is doable, but not by using #ifdef/#else throughout, nor templating all the way up the TFloat usage calltree.
Revert previous commit: "HMMM. This is where the float/double co-existence stuff starts to become NOT NICE: code repetition at another level."

This reverts commit 8d40552.
# Conflicts:
#	src/arch/dotproductsse.cpp
# Conflicts:
#	src/arch/intsimdmatrixavx2.cpp
# Conflicts:
#	src/arch/dotproduct.cpp
#	src/arch/dotproductsse.cpp
#	src/arch/intsimdmatrixavx2.cpp
…: added that one as another enabling condition since benchmarks have shown MSVC2019's `/openmp:experimental` to deliver. :-) (See tesseract-ocr#3486 benchmark reports on @stweil's DotProductNative() implementation)
…: added that one as another enabling condition since benchmarks have shown MSVC2019's `/openmp:experimental` to deliver. :-) (See tesseract-ocr#3486 benchmark reports on @stweil's DotProductNative() implementation)
GerHobbelt added a commit to GerHobbelt/tesseract that referenced this pull request Jul 13, 2021
…g function templates for TFloat float & double implementations to co-exist in the run-time without cluttering the code with #if/#else and no run-time switches (yet).

## Observations thus far

- DRY? Check!
- the whole function template (and let the C++ compiler do the heavy lifting) idea of stops somewhere. This regrettably happens to be at the weightmatrix.cpp code, where the code calls the CPU+configuration-selected SIMD implementation via function pointer: `intSimdMatrix->matrixDotVectorFunction` -- this would require code duplication of some kind (e.g. a FP32 callback pointer co-existing with a FP64 callback ptr in the struct and then have the code pick the right one, depending on current TFloat size, for example) and is thus deemed unsatisfactory (my opinion).
- So far, and very probably independent of any solutions for the co-existence issue at higher levels in the code, this template approach works out well, with the compiler smartly picking the one matching the current float/double choice.
- while we have double the number of specialized SIMD implementations (obviously), these do not need #if/#else checks as we can let the C++ compiler do its prototype matching job --> cleaner code.
- the template functions also help clean up the serialization/de-serialization code as the `<T, ST>` dual-type approach there allows one to specify the run-time type (TFloat) and the file-storage type at the same time: also do note how this cleans up the 'Old' scales deserialization code, as the old file storage is simply 'float' instead of 'double'.
- the added cost there is a double copy of file data when T==ST, but that turned out negligible in the preliminary tests as that bit of code didn't even reach the Top20 CPU Guzzlers Chart, so that extra copy can wait for smarter C++ template writers to take care of when microtuning is called for.
@GerHobbelt
Copy link
Contributor Author

The cleaned-up version of this now exists as stweil#7 : using C++ function templates to DRY & still have float/double (Tfloat: #3486) selectable.

@GerHobbelt GerHobbelt closed this Jul 13, 2021
stweil pushed a commit to stweil/tesseract that referenced this pull request Jul 14, 2021
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
GerHobbelt added a commit to GerHobbelt/tesseract that referenced this pull request Jul 15, 2021
…g function templates for TFloat float & double implementations to co-exist in the run-time without cluttering the code with #if/#else and no run-time switches (yet).

## Observations thus far

- DRY? Check!
- the whole function template (and let the C++ compiler do the heavy lifting) idea of stops somewhere. This regrettably happens to be at the weightmatrix.cpp code, where the code calls the CPU+configuration-selected SIMD implementation via function pointer: `intSimdMatrix->matrixDotVectorFunction` -- this would require code duplication of some kind (e.g. a FP32 callback pointer co-existing with a FP64 callback ptr in the struct and then have the code pick the right one, depending on current TFloat size, for example) and is thus deemed unsatisfactory (my opinion).
- So far, and very probably independent of any solutions for the co-existence issue at higher levels in the code, this template approach works out well, with the compiler smartly picking the one matching the current float/double choice.
- while we have double the number of specialized SIMD implementations (obviously), these do not need #if/#else checks as we can let the C++ compiler do its prototype matching job --> cleaner code.
- the template functions also help clean up the serialization/de-serialization code as the `<T, ST>` dual-type approach there allows one to specify the run-time type (TFloat) and the file-storage type at the same time: also do note how this cleans up the 'Old' scales deserialization code, as the old file storage is simply 'float' instead of 'double'.
- the added cost there is a double copy of file data when T==ST, but that turned out negligible in the preliminary tests as that bit of code didn't even reach the Top20 CPU Guzzlers Chart, so that extra copy can wait for smarter C++ template writers to take care of when microtuning is called for.
stweil pushed a commit to stweil/tesseract that referenced this pull request Jul 15, 2021
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
GerHobbelt added a commit to GerHobbelt/tesseract that referenced this pull request Jul 15, 2021
…g function templates for TFloat float & double implementations to co-exist in the run-time without cluttering the code with #if/#else and no run-time switches (yet).

## Observations thus far

- DRY? Check!
- the whole function template (and let the C++ compiler do the heavy lifting) idea of stops somewhere. This regrettably happens to be at the weightmatrix.cpp code, where the code calls the CPU+configuration-selected SIMD implementation via function pointer: `intSimdMatrix->matrixDotVectorFunction` -- this would require code duplication of some kind (e.g. a FP32 callback pointer co-existing with a FP64 callback ptr in the struct and then have the code pick the right one, depending on current TFloat size, for example) and is thus deemed unsatisfactory (my opinion).
- So far, and very probably independent of any solutions for the co-existence issue at higher levels in the code, this template approach works out well, with the compiler smartly picking the one matching the current float/double choice.
- while we have double the number of specialized SIMD implementations (obviously), these do not need #if/#else checks as we can let the C++ compiler do its prototype matching job --> cleaner code.
- the template functions also help clean up the serialization/de-serialization code as the `<T, ST>` dual-type approach there allows one to specify the run-time type (TFloat) and the file-storage type at the same time: also do note how this cleans up the 'Old' scales deserialization code, as the old file storage is simply 'float' instead of 'double'.
- the added cost there is a double copy of file data when T==ST, but that turned out negligible in the preliminary tests as that bit of code didn't even reach the Top20 CPU Guzzlers Chart, so that extra copy can wait for smarter C++ template writers to take care of when microtuning is called for.
stweil pushed a commit to stweil/tesseract that referenced this pull request Jul 19, 2021
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
stweil pushed a commit to stweil/tesseract that referenced this pull request Jul 20, 2021
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
stweil pushed a commit to stweil/tesseract that referenced this pull request Jul 20, 2021
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
stweil pushed a commit to stweil/tesseract that referenced this pull request Jul 20, 2021
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
stweil pushed a commit to stweil/tesseract that referenced this pull request Jul 21, 2021
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
stweil pushed a commit to stweil/tesseract that referenced this pull request Jul 21, 2021
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
stweil pushed a commit to stweil/tesseract that referenced this pull request Jul 21, 2021
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
stweil pushed a commit to stweil/tesseract that referenced this pull request Jul 21, 2021
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants