Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model locale fix #1

Merged
merged 8 commits into from
Sep 15, 2020
Merged

Conversation

AlbertoEAF
Copy link

No description provided.

@AlbertoEAF AlbertoEAF merged commit 1b74a97 into feedzai:model-locale-fix Sep 15, 2020
AlbertoEAF added a commit that referenced this pull request Sep 29, 2020
Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe

If it works it should fix more LGBM builds
AlbertoEAF added a commit that referenced this pull request Nov 7, 2020
Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe

If it works it should fix more LGBM builds
AlbertoEAF added a commit that referenced this pull request Nov 9, 2020
Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe

If it works it should fix more LGBM builds
AlbertoEAF added a commit that referenced this pull request Nov 16, 2020
Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe

If it works it should fix more LGBM builds
AlbertoEAF added a commit that referenced this pull request Nov 21, 2020
Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe

If it works it should fix more LGBM builds
AlbertoEAF added a commit that referenced this pull request Nov 24, 2020
Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe

If it works it should fix more LGBM builds
AlbertoEAF added a commit that referenced this pull request Jan 16, 2021
…3405)

* Fix LightGBM models locale sensitivity and improve R/W performance.

When Java is used, the default C++ locale is broken. This is true for
Java providers that use the C API or even Python models that require JEP.

This patch solves that issue making the model reads/writes insensitive
to such settings.
To achieve it, within the model read/write codebase:
 - C++ streams are imbued with the classic locale
 - Calls to functions that are dependent on the locale are replaced
 - The default locale is not changed!

This approach means:
 - The user's locale is never tampered with, avoiding issues such as
    microsoft#2979 with the previous
    approach microsoft#2891
 - Datasets can still be read according the user's locale
 - The model file has a single format independent of locale

Changes:
 - Add CommonC namespace which provides faster locale-independent versions of Common's methods
 - Model code makes conversions through CommonC
 - Cleanup unused Common methods
 - Performance improvements. Use fast libraries for locale-agnostic conversion:
   - value->string: https://github.com/fmtlib/fmt
   - string->double: https://github.com/lemire/fast_double_parser (10x
      faster double parsing according to their benchmark)

Bugfixes:
 - microsoft#2500
 - microsoft#2890
 - ninia/jep#205 (as it is related to LGBM as well)

* Align CommonC namespace

* Add new external_libs/ to python setup

* Try fast_double_parser fix #1

Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe

If it works it should fix more LGBM builds

* CMake: Attempt to link fmt without explicit PUBLIC tag

* Exclude external_libs from linting

* Add exernal_libs to MANIFEST.in

* Set dynamic linking option for fmt.

* linting issues

* Try to fix lint includes

* Try to pass fPIC with static fmt lib

* Try CMake P_I_C option with fmt library

* [R-package] Add CMake support for R and CRAN

* Cleanup CMakeLists

* Try fmt hack to remove stdout

* Switch to header-only mode

* Add PRIVATE argument to target_link_libraries

* use fmt in header-only mode

* Remove CMakeLists comment

* Change OpenMP to PUBLIC linking in Mac

* Update fmt submodule to 7.1.2

* Use fmt in header-only-mode

* Remove fmt from CMakeLists.txt

* Upgrade fast_double_parser to v0.2.0

* Revert "Add PRIVATE argument to target_link_libraries"

This reverts commit 3dd45dd.

* Address James Lamb's comments

* Update R-package/.Rbuildignore

Co-authored-by: James Lamb <[email protected]>

* Upgrade to fast_double_parser v0.3.0 - Solaris support

* Use legacy code only in Solaris

* Fix lint issues

* Fix comment

* Address StrikerRUS's comments (solaris ifdef).

* Change header guards

Co-authored-by: James Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant