Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode conversion rework #278

Merged
merged 64 commits into from
Apr 13, 2020
Merged
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
db9a97c
Move AmortizedIStreamReader to driver/utils/amortized_istream_reader.h
traceon Mar 9, 2020
31484c9
WIP: detect and use system ICU
traceon Mar 9, 2020
0b25b4b
Move unicode_conv.h to unicode_conv_std.hpp
traceon Mar 11, 2020
459d9d8
Add unicode_conv_icu.cpp stub
traceon Mar 11, 2020
bbf54cd
Smoother ICU integration into the build system
traceon Mar 11, 2020
ef8a32b
Disable CH_ODBC_PREFER_BUNDLED_ICU by default
traceon Mar 11, 2020
9e3e339
ICU Unicode conversion drop-in implementation
traceon Mar 13, 2020
c95cfb2
Use fixed NTSBufferLength()
traceon Mar 14, 2020
3371522
Remove unused stuff
traceon Mar 14, 2020
2582b10
Fix naming
traceon Mar 14, 2020
bffba43
Use ICU
traceon Mar 14, 2020
10b4ed2
Add proper dispatching according to the expected character sizes
traceon Mar 14, 2020
ea861e0
Compilation fix
traceon Mar 14, 2020
4cdc3e1
Fix bug
traceon Mar 14, 2020
b3443dc
Naming changes
traceon Mar 21, 2020
9562c0e
Code cleanup
traceon Mar 21, 2020
aee8fac
Choose default wide-char encoding according to platform specs
traceon Mar 21, 2020
89e4495
Change to typedefed char type
traceon Mar 21, 2020
dbc16b8
WIP: debugging mac builds
traceon Mar 21, 2020
2ced750
WIP: debugging mac builds
traceon Mar 21, 2020
3acf4fa
WIP: debugging mac builds
traceon Mar 21, 2020
f840b69
Move common code to parent header
traceon Mar 21, 2020
a5f88bc
Always set ICU root path for macOS builds
traceon Mar 21, 2020
60af4c9
Fixed macOS builds
traceon Mar 21, 2020
2b9cc1e
Delete folly
traceon Mar 22, 2020
2617708
Add internal folly fork
traceon Mar 22, 2020
8adf049
wip
traceon Mar 22, 2020
181fb76
wip
traceon Mar 22, 2020
ef4746e
Add internal folly fork
traceon Mar 22, 2020
4590fc6
Bump submodule
traceon Mar 22, 2020
71e479f
Bump submodule
traceon Mar 22, 2020
c8c8827
Enable folly::resizeWithoutInitialization everywhere
traceon Mar 22, 2020
1852c82
Fix TODOs: use resize_without_initialization() everywhere
traceon Mar 22, 2020
d54c92d
Fix size checks for BOM cases
traceon Mar 22, 2020
a331ff4
Accomodate the test for variable character encodings
traceon Mar 22, 2020
d033303
Bump folly
traceon Mar 23, 2020
6bb1a3b
Implement signature/BOM detection and truncation on edges
traceon Mar 25, 2020
3058fb3
Typo fixes
traceon Mar 25, 2020
1c9e6de
Typo fix - missing argument
traceon Mar 25, 2020
b7d7174
Remove unnecessary (and invalid) mismatching char size workaround sup…
traceon Mar 26, 2020
39bcb88
Remove irrelevant cases
traceon Mar 26, 2020
f6a41b6
Avoid incorrect instantiations
traceon Mar 26, 2020
87e2ab3
Fix buffer size increment calculations
traceon Mar 27, 2020
a8cb060
WIP: refactoring of the unicode conversion code
traceon Mar 29, 2020
dba3acc
WIP: refactoring of the unicode conversion code
traceon Mar 30, 2020
882262a
Implement signature prepending
traceon Mar 30, 2020
33d0813
GCC compilation fix
traceon Mar 30, 2020
3ad37c8
Windows compilation fix
traceon Mar 30, 2020
e32793e
Add icu requirements
traceon Mar 30, 2020
24eba04
Fix signature detection
traceon Mar 30, 2020
b4a9f02
Fix typo
traceon Mar 30, 2020
ea291fc
Fix typo
traceon Mar 30, 2020
32ec45a
Fix typo
traceon Mar 30, 2020
1395df5
Fix duplicate BOM filtering
traceon Mar 30, 2020
1243651
Move convertEncoding to unicode_converter.h
traceon Mar 31, 2020
691ddee
WIP prepare for WORKAROUND_ICU_USE_EXPLICIT_PIVOTING
traceon Mar 31, 2020
79df886
Fix compilation
traceon Mar 31, 2020
7b85aec
Adjust folly temporary branch
traceon Apr 1, 2020
ed24f4b
Add comments
traceon Apr 6, 2020
009ffdf
Move code from original places to:
traceon Apr 6, 2020
c80519d
Windows compilation fix
traceon Apr 6, 2020
eed69f8
Switch to official folly repo
traceon Apr 7, 2020
1eaf95c
Tabs to spaces
traceon Apr 7, 2020
bf4f2ba
Delete UnicodeConverter copy c-rot and assignment operator
traceon Apr 13, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 15 additions & 15 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
[submodule "contrib/poco"]
path = contrib/poco
url = https://github.com/pocoproject/poco.git
branch = master
path = contrib/poco
url = https://github.com/pocoproject/poco.git
branch = master
[submodule "contrib/ssl"]
path = contrib/ssl
url = https://github.com/ClickHouse-Extras/ssl.git
branch = master
path = contrib/ssl
url = https://github.com/ClickHouse-Extras/ssl.git
branch = master
[submodule "contrib/nanodbc"]
path = contrib/nanodbc
url = https://github.com/nanodbc/nanodbc.git
branch = master
path = contrib/nanodbc
url = https://github.com/nanodbc/nanodbc.git
branch = master
[submodule "contrib/googletest"]
path = contrib/googletest
url = https://github.com/google/googletest.git
branch = master
path = contrib/googletest
url = https://github.com/google/googletest.git
branch = master
[submodule "contrib/folly"]
path = contrib/folly
url = https://github.com/facebook/folly.git
branch = master
path = contrib/folly
url = https://github.com/facebook/folly.git
branch = master
13 changes: 13 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ jobs:
packages:
- gcc
- g++
- libicu-dev
- unixodbc
- unixodbc-bin
- unixodbc-dev
Expand Down Expand Up @@ -70,6 +71,7 @@ jobs:
packages:
- gcc
- g++
- libicu-dev
- unixodbc
- unixodbc-bin
- unixodbc-dev
Expand Down Expand Up @@ -98,6 +100,7 @@ jobs:
packages:
- gcc
- g++
- libicu-dev
- unixodbc
- unixodbc-bin
- unixodbc-dev
Expand Down Expand Up @@ -129,6 +132,7 @@ jobs:
- openssl
- libssl-dev
- libpoco-dev
- libicu-dev
- googletest
- unixodbc
- unixodbc-bin
Expand Down Expand Up @@ -157,6 +161,7 @@ jobs:
key_url: 'https://apt.kitware.com/keys/kitware-archive-latest.asc'
packages:
- clang
- libicu-dev
- unixodbc
- unixodbc-bin
- unixodbc-dev
Expand Down Expand Up @@ -185,6 +190,7 @@ jobs:
packages:
- gcc
- g++
- libicu-dev
- iodbc
- libiodbc2
- libiodbc2-dev
Expand Down Expand Up @@ -212,6 +218,7 @@ jobs:
key_url: 'https://apt.kitware.com/keys/kitware-archive-latest.asc'
packages:
- clang
- libicu-dev
- iodbc
- libiodbc2
- libiodbc2-dev
Expand Down Expand Up @@ -256,6 +263,7 @@ jobs:
addons:
homebrew:
packages:
- icu4c
- libiodbc
update: true

Expand All @@ -275,6 +283,7 @@ jobs:
packages:
- openssl
- poco
- icu4c
- libiodbc
update: true

Expand All @@ -292,6 +301,7 @@ jobs:
addons:
homebrew:
packages:
- icu4c
- unixodbc
update: true

Expand Down Expand Up @@ -430,6 +440,9 @@ install: |-

.configure: &configure |-
CMAKE_CONFIGURE_ARGS="-DTEST_DSN_LIST=\"ClickHouse DSN (ANSI);ClickHouse DSN (Unicode);ClickHouse DSN (ANSI, RBWNAT)\" $CMAKE_CONFIGURE_EXTRA_ARGS"
if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then
CMAKE_CONFIGURE_ARGS="-DICU_ROOT=$(brew --prefix)/opt/icu4c $CMAKE_CONFIGURE_ARGS"
fi
if [[ "$EXTERNAL_THIRD_PARTY" == "yes" ]]; then
CMAKE_CONFIGURE_ARGS="-DCH_ODBC_PREFER_BUNDLED_THIRD_PARTIES=OFF $CMAKE_CONFIGURE_ARGS"
if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then
Expand Down
9 changes: 9 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,12 @@ option (CH_ODBC_ALLOW_UNSAFE_DISPATCH "Allow unchecked handle dispatching (may s
option (CH_ODBC_ENABLE_SSL "Enable SSL (required for utilizing https:// interface, etc.)" ON)
option (CH_ODBC_ENABLE_INSTALL "Enable install targets (required for packaging)" ON)
cmake_dependent_option (CH_ODBC_ENABLE_TESTING "Enable test targets" ON "BUILD_TESTING" OFF)
cmake_dependent_option (CH_ODBC_USE_ICU "Use ICU library, instead of C++ STD, for Unicode conversions" ON "NOT MSVC" OFF)
option (CH_ODBC_PREFER_BUNDLED_THIRD_PARTIES "Prefer bundled over system variants of third party libraries" ON)
cmake_dependent_option (CH_ODBC_PREFER_BUNDLED_POCO "Prefer bundled over system variants of Poco library" ON "CH_ODBC_PREFER_BUNDLED_THIRD_PARTIES" OFF)
cmake_dependent_option (CH_ODBC_PREFER_BUNDLED_SSL "Prefer bundled over system variants of SSL library" ON "CH_ODBC_PREFER_BUNDLED_POCO" OFF)
cmake_dependent_option (CH_ODBC_PREFER_BUNDLED_FOLLY "Prefer bundled over system variants of Folly library" ON "CH_ODBC_PREFER_BUNDLED_THIRD_PARTIES" OFF)
cmake_dependent_option (CH_ODBC_PREFER_BUNDLED_ICU "Prefer bundled over system variants of ICU library" OFF "CH_ODBC_PREFER_BUNDLED_THIRD_PARTIES" OFF)
cmake_dependent_option (CH_ODBC_PREFER_BUNDLED_GOOGLETEST "Prefer bundled over system variants of Google Test library" ON "CH_ODBC_PREFER_BUNDLED_THIRD_PARTIES" OFF)
cmake_dependent_option (CH_ODBC_PREFER_BUNDLED_NANODBC "Prefer bundled over system variants of nanodbc library" ON "CH_ODBC_PREFER_BUNDLED_THIRD_PARTIES" OFF)
option (CH_ODBC_RUNTIME_LINK_STATIC "Link with compiler and language runtime statically" OFF)
Expand Down Expand Up @@ -137,6 +139,13 @@ if (NOT CH_ODBC_PREFER_BUNDLED_FOLLY)
# find_package (Folly)
endif ()

if (CH_ODBC_USE_ICU)
Copy link
Collaborator

@Enmk Enmk Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, cmake wasn't able to find icu installed via homebrew on my mac. Is there any workaround/fix for that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to specify the path manually, its a known issue of FindICU + homebrew.
Use this command (as per README.md):

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DOPENSSL_ROOT_DIR=$(brew --prefix)/opt/openssl -DICU_ROOT=$(brew --prefix)/opt/icu4c ..

if (CH_ODBC_PREFER_BUNDLED_ICU)
message (WARNING "ICU: using bundled variant of the library currently not supported")
endif ()
find_package (ICU COMPONENTS uc REQUIRED)
endif ()

if (CH_ODBC_ENABLE_TESTING)
if (NOT CH_ODBC_PREFER_BUNDLED_GOOGLETEST)
find_package (GTest)
Expand Down
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,15 +177,15 @@ Another run-time dependecies are `C++ Redistributable for Visual Studio 2017` or
Homebrew: execute the following in the terminal (assuming you have [Homebrew](https://brew.sh/) installed):

```sh
brew install poco openssl libiodbc
brew install poco openssl icu4c libiodbc
```

#### UnixODBC <!-- omit in toc -->

Homebrew: execute the following in the terminal (assuming you have [Homebrew](https://brew.sh/) installed):

```sh
brew install poco openssl unixodbc
brew install poco openssl icu4c unixodbc
```

### Run-time dependencies: Red Hat/CentOS
Expand All @@ -195,15 +195,15 @@ brew install poco openssl unixodbc
Execute the following in the terminal:

```sh
sudo yum install openssl unixODBC
sudo yum install openssl libicu unixODBC
```

#### iODBC <!-- omit in toc -->

Execute the following in the terminal:

```sh
sudo yum install openssl libiodbc
sudo yum install openssl libicu libiodbc
```

### Run-time dependencies: Debian/Ubuntu
Expand All @@ -213,15 +213,15 @@ sudo yum install openssl libiodbc
Execute the following in the terminal:

```sh
sudo apt install openssl unixodbc
sudo apt install openssl libicu unixodbc
```

#### iODBC <!-- omit in toc -->

Execute the following in the terminal:

```sh
sudo apt install openssl libiodbc2
sudo apt install openssl libicu libiodbc2
```

### Configuration: MDAC/WDAC (Microsoft/Windows Data Access Components)
Expand Down Expand Up @@ -368,15 +368,15 @@ You will need Xcode 10 or later and Command Line Tools to be installed, as well
Homebrew: execute the following in the terminal (assuming you have [Homebrew](https://brew.sh/) installed):

```sh
brew install git cmake make poco openssl libiodbc
brew install git cmake make poco openssl icu4c libiodbc
```

#### Build-time dependencies: UnixODBC <!-- omit in toc -->

Homebrew: execute the following in the terminal (assuming you have [Homebrew](https://brew.sh/) installed):

```sh
brew install git cmake make poco openssl unixodbc
brew install git cmake make poco openssl icu4c unixodbc
```

#### Build steps <!-- omit in toc -->
Expand All @@ -397,7 +397,7 @@ cd build
# Configuration options for the project can be specified in the next command in a form of '-Dopt=val'.

# You may also add '-G Xcode' to the next command, in order to use Xcode as a build system or IDE, and generate the solution and project files instead of Makefile.
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DOPENSSL_ROOT_DIR=$(brew --prefix)/opt/openssl -DICU_ROOT=$(brew --prefix)/opt/icu4c ..
```

Build the generated solution in-place:
Expand Down Expand Up @@ -430,7 +430,7 @@ sudo yum install epel-release
sudo yum groupinstall "Development Tools"
sudo yum install centos-release-scl
sudo yum install devtoolset-8
sudo yum install git cmake3 openssl-devel unixODBC-devel
sudo yum install git cmake3 openssl-devel libicu-devel unixODBC-devel
```

#### Build-time dependencies: iODBC <!-- omit in toc -->
Expand All @@ -442,7 +442,7 @@ sudo yum install epel-release
sudo yum groupinstall "Development Tools"
sudo yum install centos-release-scl
sudo yum install devtoolset-8
sudo yum install git cmake3 openssl-devel libiodbc-devel
sudo yum install git cmake3 openssl-devel libicu-devel libiodbc-devel
```

#### Build steps <!-- omit in toc -->
Expand Down Expand Up @@ -491,15 +491,15 @@ cmake3 --build . --config RelWithDebInfo --target test
Execute the following in the terminal:

```sh
sudo apt install build-essential git cmake libpoco-dev libssl-dev unixodbc-dev
sudo apt install build-essential git cmake libpoco-dev libssl-dev libicu-dev unixodbc-dev
```

#### Build-time dependencies: iODBC <!-- omit in toc -->

Execute the following in the terminal:

```sh
sudo apt install build-essential git cmake libpoco-dev libssl-dev libiodbc2-dev
sudo apt install build-essential git cmake libpoco-dev libssl-dev libicu-dev libiodbc2-dev
```

#### Build steps <!-- omit in toc -->
Expand Down
2 changes: 1 addition & 1 deletion contrib/folly
Submodule folly updated 194 files
24 changes: 23 additions & 1 deletion driver/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ function (declare_odbc_lib_targets libname UNICODE)
# See the usage of each workaround in the code, for comments.
unset (WORKAROUND_ALLOW_UNSAFE_DISPATCH)
unset (WORKAROUND_DISABLE_SSL)
unset (WORKAROUND_USE_ICU)
unset (WORKAROUND_ENABLE_TRIM_TRAILING_NULL)
unset (WORKAROUND_ENABLE_DEFINE_SQLBindParam)
unset (WORKAROUND_ENABLE_NO_UNICODE_CHARS_IN_COLUMN_NAMES_IN_TESTS)
Expand Down Expand Up @@ -31,6 +32,10 @@ if (NOT CH_ODBC_ENABLE_SSL)
set (WORKAROUND_DISABLE_SSL 1)
endif()

if (CH_ODBC_USE_ICU)
set (WORKAROUND_USE_ICU 1)
Enmk marked this conversation as resolved.
Show resolved Hide resolved
endif ()

message (STATUS "${libname}: ${DRIVER_PREFIX}")

configure_file (
Expand All @@ -40,6 +45,7 @@ configure_file (

unset (WORKAROUND_ALLOW_UNSAFE_DISPATCH)
unset (WORKAROUND_DISABLE_SSL)
unset (WORKAROUND_USE_ICU)
unset (WORKAROUND_ENABLE_TRIM_TRAILING_NULL)
unset (WORKAROUND_ENABLE_DEFINE_SQLBindParam)
unset (WORKAROUND_ENABLE_NO_UNICODE_CHARS_IN_COLUMN_NAMES_IN_TESTS)
Expand All @@ -48,6 +54,8 @@ unset (WORKAROUND_ENABLE_NO_UNICODE_CHARS_IN_COLUMN_NAMES_IN_TESTS)
add_library (${libname}-impl STATIC
utils/type_parser.cpp
utils/type_info.cpp
utils/unicode_converter.cpp
utils/conversion_context.cpp

config/config.cpp

Expand Down Expand Up @@ -75,8 +83,16 @@ add_library (${libname}-impl STATIC
platform/platform.h

utils/utils.h
utils/unicode_conv.h
utils/iostream_debug_helpers.h
utils/amortized_istream_reader.h
utils/resize_without_initialization.h
utils/object_pool.h
utils/string_pool.h
utils/unicode_converter.h
utils/conversion_context.h
utils/conversion.h
utils/conversion_std.h
utils/conversion_icu.h
utils/type_parser.h
utils/type_info.h

Expand Down Expand Up @@ -154,6 +170,12 @@ target_link_libraries (${libname}-impl
PUBLIC Threads::Threads
)

if (CH_ODBC_USE_ICU)
target_link_libraries (${libname}-impl
PUBLIC ICU::uc
)
endif ()

add_library (${libname} SHARED
api/odbc.cpp
${WIN_SOURCES}
Expand Down
2 changes: 1 addition & 1 deletion driver/api/odbc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ SQLRETURN SQL_API EXPORTED_FUNCTION_MAYBE_W(SQLDriverConnect)(
if (StringLength1 > 0)
out_buffer_length = StringLength1;
else if (StringLength1 == SQL_NTS)
out_buffer_length = NTSStringLength(InConnectionString) + 1; // +1 for null terminating character
out_buffer_length = stringBufferLength(InConnectionString);
else
out_buffer_length = 1024; // ...as per SQLDriverConnect() doc: "Applications should allocate at least 1,024 characters for this buffer."
}
Expand Down
2 changes: 2 additions & 0 deletions driver/config/config.h
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
#pragma once

#include "driver/platform/platform.h"
#include "driver/utils/utils.h"

#include <string>
#include <map>

using key_value_map_t = std::map<std::string, std::string, UTF8CaseInsensitiveCompare>;

Expand Down
1 change: 1 addition & 0 deletions driver/format/ODBCDriver2.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "driver/format/ODBCDriver2.h"
#include "driver/utils/resize_without_initialization.h"

ODBCDriver2ResultSet::ODBCDriver2ResultSet(AmortizedIStreamReader & stream, std::unique_ptr<ResultMutator> && mutator)
: ResultSet(stream, std::move(mutator))
Expand Down
1 change: 1 addition & 0 deletions driver/format/RowBinaryWithNamesAndTypes.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "driver/format/RowBinaryWithNamesAndTypes.h"
#include "driver/utils/resize_without_initialization.h"

#include <ctime>

Expand Down
2 changes: 2 additions & 0 deletions driver/platform/config_cmake.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
// See the usage of each workaround in the code, for comments.
#cmakedefine WORKAROUND_ALLOW_UNSAFE_DISPATCH
#cmakedefine WORKAROUND_DISABLE_SSL
#cmakedefine WORKAROUND_USE_ICU
#cmakedefine WORKAROUND_ICU_USE_EXPLICIT_PIVOTING
#cmakedefine WORKAROUND_ENABLE_TRIM_TRAILING_NULL
#cmakedefine WORKAROUND_ENABLE_DEFINE_SQLBindParam
#cmakedefine WORKAROUND_ENABLE_NO_UNICODE_CHARS_IN_COLUMN_NAMES_IN_TESTS
1 change: 1 addition & 0 deletions driver/platform/win/setup.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#include "driver/platform/platform.h"
#include "driver/platform/win/resource.h"
#include "driver/utils/utils.h"
#include "driver/utils/conversion.h"
#include "driver/config/config.h"
#include "driver/config/ini_defines.h"

Expand Down
2 changes: 1 addition & 1 deletion driver/result_set.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ std::unique_ptr<ResultReader> make_result_reader(const std::string & format, std
return std::make_unique<ODBCDriver2ResultReader>(raw_stream, std::move(mutator));
}
else if (format == "RowBinaryWithNamesAndTypes") {
if (!is_little_endian())
if (!isLittleEndian())
throw std::runtime_error("'" + format + "' format is supported only on little-endian platforms");

return std::make_unique<RowBinaryWithNamesAndTypesResultReader>(raw_stream, std::move(mutator));
Expand Down
Loading