Alternative wire protocols support (RowBinaryWithNamesAndTypes) #255

traceon · 2020-01-20T21:27:07Z

Closes #242.
Prepares some parts for #235.

Modified CMake/CTest config to accept a list of DSNs to use with all tests each, instead of ansi vs unicode DSNs (TEST_DSN_LIST instead of TEST_DSN and TEST_DSN_W)
Added a DSN with ANSI driver and RowBinaryWithNamesAndTypes format
Fixed NDEBUG and BUILD_TYPE_* macro setting
Removed driver/utils/read_helpers.{h,cpp} (functionality moved to ODBCDriver2ResultSet class)
Removed driver/utils/scope_guard.h - not used
Moved driver/type_info.cpp to driver/utils/type_info.cpp
Rewrote result set implementation, interface and usage in SQLGetData/SQLFetch/SQLFetchScroll
Factored out format parsing into ODBCDriver2ResultSet (old behavior), added new RowBinaryWithNamesAndTypesResultSet
Removed unused vars to represent DSN entries
Renamed some vars used to represent DSN entries
Rearranged code that processes vars used to represent DSN entries to follow the same order everywhere
Introduced type id, to dispatch type-related actions effectively
Modified TypeParser to support extracting precision and scale info from DecimalXYZ types
Added DataSourceType<TypeID> thin wrapper classes that store values, to be able to define overloads and dispatch easier
Fully revisited type conversion codes, added new combinations, generalized/factored out buffer filling during conversion from the core conversion definitions
Unified field/row representation, to be suitable for both formats (was std::string, now std::variant<all DataSourceType<TypeID>s...>)
Moved type/value related utility functions from driver/utils/utils.h to driver/utils/type_info.h (like fillOutputBuffer(), etc.)
Using string/vector object pool in result set implementation to avoid reallocations
Added several performance tests

Add default_format parameter

Use std::make_unique instead of bare "new"

Refine existing cases

traceon · 2020-01-23T11:25:06Z

For the record, the average output of Release version of clickhouse-odbc-client-it 'ClickHouse DSN (ANSI)' --gtest_filter='PerformanceTest.*', default ODBCDriver2 format, on a reference machine at this point, with Threading = 0 in driver's section in odbcinst.ini:

Note: Google Test filter = PerformanceTest.*
[==========] Running 9 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 9 tests from PerformanceTest
[ RUN      ] PerformanceTest.UnimplementedAPICallOverhead
Executed in:
	0.757812025 seconds
[       OK ] PerformanceTest.UnimplementedAPICallOverhead (803 ms)
[ RUN      ] PerformanceTest.NoOpAPICallOverhead
Executed in:
	0.056845678 seconds
[       OK ] PerformanceTest.NoOpAPICallOverhead (64 ms)
[ RUN      ] PerformanceTest.FetchNoExtractMultiType
Executing query:
	SELECT CAST('some not very long text', 'String') AS col1, CAST('12345', 'Int') AS col2, CAST('12.345', 'Float32') AS col3, CAST('-123.456789012345678', 'Float64') AS col4 FROM numbers(10000000)
Executed in:
	3.403966398 seconds
[       OK ] PerformanceTest.FetchNoExtractMultiType (3416 ms)
[ RUN      ] PerformanceTest.FetchGetDataMultiType
Executing query:
	SELECT CAST('some not very long text', 'String') AS col1, CAST('12345', 'Int') AS col2, CAST('12.345', 'Float32') AS col3, CAST('-123.456789012345678', 'Float64') AS col4 FROM numbers(10000000)
Executed in:
	10.668867184 seconds
[       OK ] PerformanceTest.FetchGetDataMultiType (10677 ms)
[ RUN      ] PerformanceTest.FetchBindColMultiType
Executing query:
	SELECT CAST('some not very long text', 'String') AS col1, CAST('12345', 'Int') AS col2, CAST('12.345', 'Float32') AS col3, CAST('-123.456789012345678', 'Float64') AS col4 FROM numbers(10000000)
Executed in:
	9.367474781 seconds
[       OK ] PerformanceTest.FetchBindColMultiType (9375 ms)
[ RUN      ] PerformanceTest.FetchBindColSingleType_ANSI_String
Executing query:
	SELECT CAST('some not very long text', 'String') AS col FROM numbers(10000000)
Executed in:
	2.974570919 seconds
[       OK ] PerformanceTest.FetchBindColSingleType_ANSI_String (2979 ms)
[ RUN      ] PerformanceTest.FetchBindColSingleType_Unicode_String
Executing query:
	SELECT CAST('some not very long text', 'String') AS col FROM numbers(10000000)
Executed in:
	4.030959639 seconds
[       OK ] PerformanceTest.FetchBindColSingleType_Unicode_String (4036 ms)
[ RUN      ] PerformanceTest.FetchBindColSingleType_Int
Executing query:
	SELECT CAST('12345', 'Int') AS col FROM numbers(10000000)
Executed in:
	2.441048792 seconds
[       OK ] PerformanceTest.FetchBindColSingleType_Int (2447 ms)
[ RUN      ] PerformanceTest.FetchBindColSingleType_Float64
Executing query:
	SELECT CAST('-123.456789012345678', 'Float64') AS col FROM numbers(10000000)
Executed in:
	3.783190085 seconds
[       OK ] PerformanceTest.FetchBindColSingleType_Float64 (3789 ms)
[----------] 9 tests from PerformanceTest (37587 ms total)

[----------] Global test environment tear-down
[==========] 9 tests from 1 test suite ran. (37587 ms total)
[  PASSED  ] 9 tests.

….cgi?id=85282

…some platforms

* switch-to-variant: Do not attempt to install already installed packages in brew No more Python 2 in brew, using Python (3) Fix description for CentOS 7 Roll back poco submodule Define BUILD_TYPE_* Enable perf tests only if BUILD_TYPE_Release is defined Bump submodules Update LICENSE Report iteration count, throughput, and latency in perf test measurements Rename CH_ODBC_ENABLE_SAFE_DISPATCH_ONLY to CH_ODBC_ALLOW_UNSAFE_DISPATCH Rename WORKAROUND_ENABLE_SAFE_DISPATCH_ONLY to WORKAROUND_ALLOW_UNSAFE_DISPATCH Rename WORKAROUND_ENABLE_SSL to WORKAROUND_DISABLE_SSL Add comments Fix getObjectHandleType() usage Remove general case implementations for getObjectHandleType() and getObjectTypeName() Fix int -> string attribute value extraction checks Fix fromString() # Conflicts: # .travis.yml # driver/connection.cpp # driver/test/performance_it.cpp # driver/utils/utils.h

…se them during deserialization

…{n}" is appended to each test command name) Modify testing to support arbitrary list of DSNs Add a DSN in travis testing config, that access server using RowBinaryWithNamesAndTypes and ANSI driver

Fix vs2017 compilation

…ible

driver/utils/utils.h

Enmk · 2020-02-25T05:32:45Z

driver/utils/type_info.h

+} // namespace value_manip
+
+template <typename T>
+struct SimpleTypeWrapper {


Is there any specific reason you need this wrapper? Why not just a type alias?

To define to_null()'ing c-tor, and reuse code in general. Later, it is used as base for the most of DataSourceType<>, which in their turn used for better control over types and their static dispatch/overloads.

Enmk · 2020-02-25T05:34:23Z

driver/utils/type_info.h

+    std::string sql_type_name;
+    bool is_unsigned;
+    SQLSMALLINT sql_type;
+    int32_t column_size;


Could you please add comments explaining what is the difference between column_size and octet_length ?

These are typical ODBC concepts with lengthy descriptions provided here: https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/column-size-decimal-digits-transfer-octet-length-and-display-size

Ok, please add this link as a comment

driver/format/ODBCDriver2.h

Enmk · 2020-02-25T06:27:54Z

driver/format/RowBinaryWithNamesAndTypes.h

+#include "driver/platform/platform.h"
+#include "driver/result_set.h"
+
+class RowBinaryWithNamesAndTypesResultSet


Could you please add a comment explaining what is the solve purpose of this ResultSet subclass and how it is different from any other ResultSet ?

Enmk · 2020-02-25T07:03:43Z

driver/format/RowBinaryWithNamesAndTypes.h

+        dest.data = std::move(value);
+    }
+
+    void readValue(DataSourceType< DataSourceTypeId::Date        > & dest, ColumnInfo & column_info);


IMO DataSourceType< DataSourceTypeId::Date> is a bit verbose, would you consider adding a type-aliases DataSourceTypeDate, etc ?

I am not a big fan of aliases. You need to decompose those aliases in your mind each time you meet them, and store more "non-uniform" piece of data in your head, which spends your working memory resources.

I.e., while this may look like more raw data (in terms of characters):

DataSourceType< DataSourceTypeId::DateTime > DataSourceType< DataSourceTypeId::Decimal > DataSourceType< DataSourceTypeId::Int64 >

this is more data in terms of unique slots in your memory:

DataSourceTypeDateTime DataSourceTypeDecimal DataSourceTypeInt64

(but the latter is better than using some random names, obviously.)

In other words, there is a hidden structure in this naming: DataSourceTypeDateTime.

Whereas, that stricture is explicit in: DataSourceType< DataSourceTypeId::DateTime >
So you don't need to do extra work.

IMO, right now it looks "guts out": the 'hidden' struct here is a mere implementation detail.

Oh, it's not an underlying struct, it's a type id enum value. A very "facing out" thing.

Enmk · 2020-02-25T07:37:00Z

driver/utils/type_info.h

+    void convert_via_proxy(const SourceType & src, DestinationType & dest);
+
+    template <typename SourceType>
+    struct from_value {


IMHO, from_value<T>::to_value<Y>::convert() is a bit of over complication (both in calling code and implementation), could that be set of convertFromValue(const X & x, Y & y) function overloads ?
Also, that would allow moving all that code into .cpp

During the evolution of this code I went through different configurations, and I actually saw what you mentioned too. It was just too non-uniform and hard to maintain, to have such different ways of calling the conversion code at that time. Things become more complicated when you try to do SFINAE or partial specializations. So I picked this representation, because it is actually the only maintainable one.

This can be done in a more simple manner: https://godbolt.org/z/HZWv6M

Maybe... if it could be applied to the entire code, including partial specialization cases.

Enmk · 2020-02-25T07:39:49Z

driver/utils/type_info.h

+    };
+
+    template <typename DestinationType>
+    struct to_buffer {


Same thing applies to to_buffer<X>::from_value<Y>::convert, maybe convertFromValueToBuffer(const Y & y, X & x) ?

ALSO, to_buffer breaks pattern:

from_SOMETHING<X>::to_SOMETHING_ELSE<Y>::convert(x, y)
to opposite:
to_SOMETHING<X>::from_SOMETHING_ELSE<X>::convert(y, x)

and that complicates things even further.

That's severely affect maintainability, as mentioned earlier.

How does it brake the pattern?

to_value <OfType> from_value <OfType> to_buffer <OfType> from_buffer <OfType>

ok, so to_smth and from_smt defined and used differently, according to the way the conversion are used, and that allows to define some default conversions that cover more cases.

By interrupting the flow, most of the code here converters from left type to right type, and to_buffer is an exception. What I am trying to say: this is already complicated enough, no need to make it even harder to grasp.

That's all about compromises. Preserving to_* -> from_* structure may result in more actual code, that does pretty much the same thing.

Enmk · 2020-02-25T07:40:43Z

driver/utils/type_info.h

+    };
+
+    template <typename SourceType>
+    struct from_buffer {


from_buffer<X>::to_value<Y>::convert() => convertFromBufferToValue(const X & x, Y & y) ?

That's severely affect maintainability, as mentioned earlier.

Enmk · 2020-02-25T07:57:17Z

driver/utils/utils.h

@@ -126,6 +110,37 @@ struct UTF8CaseInsensitiveCompare {
    }
 };

+template<typename T>
+class ObjectPool {


Could you please add comment describing that this pool is for, and to document that if behaves in LRU-fashion by dropping old object after reaching certain size.

Enmk

I have few questions regarding this PR.

Also (if it is possible), please make sure that type_info.h is detected as moved in git. That might help github to detect move also, and hence greatly simplify the review process.

Another issue type_info.h is basically type info + data types + type traits + conversion functions + fillOutput functions smashed together into a huge lump of code, too big to be digestible by anyone except you now (and I am afraid in two months time even you wouldn't be able to promptly reason about what is going on here). Please, split it into meaningful pieces.

traceon · 2020-02-25T08:13:28Z

Also (if it is possible), please make sure that type_info.h is detected as moved in git. That might help github to detect move also, and hence greatly simplify the review process.

The only way to achieve this that I know of, is to first move the file, and then make changes in a separate commit. Cannot be done retrospectively.

Enmk · 2020-02-25T08:39:41Z

Another thing: please point me to a location of tests for ResultSet parsing (both ODBCv2 and RowBinaryWithNames formats)

Enmk · 2020-02-25T08:42:33Z

driver/utils/type_info.h

+    struct from_value<std::string>::to_value<std::string> {
+        using DestinationType = std::string;
+
+        static inline void convert(const SourceType & src, DestinationType & dest) {


Does it make sense to add extra overload here? :

static inline void convert(SourceType & src, DestinationType & dest) { dest = std::move(src); }

This particular conversion is not used on a critical path. Also, managing storage in convert()-type functions proved to be hell.

traceon · 2020-02-25T08:48:27Z

Another thing: please point me to a location of tests for ResultSet parsing (both ODBCv2 and RowBinaryWithNames formats)

This is tested by virtually all integration tests, if the DSN is configured to used the corrsponding wire format.
In travis jobs, for regular DSN's the default ODBCDriver2 is used. For the ClickHouse DSN (ANSI, RBWNAT) DSN the RowBinaryWithNamesAndTypes format is used. In the travis runs all tests that end with -dsn-2 are using RowBinaryWithNamesAndTypesResultSet processing code.

Enmk · 2020-02-25T10:01:52Z

driver/format/RowBinaryWithNamesAndTypes.cpp

+    dest.value.day = tm.tm_mday;
+}
+
+void RowBinaryWithNamesAndTypesResultSet::readValue(DataSourceType<DataSourceTypeId::DateTime> & dest, ColumnInfo & column_info) {


Please validate that Date and DateTime values returned by the ODBC driver are equal to the values returned by ClickHouse client when local timezone differs from timezone of server.

Enmk · 2020-02-27T05:29:15Z

driver/utils/type_info.h

+                    dest = src;
+                }
+                else {
+                    convert_via_proxy<std::string>(src, dest);


This, and every other convert_via_proxy<std::string> looks spooky and very sub-optimal, luckily these branches are never hit: traceon@cf58fff

And can be safely removed.

@Enmk

Clarify/fix temporaries handling Comments from @Enmk

Enmk

Ok, please make sure to address #269 after merging

traceon · 2020-02-28T05:24:47Z

Yes, sure. Thank you

traceon added 8 commits January 21, 2020 02:54

Remove unused keys

301cf50

Fix fromString()

f3529dd

Fix int -> string attribute value extraction checks

0ee121d

Reaarrange parameters for better browsing

daa2452

Add default_format parameter

Fix renamed parameter usage

4f4a563

Use std::make_unique instead of bare "new"

Pass detected or anticipated format name to ResultSet c-tor

91bd8d0

Use std::make_unique instead of bare "new"

Fix icompare() typo

a6c03ff

Add more test cases

8ec3763

Refine existing cases

traceon added 2 commits February 4, 2020 21:11

Fix description for CentOS 7

b9fd612

Deep refactoring of data conversion and result set processing

8a9b188

traceon force-pushed the alternative-wire-protocol branch from d9d97a2 to 8a9b188 Compare February 15, 2020 20:33

traceon added 13 commits February 16, 2020 01:32

GCC compilation fix/workaround: https://gcc.gnu.org/bugzilla/show_bug…

8a0ad2f

….cgi?id=85282

No more Python 2 in brew, using Python (3)

24a1387

Fix 'long' and 'unsigned long' conversion specialization conficts on …

9cfd946

…some platforms

Bump submodules

40f6405

Fix STOP_MEASURING_TIME_AND_REPORT macro usage in new code

8bcc22f

Fix flag setting

8962392

Fix macro name

b811cdb

Basic RowBinary deserialization

afe4e4c

UUID RowBinary deserialization

5be48b2

Store precision and scale of decimal type specs in column info, and u…

8a1061c

…se them during deserialization

Use TEST_DSN_LIST instead of separate TEST_DSN and TEST_DSN_W ("-dsn-…

a714a0a

…{n}" is appended to each test command name) Modify testing to support arbitrary list of DSNs Add a DSN in travis testing config, that access server using RowBinaryWithNamesAndTypes and ANSI driver

Fix cmake invocation

85c4163

Fix vs2017 compilation

traceon force-pushed the alternative-wire-protocol branch from afcf399 to 85c4163 Compare February 17, 2020 01:11

traceon added 3 commits February 17, 2020 06:05

Shorten DSN name to avoid problems

448d55a

Remove some cases to stay within common Decimal64 boundaries

1c708be

Make best efforts to provide meaningful display size as early as poss…

3355c9d

…ible

traceon changed the title ~~WIP: Alternative wire protocols support~~ Alternative wire protocols support (RowBinaryWithNamesAndTypesResultSet) Feb 17, 2020

Enmk reviewed Feb 25, 2020

View reviewed changes

driver/utils/utils.h Show resolved Hide resolved

Enmk reviewed Feb 25, 2020

View reviewed changes

driver/format/ODBCDriver2.h Show resolved Hide resolved

Enmk reviewed Feb 25, 2020

View reviewed changes

Enmk requested changes Feb 25, 2020

View reviewed changes

Enmk reviewed Feb 25, 2020

View reviewed changes

traceon added 7 commits February 25, 2020 14:17

Add comments

d8f523c

Update brew before installing packages

758f7f5

Add "URL query string" section

5004eed

Working around wierd behavior of brew

eb5cf2d

Try versionned openssl package name

b31a5c3

Set OPENSSL_ROOT_DIR explicitly

654d0d4

Add note about deifference in timezone handling between formats

c685014

Enmk reviewed Feb 27, 2020

View reviewed changes

Add convert()'s that accept moving-into source values

58c068b

Clarify/fix temporaries handling Comments from @Enmk

Enmk mentioned this pull request Feb 27, 2020

Split type_info.h into managable chunks #269

Open

Enmk approved these changes Feb 28, 2020

View reviewed changes

traceon merged commit bd0dbd4 into ClickHouse:master Feb 28, 2020

traceon deleted the alternative-wire-protocol branch February 29, 2020 09:36

Alternative wire protocols support (RowBinaryWithNamesAndTypes) #255

Alternative wire protocols support (RowBinaryWithNamesAndTypes) #255

Conversation

traceon commented Jan 20, 2020 • edited Loading

traceon commented Jan 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

traceon Feb 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enmk Feb 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enmk Feb 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

traceon Feb 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enmk Feb 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enmk left a comment • edited Loading

Choose a reason for hiding this comment

traceon commented Feb 25, 2020

Enmk commented Feb 25, 2020

Enmk Feb 25, 2020 • edited Loading

Choose a reason for hiding this comment

traceon Feb 25, 2020 • edited Loading

Choose a reason for hiding this comment

traceon commented Feb 25, 2020 • edited Loading

Enmk Feb 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enmk left a comment

Choose a reason for hiding this comment

traceon commented Feb 28, 2020

traceon commented Jan 20, 2020 •

edited

Loading

traceon commented Jan 23, 2020 •

edited

Loading

traceon Feb 25, 2020 •

edited

Loading

Enmk Feb 28, 2020 •

edited

Loading

Enmk Feb 25, 2020 •

edited

Loading

traceon Feb 25, 2020 •

edited

Loading

Enmk Feb 25, 2020 •

edited

Loading

Enmk left a comment •

edited

Loading

Enmk Feb 25, 2020 •

edited

Loading

traceon Feb 25, 2020 •

edited

Loading

traceon commented Feb 25, 2020 •

edited

Loading

Enmk Feb 25, 2020 •

edited

Loading