Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative wire protocols support (RowBinaryWithNamesAndTypes) #255

Merged
merged 39 commits into from
Feb 28, 2020

Conversation

traceon
Copy link
Collaborator

@traceon traceon commented Jan 20, 2020

Closes #242.
Prepares some parts for #235.

  • Modified CMake/CTest config to accept a list of DSNs to use with all tests each, instead of ansi vs unicode DSNs (TEST_DSN_LIST instead of TEST_DSN and TEST_DSN_W)
  • Added a DSN with ANSI driver and RowBinaryWithNamesAndTypes format
  • Fixed NDEBUG and BUILD_TYPE_* macro setting
  • Removed driver/utils/read_helpers.{h,cpp} (functionality moved to ODBCDriver2ResultSet class)
  • Removed driver/utils/scope_guard.h - not used
  • Moved driver/type_info.cpp to driver/utils/type_info.cpp
  • Rewrote result set implementation, interface and usage in SQLGetData/SQLFetch/SQLFetchScroll
  • Factored out format parsing into ODBCDriver2ResultSet (old behavior), added new RowBinaryWithNamesAndTypesResultSet
  • Removed unused vars to represent DSN entries
  • Renamed some vars used to represent DSN entries
  • Rearranged code that processes vars used to represent DSN entries to follow the same order everywhere
  • Introduced type id, to dispatch type-related actions effectively
  • Modified TypeParser to support extracting precision and scale info from DecimalXYZ types
  • Added DataSourceType<TypeID> thin wrapper classes that store values, to be able to define overloads and dispatch easier
  • Fully revisited type conversion codes, added new combinations, generalized/factored out buffer filling during conversion from the core conversion definitions
  • Unified field/row representation, to be suitable for both formats (was std::string, now std::variant<all DataSourceType<TypeID>s...>)
  • Moved type/value related utility functions from driver/utils/utils.h to driver/utils/type_info.h (like fillOutputBuffer(), etc.)
  • Using string/vector object pool in result set implementation to avoid reallocations
  • Added several performance tests

@traceon
Copy link
Collaborator Author

traceon commented Jan 23, 2020

For the record, the average output of Release version of clickhouse-odbc-client-it 'ClickHouse DSN (ANSI)' --gtest_filter='PerformanceTest.*', default ODBCDriver2 format, on a reference machine at this point, with Threading = 0 in driver's section in odbcinst.ini:

Note: Google Test filter = PerformanceTest.*
[==========] Running 9 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 9 tests from PerformanceTest
[ RUN      ] PerformanceTest.UnimplementedAPICallOverhead
Executed in:
	0.757812025 seconds
[       OK ] PerformanceTest.UnimplementedAPICallOverhead (803 ms)
[ RUN      ] PerformanceTest.NoOpAPICallOverhead
Executed in:
	0.056845678 seconds
[       OK ] PerformanceTest.NoOpAPICallOverhead (64 ms)
[ RUN      ] PerformanceTest.FetchNoExtractMultiType
Executing query:
	SELECT CAST('some not very long text', 'String') AS col1, CAST('12345', 'Int') AS col2, CAST('12.345', 'Float32') AS col3, CAST('-123.456789012345678', 'Float64') AS col4 FROM numbers(10000000)
Executed in:
	3.403966398 seconds
[       OK ] PerformanceTest.FetchNoExtractMultiType (3416 ms)
[ RUN      ] PerformanceTest.FetchGetDataMultiType
Executing query:
	SELECT CAST('some not very long text', 'String') AS col1, CAST('12345', 'Int') AS col2, CAST('12.345', 'Float32') AS col3, CAST('-123.456789012345678', 'Float64') AS col4 FROM numbers(10000000)
Executed in:
	10.668867184 seconds
[       OK ] PerformanceTest.FetchGetDataMultiType (10677 ms)
[ RUN      ] PerformanceTest.FetchBindColMultiType
Executing query:
	SELECT CAST('some not very long text', 'String') AS col1, CAST('12345', 'Int') AS col2, CAST('12.345', 'Float32') AS col3, CAST('-123.456789012345678', 'Float64') AS col4 FROM numbers(10000000)
Executed in:
	9.367474781 seconds
[       OK ] PerformanceTest.FetchBindColMultiType (9375 ms)
[ RUN      ] PerformanceTest.FetchBindColSingleType_ANSI_String
Executing query:
	SELECT CAST('some not very long text', 'String') AS col FROM numbers(10000000)
Executed in:
	2.974570919 seconds
[       OK ] PerformanceTest.FetchBindColSingleType_ANSI_String (2979 ms)
[ RUN      ] PerformanceTest.FetchBindColSingleType_Unicode_String
Executing query:
	SELECT CAST('some not very long text', 'String') AS col FROM numbers(10000000)
Executed in:
	4.030959639 seconds
[       OK ] PerformanceTest.FetchBindColSingleType_Unicode_String (4036 ms)
[ RUN      ] PerformanceTest.FetchBindColSingleType_Int
Executing query:
	SELECT CAST('12345', 'Int') AS col FROM numbers(10000000)
Executed in:
	2.441048792 seconds
[       OK ] PerformanceTest.FetchBindColSingleType_Int (2447 ms)
[ RUN      ] PerformanceTest.FetchBindColSingleType_Float64
Executing query:
	SELECT CAST('-123.456789012345678', 'Float64') AS col FROM numbers(10000000)
Executed in:
	3.783190085 seconds
[       OK ] PerformanceTest.FetchBindColSingleType_Float64 (3789 ms)
[----------] 9 tests from PerformanceTest (37587 ms total)

[----------] Global test environment tear-down
[==========] 9 tests from 1 test suite ran. (37587 ms total)
[  PASSED  ] 9 tests.

* switch-to-variant:
  Do not attempt to install already installed packages in brew
  No more Python 2 in brew, using Python (3)
  Fix description for CentOS 7
  Roll back poco submodule
  Define BUILD_TYPE_* Enable perf tests only if BUILD_TYPE_Release is defined
  Bump submodules
  Update LICENSE
  Report iteration count, throughput, and latency in perf test measurements
  Rename CH_ODBC_ENABLE_SAFE_DISPATCH_ONLY to CH_ODBC_ALLOW_UNSAFE_DISPATCH Rename WORKAROUND_ENABLE_SAFE_DISPATCH_ONLY to WORKAROUND_ALLOW_UNSAFE_DISPATCH Rename WORKAROUND_ENABLE_SSL to WORKAROUND_DISABLE_SSL Add comments
  Fix getObjectHandleType() usage Remove general case implementations for getObjectHandleType() and getObjectTypeName()
  Fix int -> string attribute value extraction checks
  Fix fromString()

# Conflicts:
#	.travis.yml
#	driver/connection.cpp
#	driver/test/performance_it.cpp
#	driver/utils/utils.h
…{n}" is appended to each test command name)

Modify testing to support arbitrary list of DSNs
Add a DSN in travis testing config, that access server using RowBinaryWithNamesAndTypes and ANSI driver
Fix vs2017 compilation
@traceon traceon changed the title WIP: Alternative wire protocols support Alternative wire protocols support (RowBinaryWithNamesAndTypesResultSet) Feb 17, 2020
} // namespace value_manip

template <typename T>
struct SimpleTypeWrapper {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason you need this wrapper? Why not just a type alias?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To define to_null()'ing c-tor, and reuse code in general. Later, it is used as base for the most of DataSourceType<>, which in their turn used for better control over types and their static dispatch/overloads.

std::string sql_type_name;
bool is_unsigned;
SQLSMALLINT sql_type;
int32_t column_size;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add comments explaining what is the difference between column_size and octet_length ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, please add this link as a comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

#include "driver/platform/platform.h"
#include "driver/result_set.h"

class RowBinaryWithNamesAndTypesResultSet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment explaining what is the solve purpose of this ResultSet subclass and how it is different from any other ResultSet ?

dest.data = std::move(value);
}

void readValue(DataSourceType< DataSourceTypeId::Date > & dest, ColumnInfo & column_info);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO DataSourceType< DataSourceTypeId::Date> is a bit verbose, would you consider adding a type-aliases DataSourceTypeDate, etc ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of aliases. You need to decompose those aliases in your mind each time you meet them, and store more "non-uniform" piece of data in your head, which spends your working memory resources.

I.e., while this may look like more raw data (in terms of characters):

DataSourceType< DataSourceTypeId::DateTime >
DataSourceType< DataSourceTypeId::Decimal   >
DataSourceType< DataSourceTypeId::Int64        >

this is more data in terms of unique slots in your memory:

DataSourceTypeDateTime
DataSourceTypeDecimal
DataSourceTypeInt64

(but the latter is better than using some random names, obviously.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, there is a hidden structure in this naming: DataSourceTypeDateTime.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whereas, that stricture is explicit in: DataSourceType< DataSourceTypeId::DateTime >
So you don't need to do extra work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, right now it looks "guts out": the 'hidden' struct here is a mere implementation detail.

Copy link
Collaborator Author

@traceon traceon Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's not an underlying struct, it's a type id enum value. A very "facing out" thing.

void convert_via_proxy(const SourceType & src, DestinationType & dest);

template <typename SourceType>
struct from_value {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, from_value<T>::to_value<Y>::convert() is a bit of over complication (both in calling code and implementation), could that be set of convertFromValue(const X & x, Y & y) function overloads ?
Also, that would allow moving all that code into .cpp

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the evolution of this code I went through different configurations, and I actually saw what you mentioned too. It was just too non-uniform and hard to maintain, to have such different ways of calling the conversion code at that time. Things become more complicated when you try to do SFINAE or partial specializations. So I picked this representation, because it is actually the only maintainable one.

Copy link
Collaborator

@Enmk Enmk Feb 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done in a more simple manner: https://godbolt.org/z/HZWv6M

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe... if it could be applied to the entire code, including partial specialization cases.

};

template <typename DestinationType>
struct to_buffer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing applies to to_buffer<X>::from_value<Y>::convert, maybe convertFromValueToBuffer(const Y & y, X & x) ?

Copy link
Collaborator

@Enmk Enmk Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALSO, to_buffer breaks pattern:

from_SOMETHING<X>::to_SOMETHING_ELSE<Y>::convert(x, y)
to opposite:
to_SOMETHING<X>::from_SOMETHING_ELSE<X>::convert(y, x)

and that complicates things even further.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's severely affect maintainability, as mentioned earlier.

Copy link
Collaborator Author

@traceon traceon Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it brake the pattern?

to_value    <OfType>
from_value  <OfType>
to_buffer   <OfType>
from_buffer <OfType>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so to_smth and from_smt defined and used differently, according to the way the conversion are used, and that allows to define some default conversions that cover more cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By interrupting the flow, most of the code here converters from left type to right type, and to_buffer is an exception. What I am trying to say: this is already complicated enough, no need to make it even harder to grasp.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's all about compromises. Preserving to_* -> from_* structure may result in more actual code, that does pretty much the same thing.

};

template <typename SourceType>
struct from_buffer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_buffer<X>::to_value<Y>::convert() => convertFromBufferToValue(const X & x, Y & y) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's severely affect maintainability, as mentioned earlier.

@@ -126,6 +110,37 @@ struct UTF8CaseInsensitiveCompare {
}
};

template<typename T>
class ObjectPool {
Copy link
Collaborator

@Enmk Enmk Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add comment describing that this pool is for, and to document that if behaves in LRU-fashion by dropping old object after reaching certain size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Collaborator

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have few questions regarding this PR.

Also (if it is possible), please make sure that type_info.h is detected as moved in git. That might help github to detect move also, and hence greatly simplify the review process.

Another issue type_info.h is basically type info + data types + type traits + conversion functions + fillOutput functions smashed together into a huge lump of code, too big to be digestible by anyone except you now (and I am afraid in two months time even you wouldn't be able to promptly reason about what is going on here). Please, split it into meaningful pieces.

@traceon
Copy link
Collaborator Author

traceon commented Feb 25, 2020

Also (if it is possible), please make sure that type_info.h is detected as moved in git. That might help github to detect move also, and hence greatly simplify the review process.

The only way to achieve this that I know of, is to first move the file, and then make changes in a separate commit. Cannot be done retrospectively.

@Enmk
Copy link
Collaborator

Enmk commented Feb 25, 2020

Another thing: please point me to a location of tests for ResultSet parsing (both ODBCv2 and RowBinaryWithNames formats)

struct from_value<std::string>::to_value<std::string> {
using DestinationType = std::string;

static inline void convert(const SourceType & src, DestinationType & dest) {
Copy link
Collaborator

@Enmk Enmk Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add extra overload here? :

        static inline void convert(SourceType & src, DestinationType & dest) {
            dest = std::move(src);
        }

Copy link
Collaborator Author

@traceon traceon Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular conversion is not used on a critical path. Also, managing storage in convert()-type functions proved to be hell.

@traceon
Copy link
Collaborator Author

traceon commented Feb 25, 2020

Another thing: please point me to a location of tests for ResultSet parsing (both ODBCv2 and RowBinaryWithNames formats)

This is tested by virtually all integration tests, if the DSN is configured to used the corrsponding wire format.
In travis jobs, for regular DSN's the default ODBCDriver2 is used. For the ClickHouse DSN (ANSI, RBWNAT) DSN the RowBinaryWithNamesAndTypes format is used. In the travis runs all tests that end with -dsn-2 are using RowBinaryWithNamesAndTypesResultSet processing code.

dest.value.day = tm.tm_mday;
}

void RowBinaryWithNamesAndTypesResultSet::readValue(DataSourceType<DataSourceTypeId::DateTime> & dest, ColumnInfo & column_info) {
Copy link
Collaborator

@Enmk Enmk Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please validate that Date and DateTime values returned by the ODBC driver are equal to the values returned by ClickHouse client when local timezone differs from timezone of server.

dest = src;
}
else {
convert_via_proxy<std::string>(src, dest);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and every other convert_via_proxy<std::string> looks spooky and very sub-optimal, luckily these branches are never hit: traceon@cf58fff

And can be safely removed.

Clarify/fix temporaries handling
Comments from @Enmk
Copy link
Collaborator

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, please make sure to address #269 after merging

@traceon
Copy link
Collaborator Author

traceon commented Feb 28, 2020

Yes, sure. Thank you

@traceon traceon merged commit bd0dbd4 into ClickHouse:master Feb 28, 2020
@traceon traceon deleted the alternative-wire-protocol branch February 29, 2020 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants