Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RowBinaryWithNamesAndTypes #10

Open
loyd opened this issue Apr 20, 2021 · 4 comments
Open

Use RowBinaryWithNamesAndTypes #10

loyd opened this issue Apr 20, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@loyd
Copy link
Collaborator

loyd commented Apr 20, 2021

Use RowBinaryWithNamesAndTypes instead of RowBinary as a primary format in order to support deserialize_any and, hence, serde(flatten) and type conversions & validation.

@loyd loyd added the enhancement New feature or request label Jul 23, 2021
@makorne
Copy link

makorne commented Oct 1, 2021

Hi! Any progress with this?
May be you need a beta-tester? :)

Looks like a very important enhancement.
Sometimes I get NotEnoughData on a query that works with another time bind argument, and works in clickhouse-client without any issues. And not a hint why.

@wspeirs
Copy link
Contributor

wspeirs commented Oct 18, 2023

@loyd do you have a branch for the RowBinaryWithNamesAndTypes work you can share? Happy to help contribute, but also don't want to start-from-scratch if you're already close :-)

wspeirs added a commit to wspeirs/clickhouse.rs that referenced this issue Oct 18, 2023
**SUPER HACKY**
* Allows for fetching cell-by-cell, with a different type for each
* This is a bit of a hack, but can be a solution for ClickHouse#10
@wspeirs
Copy link
Contributor

wspeirs commented Oct 18, 2023

@loyd this is a super-hack (but min number of changes) that can get us cell-by-cell fetching of data. I see a few issues with this approach (besides the hackiness, but some of it can be cleaned up):

  • The ?fields marker for the SELECT statement is no longer supported because there is no type to get the fields from. I don't think this is a hug deal, but open to feedback.
  • If you attempt to get a cell by the wrong type, it could start reading data from the next cell, or return a "not enough data" error. I don't think this is a huge deal either, because the same thing happens if you specify the wrong type now.

Going with RowBinaryWithNamesAndTypes is going to be tough because we'd have to translate the string names to the actual types... not impossible, but annoying.

I welcome any/all feedback... thanks!

@loyd
Copy link
Collaborator Author

loyd commented Jan 27, 2024

Providing support for deserializing different rows into different types is the foot gun; CH always returns specific data shapes without any variation.

Moving to RowBinaryWithNamesAndTypes is an expected improvement not only for supporting deserialize_any, but, first of all, for validation purposes to prevent a schema violation (e.g., #100).

It's not a problem to replace RowBinary with RowBinaryWithNamesAndTypes, but we need to decide in which cases is conversion is allowed and when an error should rise.

Support for SELECTs and INSERTs is different. During INSERT, the conversion is performed on the CH side. Moreover, the behavior is not specified directly, only

If setting input_format_with_types_use_header is set to 1, the types from input data will be compared with the types of the corresponding columns from the table. Otherwise, the second row will be skipped.

So, I need to check the actual behavior (and maybe check other libraries).

On the SELECTs, the conversion is performed on the client side and we have more control over it.

Also, the behavior should remain same in case of moving to TCP+Native.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants