Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clickhouse-copier with slightly different schema of the target table #9159

Closed
vpanfilov opened this issue Feb 17, 2020 · 5 comments · Fixed by #23518
Closed

clickhouse-copier with slightly different schema of the target table #9159

vpanfilov opened this issue Feb 17, 2020 · 5 comments · Fixed by #23518
Assignees
Labels
question Question?

Comments

@vpanfilov
Copy link

Hello.
I am trying to use clickhouse-copier to migrate some tables to new Clickhouse cluster. But at the same time I want to alter some column types (for example, replace String with LowCardinality(String) . So I manually created all destination tables with new types and wrote a test task for clickhouse-copier.

But clickhouse-copier tasks are failing with error:

DB::Exception: Block structure mismatch in RemoteBlockOutputStream stream: different types:

while ordinary INSERT INTO db.table SELECT * FROM remote(...) WHERE ... queries are working fine.

Can I use clickhouse-copier with different (but compatible) column types on source and destination clusters?

Source cluster: Clickhouse 18.16.1
Destination cluster: Clickhouse 19.16.11.47

@nikitamikhaylov
Copy link
Member

Could you please provide us more informative stacktrace?
I think that you can't copy tables with different schema, because "custom" insert-select is implemented in this program. But this issue will be evaluated as a feature request 😉

@vpanfilov
Copy link
Author

vpanfilov commented Feb 18, 2020

@nikitamikhaylov Here is anonymized stacktrace and table DDL. I don't have any materialized or alias columns.

clickhouse-copier stack trace:

2020.02.17 16:50:31.645519 [ 1 ] {} <Debug> ClusterCopier: Processing /clickhouse/copytasks/test-task/tables/destination_cluster.<db>.<table>/20201223/shards/1
2020.02.17 16:50:31.646605 [ 1 ] {} <Debug> ClusterCopier: Partition 20201223 appears to be clean
2020.02.17 16:50:31.670795 [ 1 ] {} <Trace> InterpreterSelectQuery: WithMergeableState -> Complete
2020.02.17 16:50:31.670908 [ 1 ] {} <Trace> Aggregator: Reading blocks of partially aggregated data.
2020.02.17 16:50:31.673832 [ 1 ] {} <Trace> Aggregator: Read 4 blocks of partially aggregated data, total 4 rows.
2020.02.17 16:50:31.673877 [ 1 ] {} <Trace> Aggregator: Merging partially aggregated single-level data.
2020.02.17 16:50:31.673902 [ 1 ] {} <Trace> Aggregator: Merged partially aggregated single-level data.
2020.02.17 16:50:31.673927 [ 1 ] {} <Trace> Aggregator: Converting aggregated data to blocks
2020.02.17 16:50:31.673961 [ 1 ] {} <Trace> Aggregator: Converted aggregated data to blocks. 1 rows, 0.000 MiB in 0.000 sec. (158002.844 rows/sec., 1.205 MiB/sec.)
2020.02.17 16:50:31.673997 [ 1 ] {} <Trace> UnionBlockInputStream: Waiting for threads to finish
2020.02.17 16:50:31.674020 [ 1 ] {} <Trace> UnionBlockInputStream: Waited for threads to finish
2020.02.17 16:50:31.678025 [ 1 ] {} <Debug> ClusterCopier: Create destination tables. Query: CREATE TABLE IF NOT EXISTS <db>.<table> (`Column1` String, `Column2` UInt32, `Column3` Date, `Column4` DateTime, `Column5` UInt16, `Column6` String, `Column7` String, `Column8` String, `Column9` String, `Column10` String, `Column11` String, `Column12` Decimal(3, 1), `Column13` DateTime, `Column14` UInt16) ENGINE = MergeTree() PARTITION BY toYYYYMMDD(Column3) ORDER BY (Column9, Column1, Column2, Column3, Column4)
2020.02.17 16:50:31.679754 [ 1 ] {} <Debug> ClusterCopier: Destination tables <db>.<table> have been created on 4 shards of 4
2020.02.17 16:50:31.679947 [ 1 ] {} <Debug> ClusterCopier: Executing SELECT query and pull from N1 (having a replica 192.168.228.165:9000, pull table <db>.<table> of cluster source_cluster) : SELECT * FROM _local.`.read_shard_0.destination_cluster.<db>.<table>` WHERE toYYYYMMDD(Column3) = (20201223 AS partition_key)
2020.02.17 16:50:31.680011 [ 1 ] {} <Debug> ClusterCopier: Executing INSERT query: INSERT INTO _local.`.split.destination_cluster.<db>.<table>` VALUES
2020.02.17 16:50:31.680983 [ 1 ] {} <Trace> InterpreterSelectQuery: Complete -> Complete
2020.02.17 16:50:31.772476 [ 1 ] {} <Error> ClusterCopier: An error occurred during copying, partition will be marked as dirty: Code: 171, e.displayText() = DB::Exception: Block structure mismatch in RemoteBlockOutputStream stream: different types:
  Column1 String String(size = 3), Column2 UInt32 UInt32(size = 3), Column3 Date UInt16(size = 3), Column4 DateTime UInt32(size = 3), Column5 UInt16 UInt16(size = 3), Column6 String String(size = 3), Column7 String String(size = 3), Column8 String String(size = 3), Column9 String String(size = 3), Column10 String String(size = 3), Column11 String String(size = 3), Column12 Decimal(3, 1) Decimal32(size = 3), Column13 DateTime UInt32(size = 3), Column14 UInt16 UInt16(size = 3)
Column1 LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1))), Column2 UInt16 UInt16(size = 0), Column3 Date UInt16(size = 0), Column4 DateTime UInt32(size = 0), Column5 UInt16 UInt16(size = 0), Column6 LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1))), Column7 LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1))), Column8 LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1))), Column9 LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1))), Column10 String String(size = 0), Column11 LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1))), Column12 Decimal(3, 1) Decimal32(size = 0), Column13 DateTime UInt32(size = 0), Column14 UInt16 UInt16(size = 0): Insertion status:
  Wrote 0 blocks and 0 rows on shard 0 replica 0, <server1>:9000 (average 2 ms per block)
Wrote 0 blocks and 0 rows on shard 1 replica 0, <server2>:9000 (average 2 ms per block)
Wrote 0 blocks and 0 rows on shard 2 replica 0, <server3>:9000 (average 3 ms per block)
Wrote 0 blocks and 0 rows on shard 3 replica 0, <server4>:9000 (average 3 ms per block)
, Stack trace:
  
  0. 0x55ba5cf3f780 StackTrace::StackTrace() /usr/bin/clickhouse
1. 0x55ba5cf3f555 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
2. 0x55ba5cc30e40 ? /usr/bin/clickhouse
3. 0x55ba5cc31afb ? /usr/bin/clickhouse
4. 0x55ba608cb319 DB::RemoteBlockOutputStream::write(DB::Block const&) /usr/bin/clickhouse
5. 0x55ba60588fb9 ? /usr/bin/clickhouse
6. 0x55ba5cf8c11e ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::_List_iterator<ThreadFromGlobalPool>) /usr/bin/clickhouse
7. 0x55ba5cf8c72e ThreadFromGlobalPool::ThreadFromGlobalPool<ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#3}>(ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#3}&&)::{lambda()#1}::operator()() const /usr/bin/clickhouse
  8. 0x55ba5cf89bec ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>) /usr/bin/clickhouse
  9. 0x55ba62caca60 ? /usr/bin/clickhouse
  10. 0x7f89334124a4 start_thread /lib/x86_64-linux-gnu/libpthread-2.24.so
  11. 0x7f8932d48d0f clone /lib/x86_64-linux-gnu/libc-2.24.so
  (version 19.16.11.47 (official build))

Table definition on old cluster:

CREATE TABLE <db>.<table>
(
    Column1 String,
    Column2 UInt32,
    Column3 Date,
    Column4 DateTime,
    Column5 UInt16,
    Column6 String,
    Column7 String,
    Column8 String,
    Column9 String,
    Column10 String,
    Column11 String,
    Column12 Decimal(3, 1),
    Column13 DateTime,
    Column14 UInt16
)
ENGINE = MergeTree()
PARTITION BY (toYYYYMMDD(Column3), Column3)
PRIMARY KEY (Column1, Column2, Column3, Column4, Column6, Column7, Column8, Column9)
ORDER BY (Column1, Column2, Column3, Column4, Column6, Column7, Column8, Column9)
SETTINGS index_granularity = 8192

table definition on new cluster:

CREATE TABLE <db>.<table> (
  Column1 LowCardinality(String) CODEC(LZ4),
  Column2 UInt16 CODEC(LZ4),
  Column3 Date CODEC(DoubleDelta, LZ4),
  Column4 DateTime CODEC(DoubleDelta, LZ4),
  Column5 UInt16 CODEC(LZ4),
  Column6 LowCardinality(String) CODEC(ZSTD),
  Column7 LowCardinality(String) CODEC(ZSTD),
  Column8 LowCardinality(String) CODEC(ZSTD),
  Column9 LowCardinality(String) CODEC(ZSTD),
  Column10 String CODEC(ZSTD(6)),
  Column11 LowCardinality(String) CODEC(LZ4),
  Column12 Decimal(3,1) CODEC(LZ4),
  Column13 DateTime CODEC(DoubleDelta, LZ4),
  Column14 UInt16 CODEC(LZ4)
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(Column3)
ORDER BY (Column9, Column1, Column2, Column3, Column4);

@nikitamikhaylov
Copy link
Member

@vpanfilov Try to install package clickhouse-common-static-dbg to get normal stacktrace.

@vpanfilov
Copy link
Author

vpanfilov commented Feb 20, 2020

Sorry, but I can't provide more detailed stack trace. But I am pretty sure that the actual error (Block structure mismatch in RemoteBlockOutputStream stream: different types) is triggered here:

if (!actual.type->equals(*expected.type))
return on_error("Block structure mismatch in " + context_description + " stream: different types:\n"
+ lhs.dumpStructure() + "\n" + rhs.dumpStructure(), ErrorCodes::BLOCKS_HAVE_DIFFERENT_STRUCTURE);

Actual column structures are present in clickhouse-copier log above. I created a markdown table from it:

Column Name Type in old cluster (Clickhouse 18.16.1) Type in new cluster (Clickhouse 19.16.11.47)
Column1 String String(size = 3) LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1)))
Column2 UInt32 UInt32(size = 3) UInt16 UInt16(size = 0)
Column3 Date UInt16(size = 3) Date UInt16(size = 0)
Column4 DateTime UInt32(size = 3) DateTime UInt32(size = 0)
Column5 UInt16 UInt16(size = 3) UInt16 UInt16(size = 0)
Column6 String String(size = 3) LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1)))
Column7 String String(size = 3) LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1)))
Column8 String String(size = 3) LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1)))
Column9 String String(size = 3) LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1)))
Column10 String String(size = 3) String String(size = 0),
Column11 String String(size = 3) LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1)))
Column12 Decimal(3, 1) Decimal32(size = 3) Decimal(3, 1) Decimal32(size = 0)
Column13 DateTime UInt32(size = 3) DateTime UInt32(size = 0)
Column14 UInt16 UInt16(size = 3) UInt16 UInt16(size = 0)

@nikitamikhaylov
Copy link
Member

This will be fixed in #23518

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question?
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants