Add sharding and replication guide #34

pavolloffay · 2021-07-26T16:28:54Z

Signed-off-by: Pavol Loffay [email protected]

Resolves #28

Signed-off-by: Pavol Loffay <[email protected]>

pavolloffay · 2021-07-26T16:56:03Z

sharding-and-replication.md

+The "local" tables have to be created on the nodes before the distributed table.
+
+```sql
+CREATE TABLE jaeger_spans_global AS jaeger_spans ENGINE = Distributed(sharded, default, jaeger_spans, rand());


It would be good to use traceID as a sharding key, but it is string and that cannot be used.

In that case, you can use murmurHash3_64(traceID) or cityHash64(traceID) as sharding key.

based on your experience with CH, is it good to use sharding by traceID?

We use rand()as the sharding key in production but cityHash64() is pretty fast too. We can do some quick testing to see how each performs for our usecase.

I have booked a ticket for it https://github.com/pavolloffay/jaeger-clickhouse/issues/35

pavolloffay · 2021-07-26T16:57:07Z

Some questions: what is the ideal number of shards and replicas? For replicas I assume it is at least 3?

chhetripradeep · 2021-07-26T17:15:37Z

We run with 3 replica and as we need to expand the cluster we add more shards. One thing to note is clickhouse doesn't have any inbuilt data balancing feature i.e. once a data is written to a node, it will stay there throughout the lifetime of that data unless the operator moves the data manually, so it's good to do capacity planning in the beginning of cluster provisioning.

EinKrebs · 2021-07-27T06:40:51Z

sharding-and-replication.md

+CREATE TABLE jaeger_spans_global ON CLUSTER sharded AS jaeger_spans ENGINE = Distributed(sharded, default, jaeger_spans, rand());
+CREATE TABLE jaeger_index_global ON CLUSTER sharded AS jaeger_index ENGINE = Distributed(sharded, default, jaeger_index, rand());
+CREATE TABLE jaeger_operations_global on CLUSTER sharded AS jaeger_operations ENGINE = Distributed(sharded, default, jaeger_operations, rand());
+```


It's more universal to use '{cluster}' instead of particular name;

Add "IF NOT EXISTS" to creation of global tables;

Add ';' after creation of jaeger_operations.

CREATE TABLE IF NOT EXISTS jaeger_spans ON CLUSTER '{cluster}' ( timestamp DateTime CODEC(Delta, ZSTD(1)), traceID String CODEC(ZSTD(1)), model String CODEC(ZSTD(3)) ) ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/jaeger_spans', '{replica}') PARTITION BY toDate(timestamp) ORDER BY traceID SETTINGS index_granularity=1024; CREATE TABLE IF NOT EXISTS jaeger_index ON CLUSTER '{cluster}' ( timestamp DateTime CODEC(Delta, ZSTD(1)), traceID String CODEC(ZSTD(1)), service LowCardinality(String) CODEC(ZSTD(1)), operation LowCardinality(String) CODEC(ZSTD(1)), durationUs UInt64 CODEC(ZSTD(1)), tags Array(String) CODEC(ZSTD(1)), INDEX idx_tags tags TYPE bloom_filter(0.01) GRANULARITY 64, INDEX idx_duration durationUs TYPE minmax GRANULARITY 1 ) ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/jaeger_index', '{replica}') PARTITION BY toDate(timestamp) ORDER BY (service, -toUnixTimestamp(timestamp)) SETTINGS index_granularity=1024; CREATE MATERIALIZED VIEW IF NOT EXISTS jaeger_operations ON CLUSTER '{cluster}' ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/jaeger_operations', '{replica}') PARTITION BY toYYYYMM(date) ORDER BY (date, service, operation) SETTINGS index_granularity=32 POPULATE AS SELECT toDate(timestamp) AS date, service, operation, count() as count FROM jaeger_index GROUP BY date, service, operation; CREATE TABLE IF NOT EXISTS jaeger_spans_global ON CLUSTER '{cluster}' AS jaeger_spans ENGINE = Distributed('{cluster}', default, jaeger_spans, rand()); CREATE TABLE IF NOT EXISTS jaeger_index_global ON CLUSTER '{cluster}' AS jaeger_index ENGINE = Distributed('{cluster}', default, jaeger_index, rand()); CREATE TABLE IF NOT EXISTS jaeger_operations_global on CLUSTER '{cluster}' AS jaeger_operations ENGINE = Distributed('{cluster}', default, jaeger_operations, rand());

It's more universal to use '{cluster}' instead of particular name;

I like it I will test if the substitution works.

pavolloffay · 2021-07-27T08:17:55Z

@EinKrebs regarding the naming of tables. Most of the tutorials that I have seen use table_name_local for local tables and table_name for the distributed tables. if no objections I will rename to follow this schema.

EinKrebs · 2021-07-27T08:21:50Z

@EinKrebs regarding the naming of tables. Most of the tutorials that I have seen use table_name_local for local tables and table_name for the distributed tables. if no objections I will rename to follow this schema.

No objections, I think it's good

Signed-off-by: Pavol Loffay <[email protected]>

kuzpactor · 2021-07-28T09:23:52Z

sharding-and-replication.md

+                                                                timestamp DateTime CODEC(Delta, ZSTD(1)),
+    traceID String CODEC(ZSTD(1)),
+    model String CODEC(ZSTD(3))
+    ) ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/jaeger_spans', '{replica}')


Perhaps it might make sense to omit ReplicatedMergeTree parameters at all, as per email thread discussion: Atomic DB engine will choose a path automatically so there wont be any conflicts when cluster nodes are re-created.

Done here https://github.com/pavolloffay/jaeger-clickhouse/pull/49

kuzpactor · 2021-07-28T09:26:43Z

sharding-and-replication.md

+This is a guide how to setup sharding and replication for Jaeger data.
+This guide uses [clickhouse-operator](https://github.com/Altinity/clickhouse-operator) to deploy
+the storage.
+


I would suggest putting a recommendation to use latest LTS Clickhouse version (21.3 currently).

New is always better :) Are there any particular features that we might use?

kuzpactor · 2021-07-28T09:37:47Z

sharding-and-replication.md

+    ) ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/jaeger_index', '{replica}')
+    PARTITION BY toDate(timestamp)
+    ORDER BY (service, -toUnixTimestamp(timestamp))
+    SETTINGS index_granularity=1024;


It might be useful for some to put a cap on the table size with TTL, like this:

... ORDER BY (service, -toUnixTimestamp(timestamp)) TTL toDate(timestamp) + INTERVAL 2 MONTH DELETE SETTINGS ttl_only_drop_parts=1 ...;

ttl_only_drop_parts would prevent scheduling any merges to perform a ttl cleanup and would cause the part to be just dropped when the last record in it expires.

We have open ticket for setting TTL https://github.com/pavolloffay/jaeger-clickhouse/issues/43

pavolloffay added 4 commits July 26, 2021 18:26

Add sharding and replication guide

69347b0

Signed-off-by: Pavol Loffay <[email protected]>

Revert

5a40271

Signed-off-by: Pavol Loffay <[email protected]>

typos

ce233f0

Signed-off-by: Pavol Loffay <[email protected]>

typo

bdd6f58

Signed-off-by: Pavol Loffay <[email protected]>

pavolloffay commented Jul 26, 2021

View reviewed changes

EinKrebs requested changes Jul 27, 2021

View reviewed changes

pavolloffay added 4 commits July 27, 2021 10:30

Rename to _local

767b7ba

Signed-off-by: Pavol Loffay <[email protected]>

use if not exists

e24b660

Signed-off-by: Pavol Loffay <[email protected]>

make kubectl easier to use

7a401be

Signed-off-by: Pavol Loffay <[email protected]>

Some fixes

0a33528

Signed-off-by: Pavol Loffay <[email protected]>

pavolloffay mentioned this pull request Jul 27, 2021

Capacity planning #36

Open

some fixes

6856cbc

Signed-off-by: Pavol Loffay <[email protected]>

pavolloffay merged commit d67bbb6 into main Jul 27, 2021

pavolloffay deleted the sharding-replication branch July 27, 2021 10:29

kuzpactor reviewed Jul 28, 2021

View reviewed changes

pavolloffay mentioned this pull request Jul 28, 2021

Document and add support for deleting data/TTL #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sharding and replication guide #34

Add sharding and replication guide #34

pavolloffay commented Jul 26, 2021

pavolloffay Jul 26, 2021

chhetripradeep Jul 26, 2021 •

edited

Loading

pavolloffay Jul 26, 2021

chhetripradeep Jul 27, 2021 •

edited

Loading

pavolloffay Jul 27, 2021

pavolloffay commented Jul 26, 2021

chhetripradeep commented Jul 26, 2021

EinKrebs Jul 27, 2021

pavolloffay Jul 27, 2021

pavolloffay commented Jul 27, 2021

EinKrebs commented Jul 27, 2021

kuzpactor Jul 28, 2021

pavolloffay Jul 28, 2021

kuzpactor Jul 28, 2021

pavolloffay Jul 28, 2021

kuzpactor Jul 28, 2021

pavolloffay Jul 28, 2021

Add sharding and replication guide #34

Add sharding and replication guide #34

Conversation

pavolloffay commented Jul 26, 2021

Choose a reason for hiding this comment

chhetripradeep Jul 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chhetripradeep Jul 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavolloffay commented Jul 26, 2021

chhetripradeep commented Jul 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavolloffay commented Jul 27, 2021

EinKrebs commented Jul 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chhetripradeep Jul 26, 2021 •

edited

Loading

chhetripradeep Jul 27, 2021 •

edited

Loading