-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sharding and replication guide #34
Conversation
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
sharding-and-replication.md
Outdated
The "local" tables have to be created on the nodes before the distributed table. | ||
|
||
```sql | ||
CREATE TABLE jaeger_spans_global AS jaeger_spans ENGINE = Distributed(sharded, default, jaeger_spans, rand()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to use traceID
as a sharding key, but it is string
and that cannot be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, you can use murmurHash3_64(traceID)
or cityHash64(traceID)
as sharding key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on your experience with CH, is it good to use sharding by traceID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use rand()
as the sharding key in production but cityHash64()
is pretty fast too. We can do some quick testing to see how each performs for our usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have booked a ticket for it https://github.com/pavolloffay/jaeger-clickhouse/issues/35
Some questions: what is the ideal number of shards and replicas? For replicas I assume it is at least 3? |
We run with 3 replica and as we need to expand the cluster we add more shards. One thing to note is clickhouse doesn't have any inbuilt data balancing feature i.e. once a data is written to a node, it will stay there throughout the lifetime of that data unless the operator moves the data manually, so it's good to do capacity planning in the beginning of cluster provisioning. |
CREATE TABLE jaeger_spans_global ON CLUSTER sharded AS jaeger_spans ENGINE = Distributed(sharded, default, jaeger_spans, rand()); | ||
CREATE TABLE jaeger_index_global ON CLUSTER sharded AS jaeger_index ENGINE = Distributed(sharded, default, jaeger_index, rand()); | ||
CREATE TABLE jaeger_operations_global on CLUSTER sharded AS jaeger_operations ENGINE = Distributed(sharded, default, jaeger_operations, rand()); | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's more universal to use '{cluster}' instead of particular name;
- Add "IF NOT EXISTS" to creation of global tables;
- Add ';' after creation of jaeger_operations.
CREATE TABLE IF NOT EXISTS jaeger_spans ON CLUSTER '{cluster}' (
timestamp DateTime CODEC(Delta, ZSTD(1)),
traceID String CODEC(ZSTD(1)),
model String CODEC(ZSTD(3))
) ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/jaeger_spans', '{replica}')
PARTITION BY toDate(timestamp)
ORDER BY traceID
SETTINGS index_granularity=1024;
CREATE TABLE IF NOT EXISTS jaeger_index ON CLUSTER '{cluster}' (
timestamp DateTime CODEC(Delta, ZSTD(1)),
traceID String CODEC(ZSTD(1)),
service LowCardinality(String) CODEC(ZSTD(1)),
operation LowCardinality(String) CODEC(ZSTD(1)),
durationUs UInt64 CODEC(ZSTD(1)),
tags Array(String) CODEC(ZSTD(1)),
INDEX idx_tags tags TYPE bloom_filter(0.01) GRANULARITY 64,
INDEX idx_duration durationUs TYPE minmax GRANULARITY 1
) ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/jaeger_index', '{replica}')
PARTITION BY toDate(timestamp)
ORDER BY (service, -toUnixTimestamp(timestamp))
SETTINGS index_granularity=1024;
CREATE MATERIALIZED VIEW IF NOT EXISTS jaeger_operations ON CLUSTER '{cluster}'
ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/jaeger_operations', '{replica}')
PARTITION BY toYYYYMM(date) ORDER BY (date, service, operation)
SETTINGS index_granularity=32
POPULATE
AS SELECT
toDate(timestamp) AS date,
service,
operation,
count() as count
FROM jaeger_index
GROUP BY date, service, operation;
CREATE TABLE IF NOT EXISTS jaeger_spans_global ON CLUSTER '{cluster}' AS jaeger_spans
ENGINE = Distributed('{cluster}', default, jaeger_spans, rand());
CREATE TABLE IF NOT EXISTS jaeger_index_global ON CLUSTER '{cluster}' AS jaeger_index
ENGINE = Distributed('{cluster}', default, jaeger_index, rand());
CREATE TABLE IF NOT EXISTS jaeger_operations_global on CLUSTER '{cluster}' AS jaeger_operations
ENGINE = Distributed('{cluster}', default, jaeger_operations, rand());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more universal to use '{cluster}' instead of particular name;
I like it I will test if the substitution works.
@EinKrebs regarding the naming of tables. Most of the tutorials that I have seen use |
No objections, I think it's good |
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
timestamp DateTime CODEC(Delta, ZSTD(1)), | ||
traceID String CODEC(ZSTD(1)), | ||
model String CODEC(ZSTD(3)) | ||
) ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/jaeger_spans', '{replica}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it might make sense to omit ReplicatedMergeTree parameters at all, as per email thread discussion: Atomic DB engine will choose a path automatically so there wont be any conflicts when cluster nodes are re-created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a guide how to setup sharding and replication for Jaeger data. | ||
This guide uses [clickhouse-operator](https://github.com/Altinity/clickhouse-operator) to deploy | ||
the storage. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest putting a recommendation to use latest LTS Clickhouse version (21.3 currently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New is always better :) Are there any particular features that we might use?
) ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/jaeger_index', '{replica}') | ||
PARTITION BY toDate(timestamp) | ||
ORDER BY (service, -toUnixTimestamp(timestamp)) | ||
SETTINGS index_granularity=1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful for some to put a cap on the table size with TTL, like this:
...
ORDER BY (service, -toUnixTimestamp(timestamp))
TTL toDate(timestamp) + INTERVAL 2 MONTH DELETE
SETTINGS ttl_only_drop_parts=1 ...;
ttl_only_drop_parts
would prevent scheduling any merges to perform a ttl cleanup and would cause the part to be just dropped when the last record in it expires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have open ticket for setting TTL https://github.com/pavolloffay/jaeger-clickhouse/issues/43
Signed-off-by: Pavol Loffay [email protected]
Resolves #28