diff --git a/config.yaml b/config.yaml index 5dfa4482..954cdc6f 100644 --- a/config.yaml +++ b/config.yaml @@ -18,4 +18,10 @@ password: # Database name. Default is "default". database: # Endpoint for scraping prometheus metrics. Default localhost:9090 -metrics_endpoint: localhost:9090 \ No newline at end of file +metrics_endpoint: localhost:9090 +# Table with spans. Default "jaeger_spans_local". +spans_table: +# Span index table. Default "jaeger_index_local". +spans_index_table: +# Operations table. Default "jaeger_operations_local". +operations_table: diff --git a/sharding-and-replication.md b/sharding-and-replication.md new file mode 100644 index 00000000..2d373e0c --- /dev/null +++ b/sharding-and-replication.md @@ -0,0 +1,171 @@ +# Sharding and Replication + +This is a guide how to setup sharding and replication for Jaeger data. +This guide uses [clickhouse-operator](https://github.com/Altinity/clickhouse-operator) to deploy +the storage. + +## Sharding + +Sharding is a feature that allows splitting the data into multiple Clickhouse nodes to +increase throughput and decrease latency. +The sharding feature uses `Distributed` engine that is backed by local tables. +The distributed engine is a "virtual" table that does not store any data. It is used as +an interface to insert and query data. + +To setup sharding run the following statements on all nodes in the cluster. +The "local" tables have to be created on the nodes before the distributed table. + +```sql +CREATE TABLE IF NOT EXISTS jaeger_spans AS jaeger_spans_local ENGINE = Distributed('{cluster}', default, jaeger_spans_local, cityHash64(traceID)); +CREATE TABLE IF NOT EXISTS jaeger_index AS jaeger_index_local ENGINE = Distributed('{cluster}', default, jaeger_index_local, cityHash64(traceID)); +CREATE TABLE IF NOT EXISTS jaeger_operations AS jaeger_operations_local ENGINE = Distributed('{cluster}', default, jaeger_operations_local, rand()); +``` + +* The `AS ` statement creates table with the same schema as the specified one. +* The `Distributed` engine takes as parameters cluster , database, table name and sharding key. + +If the distributed table is not created on all Clickhouse nodes the Jaeger query fails to get the data from the storage. + +### Deploy Clickhouse + +Deploy Clickhouse with 2 shards: + +```yaml +cat <