Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add helm chart for clickhouse-copier #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,3 +190,7 @@ Parameters:
You don't have to. Download the binaries from the [final release](releases/tag/final).

But if you want, use the following repository snapshot https://github.com/ClickHouse/ClickHouse/tree/1179a70c21eeca88410a012a73a49180cc5e5e2e and proceed with the normal ClickHouse build. The built `clickhouse` binary will contain the copier tool.

## Running clickhouse-copier in kubernetes

[README](./helm/clickhouse-copier/README.md)
23 changes: 23 additions & 0 deletions helm/clickhouse-copier/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
24 changes: 24 additions & 0 deletions helm/clickhouse-copier/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
apiVersion: v2
name: clickhouse-copier
description: A Helm chart for Kubernetes

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "1.16.0"
154 changes: 154 additions & 0 deletions helm/clickhouse-copier/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@

## Introduction

[Consider removal of clickhouse-copier](https://github.com/ClickHouse/ClickHouse/issues/60734)

Clickhouse-copier is a tool for copying data between clusters and one of the easy ways for resharding. For ease of use, you can run it not only on a local machine, but also on a KS cluster. Use a ready-made Helm Hart. Enter settings for unique values in settings from table directly into the configmap.

## Install

```console
helm install my-database -n my-namespace -f values.yaml .
```

## Uninstalling the Chart

To uninstall/delete the `my-release` deployment:

```console
helm install my-database -n my-namespace -f values.yaml .
```

The command removes all the Kubernetes components associated with the chart and deletes the release.

## Parameters

| Name | Description | Value |
| ------------------------ | --------------------------------------------------------------------------------------- | --------------- |
| `replicaCount` | the number of workers, depending on the power of your cluster and the number of shards. Selected individually. In practice, often the number of shards is equal to the number of workers, but more is possible | `"1"` |
| `host_replica_pull` | Replica names. It is understood that only unique shards are indicated (without replicas), for example, if there are 12 shards, then we register 12 shards (only one replica from the shard) | `"[]"` |
| `user_pull` | Username with rights to connect to the source cluster (Grant select) | `"user_select"` |
| `password_user_pull` | user password with rights to connect to the source cluster | `"password"` |
| `host_replica_push` | name of the destination cluster replicas. We also indicate only one replica from the shard | `[]` |
| `user_push` | Username with rights to connect to the destination cluster (Grant insert) | `{}` |
| `password_user_push` | user password with rights to connect to the destination cluster | `cluster.local` |
| `zookeeper_host` | Zookeeper host ip or dns name | `192.168.160.58` |
| `path_copier` | path for task in zookeeper|clickhouse-keepeer | `my_baza_part1` |
| `database_pull` | source base | `database` |
| `table_pull` | source table | `table` |
| `database_push` | destination base | `database` |
| `table_push` | destination table | `table` |
| `engine_table` | engine table and settings for create | `` |
| `sharding_key` | for sharding key it is better to use cityHash64(object_order_id), etc. | `` |
| `image.tag` | The image used to run the clickhouse-copier. From 24.3 will be absent, whether there will be a separate one is unknown | `` |


## My experience

On large clusters with multiple resharding, transferring small tables can take a long time when using resharding, so it is better to use this solution not for 12 shards moving to 40. But for data from one replica to several shards. Practice has also shown that using replicas in the configuration file brings additional problems. Use one replica at a time, and first create the tables yourself using clickhouse, your data is replicated where necessary.

Optionally use enable_partitions:
````
<sharding_key>rand()</sharding_key>
# <enabled_partitions>
# <partition>'2018-02-26'</partition>
# <partition>'2018-03-05'</partition>
# ...
# </enabled_partitions>
````

helm-chart is aimed at the fact that the server configuration has already been configured, you need to specify in values the path to task-zookeeper (there should be a separate one for each task), the names of databases and tables

The solution is simple, you need to fill out the configuration file. Install zookeeper and run copying tasks.

zookeeper can be used existing or installed separately using popular charts


### Additionally, consider the section configuration parameters in configmaps settings:

````
<settings>
<max_execution_time>10000000</max_execution_time>
<send_timeout>50000000</send_timeout>
<receive_timeout>5000000</receive_timeout>
<max_result_bytes>0</max_result_bytes>
<max_rows_to_read>0</max_rows_to_read>
<max_bytes_to_read>0</max_bytes_to_read>
<max_columns_to_read>1000</max_columns_to_read>
<max_temporary_columns>1000</max_temporary_columns>
<max_temporary_non_const_columns>1000</max_temporary_non_const_columns>
<min_execution_speed>0</min_execution_speed>
<max_result_rows>0</max_result_rows>
<max_result_bytes>0</max_result_bytes>
<connect_timeout>30</connect_timeout>
<insert_distributed_sync>1</insert_distributed_sync>
<allow_to_drop_target_partitions>1</allow_to_drop_target_partitions>
</settings>
````

### What is the maximum speed when using clickhouse-copier?

When using a small number of workers, the speed is within 400 Mbit. On the test bench, when using a configuration of 250 workers and 12 running processes (4 in each container), it was possible to achieve a speed of 1.67 Gbit/sec, but at the same time the delays for writing from disks reach 3-4 seconds - metric in prometheus ``irate(node_disk_io_time_weighted_seconds_total{instance=~"$instance"}[5m])``

### [insert_distributed_sync](https://github.com/ClickHouse/ClickHouse/issues/20053#issuecomment-773069116)
````
insert_distributed_sync = 1 has the following advantages:

data is available to SELECT from replica immediately;
if all replicas are unavailable, you get the error instead of buffering data on local filesystem;
better resource usage due to less data copying;
no risk of data loss when server with Distributed table will fail;
no buffering of data in local filesystem - no concerns on the size of buffered data, queue size and delays;


insert_distributed_sync = 1 has the following disadvantages:

when there are large number of shards (in order of 100), the latency of INSERT will increase significantly;
when there are large number of shards (in order of 100), synchronous INSERT will be less reliable, you will have more transient errors;
does not work if insert_quorum is set but insert_quorum_parallel is disabled;
it does not batch large number of small INSERTs - every INSERT goes to the shards immediately;
````

### Error SquashingTransform

We are checking the settings data from the two clusters; there may be a discrepancy on version 22.8 compared to 23.3. It is necessary to bring these settings into the same form. Or recreate the tables with block settings as in the original.

Check settings:

```
SELECT
name,
value
FROM system.settings
WHERE name LIKE '%block_size%'
```

Settings at the inventory_group level for a new cluster (this is information for admins):

```
clickhouse_merge_tree_config:
cleanup_delay_period: 300
enable_mixed_granularity_parts: 1
max_delay_to_insert: 2
merge_max_block_size: 65505
merge_selecting_sleep_ms: 600000
parts_to_delay_insert: 150
parts_to_throw_insert: 300
clickhouse_profiles_default:
default:
max_block_size: 65505
max_insert_block_size: 1.048545e+06
max_joined_block_size_rows: 65505
min_insert_block_size_bytes: 2.6842752e+08
min_insert_block_size_rows: 1.048545e+06
system:
max_block_size: 65505
max_insert_block_size: 1.048545e+06
max_joined_block_size_rows: 65505
min_insert_block_size_bytes: 2.6842752e+08
min_insert_block_size_rows: 1.048545e+06
clickhouse_version: 23.3.18.15
```



22 changes: 22 additions & 0 deletions helm/clickhouse-copier/templates/NOTES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
1. Get the application URL by running these commands:
{{- if .Values.ingress.enabled }}
{{- range $host := .Values.ingress.hosts }}
{{- range .paths }}
http{{ if $.Values.ingress.tls }}s{{ end }}://{{ $host.host }}{{ .path }}
{{- end }}
{{- end }}
{{- else if contains "NodePort" .Values.service.type }}
export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "clickhouse-copier.fullname" . }})
export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
{{- else if contains "LoadBalancer" .Values.service.type }}
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get --namespace {{ .Release.Namespace }} svc -w {{ include "clickhouse-copier.fullname" . }}'
export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "clickhouse-copier.fullname" . }} --template "{{"{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}"}}")
echo http://$SERVICE_IP:{{ .Values.service.port }}
{{- else if contains "ClusterIP" .Values.service.type }}
export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "clickhouse-copier.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}")
export CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
show logs stdout"
kubectl --namespace {{ .Release.Namespace }} port-forward $POD_NAME 8080:$CONTAINER_PORT
{{- end }}
62 changes: 62 additions & 0 deletions helm/clickhouse-copier/templates/_helpers.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{{/*
Expand the name of the chart.
*/}}
{{- define "clickhouse-copier.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "clickhouse-copier.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "clickhouse-copier.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Common labels
*/}}
{{- define "clickhouse-copier.labels" -}}
helm.sh/chart: {{ include "clickhouse-copier.chart" . }}
{{ include "clickhouse-copier.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{/*
Selector labels
*/}}
{{- define "clickhouse-copier.selectorLabels" -}}
app.kubernetes.io/name: {{ include "clickhouse-copier.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

{{/*
Create the name of the service account to use
*/}}
{{- define "clickhouse-copier.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "clickhouse-copier.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
86 changes: 86 additions & 0 deletions helm/clickhouse-copier/templates/configmap.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "clickhouse-copier.fullname" . }}-copier-config
labels:
{{- include "clickhouse-copier.labels" . | nindent 4 }}
data:
task01.xml: |
<clickhouse>
<logger>
<console>true</console>
<log remove="remove"/>
<errorlog remove="remove"/>
<level>trace</level>
</logger>
<remote_servers>
<source_cluster>
{{- $user_select := .Values.user_pull }}
{{- $password_user_select := .Values.password_user_pull }}
{{- range $replicas_pull := .Values.host_replica_pull }}
<shard>
<replica>
<host>{{ $replicas_pull }}</host>
<port>9000</port>
<user>{{ $user_select }}</user>
<password>{{ $password_user_select }}</password>
</replica>
</shard>
{{- end }}
</source_cluster>
<target_cluster>
<!-- <secret></secret> -->
{{- $user_insert := .Values.user_push }}
{{- $password_user_insert := .Values.password_user_push }}
{{- range $replicas_push := .Values.host_replica_push }}
<shard>
<replica>
<host>{{ $replicas_push }}</host>
<port>9000</port>
<user>{{ $user_insert }}</user>
<password>{{ $password_user_insert }}</password>
</replica>
</shard>
{{- end }}
</target_cluster>
</remote_servers>
<max_workers>100</max_workers>
<settings_pull>
<readonly>1</readonly>
</settings_pull>
<settings_push>
<readonly>0</readonly>
</settings_push>
<settings>
<connect_timeout>6</connect_timeout>
<distributed_foreground_insert>1</distributed_foreground_insert>
<allow_suspicious_codecs>1</allow_suspicious_codecs>
</settings>
<tables>
<table_test>
<cluster_pull>source_cluster</cluster_pull>
<database_pull>{{ .Values.database_pull }}</database_pull>
<table_pull>{{ .Values.table_pull }}</table_pull>
<cluster_push>target_cluster</cluster_push>
<database_push>{{ .Values.database_push }}</database_push>
<table_push>{{ .Values.table_push }}</table_push>
<engine>{{ .Values.engine_table | nindent 24 }}</engine>
<sharding_key>{{ .Values.sharding_key }}</sharding_key>
</table_test>
</tables>
</clickhouse>
zookeeper.xml: |
<clickhouse>
<logger>
<console>true</console>
<log remove="remove"/>
<errorlog remove="remove"/>
<level>trace</level>
</logger>
<zookeeper>
<node>
<host>{{ .Values.zookeeper_host }}</host>
<port>2181</port>
</node>
</zookeeper>
</clickhouse>
Loading