Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidb-lightning: rename tables and databases #15440

Merged
merged 21 commits into from
Nov 24, 2023
Merged
Changes from 3 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
a4ba2d5
Update tidb-lightning-data-source.md
pepezzzz Nov 15, 2023
d04427c
Update tidb-lightning-data-source.md
pepezzzz Nov 17, 2023
ec8e7d8
Update tidb-lightning-data-source.md
pepezzzz Nov 20, 2023
497e17d
Update tidb-lightning-data-source.md
hfxsd Nov 21, 2023
2f30d71
added toml
hfxsd Nov 21, 2023
073fb98
Update tidb-lightning-data-source.md
hfxsd Nov 21, 2023
80053cc
Update tidb-lightning-data-source.md
hfxsd Nov 21, 2023
3694831
fix wrong regular expression
hfxsd Nov 22, 2023
624506c
Apply suggestions from code review
hfxsd Nov 22, 2023
3b5beb9
Apply suggestions from code review
hfxsd Nov 23, 2023
443c467
Update tidb-lightning/tidb-lightning-data-source.md
pepezzzz Nov 24, 2023
608a1dd
Update tidb-lightning/tidb-lightning-data-source.md
pepezzzz Nov 24, 2023
7fe32a5
Update tidb-lightning/tidb-lightning-data-source.md
pepezzzz Nov 24, 2023
3f1e1e4
Update tidb-lightning/tidb-lightning-data-source.md
pepezzzz Nov 24, 2023
0631bf2
Update tidb-lightning/tidb-lightning-data-source.md
pepezzzz Nov 24, 2023
781ec30
Update tidb-lightning/tidb-lightning-data-source.md
hfxsd Nov 24, 2023
1f78c65
Update tidb-lightning/tidb-lightning-data-source.md
hfxsd Nov 24, 2023
46c7c41
Update tidb-lightning/tidb-lightning-data-source.md
pepezzzz Nov 24, 2023
3d2bf75
Update tidb-lightning/tidb-lightning-data-source.md
pepezzzz Nov 24, 2023
0c28239
Update tidb-lightning/tidb-lightning-data-source.md
pepezzzz Nov 24, 2023
6457c5e
Update tidb-lightning/tidb-lightning-data-source.md
hfxsd Nov 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions tidb-lightning/tidb-lightning-data-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,73 @@ TiDB Lightning 运行时将查找 `data-source-dir` 中所有符合命令规则

TiDB Lightning 尽量并行处理数据,由于文件必须顺序读取,所以数据处理协程是文件级别的并发(通过 `region-concurrency` 配置控制)。因此导入大文件时性能比较差。通常建议单个文件尺寸为 256MiB,以获得最好的性能。

## 表库重命名

TiDB Lightning 运行时按照文件的命令规则导入到对应的表库位置。如果数据库和表名的位置发生变化,可以使用重命名文件后导入或者使用正则表达式在线对象名称替换的方式导入。

### 文件批量重命名

RedHat-Like Linux 可以使用以下的 `rename` 命令对 `data-source-dir` 目录下的文件进行批量重命名。
pepezzzz marked this conversation as resolved.
Show resolved Hide resolved

```bash
rename srcdb. tgtdb. *.sql
```

文件中的数据库名修改后,推荐删除 `data-source-dir` 目录下包含 `CREATE DATABASE` DDL 语句的 schema-create.sql 文件。如果是表名进行修改,还需要修改包含表名的 `CREATE TABLE` DDL 语句的 schema.sql 文件。

### 正则表达式在线名称替换

在 [[mydumper.files]] 内使用 `pattern` 匹配文件名,将 `schema` 和 `table` 换成目标名。请参考[自定义文件匹配](/tidb-lightning/tidb-lightning-data-source.md#自定义文件匹配)。
数据文件 `pattern` 的匹配规则是 '^({schema_regrex})\.({table_regrex})\.({file_serial_regrex})\.(csv|parquet|sql)'
`schema` 可以指定为 '$1' 代表第一个正则表达式 schema_regrex 取值不变,或者是一个字符串,如 'tgtdb',代表固定的目标数据库。
`table` 可以指定为 '$2' 代表第二个正则表达式 table_regrex 取值不变,或者是一个字符串,如 't1',代表固定的目标表。
`type` 可以指定为 '$3' 代表是数据文件类型,"table-schema" 代表是 schema.sql 文件,或 "schema-schema" 代表是 schema-create.sql 文件。

```
[mydumper]
data-source-dir = "/some-subdir/some-database/"
[[mydumper.files]]
pattern = '^(srcdb)\.(.*?)-schema-create\.sql'
schema = 'tgtdb'
type = "schema-schema"
[[mydumper.files]]
pattern = '^(srcdb)\.(.*?)-schema\.sql'
schema = 'tgtdb'
table = '$2'
type = "table-schema"
[[mydumper.files]]
pattern = '^(srcdb)\.(.*?)\.(?:[0-9]+)\.(csv|parquet|sql)'
schema = 'tgtdb'
table = '$2'
type = '$3'
```

如果是使用 `gzip` 方式备份的数据文件,需要对应地配置压缩格式。
数据文件 `pattern` 的匹配规则是 '^({schema_regrex})\.({table_regrex})\.({file_serial_regrex})\.(csv|parquet|sql)\.(gz)'
`compression` 可以指定为 '$4' 代表是压缩文件格式。

```
[mydumper]
data-source-dir = "/some-subdir/some-database/"
[[mydumper.files]]
pattern = '^(srcdb)\.(.*?)-schema-create\.(sql)\.(gz)'
schema = 'tgtdb'
type = "schema-schema"
compression = '$4'
[[mydumper.files]]
pattern = '^(srcdb)\.(.*?)-schema\.(sql)\.(gz)'
schema = 'tgtdb'
table = '$2'
type = "table-schema"
compression = '$4'
[[mydumper.files]]
pattern = '^(srcdb)\.(.*?)\.(?:[0-9]+)\.(sql)\.(gz)'
schema = 'tgtdb'
table = '$2'
type = '$3'
compression = '$4'
```

## CSV

### 表结构
Expand Down