Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Lightning import error handling #13376

Merged
merged 7 commits into from
Apr 28, 2023
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions tidb-lightning/tidb-lightning-distributed-import.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,9 +185,21 @@ If one or more TiDB Lightning nodes exit abnormally during a parallel import, id

- If the error shows normal exit (for example, exit in response to a kill command) or termination by the operating system due to OOM, adjust the configuration and then restart the TiDB Lightning nodes.

- If the error has no impact on data accuracy, for example, network timeout, run `checkpoint-error-ignore` by using tidb-lightning-ctl on all failed nodes to clean errors in the checkpoint source data. Then restart these nodes to continue importing data from checkpoints. For details, see [checkpoint-error-ignore](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-error-ignore).
- If the error has no impact on data accuracy, for example, network timeout, perform the following steps:

- If the log reports errors resulting in data inaccuracy, for example, checksum mismatched, which indicates invalid data in the source file, run `checkpoint-error-destroy` by using tidb-lightning-ctl on all failed nodes to clean data imported to the failed tables as well as the checkpoint source data. For details, see [checkpoint-error-destroy](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-error-destroy). This command removes the data imported to the failed tables downstream. Therefore, you need to re-configure and import the data of the failed tables on all TiDB Lightning nodes (including those that exit normally) by using the `filters` parameter.
1. Run the [`checkpoint-error-ignore`](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-error-ignore) command on all failed nodes to clean errors in the checkpoint source data.
lilin90 marked this conversation as resolved.
Show resolved Hide resolved

2. Restart these nodes to continue importing data from checkpoints.

- If you see errors in the log that result in data inaccuracies, such as a checksum mismatch indicating invalid data in the source file, you can perform the following steps to resolve this issue:

1. Run the [`checkpoint-error-destroy`](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-error-destroy) command on all Lightning nodes, including successful nodes, to clean up the imported data and the checkpoint source data.
lilin90 marked this conversation as resolved.
Show resolved Hide resolved

This command removes the data imported to the failed tables downstream, the corresponding checkpoint, and the Meta table information of multiple parallel import tasks.
lilin90 marked this conversation as resolved.
Show resolved Hide resolved

2. Reconfigure and import the data of failed tables by using the [`filter`](/table-filter.md) parameter on all TiDB Lightning nodes, including normally exiting nodes.

When you reconfigure the Lightning parallel import task, do not include the `checkpoint-error-destroy` command in the startup script of each Lightning node. Otherwise, the Meta table corresponding to the parallel import task is deleted multiple times, causing issues when the newly created Lightning parallel import task tries to import data.
lilin90 marked this conversation as resolved.
Show resolved Hide resolved

### During an import, an error "Target table is calculating checksum. Please wait until the checksum is finished and try again" is reported

Expand Down