Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: add dumpling, a data exporting tool #123

Merged
merged 2 commits into from
Jan 30, 2020
Merged

Conversation

kennytm
Copy link
Contributor

@kennytm kennytm commented Dec 6, 2019

This RFC is written for the PingCAP Special Week 2019 Q4 ("Tools Matter") item "MySQL Full-Export Tool Dumpling (replacing Mydumper), Integrating to DM". Tracking issue is #122.

🖼 Rendered

@kennytm kennytm requested a review from a team December 6, 2019 15:03
@ghost ghost requested review from Soline324 and winkyao and removed request for a team December 6, 2019 15:03
### Name

The initial motivation of this tool is to supplement Lightning.
We call the new tool "Dumpling" as a portmanteau of "dump" + "Lightning".
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TiDumpling might help with both searchability and understanding. There are already open source projects named dumpling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard 🙃. We could also specify the official name as TiDB Dumpling (like TiDB Lightning and TiDB Binlog).


### Programming language

We'd like to embed Dumpling into TiDB (as an `EXPORT` statement) and DM
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the new physical backup is released, it seems that the only use case for using this against TiDB is to export data to another database (MySQL). In the case of an immediate restore into MySQL the export from TiDB probably won't add convenience because one already has to run an external tool to load the data into MySQL.

The TiDB that runs this might essentially need to be considered offline if the backup process is using up all its resources. So then the value proposition would then be that it is easier to deploy an additional TiDB than to deploy a new tool. This will only be the case if the resource requirements of backup are the same as TiDB. If the resource requirements are bigger, then this won't work. If the resource requirements are smaller and the TiDB node will still serve requests, wrapping as subprocess could still be a good idea to better isolate the backup workload.
Unintelligent load balancing between TiDB could easily lead to the TiDB doing backup to get over-worked.

In contrast, the new physical BR tool ran from inside TiDB would only perform meta operations from TiDB with most of the actual backup work done from TiKV: this should leave most resources still available for the TiDB node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph explains why we choose Go not other languages. Even if we don't want EXPORT, we still need integration with DM.

And given that we're going to have IMPORT with Lightning, it is natural to support EXPORT as well.

The EXPORT statement is not meant to replace BR. BR will be given their own BACKUP and RESTORE statements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import with lightning will have some similarly deployment issues since it will use a great deal of CPU. However one of the main use cases is to import when a cluster is first created and no useful queries can be run untill import is complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. The IMPORT and EXPORT statements allow DBAs to manage logical backups via the SQL interface for familiarity. The individual executables are still available though.

Anyway these are getting off-topic.

Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@winkyao winkyao added the status/LGT1 Indicates that a PR has LGTM 1. label Jan 30, 2020
Copy link
Contributor

@IANTHEREAL IANTHEREAL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@IANTHEREAL IANTHEREAL merged commit 68432bf into pingcap:master Jan 30, 2020
@kennytm kennytm deleted the dumpling branch January 30, 2020 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants