-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lightning: support inject external storage when as library #33303
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Code Coverage Details: https://codecov.io/github/pingcap/tidb/commit/5ac26f09ba5888a07b2345d12451172bc8a5c6b5 |
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
br/pkg/lightning/run_options.go
Outdated
type options struct { | ||
glue glue.Glue | ||
externalStorage storage.ExternalStorage | ||
cpNameInExtStorage string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just checkpointName
is better?
br/pkg/lightning/lightning.go
Outdated
return common.NormalizeError(errors.New("WithExternalStorage and WithGlue can't be both set")) | ||
} | ||
if o.cpNameInExtStorage != "" && o.glue != nil { | ||
return common.NormalizeError(errors.New("WithCpNameInExtStorage and WithGlue can't be both set")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common.ErrInvalidArgument
? It's not meaningful to call NormalizeError
for an explicit error.
br/pkg/lightning/restore/restore.go
Outdated
// lightning via SQL will implement its glue, to let lightning use host TiDB's environment | ||
Glue glue.Glue | ||
// when not OwnExtStorage, checkpoint can also be saved in framework-created ExtStorage by setting this field | ||
CpNameInExtStorage string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lightning assumes all files in external storage are source data. I think it's not a good way to put checkpoint file in it, although we can use file route to ignore specific file. I suggest using a separate external storage to store checkpoint. If you really want to use the same external storage, you can pass the same object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already the behaviour of lightning in DM pingcap/tiflow#3813
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does DM works without this PR before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can search https://github.com/pingcap/tiflow/pull/3813/files
cfg.Checkpoint.Driver = lcfg.CheckpointDriverFile
cpPath := filepath.Join(l.cfg.LoaderConfig.Dir, lightningCheckpointFileName)
cfg.Checkpoint.DSN = cpPath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it just set checkpoint path inside the source dir, but it doesn't limit it setting to other path.
Using the external storage for source data and checkpoint has potential risks that the checkpoint file may be treated as a source file to import. So I think we shouldn't put checkpoint in the same external storage. But if DM still want to put them together, two external storage can be created. e.g source dir: file:///dump
checkpoint: file:///dump/checkpoint
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most of concerns are about DM/dataflow engine should be more careful to build a config/environment for lightning when storing file checkpoint in dump folder, not related to lightning's inner logic. For example,
the checkpoint file may be treated as a source file to import.
In DM or dataflow engine's case, dumpling is used to create dump files, so we may limit the dump file matching pattern with this apriori knowledge to avoid bugs.
But if we actually store them at two place, (which requires two different arguments for two storage parameters), lightning has to handle the case that only one of them exists. And risk is higher since it's rely on two storages are available rather than one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lightning has to handle the case that only one of them exists.
Maybe we can let the two storages required and disallow one of them is nil. Or create checkpointsDB
out of NewRestoreController
and pass checkpointsDB
to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lightning has to handle the case that only one of them exists.
Maybe we can let the two storages required and disallow one of them is nil. Or create
checkpointsDB
out ofNewRestoreController
and passcheckpointsDB
to it.
Oh I mean files on one of two storages are cleaned. Can lightning report error when only checkpoint exists? And lost checkpoint will cause some import restarted, which will not happen if we store them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not. I think these are two concerns.
- For Lightning, I don't hope checkpoint can only store in dump dir's external storage.
- For DM, If you want to avoid "one of them loss", you can pass the same external storage object to both dump dir and checkpoint path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll add two storage parameters. Lightning as binary can still use old config Checkpoint.DSN
to specify the file checkpoint location.
Signed-off-by: lance6716 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no other divergence besides this checkpoint divergence, although dm use the same dir to save chepoint and data files.
how about checkpoint will use self storage only when there has a dsn
configuration?
For dataflow engine's case, it will not expose S3 URI to lightning, so this will not happen. For DM's case the logic is unchanged as before. |
Signed-off-by: lance6716 <[email protected]>
8063bd8
to
a7011e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
br/pkg/lightning/lightning.go
Outdated
} | ||
if o.checkpointStorage != nil && o.glue != nil { | ||
return common.ErrInvalidArgument.GenWithStack("WithCheckpointStorage and WithGlue can't be both set") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain why storage and glue can't be both set?
Signed-off-by: lance6716 <[email protected]>
br/pkg/lightning/lightning.go
Outdated
|
||
// pre-check about options | ||
// glue should be set when lightning in TiDB, and storages should be set when lightning in DM/dataflow engine, | ||
// so they should not both be set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems glue and storage can both set technically. If we implement Lightning on SQL, TiDB also use Lightning as a library. At that time, TiDB may both set glue and storage.
Besides, Lightning is an independent component or library. All its interfaces should consider generic requirements not the one who depend on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For lightning on SQL, it's not a requirement for TiDB to open an external storage. So if we emplace this restriction developers will use the lightning configuration to set loader dir and checkpoint DSN, which will reuse more logic about binary lightning and library lightning. In other words, my WithXXX options only want to make least changes to API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like that lightning binary and library use the same implementation to create the RestoreController
later (This can be improved later). WithXXX options indeed doesn't introduce much changes, but it is more like a hack that is similar to failpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the same implementation to create the RestoreController.
- For binary, parse config file, create external storage and pass to
RestoreController
. - For library, pass external storage to
RestoreController
directly.
Signed-off-by: lance6716 <[email protected]>
ping @WizardXiao |
/cc @kennytm |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 1a11964
|
/run-unit-test |
/run-unit-tests |
/run-mysql-tests |
/merge |
/run-mysql-tests |
@lance6716: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
br/cmd/tidb-lightning/main.go
Outdated
@@ -22,12 +22,13 @@ import ( | |||
"runtime/debug" | |||
"syscall" | |||
|
|||
"go.uber.org/zap" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this empty line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: lance6716 <[email protected]>
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 5ac26f0
|
/merge |
/hold |
/unhold |
/run-unit-tests |
1 similar comment
/run-unit-tests |
What problem does this PR solve?
Issue Number: ref #33281
Problem Summary:
What is changed and how it works?
as title
Check List
Tests
Side effects
Documentation
Release note