Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: support inject external storage when as library #33303

Merged
merged 13 commits into from
Mar 25, 2022

Conversation

lance6716
Copy link
Contributor

@lance6716 lance6716 commented Mar 22, 2022

What problem does this PR solve?

Issue Number: ref #33281

Problem Summary:

What is changed and how it works?

as title

Check List

Tests

  • Integration test

Side effects

Documentation

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Mar 22, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • D3Hunter
  • gozssky

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 22, 2022
@sre-bot
Copy link
Contributor

sre-bot commented Mar 22, 2022

Signed-off-by: lance6716 <[email protected]>
@ti-chi-bot ti-chi-bot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 22, 2022
@lance6716 lance6716 changed the title [WIP]lightning: support inject external storage when as library lightning: support inject external storage when as library Mar 22, 2022
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Mar 22, 2022
Signed-off-by: lance6716 <[email protected]>
type options struct {
glue glue.Glue
externalStorage storage.ExternalStorage
cpNameInExtStorage string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just checkpointName is better?

return common.NormalizeError(errors.New("WithExternalStorage and WithGlue can't be both set"))
}
if o.cpNameInExtStorage != "" && o.glue != nil {
return common.NormalizeError(errors.New("WithCpNameInExtStorage and WithGlue can't be both set"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common.ErrInvalidArgument? It's not meaningful to call NormalizeError for an explicit error.

// lightning via SQL will implement its glue, to let lightning use host TiDB's environment
Glue glue.Glue
// when not OwnExtStorage, checkpoint can also be saved in framework-created ExtStorage by setting this field
CpNameInExtStorage string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lightning assumes all files in external storage are source data. I think it's not a good way to put checkpoint file in it, although we can use file route to ignore specific file. I suggest using a separate external storage to store checkpoint. If you really want to use the same external storage, you can pass the same object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already the behaviour of lightning in DM pingcap/tiflow#3813

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does DM works without this PR before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can search https://github.com/pingcap/tiflow/pull/3813/files

https://github.com/pingcap/tiflow/pull/3813/files#diff-dc63bb506f94f15965a3d2fed1dcf08ced7e068fb8b07488f90895dfd40e0e38R196

		cfg.Checkpoint.Driver = lcfg.CheckpointDriverFile
		cpPath := filepath.Join(l.cfg.LoaderConfig.Dir, lightningCheckpointFileName)
		cfg.Checkpoint.DSN = cpPath

Copy link
Contributor

@sleepymole sleepymole Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it just set checkpoint path inside the source dir, but it doesn't limit it setting to other path.

Using the external storage for source data and checkpoint has potential risks that the checkpoint file may be treated as a source file to import. So I think we shouldn't put checkpoint in the same external storage. But if DM still want to put them together, two external storage can be created. e.g source dir: file:///dump checkpoint: file:///dump/checkpoint.

Copy link
Contributor Author

@lance6716 lance6716 Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of concerns are about DM/dataflow engine should be more careful to build a config/environment for lightning when storing file checkpoint in dump folder, not related to lightning's inner logic. For example,

the checkpoint file may be treated as a source file to import.

In DM or dataflow engine's case, dumpling is used to create dump files, so we may limit the dump file matching pattern with this apriori knowledge to avoid bugs.

But if we actually store them at two place, (which requires two different arguments for two storage parameters), lightning has to handle the case that only one of them exists. And risk is higher since it's rely on two storages are available rather than one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lightning has to handle the case that only one of them exists.

Maybe we can let the two storages required and disallow one of them is nil. Or create checkpointsDB out of NewRestoreController and pass checkpointsDB to it.

Copy link
Contributor Author

@lance6716 lance6716 Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lightning has to handle the case that only one of them exists.

Maybe we can let the two storages required and disallow one of them is nil. Or create checkpointsDB out of NewRestoreController and pass checkpointsDB to it.

Oh I mean files on one of two storages are cleaned. Can lightning report error when only checkpoint exists? And lost checkpoint will cause some import restarted, which will not happen if we store them together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not. I think these are two concerns.

  1. For Lightning, I don't hope checkpoint can only store in dump dir's external storage.
  2. For DM, If you want to avoid "one of them loss", you can pass the same external storage object to both dump dir and checkpoint path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll add two storage parameters. Lightning as binary can still use old config Checkpoint.DSN to specify the file checkpoint location.

Signed-off-by: lance6716 <[email protected]>
Copy link
Contributor

@WizardXiao WizardXiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no other divergence besides this checkpoint divergence, although dm use the same dir to save chepoint and data files.
how about checkpoint will use self storage only when there has a dsn configuration?

@lance6716
Copy link
Contributor Author

I have no other divergence besides this checkpoint divergence, although dm use the same dir to save chepoint and data files. how about checkpoint will use self storage only when there has a dsn configuration?

For dataflow engine's case, it will not expose S3 URI to lightning, so this will not happen. For DM's case the logic is unchanged as before.

Copy link
Contributor

@sleepymole sleepymole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

}
if o.checkpointStorage != nil && o.glue != nil {
return common.ErrInvalidArgument.GenWithStack("WithCheckpointStorage and WithGlue can't be both set")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why storage and glue can't be both set?

Signed-off-by: lance6716 <[email protected]>

// pre-check about options
// glue should be set when lightning in TiDB, and storages should be set when lightning in DM/dataflow engine,
// so they should not both be set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems glue and storage can both set technically. If we implement Lightning on SQL, TiDB also use Lightning as a library. At that time, TiDB may both set glue and storage.

Besides, Lightning is an independent component or library. All its interfaces should consider generic requirements not the one who depend on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For lightning on SQL, it's not a requirement for TiDB to open an external storage. So if we emplace this restriction developers will use the lightning configuration to set loader dir and checkpoint DSN, which will reuse more logic about binary lightning and library lightning. In other words, my WithXXX options only want to make least changes to API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like that lightning binary and library use the same implementation to create the RestoreController later (This can be improved later). WithXXX options indeed doesn't introduce much changes, but it is more like a hack that is similar to failpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the same implementation to create the RestoreController.

  1. For binary, parse config file, create external storage and pass to RestoreController.
  2. For library, pass external storage to RestoreController directly.

Signed-off-by: lance6716 <[email protected]>
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 23, 2022
@lance6716
Copy link
Contributor Author

ping @WizardXiao

@lance6716
Copy link
Contributor Author

/cc @kennytm

@ti-chi-bot ti-chi-bot requested a review from kennytm March 24, 2022 02:51
@lance6716
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 1a11964

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 25, 2022
@lance6716
Copy link
Contributor Author

/run-unit-test
/run-mysql-test

@lance6716
Copy link
Contributor Author

/run-unit-tests

@lance6716
Copy link
Contributor Author

/run-mysql-tests

@hawkingrei
Copy link
Member

/merge

@lance6716
Copy link
Contributor Author

/run-mysql-tests

@ti-chi-bot
Copy link
Member

@lance6716: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@@ -22,12 +22,13 @@ import (
"runtime/debug"
"syscall"

"go.uber.org/zap"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this empty line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: lance6716 <[email protected]>
@ti-chi-bot ti-chi-bot removed the status/can-merge Indicates a PR has been approved by a committer. label Mar 25, 2022
@lance6716
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 5ac26f0

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 25, 2022
@hawkingrei
Copy link
Member

/merge

@hawkingrei
Copy link
Member

/hold

@ti-chi-bot ti-chi-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2022
@hawkingrei
Copy link
Member

/unhold

@ti-chi-bot ti-chi-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2022
@lance6716
Copy link
Contributor Author

/run-unit-tests

1 similar comment
@lance6716
Copy link
Contributor Author

/run-unit-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants