Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: LOAD DATA use lightning CSV parser #40852

Merged
merged 13 commits into from
Feb 16, 2023

Conversation

lance6716
Copy link
Contributor

@lance6716 lance6716 commented Jan 30, 2023

What problem does this PR solve?

Issue Number: ref #40499

Problem Summary:

What is changed and how it works?

  • remove the load LOAD DATA parser and its tests, use lightning parser and tests instead
  • wrap io.Reader from MySQL client connection, simplify the logic

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jan 30, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • buchuitoudegou
  • hawkingrei

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 30, 2023
Signed-off-by: lance6716 <[email protected]>
@ti-chi-bot ti-chi-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 30, 2023
executor/load_data.go Outdated Show resolved Hide resolved
@lance6716 lance6716 changed the title [WIP]executor: LOAD DATA use lightning CSV parser executor: LOAD DATA use lightning CSV parser Feb 14, 2023
@ti-chi-bot ti-chi-bot removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-linked-issue labels Feb 14, 2023
@lance6716
Copy link
Contributor Author

/cc @gozssky @buchuitoudegou

Signed-off-by: lance6716 <[email protected]>
Copy link
Contributor

@buchuitoudegou buchuitoudegou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an integration test sort of things?

if err != nil {
return prevData, err
if err = loadDataInfo.enqOneTask(ctx); err != nil {
logutil.Logger(ctx).Error("load data process stream error", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make the error message more specific to the function (i.e., enqOneTask) to make it easier to locate the error position according to the log.

}
// rowCount will be used in fillRow(), last insert ID will be assigned according to the rowCount = 1.
// So should add first here.
e.rowCount++
e.rows = append(e.rows, e.colsToRow(ctx, cols))
e.rows = append(e.rows, e.colsToRow(ctx, parser.LastRow().Row))
e.curBatchCnt++
if e.maxRowsInBatch != 0 && e.rowCount%e.maxRowsInBatch == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not e.RowCount >= e.maxRowsInBatch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rowCount will not reset, it's used as counter to report progress

Signed-off-by: lance6716 <[email protected]>
@lance6716
Copy link
Contributor Author

Do we need an integration test sort of things?

In fact the files under loadremotetest is a bit like integration tests, they use fake gcs server to serve the file content.

@lance6716
Copy link
Contributor Author

ptal @gozssky @buchuitoudegou

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 15, 2023
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 16, 2023
@hawkingrei
Copy link
Member

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: b9a37c6

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 16, 2023
@ti-chi-bot ti-chi-bot merged commit d161aa6 into pingcap:master Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants