-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import: add skip-header-row parameter for csv #41070
Conversation
[REVIEW NOTIFICATION] This pull request has not been approved. To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/run-integration-br-test |
/run-integration-br-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should make skipNRows as a functionality of parser, so LOAD DATA can reuse it
if remainingSkipRows > 0 { | ||
iterPos, _ := parser.Pos() | ||
for remainingSkipRows > 0 && iterPos < endOffset { | ||
if err := parser.ReadRow(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #40852 I use parser.ReadUntilTerminator
because the skipped row may contain bad syntax
PrevRowIDMax: prevRowIdxMax, | ||
RowIDMax: rowIDMax, | ||
Columns: columns, | ||
SkipFirstNRows: rowsToSkip, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing startOffset
is enough? no need to add SkipFirstNRows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've considered this problem. We need the SkipFirstNRows
, because for some data format like parquet, it is not stored row by row. In this situation, we cannot just specify a offset to skip first N rows.
@gozssky PTAL |
Close the PR because of the discussion in #40839 (comment) |
What problem does this PR solve?
Issue Number: ref #40839
Problem Summary:
What is changed and how it works?
skip-header-row
, to allow skipping the first row of each CSV filecsv.header
is set to true, it implies thatskip-header-row
is also trueCheck List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.