Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1098: [C++] Support specifying type ids or column names in cpp tools #1020

Merged
merged 2 commits into from
Jan 25, 2022

Conversation

stiga-huang
Copy link
Contributor

@stiga-huang stiga-huang commented Jan 19, 2022

What changes were proposed in this pull request?

This is a follow-up task of #921. Currently we have options for the tools to work on specified top-level column fields. However, ACID ORC files usually have nested structure. We need the type ids to specify nested columns. As an extension, adding support for column names will also be helpful. So we don't need to manually convert column names to type ids. Also reports the valid values when an invalid column name is given.

This PR extracts the option parsing codes into ToolsHelper. So similiar cpp tools can share the same option set.

Why are the changes needed?

It makes the tools more useful in practice.

How was this patch tested?

Added unit tests for the new options.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the PR, @stiga-huang .

Could you review this please, @wgtmac ?

And, if possible, could you participate the on-going Apache ORC 1.6.13 vote, @stiga-huang and @wgtmac ?

@dongjoon-hyun dongjoon-hyun added this to the 1.8.0 milestone Jan 20, 2022
@stiga-huang
Copy link
Contributor Author

Thank @dongjoon-hyun! Just voted.

@dongjoon-hyun
Copy link
Member

Thank you so much for your participation, @stiga-huang .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @stiga-huang .
Merged to main for Apache ORC 1.8.

@dongjoon-hyun dongjoon-hyun merged commit 89af2cb into apache:main Jan 25, 2022
@dongjoon-hyun
Copy link
Member

@stiga-huang . After reviewing this PR once more and testing on branch-1.7, I decided to re-target this to v1.7.3 instead of v1.8.0 because this is a safe tools only PR and the change on Reader.cc is also about ParseError message instead of a core code change. Thank you again.

dongjoon-hyun pushed a commit that referenced this pull request Jan 28, 2022
…ols (#1020)

### What changes were proposed in this pull request?

This is a follow-up task of #921. Currently we have options for the tools to work on specified top-level column fields. However, ACID ORC files usually have nested structure. We need the type ids to specify nested columns. As an extension, adding support for column names will also be helpful. So we don't need to manually convert column names to type ids. Also reports the valid values when an invalid column name is given.

This PR extracts the option parsing codes into ToolsHelper. So similiar cpp tools can share the same option set.

### Why are the changes needed?

It makes the tools more useful in practice.

### How was this patch tested?

Added unit tests for the new options.

(cherry picked from commit 89af2cb)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun dongjoon-hyun modified the milestones: 1.8.0, 1.7.3 Jan 28, 2022
cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
…ols (apache#1020)

### What changes were proposed in this pull request?

This is a follow-up task of apache#921. Currently we have options for the tools to work on specified top-level column fields. However, ACID ORC files usually have nested structure. We need the type ids to specify nested columns. As an extension, adding support for column names will also be helpful. So we don't need to manually convert column names to type ids. Also reports the valid values when an invalid column name is given.

This PR extracts the option parsing codes into ToolsHelper. So similiar cpp tools can share the same option set.

### Why are the changes needed?

It makes the tools more useful in practice.

### How was this patch tested?

Added unit tests for the new options.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants