Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-236: Support UNION type in Java Convert tool #1025

Merged
merged 4 commits into from
Feb 16, 2022

Conversation

rizaon
Copy link
Contributor

@rizaon rizaon commented Jan 27, 2022

What changes were proposed in this pull request?

This patch add support to convert json to UNION type in orc file. For
example, for schema struct<foo:uniontype<int,string>>, the following
json lines can be parsed into UNION type.

{"foo": {"tag": 0, "value": 1}}
{"foo": {"tag": 1, "value": "testing"}}
{"foo": {"tag": 0, "value": 3}}

Why are the changes needed?

This add a missing support for UNION type in java convert tool.

How was this patch tested?

Manually test against handcrafted json file.

This patch add support to convert json to UNION type in orc file. For
example, for schema `struct<foo:uniontype<int,string>>`, the following
json lines can be parsed into UNION type.

```
{"foo": {"tag": 0, "value": 1}}
{"foo": {"tag": 1, "value": "testing"}}
{"foo": {"tag": 0, "value": 3}}
```
@github-actions github-actions bot added the JAVA label Jan 27, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @rizaon .

@dongjoon-hyun
Copy link
Member

cc @guiyanakuang and @williamhyun , too

@dongjoon-hyun dongjoon-hyun changed the title ORC-236: Support UNION type in Java Convert tool ORC-236: Support UNION type in Java Convert tool Jan 27, 2022
@dongjoon-hyun
Copy link
Member

Is this applicable to branch-1.7, @rizaon ?

@rizaon
Copy link
Contributor Author

rizaon commented Jan 27, 2022

@dongjoon-hyun If backport is clean, it should be applicable.

@dongjoon-hyun dongjoon-hyun added this to the 1.7.3 milestone Jan 28, 2022
@dongjoon-hyun
Copy link
Member

Thank you, @rizaon . I set the milestone, v1.7.3.

@guiyanakuang
Copy link
Member

It might be nice to make Convert more generic, some existing json data may not have tag and value fields, adding an option for the user to customize the method of getting tag and value would be great. : )

@rizaon
Copy link
Contributor Author

rizaon commented Jan 28, 2022

Hello @guiyanakuang, thanks for your feedback!
Does it help if I add parameters to specify the tag and value json-key? Maybe, --union_tag and --union_key like this?

java -jar ./java/tools/target/orc-tools-1.8.0-SNAPSHOT-uber.jar convert -o sample.orc \
  -s "struct<foo:uniontype<int,string>>" --union_tag="tag" --union_value="value" sample.json

@guiyanakuang
Copy link
Member

@rizaon. I think it's good that this can handle simple situations. If there is complex json data we can keep upgrading

@dongjoon-hyun dongjoon-hyun modified the milestones: 1.7.3, 1.8.0 Jan 31, 2022
@dongjoon-hyun
Copy link
Member

I switched the milestone to 1.8.0 since 1.7.3 will be released in two weeks and this is a new improvement which is not a blocker for https://github.com/apache/orc/milestone/4?closed=1 .

@guiyanakuang
Copy link
Member

+1 LGTM

@dongjoon-hyun dongjoon-hyun modified the milestones: 1.8.0, 1.7.4 Feb 16, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @rizaon and @guiyanakuang .
Also, cc @williamhyun because he is the release manager of Apache ORC 1.7.4.

Merged to main/1.7.

@dongjoon-hyun dongjoon-hyun merged commit 6f44815 into apache:main Feb 16, 2022
dongjoon-hyun pushed a commit that referenced this pull request Feb 16, 2022
### What changes were proposed in this pull request?

This patch add support to convert json to UNION type in orc file. For
example, for schema `struct<foo:uniontype<int,string>>`, the following
json lines can be parsed into UNION type.

```
{"foo": {"tag": 0, "value": 1}}
{"foo": {"tag": 1, "value": "testing"}}
{"foo": {"tag": 0, "value": 3}}
```

### Why are the changes needed?

This add a missing support for UNION type in java convert tool.

### How was this patch tested?

Manually test against handcrafted json file.

(cherry picked from commit 6f44815)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

@rizaon . I added you to the Apache ORC contributor group and assigned ORC-236 to you.
Welcome to the Apache ORC community!

@rizaon
Copy link
Contributor Author

rizaon commented Feb 16, 2022

Thank you, @dongjoon-hyun @guiyanakuang !

cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
### What changes were proposed in this pull request?

This patch add support to convert json to UNION type in orc file. For
example, for schema `struct<foo:uniontype<int,string>>`, the following
json lines can be parsed into UNION type.

```
{"foo": {"tag": 0, "value": 1}}
{"foo": {"tag": 1, "value": "testing"}}
{"foo": {"tag": 0, "value": 3}}
```

### Why are the changes needed?

This add a missing support for UNION type in java convert tool.

### How was this patch tested?

Manually test against handcrafted json file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants