[HUDI-3236] use fields'comments persisted in catalog to fill in schema #4587

YannByron · 2022-01-13T11:08:32Z

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

xiarixiaoyao · 2022-01-13T12:51:29Z

...k/src/main/scala/org/apache/spark/sql/hudi/command/AlterHoodieTableChangeColumnCommand.scala

+      throw new AnalysisException(s"Can't find column `$columnName` given table data columns " +
+        s"${hoodieCatalogTable.dataSchema.fieldNames.mkString("[`", "`, `", "`]")}")
+    )
+


I don't think we support rename operation now， Why remove relevant judgments？

This changes also do not support rename operation. See the findColumnByName's implement.

thanks . I mean, maybe we can throw a corresponding exception to tell the user that we don't support rename at present,

I think It is better to have the same behavior as Spark.

xiarixiaoyao · 2022-01-13T13:00:13Z

hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestAlterTable.scala

        }
        checkAnswer(s"select id, name, price, ts, ext0 from $newTableName")(
          Seq(1, "a1", 10.0, 1000, null)
        )
-        // Alter table column type
+
+        // change column's data type
        spark.sql(s"alter table $newTableName change column id id bigint")


now， hudi on spark cannot support dataType change。 hudi use spark parquetFileFormat to read parquet file，but that reader is hardly support type change。 see the origin code of spark project ParquetVectorUpdaterFactory.getUpdater
This test is actually wrong， if you add spark.sql(s"select id from $newTableName").show(false) in line 95， this test will failed。

I know that. See the details in https://issues.apache.org/jira/browse/HUDI-3237.

xushiyan

LGTM

hudi-bot · 2022-01-19T05:24:40Z

CI report:

4ca6772 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

apache#4587)

xiarixiaoyao reviewed Jan 13, 2022

View reviewed changes

YannByron force-pushed the master_3236 branch 2 times, most recently from 790f50e to dec3b88 Compare January 14, 2022 07:50

xushiyan approved these changes Jan 16, 2022

View reviewed changes

nsivabalan assigned xiarixiaoyao Jan 17, 2022

[HUDI-3236] use fields'comments persisted in catalog to fill in schema

4ca6772

YannByron force-pushed the master_3236 branch from dec3b88 to 4ca6772 Compare January 19, 2022 04:06

xiarixiaoyao approved these changes Jan 20, 2022

View reviewed changes

xushiyan merged commit 31b57a2 into apache:master Jan 20, 2022

vinishjail97 mentioned this pull request Jan 24, 2022

FixIgnoreKey nsivabalan/hudi#11

Closed

5 tasks

alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Jan 25, 2022

[HUDI-3236] use fields'comments persisted in catalog to fill in schema (

429c76e

apache#4587)

vingov pushed a commit to vingov/hudi that referenced this pull request Jan 26, 2022

[HUDI-3236] use fields'comments persisted in catalog to fill in schema (

ea5522b

apache#4587)

liusenhua pushed a commit to liusenhua/hudi that referenced this pull request Mar 1, 2022

[HUDI-3236] use fields'comments persisted in catalog to fill in schema (

393317f

apache#4587)

vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022

[HUDI-3236] use fields'comments persisted in catalog to fill in schema (

fc11a7f

apache#4587)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-3236] use fields'comments persisted in catalog to fill in schema #4587

[HUDI-3236] use fields'comments persisted in catalog to fill in schema #4587

YannByron commented Jan 13, 2022

xiarixiaoyao Jan 13, 2022

YannByron Jan 13, 2022

xiarixiaoyao Jan 13, 2022

YannByron Jan 13, 2022

xiarixiaoyao Jan 13, 2022

YannByron Jan 13, 2022

xushiyan left a comment

hudi-bot commented Jan 19, 2022

[HUDI-3236] use fields'comments persisted in catalog to fill in schema #4587

[HUDI-3236] use fields'comments persisted in catalog to fill in schema #4587

Conversation

YannByron commented Jan 13, 2022

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

xiarixiaoyao Jan 13, 2022

Choose a reason for hiding this comment

YannByron Jan 13, 2022

Choose a reason for hiding this comment

xiarixiaoyao Jan 13, 2022

Choose a reason for hiding this comment

YannByron Jan 13, 2022

Choose a reason for hiding this comment

xiarixiaoyao Jan 13, 2022

Choose a reason for hiding this comment

YannByron Jan 13, 2022

Choose a reason for hiding this comment

xushiyan left a comment

Choose a reason for hiding this comment

hudi-bot commented Jan 19, 2022

CI report: