Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28575: Reduce hdfs filesystem rpc call #5504

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zhangbutao
Copy link
Contributor

@zhangbutao zhangbutao commented Oct 12, 2024

What changes were proposed in this pull request?

Why are the changes needed?

HIVE-24838 #2085 added the optimization to reduce BlobStorages fs call. I think it is also can be used to reduce HDFS filesystem call.

This can also optimize some issues like HIVE-28523(if partition location is outside from table location)

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

No

How was this patch tested?

The existing tests

* Hadoop File System reverse lookups paths with raw ip addresses The File
* System URI always contains the canonical DNS name of the Namenode.
* Subsequently, operations on paths with raw ip addresses cause an exception
* since they don't match the file system URI.
Copy link
Contributor Author

@zhangbutao zhangbutao Oct 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realy don't understand what's meaning of this comment.
https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestWarehouseDnsPath.java The tests show that getDnsPath has nothing to do with Hadoop File System reverse lookups.

IMO, the hdfs filesystem should be same as object stores, s3a/s3n, we can also optimizie hdfs fs like HIVE-24838. #2085 (comment) also said getDnsPath : The only useful thing this function does is transforming a relative path to an absolute path.

input = /warehouse/tablespace/managed/hive
output = hdfs://namenode-address.site:8020/warehouse/tablespace/managed/hive

I hope someone can help me to understand this comment about the Hadoop File System reverse lookups paths.
@ayushtkn Do you have any thought?

@@ -1796,8 +1796,6 @@ public enum ConfVars {
"hive.metastore.custom.database.product.classname", "none",
"Hook for external RDBMS. This class will be instantiated only when " +
"metastore.use.custom.database.product is set to true."),
HIVE_BLOBSTORE_SUPPORTED_SCHEMES("hive.blobstore.supported.schemes", "hive.blobstore.supported.schemes", "s3,s3a,s3n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can delete this propertity, we can do optimizie the fs call for all schemas including hdfs.

@zhangbutao zhangbutao marked this pull request as ready for review October 12, 2024 09:32
@zhangbutao zhangbutao changed the title [Test]: Reduce hdfs filesystem rpc call Reduce hdfs filesystem rpc call Oct 12, 2024
@zhangbutao zhangbutao changed the title Reduce hdfs filesystem rpc call HIVE-28575: Reduce hdfs filesystem rpc call Oct 12, 2024
Copy link

sonarcloud bot commented Oct 13, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants