Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-24838. Reduce FS creation in Warehouse::getDnsPath for object stores #2085

Merged
merged 1 commit into from
Mar 31, 2021

Conversation

zeroflag
Copy link
Contributor

@zeroflag zeroflag commented Mar 17, 2021

Based on its javadoc comment getDnsPath is supposed to change the authority part (ip) of a path to a domain name. To do so it creates a new FileSystem object and gets the default path from it, and uses the authority from the the default path to replace the authority of the input.

There are multiple problems with this. Instantiating a new FileSystem is expensive. Replacing the authority on a blobstore path (s3a) is not needed since s3 uses bucket name instead of a hostname in the path.

Even in HDFS case the original function doesn't do what the original intention might was. The new file system is initialized with the input path so the default FS path is always same as the input path (unless a relative path is used as an input). The only useful thing this function does is transforming a relative path to an absolute path.

For example:

input = /warehouse/tablespace/managed/hive
output = hdfs://namenode-address.site:8020/warehouse/tablespace/managed/hive

But this can be achieved by doing a simple config lookup (FileSystem.getDefaultUri(conf)) there is no need to create a new FS object every time.

Copy link
Contributor

@rbalamohan rbalamohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @zeroflag .

LGTM. +1.

@zhangbutao
Copy link
Contributor

hi @zeroflag , thanks your fix. A late comment: Do you think this optimization applies to hdfs as well?
I think the getDnsPath is just used to convert the relative path to absolute path. FileSystem.getDefaultUri(conf) is enough to get the authority&scheme if the path does not contains authority&scheme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants