Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support getFileBlockLocation in LocalCacheFileSystem #17672

Merged
merged 3 commits into from
Jul 24, 2023

Conversation

maobaolong
Copy link
Contributor

@maobaolong maobaolong commented Jun 24, 2023

What changes are proposed in this pull request?

Delegate getFileBlockLocation to external file system in LocalCacheFileSystem.

Why are the changes needed?

Otherwise, LocalCacheFileSystem inherits the default behavior of org.apache.hadoop.fs.FileSystem which returns localhost only.

Does this PR introduce any user facing changes?

No.

@maobaolong maobaolong changed the title Support LocalCacheFileSystem turn getFileBlockLocation request to ext… Support LocalCacheFileSystem turn getFileBlockLocation request to external file system Jun 24, 2023
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • Title is too long (86 characters). Must be at most 72 characters.
  • Commits associated with Github account: PASS

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

@maobaolong maobaolong changed the title Support LocalCacheFileSystem turn getFileBlockLocation request to external file system Support getFileBlockLocation in LocalCacheFileSystem Jun 24, 2023
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: PASS

All checks passed!

@maobaolong maobaolong requested a review from dbw9580 June 25, 2023 23:48
@dbw9580
Copy link
Contributor

dbw9580 commented Jun 26, 2023

Can you please describe what are the benefits if the client can get access to the block location info? Would that cause the client to directly talk to the datanode and bypass local cache?

@maobaolong
Copy link
Contributor Author

@dbw9580 The compute framework can use it to schedule the split task to the node in the same block location.

@Override
public BlockLocation[] getFileBlockLocations(FileStatus file, long start,
long len) throws IOException {
return mExternalFileSystem.getFileBlockLocations(file, start, len);
Copy link
Contributor

@jiacheliu3 jiacheliu3 Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Applications use the block information here to schedule/distribute the tasks.
// Return the UFS locations directly instead of the local cache location,
// so the application can schedule the tasks accordingly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbw9580 @maobaolong what do you think about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it

@maobaolong
Copy link
Contributor Author

@jiacheliu3 Thanks for your suggested comments, PTAL.

@jiacheliu3 jiacheliu3 added the type-bug This issue is about a bug label Jul 24, 2023
@jiacheliu3
Copy link
Contributor

alluxio-bot, merge this please

@alluxio-bot alluxio-bot merged commit f3d1af8 into Alluxio:master-2.x Jul 24, 2023
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request Nov 8, 2023
### What changes are proposed in this pull request?

Delegate `getFileBlockLocation` to external file system in `LocalCacheFileSystem`.

### Why are the changes needed?

Otherwise, `LocalCacheFileSystem` inherits the default behavior of `org.apache.hadoop.fs.FileSystem` which returns `localhost` only. 

### Does this PR introduce any user facing changes?

No.

			pr-link: Alluxio#17672
			change-id: cid-eb545dbd8ed42001d074fecfb9c8d6b118a559c1
ssz1997 pushed a commit to ssz1997/alluxio that referenced this pull request Dec 15, 2023
### What changes are proposed in this pull request?

Delegate `getFileBlockLocation` to external file system in `LocalCacheFileSystem`.

### Why are the changes needed?

Otherwise, `LocalCacheFileSystem` inherits the default behavior of `org.apache.hadoop.fs.FileSystem` which returns `localhost` only. 

### Does this PR introduce any user facing changes?

No.

			pr-link: Alluxio#17672
			change-id: cid-eb545dbd8ed42001d074fecfb9c8d6b118a559c1
maobaolong added a commit to maobaolong/alluxio that referenced this pull request Jan 3, 2024
### What changes are proposed in this pull request?

Delegate `getFileBlockLocation` to external file system in `LocalCacheFileSystem`.

### Why are the changes needed?

Otherwise, `LocalCacheFileSystem` inherits the default behavior of `org.apache.hadoop.fs.FileSystem` which returns `localhost` only.

### Does this PR introduce any user facing changes?

No.

			pr-link: Alluxio#17672
			change-id: cid-eb545dbd8ed42001d074fecfb9c8d6b118a559c1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug This issue is about a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants