Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended to make the saveAs(New)HadoopFile an async operation #43

Merged
merged 1 commit into from
Mar 13, 2015

Conversation

wli600
Copy link

@wli600 wli600 commented Mar 9, 2015

@markhamstra replacing the old PR

Still use SimpleFutureAction, and similar way as submitJob, difference is the partition function takes additional task context

@wli600
Copy link
Author

wli600 commented Mar 10, 2015

@markhamstra, recall the reason why I did it in the first version.

As we use the SimpleFutureAction to wrap the commit in the passed in handler, we can not guarantee that we also invoke this when the Failure case happened with write action. So I make the extension to ensure that we have a chance to always call the passed in functions regardless the write action failed to succeeded.
However, with the current spark sync version, it does not appear that the commit is invoked in Failure case, could this lead to resource "leaking"?

@wli600
Copy link
Author

wli600 commented Mar 10, 2015

FWIW, look more with the spark hadoop writer, the abortJob is not exposed. So, the current form of async is on par with what spark support now.

markhamstra added a commit that referenced this pull request Mar 13, 2015
Extended to make the saveAs(New)HadoopFile an async operation
@markhamstra markhamstra merged commit 9852b8a into csd-1.1-cdh-5.3.2 Mar 13, 2015
@wli600 wli600 deleted the async branch March 14, 2015 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants