-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6560][CORE] Do not suppress exceptions from writer.write. #5223
Conversation
Test build #29280 has started for PR 5223 at commit
|
Test build #29280 has finished for PR 5223 at commit
|
Test PASSed. |
I like the idea. What I slightly don't like is making a utility method but then using it to solve just one instance of the issue. I think that this doesn't need to apply to all |
@srowen I waffled on the utility method, but went with it only because putting the logic inline with the writer.write/writer.close logic made it pretty horribly to read. That finally/try/finally is not exactly pretty. Sure, I can scan for other places in the codebase that are obvious for this; although I'm tempted to think the opposite, in that I think I'd almost always want this "don't suppress the real exception" behavior, vs. hiding what really happened. |
@stephenh how about other exactly similar cases in the code base, at least -- a writer which may fail with an exception, which often causes its closing to fail too, since it often triggers a flush? I don't think there are a load of them so it wouldn't be too disruptive. Use your judgment. |
f42e92d
to
d7949a3
Compare
Test build #29354 has started for PR 5223 at commit
|
Okay, just grepping for "writer.write" in core/src/main, I found another place in PairRDDFunctions, and then an existing "safe finally" usage in ShuffleMapTask. I've glanced around at other finally blocks, and, yeah, I guess most of them see fine/more innocent, so this is probably fine for now. I'll keep poking around, but this is what I have for now. Let me know if there are more suspecting places to look than others. |
Test build #29354 has finished for PR 5223 at commit
|
Test PASSed. |
log.debug("Could not stop writer", e) | ||
} | ||
throw e | ||
} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code was not closing the writer before during normal execution, so this is good. However, this change also means that it will stop with success = false whether an error occurred or not, and will do so even after stop was called with success = true. This one may not fit the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, sorry, good catch.
Grepping for something coarse like In many of the cases (tests, utility helper code) it probably doesn't matter but some of them look worth fixing with this nice utility method, that gets both the |
Good idea about out.close; I'll look in that, but it might take a day or two to find some time. Thanks for the suggestion. |
7e9cd5d
to
9ce88d4
Compare
Test build #29487 has started for PR 5223 at commit |
9ce88d4
to
3e72362
Compare
Test build #29489 has started for PR 5223 at commit |
Okay, I put back the ShuffleMapTask change, and found a few other "out.close" places. I was only looking in core/src/main. Were there some specifically you'd seen that I've missed? |
Test build #29489 has finished for PR 5223 at commit
|
Test FAILed. |
Test build #29487 has finished for PR 5223 at commit
|
Test PASSed. |
val out = new DataOutputStream(conn.getOutputStream) | ||
out.write(json.getBytes(Charsets.UTF_8)) | ||
out.close() | ||
var out: DataOutputStream = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see if there are other changes that are needed, but, I think this pattern is slightly inconsistent with the others. out
can be a val
that is declared before the try block. I suppose if the object to be closed never instantiates, there's nothing to call close()
on anyway. If we were really picky we'd have to handle the case of conn.getOutputStream
succeeding while the constructor fails but I think it is not needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, right...I had looked at this and not sure why I thought it needed to be a var...will change.
3e72362
to
15c4a6f
Compare
Test build #29500 has started for PR 5223 at commit |
Hm, I think there are maybe 10 more examples of this pattern that could be touched up in the code base. Maybe it's a good time to pause to ask if anyone else has thoughts either way? I favor cleaning this up, at least the non-test, non-example, core instances. |
Ah, yeah...just grepping for ".close()" shows a lot more hits. I've poked a few that you mentioned. I'm all for getting other feedback...will someone notice our chatter on the PR, or do you want to specifically loop someone in? |
Test build #29500 has finished for PR 5223 at commit
|
Test PASSed. |
Yeah this is looking good. I identified some more instances that might use the same treatment; have a look and see what you think:
CC @pwendell @rxin for thoughts on this which touches bits of code all over the place, but in core in particular. I think it improves error reporting and cleanup and is a clean, targeted change everywhere. |
// We could do originalThrowable.addSuppressed(t), but it's | ||
// not available in JDK 1.6. | ||
logWarning(s"Skipping exception that happening in finally", t) | ||
throw originalThrowable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do some of our own munging here to append to the original message in order to indicate the swallowed exception as well? Or maybe it's too verbose and not worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will add.
This LGTM with one minor comment - I looked closely at the external sorter code, which seemed like the trickiest bit, and from what I can tell the existing behavior is preserved. Thanks, this is a nice construct we can use throughout the codebase. |
15c4a6f
to
6092217
Compare
If there is a failure in the Hadoop backend while calling writer.write, we should remember this original exception, and try to call writer.close(), but if that fails as well, still report the original exception. Note that, if writer.write fails, it is likely that writer was left in an invalid state, and so actually makes it more likely that writer.close will also fail. Which just increases the chances for writer.write's exception to be suppressed. This patch introduces an admittedly potentially too cute Utils.tryWithSafeFinally method to handle the try/finally gyrations.
6092217
to
c7ad53f
Compare
Test build #29614 has started for PR 5223 at commit |
Okay, @srowen I believe I've addressed each of those additional locations. |
Test build #29615 has started for PR 5223 at commit |
Test build #29614 has finished for PR 5223 at commit
|
Test PASSed. |
Test build #29615 has finished for PR 5223 at commit
|
Test PASSed. |
If there is a failure in the Hadoop backend while calling
writer.write, we should remember this original exception,
and try to call writer.close(), but if that fails as well,
still report the original exception.
Note that, if writer.write fails, it is likely that writer
was left in an invalid state, and so actually makes it more
likely that writer.close will also fail. Which just increases
the chances for writer.write's exception to be suppressed.
This patch introduces an admittedly potentially too cute
Utils.tryWithSafeFinally method to handle the try/finally
gyrations.