Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1397] Notify SparkListeners when stages fail or are cancelled. #309

Closed
wants to merge 2 commits into from

Commits on Apr 8, 2014

  1. Notify SparkListeners when stages fail or are cancelled.

    Previously, when stages fail or get cancelled, the SparkListener is only notified
    indirectly through the SparkListenerJobEnd, where we sometimes pass in a single
    stage that failed.  This worked before job cancellation, because jobs would only fail
    due to a single stage failure.  However, with job cancellation, multiple running stages
    can fail when a job gets cancelled.  Right now, this is not handled correctly, which
    results in stages that get stuck in the “Running Stages” window in the UI even
    though they’re dead.
    
    This PR changes the SparkListenerStageCompleted event to a SparkListenerStageEnded
    event, and uses this event to tell SparkListeners when stages fail in addition to when
    they complete successfully.  This change is NOT publicly backward compatible for two
    reasons.  First, it changes the SparkListener interface.  We could alternately add a new event,
    SparkListenerStageFailed, and keep the existing SparkListenerStageCompleted.  However,
    this is less consistent with the listener events for tasks / jobs ending, and will result in some
    code duplication for listeners (because failed and completed stages are handled in similar
    ways).  Note that I haven’t finished updating the JSON code to correctly handle the new event
    because I’m waiting for feedback on whether this is a good or bad idea (hence the “WIP”).
    
    It is also not backwards compatible because it changes the publicly visible JobWaiter.jobFailed()
    method to no longer include a stage that caused the failure.  I think this change should definitely
    stay, because with cancellation (as described above), a failure isn’t necessarily caused by a
    single stage.
    kayousterhout committed Apr 8, 2014
    Configuration menu
    Copy the full SHA
    320c7c7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5533ecd View commit details
    Browse the repository at this point in the history