Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNAP-1190] Reduce partition message overhead from driver to executor #31

Merged
merged 5 commits into from
Dec 3, 2016

Commits on Nov 30, 2016

  1. [SNAP-1190] Reduce partition message overhead from driver to executor…

    …s and back
    
    - DAGScheduler:
      - For small enough common task data (RDD + closure) send inline with the Task instead of a broadcast
      - Transiently store task binary data in Stage to re-use if possible
      - Compress the common task bytes to save on network cost
    - Task: New TaskData class to encapsulate task compressed bytes from above, the uncompressed length
      and reference index if TaskData is being read from a separate list (see next comments)
    - CoarseGrainedClusterMessage: Added new LaunchTasks message to encapsulate multiple
      Task messages to same executor
    - CoarseGrainedSchedulerBackend:
      - Create LaunchTasks by grouping messages in ExecutorTaskGroup per executor
      - Actual TaskData is sent as part of TaskDescription and not the Task to easily
        separate out the common portions in a separate list
      - Send the common TaskData as a separate ArrayBuffer of data with the index into this
        list set in the original task's TaskData
    - CoarseGrainedExecutorBackend: Handle LaunchTasks by splitting into individual jobs
    - CompressionCodec: added bytes compress/decompress methods for more efficient byte array compression
    - Executor:
      - Set the common decompressed task data back into the Task object.
      - Avoid additional serialization of TaskResult just to determine the serialization time.
        Instead now calculate the time inline during serialization write/writeExternal methods
    - TaskMetrics: more generic handling for DoubleAccumulator case
    - Task: Handling of TaskData during serialization to send a flag to indicate whether
      data is inlined or will be received via broadcast
    - ResultTask, ShuffleMapTask: delegate handling of TaskData to parent Task class
    - SparkEnv: encapsulate codec creation as a zero-arg function to avoid repeated conf lookups
    - SparkContext.clean: avoid checking serializability in case non-default closure serializer is being used
    - Test updates for above
    Sumedh Wale committed Nov 30, 2016
    Configuration menu
    Copy the full SHA
    781e74d View commit details
    Browse the repository at this point in the history

Commits on Dec 1, 2016

  1. fixing couple of scalaStyle errors

    Sumedh Wale committed Dec 1, 2016
    Configuration menu
    Copy the full SHA
    90ac0e9 View commit details
    Browse the repository at this point in the history
  2. Explicit addLong/longValue methods in SQLMetrics

    This avoids runtime erasure for add/value methods that will result in unnecessary boxing/unboxing overheads.
    Sumedh Wale committed Dec 1, 2016
    Configuration menu
    Copy the full SHA
    030bfa3 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'SNAP-1194' into SNAP-1190

    Sumedh Wale committed Dec 1, 2016
    Configuration menu
    Copy the full SHA
    06612be View commit details
    Browse the repository at this point in the history

Commits on Dec 2, 2016

  1. Configuration menu
    Copy the full SHA
    d139e6b View commit details
    Browse the repository at this point in the history