Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1103] [WIP] Automatic garbage collection of RDD, shuffle and broadcast data #126

Closed
wants to merge 51 commits into from

Commits on Feb 5, 2014

  1. Added unpersist method to Broadcast.

    Roman Pastukhov committed Feb 5, 2014
    Configuration menu
    Copy the full SHA
    1e752f1 View commit details
    Browse the repository at this point in the history

Commits on Feb 6, 2014

  1. Fix for Broadcast unpersist patch.

    Updated comment in MemoryStore.dropFromMemory
    Keep TorrentBroadcast piece blocks until unpersist is called
    Roman Pastukhov committed Feb 6, 2014
    Configuration menu
    Copy the full SHA
    80dd977 View commit details
    Browse the repository at this point in the history

Commits on Feb 14, 2014

  1. Added ContextCleaner to automatically clean RDDs and shuffles when th…

    …ey fall out of scope. Also replaced TimeStampedHashMap to BoundedHashMaps and TimeStampedWeakValueHashMap for the necessary hashmap behavior.
    tdas committed Feb 14, 2014
    Configuration menu
    Copy the full SHA
    e427a9e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8512612 View commit details
    Browse the repository at this point in the history

Commits on Mar 11, 2014

  1. Merge remote-tracking branch 'apache/master' into state-cleanup

    Conflicts:
    	core/src/main/scala/org/apache/spark/MapOutputTracker.scala
    	core/src/main/scala/org/apache/spark/SparkContext.scala
    	core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
    	core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
    	core/src/main/scala/org/apache/spark/storage/BlockManager.scala
    	core/src/main/scala/org/apache/spark/util/TimeStampedHashMap.scala
    	core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
    tdas committed Mar 11, 2014
    Configuration menu
    Copy the full SHA
    a24fefc View commit details
    Browse the repository at this point in the history
  2. Fixed docs and styles.

    tdas committed Mar 11, 2014
    Configuration menu
    Copy the full SHA
    cb0a5a6 View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2014

  1. Removed unncessary TimeStampedHashMap from DAGScheduler, added try-ca…

    …tches in finalize() methods, and replaced ArrayBlockingQueue to LinkedBlockingQueue to avoid blocking in Java's finalizing thread.
    tdas committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    ae9da88 View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2014

  1. Configuration menu
    Copy the full SHA
    e61daa0 View commit details
    Browse the repository at this point in the history

Commits on Mar 17, 2014

  1. Added try-catch in context cleaner and null value cleaning in TimeSta…

    …mpedWeakValueHashMap.
    tdas committed Mar 17, 2014
    Configuration menu
    Copy the full SHA
    a7260d3 View commit details
    Browse the repository at this point in the history

Commits on Mar 18, 2014

  1. Removed use of BoundedHashMap, and made BlockManagerSlaveActor cleanu…

    …p shuffle metadata in MapOutputTrackerWorker.
    tdas committed Mar 18, 2014
    Configuration menu
    Copy the full SHA
    892b952 View commit details
    Browse the repository at this point in the history

Commits on Mar 19, 2014

  1. Style fix

    tdas committed Mar 19, 2014
    Configuration menu
    Copy the full SHA
    e1fba5f View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2014

  1. Configuration menu
    Copy the full SHA
    f2881fd View commit details
    Browse the repository at this point in the history
  2. Changes based on PR comments.

    tdas committed Mar 25, 2014
    Configuration menu
    Copy the full SHA
    620eca3 View commit details
    Browse the repository at this point in the history
  3. Merge remote-tracking branch 'apache/master' into state-cleanup

    Conflicts:
    	core/src/main/scala/org/apache/spark/Dependency.scala
    	core/src/main/scala/org/apache/spark/MapOutputTracker.scala
    	core/src/main/scala/org/apache/spark/SparkContext.scala
    	core/src/main/scala/org/apache/spark/SparkEnv.scala
    	core/src/main/scala/org/apache/spark/rdd/RDD.scala
    	core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
    	core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
    	core/src/main/scala/org/apache/spark/storage/BlockManager.scala
    	core/src/main/scala/org/apache/spark/storage/ThreadingTest.scala
    	core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
    tdas committed Mar 25, 2014
    Configuration menu
    Copy the full SHA
    a007307 View commit details
    Browse the repository at this point in the history
  4. Removed duplicate unpersistRDD.

    tdas committed Mar 25, 2014
    Configuration menu
    Copy the full SHA
    d2f8b97 View commit details
    Browse the repository at this point in the history
  5. Added missing Apache license

    tdas committed Mar 25, 2014
    Configuration menu
    Copy the full SHA
    6c9dcf6 View commit details
    Browse the repository at this point in the history

Commits on Mar 26, 2014

  1. Merge branch 'bc-unpersist-merge' of github.com:ignatich/incubator-sp…

    …ark into cleanup
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/broadcast/BroadcastFactory.scala
    	core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala
    	core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
    andrewor14 committed Mar 26, 2014
    Configuration menu
    Copy the full SHA
    c7ccef1 View commit details
    Browse the repository at this point in the history
  2. Refactor broadcast classes

    andrewor14 committed Mar 26, 2014
    Configuration menu
    Copy the full SHA
    ba52e00 View commit details
    Browse the repository at this point in the history
  3. Add framework for broadcast cleanup

    As of this commit, Spark does not clean up broadcast blocks.
    This will be done in the next commit.
    andrewor14 committed Mar 26, 2014
    Configuration menu
    Copy the full SHA
    d0edef3 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    544ac86 View commit details
    Browse the repository at this point in the history

Commits on Mar 27, 2014

  1. Add tests for unpersisting broadcast

    There is not currently a way to query the blocks on the executors,
    an operation that is deceptively simple to accomplish. This commit
    adds this mechanism in order to verify that blocks are in fact
    persisted/unpersisted on the executors in the tests.
    andrewor14 committed Mar 27, 2014
    Configuration menu
    Copy the full SHA
    e95479c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f201a8d View commit details
    Browse the repository at this point in the history
  3. Merge github.com:apache/spark into cleanup

    Conflicts:
    	core/src/main/scala/org/apache/spark/SparkContext.scala
    andrewor14 committed Mar 27, 2014
    Configuration menu
    Copy the full SHA
    c92e4d9 View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2014

  1. Configuration menu
    Copy the full SHA
    0d17060 View commit details
    Browse the repository at this point in the history
  2. Generalize BroadcastBlockId to remove BroadcastHelperBlockId

    Rather than having a special purpose BroadcastHelperBlockId just for TorrentBroadcast,
    we now have a single BroadcastBlockId that has a possibly empty field. This simplifies
    broadcast clean-up because now we only have to look for one type of block.
    
    This commit also simplifies BlockId JSON de/serialization in general by parsing the
    name through regex with apply.
    andrewor14 committed Mar 28, 2014
    Configuration menu
    Copy the full SHA
    34f436f View commit details
    Browse the repository at this point in the history

Commits on Mar 29, 2014

  1. Add functionality to query executors for their local BlockStatuses

    Not all blocks are reported to the master. In HttpBroadcast and
    TorrentBroadcast, for instance, most blocks are not reported to master.
    The lack of a mechanism to get local block statuses on each executor
    makes it difficult to test the correctness of un/persisting a broadcast.
    
    This new functionality, though only used for testing at the moment, is
    general enough to be used for other things in the future.
    andrewor14 committed Mar 29, 2014
    Configuration menu
    Copy the full SHA
    fbfeec8 View commit details
    Browse the repository at this point in the history
  2. Make TimeStampedWeakValueHashMap a wrapper of TimeStampedHashMap

    This allows us to get rid of WrappedJavaHashMap without much duplicate code.
    andrewor14 committed Mar 29, 2014
    Configuration menu
    Copy the full SHA
    88904a3 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e442246 View commit details
    Browse the repository at this point in the history

Commits on Mar 30, 2014

  1. Configuration menu
    Copy the full SHA
    8557c12 View commit details
    Browse the repository at this point in the history

Commits on Mar 31, 2014

  1. Configuration menu
    Copy the full SHA
    7edbc98 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    634a097 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7ed72fb View commit details
    Browse the repository at this point in the history

Commits on Apr 1, 2014

  1. Address TD's comments

    andrewor14 committed Apr 1, 2014
    Configuration menu
    Copy the full SHA
    5016375 View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2014

  1. Correct semantics for TimeStampedWeakValueHashMap + add tests

    This largely accounts for the cases when WeakReference becomes no longer strongly
    reachable, in which case the map should return None for all get() operations, and
    should skip the entry for all listing operations.
    andrewor14 committed Apr 2, 2014
    Configuration menu
    Copy the full SHA
    f0aabb1 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #1 from andrewor14/cleanup

    I am merging this. I will take one more detailed look in the context of my original changes in the main PR.
    tdas committed Apr 2, 2014
    Configuration menu
    Copy the full SHA
    762a4d8 View commit details
    Browse the repository at this point in the history

Commits on Apr 4, 2014

  1. Merge github.com:apache/spark into cleanup

    Conflicts:
    	core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
    	core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
    andrewor14 committed Apr 4, 2014
    Configuration menu
    Copy the full SHA
    a6460d4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c5b1d98 View commit details
    Browse the repository at this point in the history
  3. Merge remote-tracking branch 'apache/master' into state-cleanup

    Conflicts:
    	core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
    	core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
    tdas committed Apr 4, 2014
    Configuration menu
    Copy the full SHA
    a2cc8bc View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ada45f0 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    cd72d19 View commit details
    Browse the repository at this point in the history
  6. Merge pull request #3 from andrewor14/cleanup

    Patrick's comments
    tdas committed Apr 4, 2014
    Configuration menu
    Copy the full SHA
    b27f8e8 View commit details
    Browse the repository at this point in the history
  7. Fixed compilation errors.

    tdas committed Apr 4, 2014
    Configuration menu
    Copy the full SHA
    a430f06 View commit details
    Browse the repository at this point in the history
  8. Fixed failing BroadcastSuite unit tests by introducing blocking for r…

    …emoveShuffle and removeBroadcast in BlockManager*
    tdas committed Apr 4, 2014
    Configuration menu
    Copy the full SHA
    104a89a View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    6222697 View commit details
    Browse the repository at this point in the history

Commits on Apr 7, 2014

  1. Configuration menu
    Copy the full SHA
    41c9ece View commit details
    Browse the repository at this point in the history
  2. Added more documentation on Broadcast implementations, specially whic…

    …h blocks are told about to the driver. Also, fixed Broadcast API to hide destroy functionality.
    tdas committed Apr 7, 2014
    Configuration menu
    Copy the full SHA
    2b95b5e View commit details
    Browse the repository at this point in the history
  3. Scala style fix.

    tdas committed Apr 7, 2014
    Configuration menu
    Copy the full SHA
    4d05314 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    cff023c View commit details
    Browse the repository at this point in the history
  5. Fixed stupid typo.

    tdas committed Apr 7, 2014
    Configuration menu
    Copy the full SHA
    d25a86e View commit details
    Browse the repository at this point in the history

Commits on Apr 8, 2014

  1. Merge remote-tracking branch 'apache/master' into state-cleanup

    Conflicts:
    	core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
    	core/src/main/scala/org/apache/spark/storage/BlockManager.scala
    	core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala
    	core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala
    tdas committed Apr 8, 2014
    Configuration menu
    Copy the full SHA
    f489fdc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    61b8d6e View commit details
    Browse the repository at this point in the history