Skip to content

Commit

Permalink
[SPARK-12678][CORE] MapPartitionsRDD clearDependencies
Browse files Browse the repository at this point in the history
MapPartitionsRDD was keeping a reference to `prev` after a call to
`clearDependencies` which could lead to memory leak.

Author: Guillaume Poulin <[email protected]>

Closes #10623 from gpoulin/map_partition_deps.

(cherry picked from commit b673852)
Signed-off-by: Reynold Xin <[email protected]>
  • Loading branch information
gpoulin authored and rxin committed Jan 7, 2016
1 parent 94af69c commit d061b85
Showing 1 changed file with 6 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import org.apache.spark.{Partition, TaskContext}
* An RDD that applies the provided function to every partition of the parent RDD.
*/
private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag](
prev: RDD[T],
var prev: RDD[T],
f: (TaskContext, Int, Iterator[T]) => Iterator[U], // (TaskContext, partition index, iterator)
preservesPartitioning: Boolean = false)
extends RDD[U](prev) {
Expand All @@ -36,4 +36,9 @@ private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag](

override def compute(split: Partition, context: TaskContext): Iterator[U] =
f(context, split.index, firstParent[T].iterator(split, context))

override def clearDependencies() {
super.clearDependencies()
prev = null
}
}

0 comments on commit d061b85

Please sign in to comment.