Skip to content

Commit

Permalink
[SPARK-12477][SQL] - Tungsten projection fails for null values in arr…
Browse files Browse the repository at this point in the history
…ay fields

Accessing null elements in an array field fails when tungsten is enabled.
It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled.

This PR solves this by checking if the accessed element in the array field is null, in the generated code.

Example:
```
// Array of String
case class AS( as: Seq[String] )
val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
dfAS.registerTempTable("T_AS")
for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))}
```

With Tungsten disabled:
```
0 = [a]
1 = [null]
2 = [b]
```

With Tungsten enabled:
```
0 = [a]
15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15)
java.lang.NullPointerException
	at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
```

Author: pierre-borckmans <[email protected]>

Closes apache#10429 from pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array.
  • Loading branch information
pierre-borckmans authored and rxin committed Dec 23, 2015
1 parent 50301c0 commit 43b2a63
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ case class GetArrayItem(child: Expression, ordinal: Expression)
nullSafeCodeGen(ctx, ev, (eval1, eval2) => {
s"""
final int index = (int) $eval2;
if (index >= $eval1.numElements() || index < 0) {
if (index >= $eval1.numElements() || index < 0 || $eval1.isNullAt(index)) {
${ev.isNull} = true;
} else {
${ev.value} = ${ctx.getValue(eval1, dataType, "index")};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,13 @@ class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext {
val df = sparkContext.parallelize(Seq((1, 1))).toDF("a", "b")
df.select(array($"a").as("s")).select(f(expr("s[0]"))).collect()
}

test("SPARK-12477 accessing null element in array field") {
val df = sparkContext.parallelize(Seq((Seq("val1", null, "val2"),
Seq(Some(1), None, Some(2))))).toDF("s", "i")
val nullStringRow = df.selectExpr("s[1]").collect()(0)
assert(nullStringRow == org.apache.spark.sql.Row(null))
val nullIntRow = df.selectExpr("i[1]").collect()(0)
assert(nullIntRow == org.apache.spark.sql.Row(null))
}
}

0 comments on commit 43b2a63

Please sign in to comment.