Skip to content

Commit

Permalink
Fix performance of building row-level results (#577)
Browse files Browse the repository at this point in the history
* Generate row-level results with withColumns

Iteratively using withColumn (singular) causes performance
issues when iterating over a large sequence of columns.

* Add back UNIQUENESS_ID
  • Loading branch information
marcantony authored Aug 31, 2024
1 parent d495234 commit 3b1a3ec
Showing 1 changed file with 1 addition and 3 deletions.
4 changes: 1 addition & 3 deletions src/main/scala/com/amazon/deequ/VerificationResult.scala
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,7 @@ object VerificationResult {
val columnNamesToMetrics: Map[String, Column] = verificationResultToColumn(verificationResult)

val dataWithID = data.withColumn(UNIQUENESS_ID, monotonically_increasing_id())
columnNamesToMetrics.foldLeft(dataWithID)(
(dataWithID, newColumn: (String, Column)) =>
dataWithID.withColumn(newColumn._1, newColumn._2)).drop(UNIQUENESS_ID)
dataWithID.withColumns(columnNamesToMetrics).drop(UNIQUENESS_ID)
}

def checkResultsAsJson(verificationResult: VerificationResult,
Expand Down

0 comments on commit 3b1a3ec

Please sign in to comment.