[SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics nd…

…cgAk ### What changes were proposed in this pull request? This PR fixes the condition to raise the following warning in MLLib's RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value set should be equal, check input data" The logic for raising warnings is faulty at the moment: it raises a warning if the `rel` input is empty and `lab.size` and `rel.size` are not equal. The logic should be to raise a warning if `rel` input is **not empty** and `lab.size` and `rel.size` are not equal. This warning was added in the following PR: apache#36843 ### Why are the changes needed? With the current logic, RankingMetrics will: - raise incorrect warning when a user is using it in the "binary" mode (i.e. no relevance values in the input) - not raise warning (that could be necessary) when the user is using it in the "non-binary" model (i.e. with relevance values in the input) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No change made to the test suite for RankingMetrics: https://github.com/uchiiii/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala Closes apache#42207 from guilhem-depop/patch-1. Authored-by: Guilhem Vuillier <[email protected]> Signed-off-by: Sean Owen <[email protected]>
zhengruifeng · Jul 28, 2023 · 72af2c0 · 72af2c0
1 parent 20bb6c0
commit 72af2c0
Showing 1 changed file with 4 additions and 1 deletion.
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
@@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <:
    * and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current
    * implementation, the relevance value is binary if the relevance value is empty.
 
+   * If the relevance value is not empty but its size doesn't match the ground truth set size,
+   * a log warning is generated.
+   *
    * If a query has an empty ground truth set, zero will be used as ndcg together with
    * a log warning.
    *
@@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <:
       val useBinary = rel.isEmpty
       val labSet = lab.toSet
       val relMap = Utils.toMap(lab, rel)
-      if (useBinary && lab.size != rel.size) {
+      if (!useBinary && lab.size != rel.size) {
         logWarning(
           "# of ground truth set and # of relevance value set should be equal, " +
             "check input data")