Skip to content

Commit

Permalink
[SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics nd…
Browse files Browse the repository at this point in the history
…cgAk

### What changes were proposed in this pull request?

This PR fixes the condition to raise the following warning in MLLib's RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value set should be equal, check input data"

The logic for raising warnings is faulty at the moment: it raises a warning if the `rel` input is empty and `lab.size` and `rel.size` are not equal.

The logic should be to raise a warning if `rel` input is **not empty** and `lab.size` and `rel.size` are not equal.

This warning was added in the following PR: apache#36843

### Why are the changes needed?

With the current logic, RankingMetrics will:
- raise incorrect warning when a user is using it in the "binary" mode (i.e. no relevance values in the input)
- not raise warning (that could be necessary) when the user is using it in the "non-binary" model (i.e. with relevance values in the input)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No change made to the test suite for RankingMetrics: https://github.com/uchiiii/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala

Closes apache#42207 from guilhem-depop/patch-1.

Authored-by: Guilhem Vuillier <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
  • Loading branch information
guilhem-depop authored and srowen committed Jul 28, 2023
1 parent 20bb6c0 commit 72af2c0
Showing 1 changed file with 4 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <:
* and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current
* implementation, the relevance value is binary if the relevance value is empty.
* If the relevance value is not empty but its size doesn't match the ground truth set size,
* a log warning is generated.
*
* If a query has an empty ground truth set, zero will be used as ndcg together with
* a log warning.
*
Expand All @@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <:
val useBinary = rel.isEmpty
val labSet = lab.toSet
val relMap = Utils.toMap(lab, rel)
if (useBinary && lab.size != rel.size) {
if (!useBinary && lab.size != rel.size) {
logWarning(
"# of ground truth set and # of relevance value set should be equal, " +
"check input data")
Expand Down

0 comments on commit 72af2c0

Please sign in to comment.