Skip to content

Commit

Permalink
[branch-1.1][SPARK-4355] OnlineSummarizer doesn't merge mean correctly
Browse files Browse the repository at this point in the history
andrewor14 This backports the bug fix in #3220 . It would be good if we can get it in 1.1.1. But this is minor.

Author: Xiangrui Meng <[email protected]>

Closes #3251 from mengxr/SPARK-4355-1.1 and squashes the following commits:

33886b6 [Xiangrui Meng] Merge remote-tracking branch 'apache/branch-1.1' into SPARK-4355-1.1
91fe1a3 [Xiangrui Meng] fix OnlineSummarizer.merge when other.mean is zero
  • Loading branch information
mengxr committed Nov 13, 2014
1 parent 685bdd2 commit 4b1c77c
Showing 1 changed file with 9 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -104,21 +104,19 @@ class MultivariateOnlineSummarizer extends MultivariateStatisticalSummary with S
val deltaMean: BDV[Double] = currMean - other.currMean
var i = 0
while (i < n) {
// merge mean together
if (other.currMean(i) != 0.0) {
if (nnz(i) + other.nnz(i) != 0.0) {
// merge mean together
currMean(i) = (currMean(i) * nnz(i) + other.currMean(i) * other.nnz(i)) /
(nnz(i) + other.nnz(i))
}
// merge m2n together
if (nnz(i) + other.nnz(i) != 0.0) {
// merge m2n together
currM2n(i) += other.currM2n(i) + deltaMean(i) * deltaMean(i) * nnz(i) * other.nnz(i) /
(nnz(i) + other.nnz(i))
}
if (currMax(i) < other.currMax(i)) {
currMax(i) = other.currMax(i)
}
if (currMin(i) > other.currMin(i)) {
currMin(i) = other.currMin(i)
if (currMax(i) < other.currMax(i)) {
currMax(i) = other.currMax(i)
}
if (currMin(i) > other.currMin(i)) {
currMin(i) = other.currMin(i)
}
}
i += 1
}
Expand Down

0 comments on commit 4b1c77c

Please sign in to comment.