Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR] Use <= for clarity in Pi examples' Monte Carlo process #15687

Closed
wants to merge 1 commit into from

Conversation

mrydzy
Copy link
Contributor

@mrydzy mrydzy commented Oct 30, 2016

What changes were proposed in this pull request?

If my understanding is correct we should be rather looking at closed disk than the opened one.

How was this patch tested?

Run simple comparison, of the mean squared error of approaches with closed and opened disk.
https://gist.github.com/mrydzy/1cf0e5c316ef9d6fbd91426b91f1969f
The closed one performed slightly better, but the tested sample wasn't too big, so I rely mostly on the algorithm understanding.

@srowen
Copy link
Member

srowen commented Oct 30, 2016

In theory, it doesn't matter because the circumference has 0 width, but in practice it could matter a very very small bit because of floating-point round-off. It doesn't really matter for purposes of the example but I agree it'd be nice to do the slightly more accurate thing, on principle. I made an even simpler test to isolate away other effects:

var lt = 0
var lte = 0
val n = 1000000000
val random = new scala.util.Random()
for (i <- 0 until n) {
  val x = 2.0 * random.nextDouble() - 1.0
  val y = 2.0 * random.nextDouble() - 1.0
  val dist = x * x + y * y
  if (dist < 1.0) { lt += 1 }
  if (dist <= 1.0) { lte += 1 }
}

val ltEstimate = 4.0 * lt / n
val lteEstimate = 4.0 * lte / n
println(ltEstimate - scala.math.Pi)
println(lteEstimate - scala.math.Pi)

I get exactly the same error of 2.910410207057623E-6 in both cases after 1 billion iterations. Therefore I'd suspect any difference is due to chance, but I'm not sure what kind of results you got or whether they're highly unlikely to be due to chance.

@mrydzy
Copy link
Contributor Author

mrydzy commented Oct 30, 2016

Hey,
Thanks for reply!
The results I got are really small and could be due to chance. The reason why I brought it up, is because I found that confusing while following the example, but you're right the circumference has 0 width.

@srowen
Copy link
Member

srowen commented Nov 1, 2016

Out of curiosity do you remember how many times one was bigger than the other? if it was 10/10 then you probably have a point, and that there's something about the way the example actually executes that makes this an issue. If it was like 7/10 that is within the realm of chance, says the binomial distribution.

It is a good point to be sure and I wouldn't even mind changing it; I guess I'd just leave it unless we think it's actually inaccurate, esp. considering it's just a 'load test' example.

@mrydzy
Copy link
Contributor Author

mrydzy commented Nov 1, 2016

It was exactly 7/10. Realising '<=' made the concept clearer to me (as I'm
a beginner following examples), but it might be it's just me, so I'm not
insisting.

2016-11-01 10:40 GMT+01:00 Sean Owen [email protected]:

Out of curiosity do you remember how many times one was bigger than the
other? if it was 10/10 then you probably have a point, and that there's
something about the way the example actually executes that makes this an
issue. If it was like 7/10 that is within the realm of chance, says the
binomial distribution.

It is a good point to be sure and I wouldn't even mind changing it; I
guess I'd just leave it unless we think it's actually inaccurate, esp.
considering it's just a 'load test' example.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#15687 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AKhQTcfdwi8QGeIlI0NVFi8f1oi_wwjwks5q5wkWgaJpZM4KkYXx
.

Maria Rydzy

@srowen
Copy link
Member

srowen commented Nov 1, 2016

There are actually 4 instances of the Pi example. I'll merge it if you update them all, just for reasons of clarity, sure. If you would update the title to something like [MINOR] Use <= for clarity in Pi examples' Monte Carlo process

@mrydzy
Copy link
Contributor Author

mrydzy commented Nov 1, 2016

Thank you Sean! I've updated Scala, Java and Python examples and changed the commit name as you've requested. I can't seem to find the fourth example though.

@srowen
Copy link
Member

srowen commented Nov 1, 2016

There's one more in LocalPi.scala

@mrydzy mrydzy changed the title Update SparkPi.scala [MINOR] Use <= for clarity in Pi examples' Monte Carlo process Nov 1, 2016
@mrydzy
Copy link
Contributor Author

mrydzy commented Nov 1, 2016

Ok, LocalPi updated and commits squashed. Hope it's fine now.

@srowen
Copy link
Member

srowen commented Nov 1, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67904 has finished for PR 15687 at commit c710f61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Nov 2, 2016

Merged to master

@asfgit asfgit closed this in bcbe444 Nov 2, 2016
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

If my understanding is correct we should be rather looking at closed disk than the opened one.
## How was this patch tested?

Run simple comparison, of the mean squared error of approaches with closed and opened disk.
https://gist.github.com/mrydzy/1cf0e5c316ef9d6fbd91426b91f1969f
The closed one performed slightly better, but the tested sample wasn't too big, so I rely mostly on the algorithm  understanding.

Author: Maria Rydzy <[email protected]>

Closes apache#15687 from mrydzy/master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants