Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uniform_mersenne is using memory proportional to the range of numbers it generates #2

Open
JohanMollevik opened this issue Feb 20, 2019 · 9 comments

Comments

@JohanMollevik
Copy link

uniform_mersenne is using memory proportional to the range of numbers it generates

If asked to generate 3 numbers between 0 and 1000000000 the memmory allocated will be for an array 1000000000 big.

the culprint is the line

out = list(range(self.len))

which happily allocates gigabytes of memmory even when only a few numbers are requested

@mikkokotila
Copy link
Contributor

Good catch. That's a very poor way to do this. I'm replacing it with:

return np.random.randint(0, self.len, self.n).tolist()

This supports cases until 10^20 at which point int64 overflows. So I guess it's nice if we can say we support permutation spaces until 10^20.

@JohanMollevik
Copy link
Author

It seems the implementation could be replaced by using pythons built in random generation rather than numpys as python has optimized its implementation for ranges

import random
...
return random.sample(range(self.len), k=self.n)

Are there ay problems with making a change like that?

@JohanMollevik
Copy link
Author

Does the randint allows duplicates or not?

@mikkokotila
Copy link
Contributor

mikkokotila commented Feb 20, 2019

It indeed does. So let's use what you have proposed.

EDIT: correction to before >> here the constrain with overflow is at 10^18 so that's the supported limit.

@JohanMollevik
Copy link
Author

That will have to do

@JohanMollevik
Copy link
Author

If 10^18 proves limiting it would probably work well to add a fallback where if the requested number is bigger than that it selects random numbers in a loop until it has enough without duplicates.

The neat thing is that in this case with huge numbers the risk of selecting the same number several times in a row is vanishingly small as the range of the numbers are huge compared to the number of numbers requested.

This will make the fallback take linear time in the common case instead of running the risk of not terminating which that approach can do with large fill factors.

@JohanMollevik
Copy link
Author

This limit ended up being a problem for me, pull request #3

@JohanMollevik
Copy link
Author

The CI failed, it seems to be complaining about unit test coverage, I will try to resolve that.

@JohanMollevik
Copy link
Author

CI still fails but the error seems unrelated to this and looks more like a dependency problem, can you look and see if you see a cause?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants