Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] sprand sanity with rfn argument #30637

Closed
wants to merge 1 commit into from

Conversation

abraunst
Copy link
Contributor

@abraunst abraunst commented Jan 7, 2019

This PR fixes inconsistencies in sprand when the rfn argument is specified (issue #30627). Breaks the current interface, so I think not to be merged during 1.x. I'm copying here from #30627 what this PR hopefully achieves:

Whenever `rfn` is passed, this function accepts always only one argument (the number of 
values to generate) and no type can be specified. In this way, the caller can supply the 
function (using a random generator or not) he wants and generating the type of values 
he want (that will become the Tv type of the matrix).

@ararslan ararslan added sparse Sparse arrays randomness Random number generation and the Random stdlib labels Jan 8, 2019
@martinholters
Copy link
Member

Should this also follow the convention of taking the function as first argument? (I don't think do syntax would often be used here, but for consistency...)

@abraunst
Copy link
Contributor Author

abraunst commented Jan 8, 2019

Should this also follow the convention of taking the function as first argument? (I don't think do syntax would often be used here, but for consistency...)

I like it, also because rfn and T are mutually exclusive in this PR, and T comes already as first argument...

@abraunst
Copy link
Contributor Author

Note that instead of the current

sprand([rng],m,[n],p::AbstractFloat,[rfn,[type]]) or sprand([rng],[type], m,[n],p::AbstractFloat)

with rfn accepting one or two arguments depending on the presence of [rng], the signature would be:

sprand([rfn|type],[rng],m,[n],p::AbstractFloat) which is almost self-explanatory.

@ViralBShah
Copy link
Member

Don't we have a way to make breaking changes in stdlib without having to wait for Julia 2.0?

@ViralBShah
Copy link
Member

Yes, taking rfn as the first argument for consistency would be nice too.

@ViralBShah
Copy link
Member

@StefanKarpinski When can we merge this kind of thing?

@StefanKarpinski
Copy link
Sponsor Member

I don't know. I haven't really been following. Is it breaking?

@abraunst
Copy link
Contributor Author

abraunst commented Feb 7, 2019

I don't know. I haven't really been following. Is it breaking?

Yes, it is. Although I believe it would improve usability greatly (in the cases it addresses), I'm not sure if it's worth if to make a breaking change like this without going all the way to something in the spirit of #24912.

@StefanKarpinski StefanKarpinski added the breaking This change will break code label Feb 8, 2019
@StefanKarpinski
Copy link
Sponsor Member

Our general policy is not to make breaking changes unless they are so minor that no one is likely to be relying on them (and then only in minor releases, not point releases), so if that's possibly the case then we could run PkgEval on this to try to figure out if no package anywhere is using this signature. However, I suspect that not to be the case, in which case this would have to wait until we can release SparseArrays 2.0, which would require having the infrastructure to decouple stdlib versions from the Julia version, which we don't have yet. So my guess is that this can't happen yet.

@abraunst
Copy link
Contributor Author

abraunst commented Feb 8, 2019

However, I suspect that not to be the case, in which case this would have to wait until we can release SparseArrays 2.0, which would require having the infrastructure to decouple stdlib versions from the Julia version, which we don't have yet. So my guess is that this can't happen yet.

Understood. One possibility would be to add the new methods but keep the old ones (with or without deprecation warnings), that would not be breaking. If the horizon for a change in this area is so far away, then maybe it is worth it.

@ViralBShah
Copy link
Member

I like the idea of adding the new signatures now, and deprecating the old ones whenever we can.

@rfourquet
Copy link
Member

I like the idea of adding the new signatures now

And what about just removing this rfn argument in the future? Currently it's totally underdocumented, i.e. one has to look at the sources to understand how rfn is used internally in order to be able to pass a correct function... so I suspect almost nobody uses it. If I'm not mistaken, rfn is needed internally to implement sprandn easily in terms of sprand, but this doesn't have to be a user facing API.

Also, it's not clear why sprand would have this level of flexibility, while rand doesn't. As @abraunst stated above, there are more general alternatives, like offered by "Distributions.jl" or #24912 (and "RandomExtensions.jl"). E.g. it seems cleaner to call sprand(Uniform(1, 9), n, m) rather than sprand(k -> rand(1:9, k), n, m). Or even better, sprand(1:9, n, m).

@abraunst
Copy link
Contributor Author

abraunst commented Apr 3, 2019

I like the idea of adding the new signatures now

And what about just removing this rfn argument in the future? Currently it's totally underdocumented, i.e. one has to look at the sources to understand how rfn is used internally in order to be able to pass a correct function... so I suspect almost nobody uses it. If I'm not mistaken, rfn is needed internally to implement sprandn easily in terms of sprand, but this doesn't have to be a user facing API.

Also, it's not clear why sprand would have this level of flexibility, while rand doesn't. As @abraunst stated above, there are more general alternatives, like offered by "Distributions.jl" or #24912 (and "RandomExtensions.jl"). E.g. it seems cleaner to call sprand(Uniform(1, 9), n, m) rather than sprand(k -> rand(1:9, k), n, m). Or even better, sprand(1:9, n, m).

(did you forget the density parameter here?)

I think rfn wouldn't make sense for rand, as there is no "randomness" left. E.g. an hypothetical A = rand(rfn, m, n) should be equivalent to A = reshape(rfn(m*n),m,n)

In my opinion, the question is what minimal interface should remain in Base; because both RandomExtensions and Distributions rightly belong to packages, I think.

@rfourquet
Copy link
Member

I think rfn wouldn't make sense for rand

So I'm wondering what is the real purpose of rfn. I see only two things: 1) make it easy to implement sprandn in terms of sprand, or 2) specify arbitrary distributions not available in Base. For 1) I believe this is an internal implementation detail which should be hidden from the API, and for 2), rand and sprand should be treated equally, whether packages of Base provide it. Do I miss another purpose of rfn ?

@abraunst
Copy link
Contributor Author

abraunst commented Apr 4, 2019

I think rfn wouldn't make sense for rand
So I'm wondering what is the real purpose of rfn. I see only two things: 1) make it easy to implement sprandn in terms of sprand, or 2) specify arbitrary distributions not available in Base. For 1) I believe this is an internal implementation detail which should be hidden from the API, and for 2), rand and sprand should be treated equally, whether packages of Base provide it. Do I miss another purpose of rfn ?

I agree about 1). About 2) I don't follow you fully, in the sense that sprand has two sources of randomness: which are the non-zero elements and what they are; whereas rand has only one (what they are), so I don't see how they could be treated equally [*]

[*] Except maybe for something like JuliaRandom/RandomExtensions.jl#3 (comment), what do you think about that?

@rfourquet
Copy link
Member

rfourquet commented Apr 7, 2019

sprand has two sources of randomness

Two parameters can be passed to sprand: the probability p, and optionally the "type" T of strucurally non-zero elements (where "type" could be extended to accept e.g. objects, like 1:3). So I don't see why it would be necessary to pass a rfn function to sprand, when it seems to be enough to pass those two parameters p and T, which are more orthogonal (and internaly the same rand function is used twice, for p and for T). Do you have an example where it's necessary to be able to pass rfn ?

@abraunst
Copy link
Contributor Author

abraunst commented Apr 7, 2019

sprand has two sources of randomness

Two parameters can be passed to sprand: the probability p, and optionally the "type" T of strucurally non-zero elements (where "type" could be extended to accept e.g. objects, like 1:3). So I don't see why it would be necessary to pass a rfn function to sprand, when it seems to be enough to pass those two parameters p and T, which are more orthogonal (and internaly the same rand function is used twice, for p and for T). Do you have an example where it's necessary to be able to pass rfn ?

Wait, so the "type" T would be extended to be a fully-fledged sampler [1]? In that case, obviously, I have no objection 😄 But you are a bit cheating because in the current situation rfn is the sampler. Coincidentally, this does not feel so far from this current PR. Don't get me wrong, I love the idea behind RandomExtensions (even if I don't fully understand its internals -- any hint to get started?); however I don't think it would be wise to remove the rfn function without having its replacement ready.

[1] this is assuming that 1:3 will be then interpreted as a sampler of uniform values in 1:3 and that this can be extended to arbitrary (non-uniform) samplers. Otherwise, for instance, how do you generate a sparse matrix with exponentially distributed non-zero values?

@rfourquet
Copy link
Member

Wait, so the "type" T would be extended to be a fully-fledged sampler [1]?

If needed, yes, it would align with rand([rng], [S], [dims]): whatever rand accepts as S, sprand could be made to accept it, as rand is used within sprand. Specifying the distribution with rfn for sprand and S for rand is quite inconsistent. Passing a function feels overkill when passing a distribution (whatever this means, e.g. S above) is enough.

Coincidentally, this does not feel so far from this current PR

Indeed! So while you are at it, I would love that the opportunity is taken to increase sanity even more ;-)

how do you generate a sparse matrix with exponentially distributed non-zero values?

Either via a package, or something is added to Base to speak about distributions, of sprandexp is added.
That illustrates exactly my point! Two very different solutions exist in base/stdlib to generate exponential distribution (rfn vs randexp), which should be unified IMHO.

@abraunst
Copy link
Contributor Author

abraunst commented Apr 7, 2019

Wait, so the "type" T would be extended to be a fully-fledged sampler [1]?

If needed, yes, it would align with rand([rng], [S], [dims]): whatever rand accepts as S, sprand could be made to accept it, as rand is used within sprand. Specifying the distribution with rfn for sprand and S for rand is quite inconsistent. Passing a function feels overkill when passing a distribution (whatever this means, e.g. S above) is enough.

Coincidentally, this does not feel so far from this current PR

Indeed! So while you are at it, I would love that the opportunity is taken to increase sanity even more ;-)

While I generally agree, I'm a bit lost about what do you suggest. Just removing the rfn parameter now would remove functionality (since rand does not currently accept currently a distribution parameter). Do you suggest to just remove the functionality and make the user rely on external packages (e.g. RandomExtensions)?

how do you generate a sparse matrix with exponentially distributed non-zero values?

Either via a package, or something is added to Base to speak about distributions, of sprandexp is added.
That illustrates exactly my point! Two very different solutions exist in base/stdlib to generate exponential distribution (rfn vs randexp), which should be unified IMHO.

Err... what do you mean by two different solutions? Setting rfn to be randexp is the only current way I see of generating a sparse matrix with exponentially distributed non-zero values (short of modifying the sparse matrix after generation). If we just remove the rfn parameter, we would remove this functionality.

@rfourquet
Copy link
Member

Do you suggest to just remove the functionality and make the user rely on external packages (e.g. RandomExtensions

No, I suggest to unify the APIs between the rand family and the sprand family.

what do you mean by two different solutions?

For normal distribution, for both families, we added a new function: randn and sprandn. But for exponential distribution, we added a new function to the rand family (randexp), while for the sprand family, we rely on an exotic rfn parameter.

So, unless we add to base a means of passing normal/exponential distributions directly to rand / sprand (like in the above-cited packages), I would prefer that we unify the approaches, meaning getting rid of rfn, adding sprandexp, and allow to pass to sprand an implicit distribution, like is the case with rand, e.g. 1:3.

@abraunst
Copy link
Contributor Author

abraunst commented Apr 7, 2019

No, I suggest to unify the APIs between the rand family and the sprand family.

I'm all for it.

what do you mean by two different solutions?

For normal distribution, for both families, we added a new function: randn and sprandn. But for exponential distribution, we added a new function to the rand family (randexp), while for the sprand family, we rely on an exotic rfn parameter.

Right, but the exponential distribution was just an example.

So, unless we add to base a means of passing normal/exponential distributions directly to rand / sprand (like in the above-cited packages), I would prefer that we unify the approaches, meaning getting rid of rfn, adding sprandexp, and allow to pass to sprand an implicit distribution, like is the case with rand, e.g. 1:3.

But 1:3 doesn't cover all useful cases, and if we don't add means of passing a distribution to Base we would lose functionality I think.

In a sense, the difference is mostly semantic IMO. In your proposal, you'd pass a distribution to sprand, but in master you pass rfn, which is essentially a sampler for some distribution. The good thing about the current approach is that we only need the sampler, and we avoid to rely on Distributions.

I keep pondering, but maybe is not so absurd that rand and sprand are different. In base, rand is essentially a sampler for uniform distributions, and sprand is a sampler for a composite distribution (which is a mixture with paremeter p of the delta distribution with sampler n->0.0 and some other distribution, with sampler n->rfn(n) ). So it makes sense to receive an extra "sampler" argument, no?

@ViralBShah ViralBShah added the DO NOT MERGE Do not merge this PR! label Jun 17, 2019
@ViralBShah ViralBShah changed the title sprand sanity with rfn argument [DO NOT MERGE] sprand sanity with rfn argument Jan 10, 2020
@DilumAluthge DilumAluthge marked this pull request as draft December 9, 2021 01:17
@DilumAluthge
Copy link
Member

We have moved the SparseArrays stdlib to an external repository.

Please open this PR against that repository: https://github.com/JuliaLang/SparseArrays.jl

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code randomness Random number generation and the Random stdlib sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants