[DO NOT MERGE] sprand sanity with rfn argument #30637

abraunst · 2019-01-07T22:59:14Z

This PR fixes inconsistencies in sprand when the rfn argument is specified (issue #30627). Breaks the current interface, so I think not to be merged during 1.x. I'm copying here from #30627 what this PR hopefully achieves:

Whenever `rfn` is passed, this function accepts always only one argument (the number of 
values to generate) and no type can be specified. In this way, the caller can supply the 
function (using a random generator or not) he wants and generating the type of values 
he want (that will become the Tv type of the matrix).

martinholters · 2019-01-08T17:53:27Z

Should this also follow the convention of taking the function as first argument? (I don't think do syntax would often be used here, but for consistency...)

abraunst · 2019-01-08T18:31:09Z

Should this also follow the convention of taking the function as first argument? (I don't think do syntax would often be used here, but for consistency...)

I like it, also because rfn and T are mutually exclusive in this PR, and T comes already as first argument...

abraunst · 2019-01-11T13:08:24Z

Note that instead of the current

sprand([rng],m,[n],p::AbstractFloat,[rfn,[type]]) or sprand([rng],[type], m,[n],p::AbstractFloat)

with rfn accepting one or two arguments depending on the presence of [rng], the signature would be:

sprand([rfn|type],[rng],m,[n],p::AbstractFloat) which is almost self-explanatory.

ViralBShah · 2019-01-28T05:57:31Z

Don't we have a way to make breaking changes in stdlib without having to wait for Julia 2.0?

ViralBShah · 2019-01-28T05:58:55Z

Yes, taking rfn as the first argument for consistency would be nice too.

ViralBShah · 2019-02-05T01:15:29Z

@StefanKarpinski When can we merge this kind of thing?

StefanKarpinski · 2019-02-06T23:57:27Z

I don't know. I haven't really been following. Is it breaking?

abraunst · 2019-02-07T15:15:58Z

I don't know. I haven't really been following. Is it breaking?

Yes, it is. Although I believe it would improve usability greatly (in the cases it addresses), I'm not sure if it's worth if to make a breaking change like this without going all the way to something in the spirit of #24912.

StefanKarpinski · 2019-02-08T12:03:14Z

Our general policy is not to make breaking changes unless they are so minor that no one is likely to be relying on them (and then only in minor releases, not point releases), so if that's possibly the case then we could run PkgEval on this to try to figure out if no package anywhere is using this signature. However, I suspect that not to be the case, in which case this would have to wait until we can release SparseArrays 2.0, which would require having the infrastructure to decouple stdlib versions from the Julia version, which we don't have yet. So my guess is that this can't happen yet.

abraunst · 2019-02-08T15:55:45Z

However, I suspect that not to be the case, in which case this would have to wait until we can release SparseArrays 2.0, which would require having the infrastructure to decouple stdlib versions from the Julia version, which we don't have yet. So my guess is that this can't happen yet.

Understood. One possibility would be to add the new methods but keep the old ones (with or without deprecation warnings), that would not be breaking. If the horizon for a change in this area is so far away, then maybe it is worth it.

ViralBShah · 2019-04-03T13:38:53Z

I like the idea of adding the new signatures now, and deprecating the old ones whenever we can.

rfourquet · 2019-04-03T14:20:38Z

I like the idea of adding the new signatures now

And what about just removing this rfn argument in the future? Currently it's totally underdocumented, i.e. one has to look at the sources to understand how rfn is used internally in order to be able to pass a correct function... so I suspect almost nobody uses it. If I'm not mistaken, rfn is needed internally to implement sprandn easily in terms of sprand, but this doesn't have to be a user facing API.

Also, it's not clear why sprand would have this level of flexibility, while rand doesn't. As @abraunst stated above, there are more general alternatives, like offered by "Distributions.jl" or #24912 (and "RandomExtensions.jl"). E.g. it seems cleaner to call sprand(Uniform(1, 9), n, m) rather than sprand(k -> rand(1:9, k), n, m). Or even better, sprand(1:9, n, m).

abraunst · 2019-04-03T20:58:07Z

I like the idea of adding the new signatures now

And what about just removing this rfn argument in the future? Currently it's totally underdocumented, i.e. one has to look at the sources to understand how rfn is used internally in order to be able to pass a correct function... so I suspect almost nobody uses it. If I'm not mistaken, rfn is needed internally to implement sprandn easily in terms of sprand, but this doesn't have to be a user facing API.

Also, it's not clear why sprand would have this level of flexibility, while rand doesn't. As @abraunst stated above, there are more general alternatives, like offered by "Distributions.jl" or #24912 (and "RandomExtensions.jl"). E.g. it seems cleaner to call sprand(Uniform(1, 9), n, m) rather than sprand(k -> rand(1:9, k), n, m). Or even better, sprand(1:9, n, m).

(did you forget the density parameter here?)

I think rfn wouldn't make sense for rand, as there is no "randomness" left. E.g. an hypothetical A = rand(rfn, m, n) should be equivalent to A = reshape(rfn(m*n),m,n)

In my opinion, the question is what minimal interface should remain in Base; because both RandomExtensions and Distributions rightly belong to packages, I think.

rfourquet · 2019-04-04T12:27:48Z

I think rfn wouldn't make sense for rand

So I'm wondering what is the real purpose of rfn. I see only two things: 1) make it easy to implement sprandn in terms of sprand, or 2) specify arbitrary distributions not available in Base. For 1) I believe this is an internal implementation detail which should be hidden from the API, and for 2), rand and sprand should be treated equally, whether packages of Base provide it. Do I miss another purpose of rfn ?

abraunst · 2019-04-04T16:53:10Z

I think rfn wouldn't make sense for rand
So I'm wondering what is the real purpose of rfn. I see only two things: 1) make it easy to implement sprandn in terms of sprand, or 2) specify arbitrary distributions not available in Base. For 1) I believe this is an internal implementation detail which should be hidden from the API, and for 2), rand and sprand should be treated equally, whether packages of Base provide it. Do I miss another purpose of rfn ?

I agree about 1). About 2) I don't follow you fully, in the sense that sprand has two sources of randomness: which are the non-zero elements and what they are; whereas rand has only one (what they are), so I don't see how they could be treated equally [*]

[*] Except maybe for something like JuliaRandom/RandomExtensions.jl#3 (comment), what do you think about that?

rfourquet · 2019-04-07T09:30:57Z

sprand has two sources of randomness

Two parameters can be passed to sprand: the probability p, and optionally the "type" T of strucurally non-zero elements (where "type" could be extended to accept e.g. objects, like 1:3). So I don't see why it would be necessary to pass a rfn function to sprand, when it seems to be enough to pass those two parameters p and T, which are more orthogonal (and internaly the same rand function is used twice, for p and for T). Do you have an example where it's necessary to be able to pass rfn ?

abraunst · 2019-04-07T10:13:56Z

sprand has two sources of randomness

Two parameters can be passed to sprand: the probability p, and optionally the "type" T of strucurally non-zero elements (where "type" could be extended to accept e.g. objects, like 1:3). So I don't see why it would be necessary to pass a rfn function to sprand, when it seems to be enough to pass those two parameters p and T, which are more orthogonal (and internaly the same rand function is used twice, for p and for T). Do you have an example where it's necessary to be able to pass rfn ?

Wait, so the "type" T would be extended to be a fully-fledged sampler [1]? In that case, obviously, I have no objection 😄 But you are a bit cheating because in the current situation rfn is the sampler. Coincidentally, this does not feel so far from this current PR. Don't get me wrong, I love the idea behind RandomExtensions (even if I don't fully understand its internals -- any hint to get started?); however I don't think it would be wise to remove the rfn function without having its replacement ready.

[1] this is assuming that 1:3 will be then interpreted as a sampler of uniform values in 1:3 and that this can be extended to arbitrary (non-uniform) samplers. Otherwise, for instance, how do you generate a sparse matrix with exponentially distributed non-zero values?

rfourquet · 2019-04-07T11:17:29Z

Wait, so the "type" T would be extended to be a fully-fledged sampler [1]?

If needed, yes, it would align with rand([rng], [S], [dims]): whatever rand accepts as S, sprand could be made to accept it, as rand is used within sprand. Specifying the distribution with rfn for sprand and S for rand is quite inconsistent. Passing a function feels overkill when passing a distribution (whatever this means, e.g. S above) is enough.

Coincidentally, this does not feel so far from this current PR

Indeed! So while you are at it, I would love that the opportunity is taken to increase sanity even more ;-)

how do you generate a sparse matrix with exponentially distributed non-zero values?

Either via a package, or something is added to Base to speak about distributions, of sprandexp is added.
That illustrates exactly my point! Two very different solutions exist in base/stdlib to generate exponential distribution (rfn vs randexp), which should be unified IMHO.

abraunst · 2019-04-07T11:44:34Z

Wait, so the "type" T would be extended to be a fully-fledged sampler [1]?

If needed, yes, it would align with rand([rng], [S], [dims]): whatever rand accepts as S, sprand could be made to accept it, as rand is used within sprand. Specifying the distribution with rfn for sprand and S for rand is quite inconsistent. Passing a function feels overkill when passing a distribution (whatever this means, e.g. S above) is enough.

Coincidentally, this does not feel so far from this current PR

Indeed! So while you are at it, I would love that the opportunity is taken to increase sanity even more ;-)

While I generally agree, I'm a bit lost about what do you suggest. Just removing the rfn parameter now would remove functionality (since rand does not currently accept currently a distribution parameter). Do you suggest to just remove the functionality and make the user rely on external packages (e.g. RandomExtensions)?

how do you generate a sparse matrix with exponentially distributed non-zero values?

Either via a package, or something is added to Base to speak about distributions, of sprandexp is added.
That illustrates exactly my point! Two very different solutions exist in base/stdlib to generate exponential distribution (rfn vs randexp), which should be unified IMHO.

Err... what do you mean by two different solutions? Setting rfn to be randexp is the only current way I see of generating a sparse matrix with exponentially distributed non-zero values (short of modifying the sparse matrix after generation). If we just remove the rfn parameter, we would remove this functionality.

rfourquet · 2019-04-07T12:08:05Z

Do you suggest to just remove the functionality and make the user rely on external packages (e.g. RandomExtensions

No, I suggest to unify the APIs between the rand family and the sprand family.

what do you mean by two different solutions?

For normal distribution, for both families, we added a new function: randn and sprandn. But for exponential distribution, we added a new function to the rand family (randexp), while for the sprand family, we rely on an exotic rfn parameter.

So, unless we add to base a means of passing normal/exponential distributions directly to rand / sprand (like in the above-cited packages), I would prefer that we unify the approaches, meaning getting rid of rfn, adding sprandexp, and allow to pass to sprand an implicit distribution, like is the case with rand, e.g. 1:3.

abraunst · 2019-04-07T15:29:55Z

No, I suggest to unify the APIs between the rand family and the sprand family.

I'm all for it.

what do you mean by two different solutions?

For normal distribution, for both families, we added a new function: randn and sprandn. But for exponential distribution, we added a new function to the rand family (randexp), while for the sprand family, we rely on an exotic rfn parameter.

Right, but the exponential distribution was just an example.

So, unless we add to base a means of passing normal/exponential distributions directly to rand / sprand (like in the above-cited packages), I would prefer that we unify the approaches, meaning getting rid of rfn, adding sprandexp, and allow to pass to sprand an implicit distribution, like is the case with rand, e.g. 1:3.

But 1:3 doesn't cover all useful cases, and if we don't add means of passing a distribution to Base we would lose functionality I think.

In a sense, the difference is mostly semantic IMO. In your proposal, you'd pass a distribution to sprand, but in master you pass rfn, which is essentially a sampler for some distribution. The good thing about the current approach is that we only need the sampler, and we avoid to rely on Distributions.

I keep pondering, but maybe is not so absurd that rand and sprand are different. In base, rand is essentially a sampler for uniform distributions, and sprand is a sampler for a composite distribution (which is a mixture with paremeter p of the delta distribution with sampler n->0.0 and some other distribution, with sampler n->rfn(n) ). So it makes sense to receive an extra "sampler" argument, no?

DilumAluthge · 2022-01-14T22:39:32Z

We have moved the SparseArrays stdlib to an external repository.

Please open this PR against that repository: https://github.com/JuliaLang/SparseArrays.jl

Thank you!

sprand sanity with rfn argument

f0d5dc2

ararslan added sparse Sparse arrays randomness Random number generation and the Random stdlib labels Jan 8, 2019

ararslan requested a review from andreasnoack January 8, 2019 00:38

StefanKarpinski added the breaking This change will break code label Feb 8, 2019

ViralBShah added the DO NOT MERGE Do not merge this PR! label Jun 17, 2019

ViralBShah changed the title ~~sprand sanity with rfn argument~~ [DO NOT MERGE] sprand sanity with rfn argument Jan 10, 2020

DilumAluthge marked this pull request as draft December 9, 2021 01:17

DilumAluthge closed this Jan 14, 2022

DilumAluthge removed the DO NOT MERGE Do not merge this PR! label Feb 11, 2022

stevengj mentioned this pull request Dec 2, 2022

document rfn argument of sprand JuliaSparse/SparseArrays.jl#309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE] sprand sanity with rfn argument #30637

[DO NOT MERGE] sprand sanity with rfn argument #30637

abraunst commented Jan 7, 2019 •

edited

Loading

martinholters commented Jan 8, 2019

abraunst commented Jan 8, 2019

abraunst commented Jan 11, 2019

ViralBShah commented Jan 28, 2019

ViralBShah commented Jan 28, 2019

ViralBShah commented Feb 5, 2019

StefanKarpinski commented Feb 6, 2019

abraunst commented Feb 7, 2019

StefanKarpinski commented Feb 8, 2019

abraunst commented Feb 8, 2019

ViralBShah commented Apr 3, 2019

rfourquet commented Apr 3, 2019

abraunst commented Apr 3, 2019

rfourquet commented Apr 4, 2019

abraunst commented Apr 4, 2019

rfourquet commented Apr 7, 2019 •

edited

Loading

abraunst commented Apr 7, 2019 •

edited

Loading

rfourquet commented Apr 7, 2019

abraunst commented Apr 7, 2019

rfourquet commented Apr 7, 2019

abraunst commented Apr 7, 2019

DilumAluthge commented Jan 14, 2022

[DO NOT MERGE] sprand sanity with rfn argument #30637

[DO NOT MERGE] sprand sanity with rfn argument #30637

Conversation

abraunst commented Jan 7, 2019 • edited Loading

martinholters commented Jan 8, 2019

abraunst commented Jan 8, 2019

abraunst commented Jan 11, 2019

ViralBShah commented Jan 28, 2019

ViralBShah commented Jan 28, 2019

ViralBShah commented Feb 5, 2019

StefanKarpinski commented Feb 6, 2019

abraunst commented Feb 7, 2019

StefanKarpinski commented Feb 8, 2019

abraunst commented Feb 8, 2019

ViralBShah commented Apr 3, 2019

rfourquet commented Apr 3, 2019

abraunst commented Apr 3, 2019

rfourquet commented Apr 4, 2019

abraunst commented Apr 4, 2019

rfourquet commented Apr 7, 2019 • edited Loading

abraunst commented Apr 7, 2019 • edited Loading

rfourquet commented Apr 7, 2019

abraunst commented Apr 7, 2019

rfourquet commented Apr 7, 2019

abraunst commented Apr 7, 2019

DilumAluthge commented Jan 14, 2022

abraunst commented Jan 7, 2019 •

edited

Loading

rfourquet commented Apr 7, 2019 •

edited

Loading

abraunst commented Apr 7, 2019 •

edited

Loading