-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS-1135 - Random functions for collections #732
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, some changes needed though.
Co-authored-by: Phillip Carter <[email protected]>
… as not used in Array module
|
@T-Gro Hi, thank you for the review
|
|
p.s. great to see progress on this |
@dsyme thank you for the comments
[1; 2; 3] |> Random.shuffle
[|1; 2; 3|] |> Random.shuffle This will allow us to use overloads, but prevent from using partial application. On one hand it is ok, on the other - it is not what the other collection functions look like, so it's not that consistent.
|
This looks fine to me overall. I don't think we need a separate As for naming - I don't have a strong opinion on that, since I don't have much experience in using those functions, but I share @dsyme's concern about naming - once they're in, there's no way for us changing or removing them in case they're confusing or lack some functionality. I personally think, that the safest way is to align naming and signatures to what existing functions in .NET and Python have. |
@dsyme I created a voting in the discussion and it seems that people like the idea of prefix, so I changed all the names in PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks! A few minor nits, mainly about being precise (and follow the language of NextDouble
).
I think this is essentially ready. While I did mention a few things in the discussion thread, up to you to mention them here as well (i.e. on infinite sequences (note that .NET only supports arrays and spans, in contrast), and on the necessary caching).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but see these two suggestions of text remnants from the RFC template (not a blocker):
|
||
## Diagnostics | ||
|
||
Please list the reasonable expectations for diagnostics for misuse of this feature. **I don't see a way to misuse it** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Lanayx, it appears we both forgot about this line. We shouldn't leave the remnants of the template hanging around (I know, it happens, but we shouldn't ;) ).
This is my suggestion:
Please list the reasonable expectations for diagnostics for misuse of this feature. **I don't see a way to misuse it** | |
There are no known diagnostics on any abuse or misuse of this feature. |
EDIT: this is resolved (but GH doesn't let me)
@dsyme and @vzarytovskii, this proposal is ready. If you can give it your thumbs up, I can merge it (or you can ;) ). |
/cc reminder for tomorrow: @vzarytovskii |
Merged! Thank you everyone for your hard work on this |
@dsyme -- please consider the revoking the approval for this suggestion. FSharp.Core is already too big, and so far all attempts to refactor it to make it smaller have failed. Useful helper functions such as these belong in an external library where they can be selected by a developer who is working on scenarios where this type of functionality is useful. We currently ship fsharp.core as a nuget package embedded within the dotnet sdk:
In general we have only added APIs to FSharp.Core when they were widely applicable and supported an idiomatic programming style, or interop with C# assemblies of course some that don't match these criteria have slipped in. Sure these APIs are small and compact, however, so are the many other APIs we could add, these suggested APIs deserve to live in a support library rather than FSharp.Core, someone should work on it. Because we have many gaps with the Numpy data library and FSharp.Core is not the place to address them. |
FSharp.Stats?
It is not because F# is "small user base" (and that this line of thinking can lead to eradication of it) that it shouldn't be part of the SDK, unless we must make the CLR a C# only thing OR the SDK would come with no compilers, just the assemblies, msbuild stuff and dotnet tool. This also would save space in the SDK... In anycase, I concur with your analysis, and also increasing the API surface for the compiler team, however useful, there are several packages that already extend FSharp.Core (https://fsprojects.github.io/FSharp.Collections.ParallelSeq/). We should keep the fslang-design process though for those extensions though. |
Random functions are not part of Numpy, they are part of Python standard library, since they are very basic and widely applicable and were really "missing" in F# since inception |
I too think that bloated libraries are hard to maintain and refactor since i have experience with it sadly. Maybe this would be a good starting point for a second nuget package? |
Fully agree. However we should differentiate that from absence of the very basic and generally applicable functionality.
I don't think so
The presence of random functions in Python standard library is one of the many things that lowered the barrier of ML entry for newcomers and it very positively influenced language popularity, we should follow the success path here rather than avoiding it. |
Ultimately the team looking after the @KevinRansom I'd request that we get those concerns raised much, much earlier in the design process. e.g. at the suggestion stage, or minimally at the RFC-discussion stage. This RFC PR was open 12 months. @Lanayx Given @KevinRansom's concerns could we get a measure of the size delta? @Lanayx I would favour dropping the |
@dsyme As for size question, here is measurements module TestData =
let arr = Array.init 1000 id
[<MemoryDiagnoser>]
type Rand() =
[<Benchmark(Baseline = true)>]
member _.Bcl () =
let newArr = Array.copy TestData.arr
Random.Shared.Shuffle newArr
newArr
[<Benchmark>]
member _.RandomWith () =
TestData.arr |> Array.randomShuffleWith Random.Shared
[<Benchmark>]
member _.RandomBy () =
TestData.arr |> Array.randomShuffleBy Random.Shared.NextDouble
If 1% increase is indeed too much and I had to drop anything from API, I'd drop |
@Lanayx If I had to guess, the HOF version being slower might be due to the |
@brianrourkeboll It will be slower anyway, because of checking value range and because of doing extra calculations of converting float 0..1 value to int min..max value |
@Lanayx Hmm, it seems to be possible to get | Method | Mean | Error | StdDev | Ratio | RatioSD |
|------- |---------:|----------:|----------:|------:|--------:|
| Bcl | 3.386 us | 0.0312 us | 0.0276 us | 1.00 | 0.00 |
| Pr | 6.721 us | 0.0795 us | 0.0705 us | 1.99 | 0.03 |
| Faster | 3.881 us | 0.0290 us | 0.0271 us | 1.15 | 0.01 | The lambda does seem to be required1 if you want full devirtualization, though. That is, Faster.randomShuffleBy Random.Shared.NextDouble TestData.arr is slightly slower (~2 μs on my machine) than Faster.randomShuffleBy (fun () -> Random.Shared.NextDouble ()) TestData.arr That's probably because the JIT can devirtualize the call to So if a user really cares about maximum performance, they can:
Footnotes |
Interesting thanks! BTW what's the cost of the locking involved in Random.Shared? Just out of curiosity really. I think from this I still approve the RFC, and my preference would still be to remove the With variants (@Lanayx is there any reason besides perf to include them?) For me the deciding factor from a design perspective is that a large number of F# teaching scenarios become much simpler to teach if you just have That's really one of the main things FSharp.Core is for: to present a coherent, teachable, usable, portable programming model that captures most common in-memory programming scenarios before moving on to advanced data structures or interop with system features or UX or tensors and so on. Doing basic random permutations has been a part of programming models like this ever since the days of Python and before. It is in Python for a reason, and should be in F# for the same reason. The addition of System.Random.Shared is a factor in this - it indicated that .NET has embraced the same principle. Therefore F# should embrace the principle too from its own perspective. The owners of dotnet/fsharp still have a veto right on this one. Size is a factor affecting many scenarios that might not be obvious. |
Do you mean Compare: dotnet/fsharp#17277 (comment) and dotnet/fsharp#17277 (comment). |
Got it, thanks! |
But
From my hands-on perspective ideal F# API that is based on C# experience should the following properties:
So I look at different options with regard to those properties:
Based on above I would use the .With option in my work project (with Random.Shared) it was available. If it's not available, I'll have to add one more method to my own collection of missing language functions (near Array.randomShuffleBy (fun () -> Random.Shared.NextDouble ()) TestData.arr This would raise a thought in my mind - "They had a chance to make it nice after 20 years, but failed it". Ultimately the .With option would be not needed it Fsharp.Core targeted .NET 8, because the implementation could just be based on |
Just a thought: How exactly is randomShuffleBy any different from sortBy? Signature-wise it seems to be identical and also the semantics seem to be the same. |
Shuffle algorithmic complexity is O(N) |
So the generated double is actually interpreted as an index for classic swap-operations like this? If so, i think it's very odd (what happens when i return NaN or simply 3.0, etc) |
Right
It's covered in the RFC, the |
Click “Files changed” → “⋯” → “View file” for the rendered RFC (i.e.: here).
Discussion: in this thread