-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UnfoldResource ValueTask Stage and DSL methods #7028
base: dev
Are you sure you want to change the base?
Conversation
private readonly PooledValueTaskContinuationHelper<Option<TOut>> | ||
_pooledContinuation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's entirely possible that this entire pattern is overkill and we should consider using Result
or Result
+Holder
patterns consistently throughout these implementations for consistency's sake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or, maybe, the 'hold local' pattern we use for the UnfoldValueTaskAsync
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd stick with one implementation pattern for this - I think the Holder pattern is easier to reason about so far but I haven't had to debug any of this code yet :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holder pattern is easier to reason about, Especially from a 'safety' standpoint...
But since these are internals, I would expect that unless the ActorGraphInterpreter
is broken, that the debugging experience would not be overly diminished for maintainers; instead of peeking at Result
property state they just need to instead look at the ValueTask
state. Also it's a -little- less code to maintain.
The win, is that directly holding the VT removes indirection Result
introduces
Note: We do need to keep Holder
however for SelectAsync
due to it's internal parallelism and logic rules.
Really though, I want to have benchmarks of Task
vs this state first before I try to standardize things, just in case the difference of keeping a specific impl is worth a discussion, and/or if whichever we 'pick' shows notable regression (which means, maybe try the other one.) Kinda the other part of why I tried a few different strategies here, This may be important for Artery.
{ | ||
public Result<T> Element { get; private set; } | ||
private readonly Action<Holder<T>> _callback; | ||
private ValueTask<T> _pending; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed to me, that the best thing to do here was have Holder<T>
itself keep the continuation and just re-use them appropriately.
Yes, it costs the size of ValueTask<T>
but we'd take at least that .AsTask()
ing and in this case we only pay it 'once'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pretty clever to me - I'd love to see a benchmark comparison between this and a normal SelectAsync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest benchmarks are in comments, It's 5-10% faster and improved alloc!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional note, There probably are additional opts to be made here but I tried to keep it, for lack of a better term, 'debugging friendly' as much as I could. :)
private readonly | ||
ConcurrentQueue< | ||
Holder<TOut>> _queue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here, is that we treat used Holders as 'Lazy' instances, so this will only grow to the size of parallelism
at the most, and even then only if we get to that level of demand.
private void CompletionAction() | ||
{ | ||
if (_currentTask.IsCompletedSuccessfully) | ||
{ | ||
_asyncHandler.Invoke(Result.Success(_currentTask.Result)); | ||
} | ||
else | ||
{ | ||
_asyncHandler.Invoke( | ||
Result.FromTask(_currentTask.AsTask())); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be capturing, then clearing clearing the _currentTask
var peeker = | ||
Unsafe.As<ValueTask<T>, ValueTaskCheatingPeeker<T>>( | ||
ref valueTask); | ||
if (peeker._obj == null) | ||
{ | ||
Invoke(Result.Success(peeker._result)); | ||
} | ||
else if (peeker._obj is Task<T> asTask) | ||
{ | ||
asTask.ContinueWith(TaskCompletedAction, this, | ||
TaskContinuationOptions.NotOnCanceled); | ||
} | ||
else | ||
{ | ||
_pending = vt; | ||
var source = | ||
Unsafe.As<IValueTaskSource<T>>(peeker._obj); | ||
source.OnCompleted(OnCompletedAction, this, | ||
peeker._token, | ||
ValueTaskSourceOnCompletedFlags.None); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Magic"
- We get back a
ValueTask<T>
- Unfortunately, Polite callbacks are not exposed
- In other words, we can't pass a state in.
- So we just abuse
Unsafe.As
and roll with that.
If this pattern (b/c we do it elsewhere) is overly offensive (I get it) there are some options that come to mind:
- Always pull the
ValueTask
into a given holder/scope and use a static continuation - do a 'pre-check' and branch logic accordingly
- I've seen this done in other libs, it's painful for whoever does it the first time but way cleaner than you would think.
- Fallback would be some form of compiled expression/etc.
- Hide all of this behind the newest .NET and rely on
UnsafeAccessor
to get the fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use Unsafe.As
, do we have to set an attribute on the assembly? I think we do this for some unsafe
stuff inside Akka.Remote already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Unsafe.As
(and many/most other parts of that API) does not have unsafe
on the public API method signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some feedback; would love some benchmarks showcasing the perf gains possible using these stages vs. their Task
-y equivalents.
/// <typeparam name="T"></typeparam> | ||
/// <typeparam name="TSource"></typeparam> | ||
/// <returns>A source of T that is created on stream start and read on backpressure</returns> | ||
public static Source<T, NotUsed> UnfoldResourceValueTaskAsync<T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick / question: can we call this UnfoldResourceAsync
and overload the other call or is there going to be some conflict with Task
/ ValueTask
conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mayyyyybe?
Biggest concern off the top of my head, is that it may be easy to accidentally wind up with the Task
variant, i.e.
Source.UnfoldResourceAsync(
() => inThing.OpenAsync(), //Is Task
(r) => r.ReadAsync(), //Is ValueTask
r => r.DisposeAsync() // is ValueTask
Above should(?) give a compiler error.
But below, need to verify it would not wind up using Task
variant:
Source.UnfoldResourceAsync(
async () => return await inThing.OpenAsync(), //Is Task
async (r) => return await r.ReadAsync(), //Is ValueTask
async r => return await r.DisposeAsync() // is ValueTask
@@ -2551,7 +2907,7 @@ public void Invoke(Result<T> result) | |||
} | |||
} | |||
|
|||
private static readonly Result<TOut> NotYetThere = Result.Failure<TOut>(new Exception()); | |||
private static readonly Result<TOut> NotYetThere = Result.Failure<TOut>(NotYetThereSentinel.Instance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
{ | ||
public Result<T> Element { get; private set; } | ||
private readonly Action<Holder<T>> _callback; | ||
private ValueTask<T> _pending; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pretty clever to me - I'd love to see a benchmark comparison between this and a normal SelectAsync
var peeker = | ||
Unsafe.As<ValueTask<T>, ValueTaskCheatingPeeker<T>>( | ||
ref valueTask); | ||
if (peeker._obj == null) | ||
{ | ||
Invoke(Result.Success(peeker._result)); | ||
} | ||
else if (peeker._obj is Task<T> asTask) | ||
{ | ||
asTask.ContinueWith(TaskCompletedAction, this, | ||
TaskContinuationOptions.NotOnCanceled); | ||
} | ||
else | ||
{ | ||
_pending = vt; | ||
var source = | ||
Unsafe.As<IValueTaskSource<T>>(peeker._obj); | ||
source.OnCompleted(OnCompletedAction, this, | ||
peeker._token, | ||
ValueTaskSourceOnCompletedFlags.None); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use Unsafe.As
, do we have to set an attribute on the assembly? I think we do this for some unsafe
stuff inside Akka.Remote already.
/// <typeparam name="TOut">TBD</typeparam> | ||
[InternalApi] | ||
public sealed class | ||
SelectValueTaskAsync<TIn, TOut> : GraphStage<FlowShape<TIn, TOut>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same nitpicks about naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.5 points here:
- The class itself, will almost certainly require a different name from the original
1.5. As for the DSL, Will see whether having the same exposed name causes issues; in this (i.e. as opposed toUnfoldResource
) case I'm more worried about compiler being confused as to which overload to use and diminish dev experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discord comments giving the DSL methods the same name results in compiler confusion due to inability to auto-resolve inlined async Func, Happy to give DSL a better name though and open to options. :)
private readonly PooledValueTaskContinuationHelper<Option<TOut>> | ||
_pooledContinuation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd stick with one implementation pattern for this - I think the Holder pattern is easier to reason about so far but I haven't had to debug any of this code yet :p
ValueTask versions of
The key to getting a clear perf benefit was having a |
…rtunities still...
26a5854
to
0471a58
Compare
New benchmarks after rebase and clearing Rider caches. Also IDK why I couldn't reply directly, but Note the benchmarks are... probably imperfect, (and need far better names,) but are fair in that they compare between each in the same way:
|
Method | Mean | Error | StdDev | Gen0 | Allocated |
---|---|---|---|---|---|
RunSelectAsync | 33.21 us | 0.661 us | 0.989 us | 3.9063 | 17.94 KB |
RunSelectValueTaskAsync | 29.68 us | 0.341 us | 0.319 us | 0.7935 | 3.74 KB |
RunSelectAsyncSync | 32.13 us | 0.637 us | 1.064 us | 3.9063 | 17.94 KB |
RunSelectValueTaskAsyncSync | 30.27 us | 0.553 us | 0.490 us | 0.7935 | 3.74 KB |
Unfold
Very happy here; we are clearly faster and with somewhat/greatly lowered allocation:
Method | Mean | Error | StdDev | Gen0 | Allocated |
---|---|---|---|---|---|
UnfoldAsyncNoYield | 176.07 us | 2.020 us | 1.791 us | 9.7656 | 45.23 KB |
UnfoldValueTaskAsyncNoYield | 136.77 us | 2.715 us | 2.788 us | 7.3242 | 33.38 KB |
UnfoldAsyncWithYield | 161.83 us | 1.198 us | 1.062 us | 8.5449 | 38.99 KB |
UnfoldValueTaskAsyncWithYield | 69.99 us | 0.479 us | 0.448 us | 1.5869 | 7.37 KB |
/// Inheritors are expected to utilize the <see cref="SetPooledCompletionCallback"/> | ||
/// and call `base.PreStart()` in their `PreStart` conditions. | ||
/// <para/> | ||
/// Additionally, if inheritors have their own 'restart' logic, | ||
/// They should utilize the `ResetHolder()` method, | ||
/// to avoid callback clashes. | ||
/// | ||
/// </summary> | ||
/// <typeparam name="T"></typeparam> | ||
internal abstract class PooledAwaitOutGraphStageLogic<T> : OutGraphStageLogic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was created based on desire to consolidate logic.
It works, but it feels clunky. Not sure what we can do to make the API cleaner?
{ | ||
public Result<T> Element { get; private set; } | ||
private readonly Action<Holder<T>> _callback; | ||
private ValueTask<T> _pending; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest benchmarks are in comments, It's 5-10% faster and improved alloc!
@@ -3805,7 +4124,7 @@ public sealed class AsyncEnumerable<T> : GraphStage<SourceShape<T>> | |||
{ | |||
#region internal classes | |||
|
|||
private sealed class Logic : OutGraphStageLogic | |||
private sealed class Logic : PooledAwaitOutGraphStageLogic<bool> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the words of GL, "I May have gone too far".
I don't have benchmarks, Can try to make something and diff against dev
(although if someone else wants to 😇) but:
- 90% sure this is better than what we had
- 100% sure it's less less LOC and vaguely cleaner.
|
||
namespace Akka.Streams.Implementation.Fusing; | ||
|
||
public readonly struct SlimResult<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing Akka Result<T>
type has some pain points AFA internals
Primarily, The structure results in poor alignment, due to the bool
/<T>
/Exception
layout.
What this does instead, is we use Sentinel Exceptions (similar to Channel<T>
internals) to track other states as required. This helps on alignment and overall buffer allocation.
var whyMSFTwhy = _vt.ConfigureAwait(false); | ||
_vt = Unsafe.As<ConfiguredValueTaskAwaitable<T>, ValueTask<T>>( | ||
ref whyMSFTwhy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone please suggest a better name, I could not think of one after comprehending that this is all more or less legal but normally only internals are allowed to do directly =/.
Add UnfoldResourceAsync benchies
I'm very happy.
|
Fix SlimResult badness ValueType Error pass in. Add Convenience methods for SlimResult, avoid branching. Add guarding around SelectValueTaskAsync Holder pool usage with apology
'Ported' the existing There's still an improvement in numbers, but we lost a bit with guard code and these allocation savings are a 'best case' as they hit the happy path, we can almost certainly improve in future however. :) Alas it's a bit of weaving and don't want the API goodness to get lost in the weeds of optimizations. SelectAsync:
|
Add UnfoldValueTaskAsync test based on UnfoldAsync test.
Tests are passing locally but a couple may be flaky on CI. Hoping to see what Linux tests will do once we solve the build errors tripping up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Edited due to Weird draft I couldn't get rid of, apologies!)
I think this is good short of a review and approval from others.
Fairly convinced the new failing test(s) are racy, but left it in for discussion/pushback.
Changes
Adds Akka.Streams Source stages:
UnfoldValueTaskAsync
<-- VT Optimized version ofUnfoldAsync
UnfoldResourceValueTaskAsync
<-- VT Optimized version ofUnfoldResourceAsync
SelectValueTaskAsync
<-- VT Optimized version ofSelectAsync
This allows all sorts of potential benefits to end users, including:
Checklist
For significant changes, please ensure that the following have been completed (delete if not relevant):
Latest
dev
BenchmarksN/A since these are 'new' APIs.
This PR's Benchmarks
See other comments, overall the new methods are 'at least' as fast with lower allocation, at worst we are 0-10% faster but in some cases we are twice as fast :D