Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a task scope produces <= 1 task to run, run it on the calling thread immediately #932

Merged
merged 1 commit into from
Nov 27, 2020

Conversation

aclysma
Copy link
Contributor

@aclysma aclysma commented Nov 27, 2020

This is an additional change continuing on from #892

The rationale for this change is that if we have just a single task to perform, it's almost always better to perform it immediately rather than pass the work to another thread and blocking-wait on it (thereby tying up two threads.)

I don't see significant gains (or losses) from this in practice when profiling. To me the data indicates that the main thread is already consistently picking up the task before a worker thread can (and doesn't appear that we are even waking another thread?) While the benefits from thread behavior are inconclusive, this does remove some allocations when hitting the fast path of <= 1 task to perform in a scope.

Original

This is a baseline from before we fixed the deadlock. It shows what it looks like when the main thread is constantly blocked waiting on other threads. Frames averaged ~6.98ms

image

image

Recent Commit

The main thread has very high occupancy and there are clear runs where the main thread executed a bunch of systems without going out to other threads. (The top bar labeled "main" is almost fully green.) Frames averaged ~5.96ms

image

image

This PR

I don't see a significant difference. Frames averaged ~5.72ms. ~4% difference is too narrow for me to draw any conclusion at this point.

image

image

Methodology

Ran a single time in release, profiled with superluminal, code was instrumented as shown here: aclysma@0782fe4

…read immediately.

While generally speaking the calling thread would have picked up the task first anyways, I don't think it makes much sense usually to block the calling thread until another thread wakes and does the work.
@aclysma
Copy link
Contributor Author

aclysma commented Nov 27, 2020

I just found some more conclusive evidence. I don't think it makes a performance impact right now because the threads are mostly idle.

Recent Commit

Most of the time the "Run Systems" scope is executed on the main thread, but sometimes (napkin math: 100s of times in a 7 second sample) it can run on a compute thread.

image

This PR

The "Run Systems" scope runs exclusively on the main thread, which is what I would expect to happen based on how the code is written today. I think this is the better behavior, so I recommend taking this PR. While I don't measure a significant improvement in throughput, I think this behavior is more desirable and in some cases it avoids allocations.

@ambeeeeee
Copy link
Contributor

Off topic: What profiler is that?

@cart
Copy link
Member

cart commented Nov 27, 2020

I'm convinced. Thanks for being thorough. I'm now lamenting that superluminal doesn't support linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants