Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set memory order on slow atomics #6920

Merged
3 commits merged into from
Jul 17, 2020
Merged

Set memory order on slow atomics #6920

3 commits merged into from
Jul 17, 2020

Conversation

miniksa
Copy link
Member

@miniksa miniksa commented Jul 14, 2020

By default, the memory order on atomics is seq_cst. This is a relatively expensive ordering and it shows in situations where we're rapidly signaling a consumer to pick up something from a producer. I've instead attempted to switch these to release (producer) and acquire (consumer) to improve the performance of these signals.

Validation Steps Performed

  • Run time cat big.txt and time cat ls.txt under VS Performance Profiler.

PR Checklist

  • Closes perf itch
  • I work here
  • Manual test
  • Documentation irrelevant.
  • Schema irrelevant.
  • Am core contributor.

@miniksa miniksa added Product-Conhost For issues in the Console codebase Area-Performance Performance-related issue Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal. Issue-Task It's a feature request, but it doesn't really need a major design. labels Jul 14, 2020
@miniksa miniksa self-assigned this Jul 14, 2020
@miniksa
Copy link
Member Author

miniksa commented Jul 14, 2020

@lhecker and @greg904.... you two seemed to fully understand memory ordering in the SPSC PR. Can you look at my changes here and confirm that my ordering is correct?

I sat down this morning to try to understand the ordering and this was what I gathered was the correct behavior.

I've tested it out locally and I don't see any ill effects from it. And I've run performance analysis and it's definitely faster this way.

@miniksa
Copy link
Member Author

miniksa commented Jul 14, 2020

Before:
image

After:
image

It's more pronounced in the NotifyPaint case than in the ThrottledFunc::Run<> case.

@miniksa miniksa marked this pull request as ready for review July 15, 2020 16:34
{
// <--
// If `NotifyPaint` is called at this point, then it will not
// set the event because `_fWaiting` is not `true` yet so we have
// to check again below.

_fWaiting.store(true);
_fWaiting.store(true, std::memory_order_release);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand it properly, this one probably shouldn't be relaxed because you're checking two different atomics instead of just one (and hoping for consistency across both), right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be "relaxed" because the "release" would make prior writes from this thread visible to other threads who "acquire" _fWaiting's value. But in this case the only one "acquiring" its value is NotifyPaint(), which doesn't need any prior writes done by _ThreadProc, since the only thing NotifyPaint() does is write to _fNextFrameRequested (but it doesn't rely upon its value, nor does it rely on any other value).
(This information is supplied without liability. ⚖😄)

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking b/c discussion of making it relaxed so nobody merges it in the meantime

@ghost ghost added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Jul 17, 2020
@miniksa
Copy link
Member Author

miniksa commented Jul 17, 2020

blocking b/c discussion of making it relaxed so nobody merges it in the meantime

If we're concerned about this, then let's just leave it as it stands right now with the release/acquire as I wrote it here. It's better than seq_cst already. Also @lhecker said in a chat that "relaxed" isn't really a thing on x86 or amd64 anyway. It only works on ARM. And we build ARM64 but it's like 0.00001% of our usage. So I'd rather just leave it for now.

@ghost ghost removed the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Jul 17, 2020
@@ -218,13 +218,13 @@ DWORD WINAPI RenderThread::_ThreadProc()

void RenderThread::NotifyPaint()
{
if (_fWaiting.load())
if (_fWaiting.load(std::memory_order_acquire))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean... Technically you can shave off a bit time here on x86 as well. While x86 doesn't make a distinction between relaxed and release, it does invalidate caches during an acquire (and that way previously "released" data is being sync'd into your "acquiring" thread).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm just going to leave it for now. I'm still not 100% comfortable with all this "manual memory order" business. And you did say above that your relaxed recommendation was provided without liability. :P So if the liability is mine, I'd like to stay in my comfy place for right now and consider relaxed in the future.

@miniksa miniksa added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jul 17, 2020
@ghost
Copy link

ghost commented Jul 17, 2020

Hello @miniksa!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit ea2bd42 into master Jul 17, 2020
@ghost ghost deleted the dev/miniksa/perf_ordering branch July 17, 2020 17:11
DHowett added a commit that referenced this pull request Aug 5, 2020
Carlos Zamora (1)
* UIA: use full buffer comparison in rects and endpoint setter (GH-6447)

Dan Thompson (2)
* Tweaks: normalize TextAttribute method names (adjective form) (GH-6951)
* Fix 'bcz exclusive' typo (GH-6938)

Dustin L. Howett (4)
* Fix VT mouse capture issues in Terminal and conhost (GH-7166)
* version: bump to 1.3 on master
* Update Cascadia Code to 2007.15 (GH-6958)
* Move to the TerminalDependencies NuGet feed (GH-6954)

James Holderness (3)
* Render the SGR "underlined" attribute in the style of the font (CC-7148)
* Add support for the "crossed-out" graphic rendition attribute (CC-7143)
* Refactor grid line renderers with support for more line types (CC-7107)

Leonard Hecker (1)
* Added til::spsc, a lock-free, single-producer/-consumer FIFO queue (CC-6751)

Michael Niksa (6)
* Update TAEF to 10.57.200731005-develop (GH-7164)
* Skip DX invalidation if we've already scrolled an entire screen worth of height (GH-6922)
* Commit attr runs less frequently by accumulating length of color run (GH-6919)
* Set memory order on slow atomics (GH-6920)
* Cache the viewport to make invalidation faster (GH-6918)
* Correct comment in this SPSC test as a quick follow up to merge.

Related work items: MSFT-28208358
@ghost
Copy link

ghost commented Aug 26, 2020

🎉Windows Terminal Preview v1.3.2382.0 has been released which incorporates this pull request.:tada:

Handy links:

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conhost For issues in the Console codebase Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants