-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<chrono>: Consider caching QueryPerformanceFrequency() #448
Comments
Hi! Agree, with the use of a global So, it should be sufficient to use the equivalent of only the Edit: On 32-bit CPU architectures, the above may end up being implemented in terms of a CAS or LL/SC loop, but on 64-bit architectures, should expect it to be optimized to plain old load/store instructions so long as the architecture guarantees atomicity of such operations. I believe .NET Core uses this same technique deep in the internals of System.Threading.SpinWait to record a measurement of the time that the x86 PAUSE instruction takes, since this can vary depending on the architecture (for example, Skylake's PAUSE takes much longer than previous Intel microarchitectures). It doesn't use interlocked operations, atomics, or locking since I believe the C# memory model for loads/stores on non-volatile variables of types |
Other thing to consider for this area: Lines 616 to 618 in a83d8c0
There's By replacing highlighted code with the following:
Can have one division instead of two. But on my system QPC takes most of the time. It also may be not worth of trouble bringing intrinsic functions to that header. |
Maybe a bit off topic, but why do magic statics need thread local storage? |
@MikeGitb The 'Magic Statics' algorithm uses a thread-local read to see 'did this thread already see that this value' to avoid synchronization if the current thread has already seen that the value is initialized. |
(If we started this off-topic) what are reasons to avoid TLS ? I recall there are issues with .NET, but this can be ruled out by preprocessor. There was dll in XP problem, but XP is no longer there. Some more exotic, like kernel mode drivers, executable packers / protectors, malware ? This STL is not suited for such scenarios anyway. |
Adding a .TLS section to a DLL that previously didn't have one can push a program over the TLS slot limit, so it's a breaking change. |
TLS slot limit? I though the algorithm that exists since Vista has no limits, it would reallocate TLS as many times as needed. |
(Sure such TLS reallocation may be heap expensive for a program with many threads, and many DLLs already loaded, and may cause reaching heap limit on x86, but then any adding change can cause that) |
@AlexGuteniev My understanding is the limit in Vista was changed from ~58 to ~1000ish but that there is still a limit. |
steady_clock::now()
says:STL/stl/inc/chrono
Lines 612 to 614 in a83d8c0
STL/stl/src/xtime.cpp
Lines 92 to 96 in a83d8c0
As the comment indicates, QueryPerformanceFrequency documentation says: "The frequency of the performance counter is fixed at system boot and is consistent across all processors. Therefore, the frequency need only be queried upon application initialization, and the result can be cached."
This code originally used magic statics, but TFS checkin 1586419 on March 16, 2016 removed that. My checkin notes claimed (emphasis added):
I had no evidence for this performance assumption and it was incorrect. In DevCom-505019, where this issue was originally reported, Damian Zwoliński noted that while QPC is indeed efficient on most platforms (aside: IIRC, on certain VMs it is expensive), QPF is not efficient.
We're avoiding magic statics for a reason (its use of Thread Local Storage is problematic for some users), but we should investigate whether it's possible to restore caching without TLS and without breaking ABI. For example, we could have a static
long long
initialized to0
(no magic) and use interlocked operations to cache QPF, since0
is never a valid value for it.The text was updated successfully, but these errors were encountered: