Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<complex>: Improve numerical accuracy of sqrt and log #935

Merged

Conversation

statementreply
Copy link
Contributor

@statementreply statementreply commented Jun 29, 2020

Goals:

  • Fix sqrt overflow when input is huge.
  • Fix sqrt inaccuracy when input is tiny, due to internal underflow.
  • Fix log inaccuracy when input is tiny, due to internal underflow.
  • Improve accuracy of log when |z| ≈ 1, and fix log(complex<float>{1, 0}) != 0 under certain combinations of compile time and runtime settings. (DevCom-1093507 is a symptom of the inaccuracy. It is itself non-bug.)
    • Use hardware FMA on x86/x64 when available.
    • Use hardware FMA on arm/arm64 when available.
  • Add test coverage.

The changes contain algorithm described in W. Kahan (1987), Branch Cuts for Complex Elementary Functions, or Much Ado About Nothing's Sign Bit.

Modifies the scale factors in `_Fabs` (used by `sqrt`) such that:

- `_Fabs` doesn't underflow when the input is tiny.

- `sqrt` doesn't overflow when the input is huge.
When |z| is close to 1, compute log(|z|) as log1p(norm_minus_1(z)) / 2,
where norm_minus_1(z) = real(z) ^ 2 + imag(z) ^ 2 - 1 computed with
double width arithmetic to avoid catastrophic cancellation.
@statementreply statementreply marked this pull request as ready for review August 8, 2020 12:29
@statementreply statementreply requested a review from a team as a code owner August 8, 2020 12:29
@StephanTLavavej

This comment has been minimized.

@azure-pipelines

This comment has been minimized.

stl/inc/complex Outdated Show resolved Hide resolved
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! 😺 I'll validate and push changes for a number of superficial issues that I noticed, nothing affecting the core logic.

stl/src/math_algorithms.cpp Outdated Show resolved Hide resolved
stl/src/math_algorithms.cpp Outdated Show resolved Hide resolved
tests/std/tests/floating_point_model_matrix.lst Outdated Show resolved Hide resolved
tests/std/include/fenv_prefix.hpp Outdated Show resolved Hide resolved
stl/inc/complex Outdated Show resolved Hide resolved
stl/src/float_multi_prec.hpp Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej removed their assignment Nov 4, 2020
@StephanTLavavej

This comment has been minimized.

StephanTLavavej added a commit to StephanTLavavej/STL that referenced this pull request Nov 5, 2020
@StephanTLavavej StephanTLavavej self-assigned this Nov 5, 2020
@StephanTLavavej
Copy link
Member

I've pushed a change for compatibility with our internal test harness - it doesn't contain the same option-parsing machinery that the Python/lit test harness does, so it attempts to use ARM64-only options on x86 and x64 (and vice versa). clang-cl emits an "unknown command line argument" warning that upgrades to an error, so we need to disable that. (MSVC will emit a warning from the compiler driver, but that doesn't get upgraded to an error by /WX, so we don't need to make a change for it.)

Unfortunately, significant changes are needed to be compatible with /clr:pure (yet again we pay a price for not having ported its builds and tests to GitHub yet). I think I know how to fix it, but not in time for merging tomorrow, so I'm moving this back to WIP.

@StephanTLavavej
Copy link
Member

I've pushed a merge, resolving conflicts with the test harness rewrite. The tests.py changes now have comments referring to features.py. The previous changes to config.py have been replaced by different changes to features.py (with the same effect).

This doesn't do anything about /clr:pure yet, but should get the tests here passing again.

@StephanTLavavej
Copy link
Member

StephanTLavavej commented Nov 9, 2020

Changelog

  • Fused src/float_multi_prec.hpp and src/math_algorithms.cpp into <complex>, because
    /clr:pure refused to work with the separately compiled code.
  • Introduced macros _FMP_USING_X86_X64_INTRINSICS and _FMP_USING_ARM64_INTRINSICS,
    defined within <complex> only.
  • /clr:pure doesn't have access to any intrinsics (beyond a few special ones like _InterlockedIncrement).
  • Clang makes it difficult to use the FMA intrinsics needed here. This may be solvable in the future, but for now, I'm activating the fallback for Clang. @statementreply's profiling indicates that the intrinsics are worth ~5% performance (~3 ns out of ~60 ns for complex<double>::log) so they're "nice to have" but not critical.
  • To reduce the throughput impact, I'm including <emmintrin.h> (which is relatively small)
    and then manually declaring _mm_fmsub_sd (which is declared by <immintrin.h>, which is large). We usually like to centralize intrinsic declarations for the STL in <intrin0.h>, but the definition of __m128d can't simply be repeated - we may be able to solve this in the future (with more significant changes to vcruntime).
  • For ARM64, I'm including <arm64_neon.h>, again because of type definitions.
  • Instead of blocking /fp:fast, I'm using #pragma float_control(precise, on, push), #pragma float_control(pop) around the entire namespace _Float_multi_prec. As @statementreply explained to me, the correctness of this logic is massively damaged by /fp:fast, which is why this is necessary.
  • To support C++14 mode, we can't use terse static_assert anymore.
  • Because _High_half (and _Sqr_error_fallback, indirectly) use _Bit_cast, they need to be marked as _CONSTEXPR_BIT_CAST for CUDA.
  • I'm changing _Sqr_x2 from constexpr to inline and removing its usage of is_constant_evaluated - nothing requires this to be constexpr (yet).
  • I've reworked how _Sqr_x2 chooses the intrinsic versus fallback implementations (for clarity and to deal with the new system that detects /clr:pure and Clang).
  • I've additionally added code, like we used in <bit>, to detect __AVX2__ and avoid the runtime ISA test.
  • The contents of src/math_algorithms.cpp are now within std::_Math_algorithms (we can't use unnamed namespaces in headers). This makes some _STD qualification unnecessary (on non-functions, and _Ugly functions).
  • To support C++14 mode, we can't have explicit specializations of non-inline variable templates. I've converted _Hypot_leg_huge and _Hypot_leg_tiny to use helper structs.
  • This header-only code no longer needs __std_math_log_hypot and __std_math_log_hypotf; we can call _Math_algorithms::_Log_hypot directly.
  • <xutility> needs to leave _CONSTEXPR_BIT_CAST defined for <complex> to use.

@StephanTLavavej StephanTLavavej removed their assignment Nov 9, 2020
@StephanTLavavej StephanTLavavej merged commit 9959929 into microsoft:master Nov 9, 2020
@StephanTLavavej
Copy link
Member

Thanks again for these accuracy improvements and bug fixes! 🎯 😸 This will ship in VS 2019 16.9 Preview 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants