<complex>: Improve numerical accuracy of sqrt and log #935

statementreply · 2020-06-29T14:44:10Z

Goals:

Fix sqrt overflow when input is huge.
Fix sqrt inaccuracy when input is tiny, due to internal underflow.
Fix log inaccuracy when input is tiny, due to internal underflow.
Improve accuracy of log when |z| ≈ 1, and fix log(complex<float>{1, 0}) != 0 under certain combinations of compile time and runtime settings. (DevCom-1093507 is a symptom of the inaccuracy. It is itself non-bug.)
- Use hardware FMA on x86/x64 when available.
- Use hardware FMA on ~~arm~~/arm64 when available.
Add test coverage.

The changes contain algorithm described in W. Kahan (1987), Branch Cuts for Complex Elementary Functions, or Much Ado About Nothing's Sign Bit.

Modifies the scale factors in `_Fabs` (used by `sqrt`) such that: - `_Fabs` doesn't underflow when the input is tiny. - `sqrt` doesn't overflow when the input is huge.

When |z| is close to 1, compute log(|z|) as log1p(norm_minus_1(z)) / 2, where norm_minus_1(z) = real(z) ^ 2 + imag(z) ^ 2 - 1 computed with double width arithmetic to avoid catastrophic cancellation.

stl/inc/complex

stl/src/float_multi_prec.hpp

StephanTLavavej

This looks good to me! 😺 I'll validate and push changes for a number of superficial issues that I noticed, nothing affecting the core logic.

stl/src/math_algorithms.cpp

tests/std/tests/floating_point_model_matrix.lst

tests/std/include/fenv_prefix.hpp

stl/inc/complex

stl/src/float_multi_prec.hpp

tests/std/tests/GH_000935_complex_numerical_accuracy/floating_point_utils.hpp

tests/std/tests/GH_000935_complex_numerical_accuracy/log_test_cases.hpp

tests/std/tests/GH_000935_complex_numerical_accuracy/test.cpp

StephanTLavavej · 2020-11-06T07:06:00Z

I've pushed a change for compatibility with our internal test harness - it doesn't contain the same option-parsing machinery that the Python/lit test harness does, so it attempts to use ARM64-only options on x86 and x64 (and vice versa). clang-cl emits an "unknown command line argument" warning that upgrades to an error, so we need to disable that. (MSVC will emit a warning from the compiler driver, but that doesn't get upgraded to an error by /WX, so we don't need to make a change for it.)

Unfortunately, significant changes are needed to be compatible with /clr:pure (yet again we pay a price for not having ported its builds and tests to GitHub yet). I think I know how to fix it, but not in time for merging tomorrow, so I'm moving this back to WIP.

StephanTLavavej · 2020-11-07T02:03:00Z

I've pushed a merge, resolving conflicts with the test harness rewrite. The tests.py changes now have comments referring to features.py. The previous changes to config.py have been replaced by different changes to features.py (with the same effect).

This doesn't do anything about /clr:pure yet, but should get the tests here passing again.

StephanTLavavej · 2020-11-09T04:09:00Z

Changelog

Fused src/float_multi_prec.hpp and src/math_algorithms.cpp into <complex>, because
/clr:pure refused to work with the separately compiled code.
Introduced macros _FMP_USING_X86_X64_INTRINSICS and _FMP_USING_ARM64_INTRINSICS,
defined within <complex> only.
/clr:pure doesn't have access to any intrinsics (beyond a few special ones like _InterlockedIncrement).
Clang makes it difficult to use the FMA intrinsics needed here. This may be solvable in the future, but for now, I'm activating the fallback for Clang. @statementreply's profiling indicates that the intrinsics are worth ~5% performance (~3 ns out of ~60 ns for complex<double>::log) so they're "nice to have" but not critical.
To reduce the throughput impact, I'm including <emmintrin.h> (which is relatively small)
and then manually declaring _mm_fmsub_sd (which is declared by <immintrin.h>, which is large). We usually like to centralize intrinsic declarations for the STL in <intrin0.h>, but the definition of __m128d can't simply be repeated - we may be able to solve this in the future (with more significant changes to vcruntime).
For ARM64, I'm including <arm64_neon.h>, again because of type definitions.
Instead of blocking /fp:fast, I'm using #pragma float_control(precise, on, push), #pragma float_control(pop) around the entire namespace _Float_multi_prec. As @statementreply explained to me, the correctness of this logic is massively damaged by /fp:fast, which is why this is necessary.
To support C++14 mode, we can't use terse static_assert anymore.
Because _High_half (and _Sqr_error_fallback, indirectly) use _Bit_cast, they need to be marked as _CONSTEXPR_BIT_CAST for CUDA.
I'm changing _Sqr_x2 from constexpr to inline and removing its usage of is_constant_evaluated - nothing requires this to be constexpr (yet).
I've reworked how _Sqr_x2 chooses the intrinsic versus fallback implementations (for clarity and to deal with the new system that detects /clr:pure and Clang).
I've additionally added code, like we used in <bit>, to detect __AVX2__ and avoid the runtime ISA test.
The contents of src/math_algorithms.cpp are now within std::_Math_algorithms (we can't use unnamed namespaces in headers). This makes some _STD qualification unnecessary (on non-functions, and _Ugly functions).
To support C++14 mode, we can't have explicit specializations of non-inline variable templates. I've converted _Hypot_leg_huge and _Hypot_leg_tiny to use helper structs.
This header-only code no longer needs __std_math_log_hypot and __std_math_log_hypotf; we can call _Math_algorithms::_Log_hypot directly.
<xutility> needs to leave _CONSTEXPR_BIT_CAST defined for <complex> to use.

StephanTLavavej · 2020-11-09T23:27:28Z

Thanks again for these accuracy improvements and bug fixes! 🎯 😸 This will ship in VS 2019 16.9 Preview 3.

CaseyCarter added the bug Something isn't working label Jun 29, 2020

statementreply mentioned this pull request Jul 1, 2020

How do I provide different configuration sets for different CPU architectures in tests? #954

Closed

statementreply force-pushed the improve_complex_sqrt_log branch from 3530c76 to 962917c Compare July 14, 2020 16:54

statementreply force-pushed the improve_complex_sqrt_log branch from 962917c to 5d43a74 Compare July 27, 2020 14:03

statementreply added 2 commits August 3, 2020 22:06

Fix undue overflow and underflow in complex sqrt

903c59f

Modifies the scale factors in `_Fabs` (used by `sqrt`) such that: - `_Fabs` doesn't underflow when the input is tiny. - `sqrt` doesn't overflow when the input is huge.

Improve accuracy of log when |z| is close to 1

a6a5934

When |z| is close to 1, compute log(|z|) as log1p(norm_minus_1(z)) / 2, where norm_minus_1(z) = real(z) ^ 2 + imag(z) ^ 2 - 1 computed with double width arithmetic to avoid catastrophic cancellation.

statementreply force-pushed the improve_complex_sqrt_log branch from 5d43a74 to a6a5934 Compare August 3, 2020 14:16

statementreply added 7 commits August 6, 2020 21:58

Minor fixes, clarify comments

7843118

Add complex sqrt test

1d34722

Add complex log test

d79c9f3

Remove internal header file from CMake source file list

b33c29e

Add new file to MSBuild project

e259009

Fix log(complex{1, tiny}) incorrectly returning -0 under FE_DOWNWARD

ccf5cdc

Use hardware FMA on arm64

5bcece4

statementreply marked this pull request as ready for review August 8, 2020 12:29

statementreply requested a review from a team as a code owner August 8, 2020 12:29

Merge branch 'master' into improve_complex_sqrt_log

dd608bf

StephanTLavavej mentioned this pull request Aug 10, 2020

Sporadic test failures after VS 2019 16.8 Preview 1 toolset update #1181

Closed

This comment has been minimized.

Sign in to view

statementreply added 2 commits August 14, 2020 21:05

Fix shadowing

09e805a

Add comments

d1e2ab8

mnatsuhara assigned cbezault Aug 19, 2020

statementreply commented Aug 27, 2020

View reviewed changes

stl/inc/complex Outdated Show resolved Hide resolved

Fix calling nonexistent ::signbit

0b281c7

statementreply commented Aug 27, 2020

View reviewed changes

stl/src/float_multi_prec.hpp Outdated Show resolved Hide resolved

StephanTLavavej mentioned this pull request Nov 4, 2020

<intrin0.h>: Copy/move intrinsics needed by <complex> #1424

Open

StephanTLavavej reviewed Nov 4, 2020

View reviewed changes

StephanTLavavej added 2 commits November 4, 2020 02:16

Merge branch 'master' into gh935_complex

792d0bb

Code review feedback.

2b39e7b

StephanTLavavej approved these changes Nov 4, 2020

View reviewed changes

StephanTLavavej removed their assignment Nov 4, 2020

This comment has been minimized.

Sign in to view

StephanTLavavej added a commit to StephanTLavavej/STL that referenced this pull request Nov 5, 2020

microsoftGH-935 <complex>: Improve numerical accuracy of sqrt and log

d9170d1

StephanTLavavej self-assigned this Nov 5, 2020

Add -Wno-unused-command-line-argument for internal tests

984c8a4

Merge branch 'master' into gh935

54801c9

StephanTLavavej added 2 commits November 8, 2020 18:16

Merge branch 'master' into gh935

6c37763

Use header-only code to fix /clr:pure.

5a089e9

Include arm64_neon.h, no arm64 subdirectory.

ba88342

StephanTLavavej approved these changes Nov 9, 2020

View reviewed changes

StephanTLavavej removed their assignment Nov 9, 2020

cbezault approved these changes Nov 9, 2020

View reviewed changes

StephanTLavavej merged commit 9959929 into microsoft:master Nov 9, 2020

StephanTLavavej mentioned this pull request Nov 11, 2020

Code cleanups: Unify _Float_traits and _Floating_type_traits #1442

Merged

StephanTLavavej mentioned this pull request Nov 20, 2020

<complex>: arm64_neon.h declares many non-_Ugly identifiers/macros #1489

Closed

statementreply deleted the improve_complex_sqrt_log branch April 17, 2021 10:55

AlexGuteniev mentioned this pull request Aug 22, 2021

<bit>: Expand test coverage to cl /arch:AVX2 and clang-cl /arch:AVX2 #2149

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<complex>: Improve numerical accuracy of sqrt and log #935

<complex>: Improve numerical accuracy of sqrt and log #935

statementreply commented Jun 29, 2020 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

StephanTLavavej left a comment

This comment has been minimized.

StephanTLavavej commented Nov 6, 2020

StephanTLavavej commented Nov 7, 2020

StephanTLavavej commented Nov 9, 2020 •

edited

Loading

StephanTLavavej commented Nov 9, 2020

<complex>: Improve numerical accuracy of sqrt and log #935

<complex>: Improve numerical accuracy of sqrt and log #935

Conversation

statementreply commented Jun 29, 2020 • edited Loading

This comment has been minimized.

This comment has been minimized.

StephanTLavavej left a comment

Choose a reason for hiding this comment

This comment has been minimized.

StephanTLavavej commented Nov 6, 2020

StephanTLavavej commented Nov 7, 2020

StephanTLavavej commented Nov 9, 2020 • edited Loading

Changelog

StephanTLavavej commented Nov 9, 2020

statementreply commented Jun 29, 2020 •

edited

Loading

StephanTLavavej commented Nov 9, 2020 •

edited

Loading