cuda::std::complex specializations for half and bfloat #1140

griwes · 2023-11-21T23:42:12Z

Description

Resolves #1139

Introduce specializations of complex<T> for half and bfloat.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

Additional checklist

The documentation contains the actual release this will be made available in.

libcudacxx/include/cuda/std/detail/libcxx/include/__type_traits/promote.h

libcudacxx/include/cuda/std/detail/libcxx/include/cmath

miscco

That is a great job working around the quirks of those types 👏

I would love to move some of the traits around (e.g. into is_floating_point.h) and importantly add a proper named define that one can grep for.

libcudacxx/include/cuda/std/detail/libcxx/include/__type_traits/promote.h

libcudacxx/include/cuda/std/detail/libcxx/include/cmath

libcudacxx/include/cuda/std/detail/libcxx/include/complex

libcudacxx/test/libcudacxx/std/numerics/complex.number/cmplx.over/real.pass.cpp

gonzalobg

LGTM in general, thanks for working on this @griwes !
I think I missed some static_asserts for the size and alignment of complex half and bfloat, do we have these somewhere? Thanks!

miscco

I am wondering whether we should just keep all the _LIBCUDACXX_HAS_NO_NVFP16 in place and define it conditionally for host

libcudacxx/include/cuda/std/detail/libcxx/include/__type_traits/promote.h

libcudacxx/include/cuda/std/detail/libcxx/include/cmath

Specifically: * disable BF16 when FP16 is disabled, since the former includes the latter; * disable both when the toolkit version is lower than 12.2, since 12.2 is when both types got the host versions of a lot of functions we need to make useful heterogeneous things with them; * disable both in host-only TU, as there's no easy way I could find to detect the condition above. I've included an opt-in macro for asserting that the headers (if available) are from a sufficiently new CTK, will add that to docs in a later commit.

NVCC is spewing code that makes various versions of clang unhappy about a deprecated implicit copy constructor of a lambda wrapper, so just work around that by not using one.

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/cmath_nvfp16.h

libcudacxx/include/cuda/std/detail/libcxx/include/__type_traits/promote.h

libcudacxx/include/cuda/std/detail/libcxx/include/cmath

libcudacxx/include/cuda/std/detail/libcxx/include/complex

libcudacxx/docs/standard_api/numerics_library/complex.md

Co-authored-by: Wesley Maxey <[email protected]>

leofang · 2024-03-11T20:24:58Z

Note: As discussed offline, local tests show that at least on sm86/89 we need this patch for performance reasons. I haven't had a chance to test on sm70/80/90, though.

diff --git a/libcudacxx/include/cuda/std/detail/libcxx/include/complex b/libcudacxx/include/cuda/std/detail/libcxx/include/complex
index 3ba249779..416c0e71d 100644
--- a/libcudacxx/include/cuda/std/detail/libcxx/include/complex
+++ b/libcudacxx/include/cuda/std/detail/libcxx/include/complex
@@ -1702,6 +1702,16 @@ atanh(const complex<_Tp>& __x)
     return complex<_Tp>(__constexpr_copysign(__z.real(), __x.real()), __constexpr_copysign(__z.imag(), __x.imag()));
 }
 
+// we add a specialization for fp16 atanh because of performance issues
+template<>
+_LIBCUDACXX_INLINE_VISIBILITY complex<__half>
+atanh(const complex<__half>& __x)
+{
+    complex<float> __temp(__x);
+    __temp = _CUDA_VSTD::atanh(__temp);
+    return complex<__half>(__temp.real(), __temp.imag());
+}
+
 // sinh
 
 template<class _Tp>
@@ -1815,6 +1825,16 @@ atan(const complex<_Tp>& __x)
     return complex<_Tp>(__z.imag(), -__z.real());
 }
 
+// we add a specialization for fp16 atanh because of performance issues
+template<>
+_LIBCUDACXX_INLINE_VISIBILITY complex<__half>
+atan(const complex<__half>& __x)
+{
+    complex<float> __temp(__x);
+    __temp = _CUDA_VSTD::atan(__temp);
+    return complex<__half>(__temp.real(), __temp.imag());
+}
+
 // sin
 
 template<class _Tp>

miscco · 2024-03-12T08:08:40Z

@leofang I added some workarounds for asinh acosh atanh and cosh

griwes requested review from a team as code owners November 21, 2023 23:42

griwes requested review from ericniebler and alliepiper and removed request for a team November 21, 2023 23:42

griwes marked this pull request as draft November 21, 2023 23:45

jrhemstad reviewed Nov 22, 2023

View reviewed changes

libcudacxx/include/cuda/std/detail/libcxx/include/__type_traits/promote.h Outdated Show resolved Hide resolved

jrhemstad reviewed Nov 22, 2023

View reviewed changes

libcudacxx/include/cuda/std/detail/libcxx/include/cmath Outdated Show resolved Hide resolved

griwes force-pushed the feature/small-complex branch 2 times, most recently from 65e6f36 to 744f2d1 Compare November 22, 2023 19:57

miscco reviewed Nov 23, 2023

View reviewed changes

gonzalobg reviewed Nov 27, 2023

View reviewed changes

griwes force-pushed the feature/small-complex branch 3 times, most recently from c2d87c2 to add3d52 Compare January 27, 2024 05:52

miscco requested changes Jan 30, 2024

View reviewed changes

griwes added 10 commits February 6, 2024 20:56

Complex for small float types.

e578bfc

Add a missing bfloat include to promote.h.

4e81a2a

Only include under cudacc, constexpr fixes.

cc0eeda

Add tests and special cases for cmath functions for half/bfloat.

b9b831f

Add an opt-out from including bf16, and respect CUB's opt-out.

5c2b197

Detect existence of both fp headers, fix C++11.

b139717

Silence unused function warnings from cuda_bf16.h in clang.

a126862

Test fixes.

a7e1ec9

Address review comments.

8121bba

griwes force-pushed the feature/small-complex branch from f2893fa to 8121bba Compare February 7, 2024 04:57

griwes added 2 commits February 6, 2024 22:51

Use structs instead of extended lambdas in the float tests.

d33debb

NVCC is spewing code that makes various versions of clang unhappy about a deprecated implicit copy constructor of a lambda wrapper, so just work around that by not using one.

Use the correct type to silence an msvc warning this time.

67cdec9

miscco added 4 commits February 23, 2024 09:58

namespaces...

3796172

Actually make the cmath subheaders work

1ab3940

Do not mess up namespaces around includes

0073d2d

Use proper qualification

a6e2a06

miscco approved these changes Feb 26, 2024

View reviewed changes

griwes added 3 commits February 27, 2024 12:47

Add a reference to the hisinf NVCC bug.

53acac9

Remove no longer needed #ifs.

aae4fa5

Merge remote-tracking branch 'origin/main' into feature/small-complex

3c6f0f9

griwes added the libcu++ For all items related to libcu++ label Feb 27, 2024

miscco mentioned this pull request Feb 28, 2024

Add cuda::ptx::cp_reduce_async_bulk #1445

Merged

2 tasks

griwes commented Feb 29, 2024

View reviewed changes

libcudacxx/docs/standard_api/numerics_library/complex.md Outdated Show resolved Hide resolved

Update the docs to mention the 2.4.0 version

ec05b1c

leofang mentioned this pull request Mar 6, 2024

Revisit: Using upstream Thrust complex headers and drop the vendored ones cupy/cupy#8222

Open

wmaxey reviewed Mar 7, 2024

View reviewed changes

libcudacxx/docs/standard_api/numerics_library/complex.md Outdated Show resolved Hide resolved

wmaxey approved these changes Mar 7, 2024

View reviewed changes

miscco and others added 2 commits March 11, 2024 18:46

Update libcudacxx/docs/standard_api/numerics_library/complex.md

0a7b850

Co-authored-by: Wesley Maxey <[email protected]>

Merge branch 'main' into feature/small-complex

df9a93c

miscco enabled auto-merge (squash) March 11, 2024 17:47

miscco added 2 commits March 11, 2024 19:12

Fix half and bfloat in ptx header

81172ac

Merge branch 'main' into pr/griwes/1140

8d7742f

miscco added 3 commits March 12, 2024 08:47

Actually define the half / bfloat constructors from float / double

07c2ab6

Add fallbacks for trigonometrix functions for half / float

74c86e8

Actually reorg the whole half / bfloat organization

b0c5a10

miscco added 4 commits March 12, 2024 01:08

Merge branch 'main' into feature/small-complex

7283372

Add inline to the trigonomentric specilaizations

fafff52

Add missing host device

1b1e449

Drop long double

4b01e5b

miscco merged commit ae0ee04 into NVIDIA:main Mar 12, 2024
584 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda::std::complex specializations for half and bfloat #1140

cuda::std::complex specializations for half and bfloat #1140

griwes commented Nov 21, 2023 •

edited

Loading

miscco left a comment

gonzalobg left a comment

miscco left a comment

leofang commented Mar 11, 2024

miscco commented Mar 12, 2024

cuda::std::complex specializations for half and bfloat #1140

cuda::std::complex specializations for half and bfloat #1140

Conversation

griwes commented Nov 21, 2023 • edited Loading

Description

Checklist

Additional checklist

miscco left a comment

Choose a reason for hiding this comment

gonzalobg left a comment

Choose a reason for hiding this comment

miscco left a comment

Choose a reason for hiding this comment

leofang commented Mar 11, 2024

miscco commented Mar 12, 2024

griwes commented Nov 21, 2023 •

edited

Loading