Utilities for decimal <--> floating conversion #15359

pmattione-nvidia · 2024-03-21T04:08:20Z

These are some utilities used by the upcoming decimal <--> floating conversion PR. This has been submitted separately from that PR in order to spread out the complexity for review. These functions are not called by any code in this PR.

One function is used to extract the components of the floating point number. Another function is used to set a floating point's sign bit and add some additional powers of two. These are done using integer and bit operations, which is much faster than using the built-in functions and bottle-necking on the FP64 pipeline. The final function is used to count the # of significant bits in a number.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

cpp/include/cudf/fixed_point/floating_conversion.hpp

Co-authored-by: Mark Harris <[email protected]>

harrism

Thanks for cooperating on reviews and revisions!

pmattione-nvidia · 2024-05-20T14:34:36Z

@pmattione-nvidia I started a review on this but there are at least 3 conversations that were resolved without the corresponding changes being applied. Can you please verify that all previous review requests were applied?

All of these changes have been applied and committed.

cpp/include/cudf/fixed_point/floating_conversion.hpp

bdice · 2024-05-20T14:46:18Z

cpp/include/cudf/fixed_point/floating_conversion.hpp

+    IntegralType integer_rep = bit_cast_to_integer(floating);
+
+    // Set the sign bit
+    integer_rep |= (IntegralType(is_negative) << sign_bit_index);


Is this right? If the sign bit is already 1, it's not possible to change it with an OR operator, regardless of is_negative. Are we assuming the sign bit is always 0 when entering this function? If so, that is not documented.

Are there test cases for this code?

The sign is always zero on input, I'll add that comment. It's not part of this PR, but when I switch from the old decimal/floating conversion code to this, the fixed_point tests test all of this functionality, and all of those tests pass.

Comment added

Might I suggest a reminder in the comment added that 0 means positive value, so the or will do what we want. Simply saying it must be positive implies the reader knows this.

I don't like the design of this function.

If is_negative is true, this should be a no-op -- we shouldn't even cast back and forth.

Where is this being called? It doesn't appear to be used in this PR, and it doesn't seem like it's designed for public consumption.

Can we avoid bitwise manipulation entirely and just negate the value? i.e. return is_negative ? -floating : floating;

This PR contains utilities for the upcoming primary PR for the decimal <--> floating conversion. I broke the code into several PRs as it is quite large. So nothing is calling this yet.

This specific function is used at the end of decimal --> floating. In the main algorithm we do a lot of bit-shifting so the sign is initially cleared, and here we are setting it at the end. Doing it this way (cast + or) requires no branching, so you don't pay a performance penalty for the branch. A couple months ago I did a bunch of benchmarking trying different methods and this method was the fastest.

cpp/include/cudf/fixed_point/floating_conversion.hpp

bdice · 2024-05-20T14:51:49Z

cpp/include/cudf/fixed_point/floating_conversion.hpp

+ * @param value The integer whose bits are being counted
+ * @return The number of significant bits: the # of bits - # of leading zeroes
+ */
+template <typename T, typename cuda::std::enable_if_t<(cuda::std::is_unsigned_v<T>)>* = nullptr>


I believe the trait is_unsigned_v will return true for types like uint16_t. However, only 32/64/128 bit values appear to be handled below. Should we be more specific in this trait, and only accept 32/64/128-bit unsigned integers? (Will we get a compile error currently for unsupported types?)

There's a static assert at the start of the function call so it won't compile. I'll add a note to the doxygen template parameter that only 32/64/128 bit values are allowed.

Using both SFINAE and a static_assert seems unnecessary. Can you just make this plain SFINAE?

Co-authored-by: Bradley Dice <[email protected]>

into convert_utils

hyperbolic2346

I can see how this is forming into a solution and I like it. I also appreciate you breaking this up into multiple parts.

cpp/include/cudf/fixed_point/floating_conversion.hpp

hyperbolic2346 · 2024-05-20T21:59:01Z

cpp/include/cudf/fixed_point/floating_conversion.hpp

+   * @param floating The floating value to extract the significand from
+   * @return The integral significand, bit-shifted to a (large) whole number
+   */
+  CUDF_HOST_DEVICE inline static IntegralType get_base2_value(FloatingType floating)


It would be undesirable to call these in succession. I know that this isn't the plan, but in a year or two it may not be all that obvious that

auto b = get_base2_value(f); auto s = get_is_negative(f); auto e = get_exp2(f);

isn't ideal. Would it make sense to change these to take the integer representation so that implementers later will default to good behavior when they see the need of somethin they already have?

auto const bits = bit_cast_to_integer(f); auto b = get_base2_value(bits); auto s = get_is_negative(bits); // using bits because they already have that! yay! ...

Well, the compiler is able to optimize it so that there's no difference at runtime. However it's probably a good idea to factor this out anyway, will do.

cpp/include/cudf/fixed_point/floating_conversion.hpp

hyperbolic2346 · 2024-05-20T22:04:16Z

cpp/include/cudf/fixed_point/floating_conversion.hpp

+    IntegralType integer_rep = bit_cast_to_integer(floating);
+
+    // Set the sign bit
+    integer_rep |= (IntegralType(is_negative) << sign_bit_index);


Might I suggest a reminder in the comment added that 0 means positive value, so the or will do what we want. Simply saying it must be positive implies the reader knows this.

cpp/include/cudf/fixed_point/floating_conversion.hpp

bdice

I left another round of comments, mostly in reply to existing threads.

bdice

I have no further comments aside from the discussion on overflow. https://github.com/rapidsai/cudf/pull/15359/files#r1617755579

I will approve this to unblock further work, but I feel uncertain about the correctness and performance of this PR (and the decimal/float work in general). I hope an extensive set of tests and benchmarks can be added to support this functionality.

pmattione-nvidia · 2024-05-29T18:47:46Z

/merge

This PR contains the main algorithm for the new decimal <--> floating conversion code. This algorithm was written to address the precision issues described [here](#14169). ### Summary * The new algorithm is more accurate than the previous code, but it is also far more complex. * It can perform conversions that were not even possible in the old code due to overflow (decimal32/64/128 conversions only worked for scale factors up to 10^9/18/38, respectively). Now the entire floating-point range is convertible, including denormals. * This new algorithm is significantly faster in some parts of the conversion phase-space, and in some parts slightly slower. ### Previous PR's These contain the supporting parts of this work: * [Explicit conversion PR](#15438) * [Benchmarking PR](#15334) * [Powers-of-10 PR](#15353) * [Utilities PR](#15359). These utilities are updated here to support denormals. ### Algorithm Outline We convert floating -> (integer) decimal by: * Extract the floating-point mantissa (converted to integer) and power-of-2 * For float we use a uint64 to contain our data during the below shifting/scaling, for double uint128_t * In this shifting integer, we alternately apply the extracted powers-of-2 (bit-shifts, until they're all used) and scale-factor powers-of-10 (multiply/divide) as needed to reach the desired scale factor. Decimal -> floating is just the reverse operation. ### Supplemental Changes * Testing: Add decimal128, add precise-conversion tests. Remove kludges due to inaccurate conversions. Add test for zeroes. * Benchmarking: Enable regions of conversion phase-space for benchmarking that were not possible in the old algorithm. * Unary: Cleanup by using CUDF_ENABLE_IF. Call new conversion code for base-10 fixed-point. ### Performance for various conversions/input-ranges * Note: F32/F64 is float/double New algorithm is **FASTER** by: * F64 --> decimal64: 60% for E8 --> E15 * F64 --> decimal128: 13% for E-8 --> E-15 * F64 --> decimal128: 22% for E8 --> E15 * F64 --> decimal128: 27% for E31 --> E38 * decimal32 --> F64: 18% for E-3 --> E4 * decimal64 --> F64: 27% for E-14 --> E-7 * decimal64 --> F64: 17% for E-3 --> E4 * decimal128 --> F64: 21% for E-14 --> E-7 * decimal128 --> F64: 11% for E-3 --> E4 * decimal128 --> F64: 13% for E31 --> E38 New algorithm is **SLOWER** by: * F32 --> decimal32: 3% for E-3 --> E4 * F32 --> decimal64: 2% for E-14 --> E14 * F64 --> decimal32: 3% for E-3 --> E4 * decimal32 --> F32: 5% for E-3 --> E4 * decimal128 --> F64: 36% for E-37 --> E-30 Other kernels: * The PYMOD binary-op benchmark is 7% slower. ### Performance discussion * Many conversions have identical speed, indicating these algorithms are often fast and we are instead bottlenecked on overheads such as getting the input to the gpu in the first place. * F64 conversions are often much faster than the old algorithm as the new algorithm completely avoids the FP64 pipeline. Other than the cast to double itself, all of the operations are on integers. Thus we don't have threads competing with each other and taking turns for access to the floating-point cores. * The conversions are slightly slower for floats with powers-of-10 near zero. Presumably this is due to code overhead for e.g., handling a large range of inputs, UB-checks for bit shifts, branches for denormals, etc. * The conversion is slower for decimal128 conversions with very small exponents, which requires several large divisions (128bit divided by 64bit). * The PYMOD kernel is slower due to register pressure from the introduction of the new division routines in the earlier PR. Even though this benchmark does not perform decimal <--> floating conversions, it gets hit because of inlined template code in the kernel increasing the code/register pressure. Authors: - Paul Mattione (https://github.com/pmattione-nvidia) Approvers: - Jason Lowe (https://github.com/jlowe) - Bradley Dice (https://github.com/bdice) - Mike Wilson (https://github.com/hyperbolic2346) URL: #15905

Utility functions for new decimal/float conversion

31aec0e

pmattione-nvidia added libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Mar 21, 2024

pmattione-nvidia self-assigned this Mar 21, 2024

pmattione-nvidia requested review from a team as code owners March 21, 2024 04:08

pmattione-nvidia requested review from wence-, mroeschke, vuule and davidwendt March 21, 2024 04:08

github-actions bot added Python Affects Python cuDF API. CMake CMake build issue conda Java Affects Java cuDF API. labels Mar 21, 2024

pmattione-nvidia changed the base branch from branch-24.04 to branch-24.06 March 21, 2024 04:08

pmattione-nvidia added improvement Improvement / enhancement to an existing function Python Affects Python cuDF API. and removed Python Affects Python cuDF API. labels Mar 21, 2024

ttnghia removed request for a team March 21, 2024 06:18

ttnghia removed Python Affects Python cuDF API. CMake CMake build issue conda labels Mar 21, 2024

Fix typo

107a86f

harrism reviewed May 6, 2024

View reviewed changes

cpp/include/cudf/fixed_point/floating_conversion.hpp Outdated Show resolved Hide resolved

Update cpp/include/cudf/fixed_point/floating_conversion.hpp

309566c

Co-authored-by: Mark Harris <[email protected]>

harrism approved these changes May 6, 2024

View reviewed changes

bdice reviewed May 20, 2024

View reviewed changes

pmattione-nvidia and others added 3 commits May 20, 2024 11:04

Update cpp/include/cudf/fixed_point/floating_conversion.hpp

f4f5819

Co-authored-by: Bradley Dice <[email protected]>

Address feedback

41b205f

Merge branch 'convert_utils' of https://github.com/pmattione-nvidia/cudf

ec398c3

into convert_utils

hyperbolic2346 requested changes May 20, 2024

View reviewed changes

Address more feedback

eeed7bc

pmattione-nvidia changed the base branch from branch-24.06 to branch-24.08 May 21, 2024 14:14

pmattione-nvidia and others added 4 commits May 21, 2024 10:15

Merge branch 'branch-24.08' into convert_utils

9c09519

Use enable-if instead of static_assert

59d6edd

Merge branch 'branch-24.08' into convert_utils

b2cc813

Fix whitespace

d3348cf

pmattione-nvidia requested review from hyperbolic2346, ttnghia and bdice May 28, 2024 13:20

bdice requested changes May 28, 2024

View reviewed changes

bdice approved these changes May 28, 2024

View reviewed changes

pmattione-nvidia added 2 commits May 28, 2024 16:38

Handle exponent under/overflow

77c84c3

Fix bias definition

150a115

hyperbolic2346 approved these changes May 28, 2024

View reviewed changes

Merge branch 'branch-24.08' into convert_utils

dcc4819

rapids-bot bot merged commit 12336da into rapidsai:branch-24.08 May 29, 2024
70 checks passed

This was referenced May 30, 2024

[BUG] casting from float32 to Decimal64Dtype is resulting in incorrect values #14169

Closed

New Decimal <--> Floating conversion #15905

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilities for decimal <--> floating conversion #15359

Utilities for decimal <--> floating conversion #15359

pmattione-nvidia commented Mar 21, 2024 •

edited

Loading

harrism left a comment

pmattione-nvidia commented May 20, 2024

bdice May 20, 2024

pmattione-nvidia May 20, 2024

pmattione-nvidia May 20, 2024

hyperbolic2346 May 20, 2024

pmattione-nvidia May 28, 2024

bdice May 28, 2024 •

edited

Loading

pmattione-nvidia May 28, 2024

bdice May 20, 2024

pmattione-nvidia May 20, 2024 •

edited

Loading

pmattione-nvidia May 20, 2024

bdice May 21, 2024

pmattione-nvidia May 21, 2024

hyperbolic2346 left a comment

hyperbolic2346 May 20, 2024

pmattione-nvidia May 21, 2024

pmattione-nvidia May 21, 2024

hyperbolic2346 May 20, 2024

bdice left a comment

bdice left a comment

pmattione-nvidia commented May 29, 2024

Utilities for decimal <--> floating conversion #15359

Utilities for decimal <--> floating conversion #15359

Conversation

pmattione-nvidia commented Mar 21, 2024 • edited Loading

Checklist

harrism left a comment

Choose a reason for hiding this comment

pmattione-nvidia commented May 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice May 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmattione-nvidia May 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hyperbolic2346 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

pmattione-nvidia commented May 29, 2024

pmattione-nvidia commented Mar 21, 2024 •

edited

Loading

bdice May 28, 2024 •

edited

Loading

pmattione-nvidia May 20, 2024 •

edited

Loading