Skip to content

Commit

Permalink
apacheGH-20213: [C++] Implement cast to/from halffloat (apache#40067)
Browse files Browse the repository at this point in the history
### Rationale for this change

### What changes are included in this PR?

This PR implements casting to and from float16 types using the vendored float16 library included in arrow at `cpp/arrrow/util/float16.*`.

### Are these changes tested?

Unit tests are included in this PR.

### Are there any user-facing changes?

In that casts to and from float16 will now work, yes.

* Closes: apache#20213

### TODO

- [x] Add casts to/from float64.
- [x] String <-> float16 casts.
- [x] Integer <-> float16 casts.
- [x] Tests.
- [x] Update https://github.com/apache/arrow/blob/main/docs/source/status.rst about half float.
- [x] Rebase.
- [x] Run clang format over this PR.
* GitHub Issue: apache#20213

Authored-by: Clif Houck <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
  • Loading branch information
ClifHouck authored and rok committed May 8, 2024
1 parent cd1cda2 commit dcbc2d1
Show file tree
Hide file tree
Showing 3 changed files with 86 additions and 76 deletions.
30 changes: 26 additions & 4 deletions cpp/src/arrow/ipc/json_simple_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,21 @@ class TestIntegers : public ::testing::Test {

TYPED_TEST_SUITE_P(TestIntegers);

template <typename DataType>
std::vector<typename DataType::c_type> TestIntegersMutateIfNeeded(
std::vector<typename DataType::c_type> data) {
return data;
}

// TODO: This works, but is it the right way to do this?
template <>
std::vector<HalfFloatType::c_type> TestIntegersMutateIfNeeded<HalfFloatType>(
std::vector<HalfFloatType::c_type> data) {
std::for_each(data.begin(), data.end(),
[](HalfFloatType::c_type& value) { value = Float16(value).bits(); });
return data;
}

TYPED_TEST_P(TestIntegers, Basics) {
using T = TypeParam;
using c_type = typename T::c_type;
Expand All @@ -197,16 +212,17 @@ TYPED_TEST_P(TestIntegers, Basics) {
auto type = this->type();

AssertJSONArray<T>(type, "[]", {});
AssertJSONArray<T>(type, "[4, 0, 5]", {4, 0, 5});
AssertJSONArray<T>(type, "[4, null, 5]", {true, false, true}, {4, 0, 5});
AssertJSONArray<T>(type, "[4, 0, 5]", TestIntegersMutateIfNeeded<T>({4, 0, 5}));
AssertJSONArray<T>(type, "[4, null, 5]", {true, false, true},
TestIntegersMutateIfNeeded<T>({4, 0, 5}));

// Test limits
const auto min_val = std::numeric_limits<c_type>::min();
const auto max_val = std::numeric_limits<c_type>::max();
std::string json_string = JSONArray(0, 1, min_val);
AssertJSONArray<T>(type, json_string, {0, 1, min_val});
AssertJSONArray<T>(type, json_string, TestIntegersMutateIfNeeded<T>({0, 1, min_val}));
json_string = JSONArray(0, 1, max_val);
AssertJSONArray<T>(type, json_string, {0, 1, max_val});
AssertJSONArray<T>(type, json_string, TestIntegersMutateIfNeeded<T>({0, 1, max_val}));
}

TYPED_TEST_P(TestIntegers, Errors) {
Expand Down Expand Up @@ -273,6 +289,12 @@ INSTANTIATE_TYPED_TEST_SUITE_P(TestUInt8, TestIntegers, UInt8Type);
INSTANTIATE_TYPED_TEST_SUITE_P(TestUInt16, TestIntegers, UInt16Type);
INSTANTIATE_TYPED_TEST_SUITE_P(TestUInt32, TestIntegers, UInt32Type);
INSTANTIATE_TYPED_TEST_SUITE_P(TestUInt64, TestIntegers, UInt64Type);
// FIXME: I understand that HalfFloatType is backed by a uint16_t, but does it
// make sense to run this test over it?
// The way ConvertNumber for HalfFloatType is currently written, it allows the
// conversion of floating point notation to a half float, which causes failures
// in this test, one example is asserting 0.0 cannot be parsed as a half float.
// INSTANTIATE_TYPED_TEST_SUITE_P(TestHalfFloat, TestIntegers, HalfFloatType);

template <typename T>
class TestStrings : public ::testing::Test {
Expand Down
7 changes: 2 additions & 5 deletions cpp/src/arrow/util/value_parsing.cc
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,8 @@ bool StringToFloat(const char* s, size_t length, char decimal_point, uint16_t* o
float temp_out;
const auto res =
::arrow_vendored::fast_float::from_chars_advanced(s, s + length, temp_out, options);
const bool ok = res.ec == std::errc() && res.ptr == s + length;
if (ok) {
*out = Float16::FromFloat(temp_out).bits();
}
return ok;
*out = Float16::FromFloat(temp_out).bits();
return res.ec == std::errc() && res.ptr == s + length;
}

// ----------------------------------------------------------------------
Expand Down
125 changes: 58 additions & 67 deletions docs/source/status.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,54 +28,54 @@ stated, the Python, R, Ruby and C/GLib libraries follow the C++ Arrow library.
Data Types
==========

+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Data type | C++ | Java | Go | JS | C# | Rust | Julia | Swift | nanoarrow |
| (primitive) | | | | | | | | | |
+===================+=======+=======+=======+====+=======+=======+=======+=======+===========+
| Null ||||| || | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Boolean ||||| || || |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Int8/16/32/64 ||||| || || |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| UInt8/16/32/64 ||||| || || |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Float16 || ✓ (1) ||| ✓ (2)| ✓ || | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Float32/64 ||||| || || |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Decimal128 ||||| || | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Decimal256 ||||| || | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Date32/64 ||||| || || |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Time32/64 ||||| || || |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Timestamp ||||| || | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Duration ||||| || | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Interval |||| | || | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Fixed Size Binary ||||| || | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Binary ||||| || || |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Large Binary ||||| | | | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Utf8 ||||| || || |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Large Utf8 ||||| | | | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Binary View || || | | | | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Large Binary View || || | | | | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Utf8 View || || | | | | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Large Utf8 View || || | | | | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Data type | C++ | Java | Go | JavaScript | C# | Rust | Julia | Swift |
| (primitive) | | | | | | | | |
+===================+=======+=======+=======+============+=======+=======+=======+=======+
| Null |||| || | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Boolean |||| || |||
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Int8/16/32/64 |||| || |||
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| UInt8/16/32/64 |||| || |||
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Float16 || ✓ (1) || | ✓ (2)| ✓ || |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Float32/64 |||| || |||
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Decimal128 |||| || | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Decimal256 |||| || | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Date32/64 |||| || |||
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Time32/64 |||| || |||
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Timestamp |||| || | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Duration |||| || | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Interval |||| || | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Fixed Size Binary |||| || | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Binary |||| || |||
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Large Binary |||| | | | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Utf8 |||| || |||
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Large Utf8 |||| | | | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Binary View || || | | | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Large Binary View || || | | | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Utf8 View || || | | | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Large Utf8 View || || | | | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+

+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Data type | C++ | Java | Go | JS | C# | Rust | Julia | Swift | nanoarrow |
Expand All @@ -100,25 +100,16 @@ Data Types
| Sparse Union |||||||| ||
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+

+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Data type | C++ | Java | Go | JS | C# | Rust | Julia | Swift | nanoarrow |
| (special) | | | | | | | | | |
+===================+=======+=======+=======+====+=======+=======+=======+=======+===========+
| Dictionary || ✓ (3) |||| ✓ (3) || ||
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Extension |||| | ||| ||
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Run-End Encoded || || | | | | | |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+

+-----------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Canonical | C++ | Java | Go | JavaScript | C# | Rust | Julia | Swift |
| Extension types | | | | | | | | |
+=======================+=======+=======+=======+============+=======+=======+=======+=======+
| Fixed shape tensor || | | | | | | |
+-----------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Variable shape tensor | | | | | | | | |
+-----------------------+-------+-------+-------+------------+-------+-------+-------+-------+
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Data type | C++ | Java | Go | JavaScript | C# | Rust | Julia | Swift |
| (special) | | | | | | | | |
+===================+=======+=======+=======+============+=======+=======+=======+=======+
| Dictionary || ✓ (3) |||| ✓ (3) || |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Extension |||| | ||| |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+
| Run-End Encoded || || | | | | |
+-------------------+-------+-------+-------+------------+-------+-------+-------+-------+

Notes:

Expand Down

0 comments on commit dcbc2d1

Please sign in to comment.