Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding format_float kernel #1572

Merged
merged 73 commits into from
Dec 18, 2023
Merged
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
0e7485c
wip
thirtiseven Oct 13, 2023
2c04fff
wip
thirtiseven Oct 18, 2023
6883988
Merge branch 'NVIDIA:branch-23.12' into float_to_string
thirtiseven Oct 18, 2023
cbce724
Add float to string kernel
thirtiseven Oct 18, 2023
8d7ead2
Update src/main/cpp/src/cast_float_to_string.cu
thirtiseven Oct 19, 2023
9ab2089
Update src/main/cpp/src/cast_float_to_string.cu
thirtiseven Oct 19, 2023
c3b3d64
address comments and use different precision for float
thirtiseven Oct 19, 2023
ebb1238
a runnable format_number demo
thirtiseven Oct 30, 2023
007cf5e
rewrite the solution with ryu
thirtiseven Nov 6, 2023
1264317
update license
thirtiseven Nov 6, 2023
a87a403
clean up
thirtiseven Nov 7, 2023
267a421
wip
thirtiseven Nov 13, 2023
979dc39
Split ftos_converter out
thirtiseven Nov 13, 2023
4c75bc7
clean up
thirtiseven Nov 13, 2023
744d0df
Merge branch 'float_to_string' of https://github.com/thirtiseven/spar…
thirtiseven Nov 14, 2023
f1c11e6
resolve cudf conflicts
thirtiseven Nov 14, 2023
760799b
resolve cudf conflicts
thirtiseven Nov 14, 2023
bfba655
resolve cudf conflicts
thirtiseven Nov 14, 2023
ad27fee
resolve cudf conflicts
thirtiseven Nov 14, 2023
77841d9
Merge branch 'float_to_string' of https://github.com/thirtiseven/spar…
thirtiseven Nov 14, 2023
6728170
Merge branch 'thirtiseven-float_to_string' into float_to_string
thirtiseven Nov 14, 2023
40a4cb8
remove cudf changes
thirtiseven Nov 14, 2023
05f5517
remove cudf changes
thirtiseven Nov 14, 2023
07c961e
Merge branch 'NVIDIA:branch-23.12' into float_to_string
thirtiseven Nov 14, 2023
8ed59bd
add ryu
thirtiseven Nov 16, 2023
e3a983d
Merge branch 'float_to_string' into format_float
thirtiseven Nov 16, 2023
da2197b
Add copyright and notice
thirtiseven Nov 16, 2023
48a5d7a
Merge branch 'float_to_string' of https://github.com/thirtiseven/spar…
thirtiseven Nov 16, 2023
2c6cdcb
Fix copyrights and license
thirtiseven Nov 17, 2023
3228755
cudf conflict resolve
thirtiseven Nov 17, 2023
d79dd44
Merge branch 'float_to_string' into format_float
thirtiseven Nov 17, 2023
d7be0d7
Add format_float kernel
thirtiseven Nov 17, 2023
5397f12
clean up
thirtiseven Nov 17, 2023
8aeeb6b
Fixed two bugs
thirtiseven Nov 20, 2023
a6578c7
Added a failed case back
thirtiseven Nov 20, 2023
9b7fb4a
Refactor
thirtiseven Nov 20, 2023
41967d9
Handle d=0 case
thirtiseven Nov 20, 2023
dc570cb
Add nv apache license to ftos_converter
thirtiseven Nov 21, 2023
96333ca
Add nv apache license to ftos_converter
thirtiseven Nov 21, 2023
c36ce94
Fix an rounding bug
thirtiseven Nov 21, 2023
360a77b
Update src/main/cpp/src/ftos_converter.cu
thirtiseven Nov 21, 2023
ced33b6
address some comments
thirtiseven Nov 22, 2023
0b0a473
Merge remote-tracking branch 'upstream/branch-24.02' into format_float
thirtiseven Nov 23, 2023
08f73ac
Merge branch 'float_to_string' into format_float
thirtiseven Nov 23, 2023
199e1db
Merge remote-tracking branch 'upstream/branch-24.02' into float_to_st…
thirtiseven Nov 23, 2023
131e48c
cudf conflict
thirtiseven Nov 23, 2023
3c09c49
Update src/main/cpp/src/cast_float_to_string.cu
thirtiseven Nov 23, 2023
346c1f7
Make it runable again
thirtiseven Nov 23, 2023
98918ce
address some comments
thirtiseven Nov 23, 2023
b78e3b3
addressed comments
thirtiseven Nov 27, 2023
d2cba4f
Address comments
thirtiseven Nov 27, 2023
62aa3ba
Merge branch 'NVIDIA:branch-24.02' into float_to_string
thirtiseven Dec 4, 2023
04d1c4f
clang format
thirtiseven Dec 4, 2023
388cb50
Address comments
thirtiseven Dec 4, 2023
54fa73c
Address comments
thirtiseven Dec 4, 2023
1f49d5a
Merge branch 'NVIDIA:branch-24.02' into format_float
thirtiseven Dec 4, 2023
6f4dae8
Merge branch 'float_to_string' into format_float
thirtiseven Dec 4, 2023
c71cf9d
Merge branch 'format_float' of https://github.com/thirtiseven/spark-r…
thirtiseven Dec 4, 2023
a40c388
Merge branch 'format_float' of https://github.com/thirtiseven/spark-r…
thirtiseven Dec 4, 2023
ea2325b
Merge branch 'format_float' of https://github.com/thirtiseven/spark-r…
thirtiseven Dec 4, 2023
3d19638
address comments
thirtiseven Dec 7, 2023
b9bccee
Merge branch 'branch-24.02' into format_float
thirtiseven Dec 8, 2023
8d02a3f
fix build after upmerge
thirtiseven Dec 8, 2023
62ff4f7
move inf/nan replacement to kernel
thirtiseven Dec 12, 2023
10bfe09
Apply suggestions from code review
thirtiseven Dec 13, 2023
e264ba9
address comments
thirtiseven Dec 13, 2023
eab61eb
Apply suggestions from code review
thirtiseven Dec 15, 2023
9892cae
address comments
thirtiseven Dec 15, 2023
8bf5b1c
cudf
thirtiseven Dec 15, 2023
81ba4a0
cudf
thirtiseven Dec 15, 2023
efb2736
format
thirtiseven Dec 15, 2023
0505d71
cudf reset
thirtiseven Dec 15, 2023
20415e7
Apply suggestions from code review
thirtiseven Dec 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/main/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@ add_library(
src/ZOrderJni.cpp
src/bloom_filter.cu
src/cast_decimal_to_string.cu
src/format_float.cu
src/cast_float_to_string.cu
src/cast_string.cu
src/cast_string_to_float.cu
Expand Down
16 changes: 16 additions & 0 deletions src/main/cpp/src/CastStringJni.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplication.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done. I didn't find those copyright issues when upmerging.

* Copyright (c) 2022-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
Expand Down Expand Up @@ -125,6 +126,21 @@ JNIEXPORT jlong JNICALL Java_com_nvidia_spark_rapids_jni_CastStrings_fromFloat(J
CATCH_CAST_EXCEPTION(env, 0);
}

JNIEXPORT jlong JNICALL Java_com_nvidia_spark_rapids_jni_CastStrings_fromFloatWithFormat(
JNIEnv* env, jclass, jlong input_column, jint digits)
{
JNI_NULL_CHECK(env, input_column, "input column is null", 0);

try {
cudf::jni::auto_set_device(env);

auto const& cv = *reinterpret_cast<cudf::column_view const*>(input_column);
return cudf::jni::release_as_jlong(
spark_rapids_jni::format_float(cv, digits, cudf::get_default_stream()));
}
CATCH_CAST_EXCEPTION(env, 0);
}

JNIEXPORT jlong JNICALL Java_com_nvidia_spark_rapids_jni_CastStrings_fromDecimal(JNIEnv* env,
jclass,
jlong input_column)
Expand Down
8 changes: 7 additions & 1 deletion src/main/cpp/src/cast_string.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
* Copyright (c) 2022-2023-2023, NVIDIA CORPORATION.
thirtiseven marked this conversation as resolved.
Show resolved Hide resolved
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -115,6 +115,12 @@ std::unique_ptr<cudf::column> string_to_float(
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

std::unique_ptr<cudf::column> format_float(
cudf::column_view const& input,
int digits,
thirtiseven marked this conversation as resolved.
Show resolved Hide resolved
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

std::unique_ptr<cudf::column> float_to_string(
cudf::column_view const& input,
rmm::cuda_stream_view stream,
Expand Down
131 changes: 131 additions & 0 deletions src/main/cpp/src/format_float.cu
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include "cast_string.hpp"
#include "ftos_converter.cuh"

#include <cudf/column/column_device_view.cuh>
#include <cudf/detail/null_mask.hpp>
#include <cudf/detail/nvtx/ranges.hpp>
#include <cudf/strings/detail/strings_children.cuh>
#include <cudf/utilities/type_dispatcher.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <rmm/exec_policy.hpp>
thirtiseven marked this conversation as resolved.
Show resolved Hide resolved

namespace spark_rapids_jni {

namespace detail {
namespace {

template <typename FloatType>
struct format_float_fn {
cudf::column_device_view d_floats;
int const digits;
hyperbolic2346 marked this conversation as resolved.
Show resolved Hide resolved
cudf::size_type* d_offsets;
char* d_chars;

__device__ cudf::size_type compute_output_size(FloatType value, int digits) const
hyperbolic2346 marked this conversation as resolved.
Show resolved Hide resolved
{
bool constexpr is_float = std::is_same_v<FloatType, float>;
return static_cast<cudf::size_type>(
ftos_converter::compute_format_float_size(static_cast<double>(value), digits, is_float));
}

__device__ void format_float(cudf::size_type idx, int digits) const
{
auto const value = d_floats.element<FloatType>(idx);
bool constexpr is_float = std::is_same_v<FloatType, float>;
auto const output = d_chars + d_offsets[idx];
ftos_converter::format_float(static_cast<double>(value), digits, is_float, output);
}

__device__ void operator()(cudf::size_type idx) const
thirtiseven marked this conversation as resolved.
Show resolved Hide resolved
{
if (d_floats.is_null(idx)) {
if (d_chars == nullptr) { d_offsets[idx] = 0; }
return;
}
if (d_chars != nullptr) {
format_float(idx, digits);
} else {
d_offsets[idx] = compute_output_size(d_floats.element<FloatType>(idx), digits);
}
}
};

/**
* @brief This dispatch method is for converting floats into strings.
*
* The template function declaration ensures only float types are allowed.
*/
struct dispatch_format_float_fn {
template <typename FloatType, CUDF_ENABLE_IF(std::is_floating_point_v<FloatType>)>
std::unique_ptr<cudf::column> operator()(cudf::column_view const& floats,
int digits,
thirtiseven marked this conversation as resolved.
Show resolved Hide resolved
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const
{
auto const strings_count = floats.size();
if (strings_count == 0) { return cudf::make_empty_column(cudf::type_id::STRING); }

auto const input_ptr = cudf::column_device_view::create(floats, stream);

auto [offsets, chars] = cudf::strings::detail::make_strings_children(
format_float_fn<FloatType>{*input_ptr, digits}, strings_count, stream, mr);

return cudf::make_strings_column(strings_count,
std::move(offsets),
std::move(chars),
floats.null_count(),
cudf::detail::copy_bitmask(floats, stream, mr));
}

// non-float types throw an exception
template <typename T, CUDF_ENABLE_IF(not std::is_floating_point_v<T>)>
std::unique_ptr<cudf::column> operator()(cudf::column_view const&,
int,
thirtiseven marked this conversation as resolved.
Show resolved Hide resolved
rmm::cuda_stream_view,
rmm::mr::device_memory_resource*) const
{
CUDF_FAIL("Values for format_float function must be a float type.");
}
};

} // namespace

// This will convert all float column types into a strings column.
std::unique_ptr<cudf::column> format_float(cudf::column_view const& floats,
int digits,
thirtiseven marked this conversation as resolved.
Show resolved Hide resolved
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
return type_dispatcher(floats.type(), dispatch_format_float_fn{}, floats, digits, stream, mr);
}

} // namespace detail

// external API
std::unique_ptr<cudf::column> format_float(cudf::column_view const& floats,
int digits,
thirtiseven marked this conversation as resolved.
Show resolved Hide resolved
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
return detail::format_float(floats, digits, stream, mr);
}

} // namespace spark_rapids_jni
Loading