Recognize NaN operands in Min and Max ops #19984

tpboudreau · 2024-03-19T23:04:35Z

Description

Update the Min and Max CUDA math operations on float/double types to propagate NaNs: if either operand is NaN, the result should be NaN.

TODO: float16/bfloat16 need similar change.

Motivation

Currently, results differ between the CPU and CUDA implementations of the floating point Min and Max operators: the CPU operators correctly return NaN results if either operand is NaN. This PR updates the CUDA implementations to conform with this correct behavior.

See the the issue and comments raised here.

Context

Same behavior in numpy, torch and Java:

>>> numpy.min([numpy.NAN, 1])
nan
>>> numpy.max([numpy.NAN, 1])
nan

>>> torch.min(torch.tensor([1, float('nan')]))
tensor(nan)
>>> torch.max(torch.tensor([1, float('nan')]))
tensor(nan)

C languguage fmin and fmax has different behavior:

fmax(NaN,1) = 1
fmin(NaN,1) = 1

https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2273.pdf

tianleiwu · 2024-03-19T23:57:39Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

tianleiwu · 2024-03-19T23:57:40Z

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

tianleiwu · 2024-03-19T23:57:40Z

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

azure-pipelines · 2024-03-19T23:58:03Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2024-03-19T23:58:21Z

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines · 2024-03-19T23:58:24Z

Azure Pipelines successfully started running 10 pipeline(s).

onnxruntime/test/providers/cpu/math/element_wise_ops_test.cc

tianleiwu · 2024-03-20T06:04:54Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

tianleiwu · 2024-03-20T06:04:54Z

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

tianleiwu · 2024-03-20T06:04:55Z

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

azure-pipelines · 2024-03-20T06:05:09Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2024-03-20T06:05:36Z

Azure Pipelines successfully started running 10 pipeline(s).

baijumeswani · 2024-03-20T18:00:11Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

azure-pipelines · 2024-03-20T18:01:23Z

Azure Pipelines successfully started running 10 pipeline(s).

baijumeswani · 2024-03-20T18:02:26Z

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

baijumeswani · 2024-03-20T18:02:46Z

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

azure-pipelines · 2024-03-20T18:02:58Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2024-03-20T18:03:20Z

Azure Pipelines successfully started running 10 pipeline(s).

tianleiwu · 2024-03-20T19:36:49Z

Any idea of buffer overflow:

1: [ RUN ] MathOpTest.Min_12_MLFloat16_Nan
1: =================================================================
1: ==5581==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60900006c1d2 at pc 0x000004062aad bp 0x7ffe126023c0 sp 0x7ffe126023b0
1: READ of size 2 at 0x60900006c1d2 thread T0
1: #0 0x4062aac in Eigen::internal::mapbase_evaluator<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> >, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const>::coeff(long) const /build/Debug/_deps/eigen-src/Eigen/src/Core/CoreEvaluators.h:917
1: #1 0x40600a6 in Eigen::internal::binary_evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::IndexBased, Eigen::internal::IndexBased, Eigen::half, Eigen::half>::coeff(long) const /build/Debug/_deps/eigen-src/Eigen/src/Core/CoreEvaluators.h:775
1: #2 0x405b37c in Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> > >, Eigen::internal::evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >, Eigen::internal::assign_op<Eigen::half, Eigen::half>, 0>::assignCoeff(long) /build/Debug/_deps/eigen-src/Eigen/src/Core/AssignEvaluator.h:660
1: #3 0x4054d8d in Eigen::internal::dense_assignment_loop<Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> > >, Eigen::internal::evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >, Eigen::internal::assign_op<Eigen::half, Eigen::half>, 0>, 1, 0>::run(Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> > >, Eigen::internal::evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >, Eigen::internal::assign_op<Eigen::half, Eigen::half>, 0>&) (/build/Debug/onnxruntime_test_all+0x4054d8d) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #4 0x4049957 in void Eigen::internal::call_dense_assignment_loop<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::assign_op<Eigen::half, Eigen::half> >(Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >&, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> const&, Eigen::internal::assign_op<Eigen::half, Eigen::half> const&) (/build/Debug/onnxruntime_test_all+0x4049957) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #5 0x4036752 in Eigen::internal::Assignment<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::assign_op<Eigen::half, Eigen::half>, Eigen::internal::Dense2Dense, void>::run(Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >&, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> const&, Eigen::internal::assign_op<Eigen::half, Eigen::half> const&) (/build/Debug/onnxruntime_test_all+0x4036752) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #6 0x40115c3 in void Eigen::internal::call_assignment_no_alias<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::assign_op<Eigen::half, Eigen::half> >(Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >&, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> const&, Eigen::internal::assign_op<Eigen::half, Eigen::half> const&) (/build/Debug/onnxruntime_test_all+0x40115c3) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #7 0x400a70e in void Eigen::internal::call_assignment<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::assign_op<Eigen::half, Eigen::half> >(Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >&, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> const&, Eigen::internal::assign_op<Eigen::half, Eigen::half> const&, Eigen::internal::enable_if<!Eigen::internal::evaluator_assume_aliasing<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::evaluator_traits<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >::Shape>::value, void*>::type) /build/Debug/_deps/eigen-src/Eigen/src/Core/AssignEvaluator.h:858
1: #8 0x400105f in void Eigen::internal::call_assignment<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >(Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >&, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> const&) /build/Debug/_deps/eigen-src/Eigen/src/Core/AssignEvaluator.h:836
1: #9 0x3fe1290 in Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >& Eigen::DenseBase<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> > >::operator=<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >(Eigen::DenseBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> > const&) (/build/Debug/onnxruntime_test_all+0x3fe1290) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #10 0x3f979ea in onnxruntime::MinMaxMLFloat16(onnxruntime::OpKernel const&, onnxruntime::OpKernelContext*)::{lambda(onnxruntime::BroadcastHelper&)#3}::operator()(onnxruntime::BroadcastHelper&) const (/build/Debug/onnxruntime_test_all+0x3f979ea) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #11 0x3f97a8d in onnxruntime::MinMaxMLFloat16(onnxruntime::OpKernel const&, onnxruntime::OpKernelContext*)::{lambda(onnxruntime::BroadcastHelper&)#3}::_FUN(onnxruntime::BroadcastHelper&) (/build/Debug/onnxruntime_test_all+0x3f97a8d) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #12 0x3f9ec02 in void onnxruntime::BroadcastLooperonnxruntime::BroadcastHelper(onnxruntime::BroadcastHelper&, onnxruntime::ProcessBroadcastSpanFuncs const&) /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:1014
1: #13 0x3f8118a in UntypedBroadcastVariadic /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:2035
1: #14 0x3f982c6 in onnxruntime::common::Status onnxruntime::MinMaxMLFloat16(onnxruntime::OpKernel const&, onnxruntime::OpKernelContext*) (/build/Debug/onnxruntime_test_all+0x3f982c6) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #15 0x3f6f081 in onnxruntime::Min_8::Compute(onnxruntime::OpKernelContext*) const /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:809
1: #16 0x5655b25 in onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) /onnxruntime_src/onnxruntime/core/framework/sequential_executor.cc:495
1: #17 0x5578d97 in onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) /onnxruntime_src/onnxruntime/core/framework/execution_steps.cc:73
1: #18 0x5706493 in onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) /onnxruntime_src/onnxruntime/core/framework/stream_execution_context.cc:222
1: #19 0x5656654 in operator() /onnxruntime_src/onnxruntime/core/framework/sequential_executor.cc:589
1: #20 0x565a7a7 in __invoke_impl<void, onnxruntime::ExecuteThePlan(const SessionState&, gsl::span, gsl::span, gsl::span, std::vector&, const std::unordered_map<long unsigned int, std::function<common::Status(const TensorShape&, const OrtDevice&, OrtValue&, bool&)> >&, const logging::Logger&, const DeviceStreamCollection*, bool const&, bool, bool)::<lambda()>&> /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/invoke.h:61
1: #21 0x565a5bc in __invoke_r<void, onnxruntime::ExecuteThePlan(const SessionState&, gsl::span, gsl::span, gsl::span, std::vector&, const std::unordered_map<long unsigned int, std::function<common::Status(const TensorShape&, const OrtDevice&, OrtValue&, bool&)> >&, const logging::Logger&, const DeviceStreamCollection*, bool const&, bool, bool)::<lambda()>&> /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/invoke.h:111
1: #22 0x5659ab5 in _M_invoke /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/std_function.h:290
1: #23 0x17ef75b in std::function<void ()>::operator()() const /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/std_function.h:591
1: #24 0x17df502 in onnxruntime::concurrency::ThreadPool::Schedule(onnxruntime::concurrency::ThreadPool*, std::function<void ()>) /onnxruntime_src/include/onnxruntime/core/platform/threadpool.h:233
1: #25 0x5656ec8 in onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<int const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection const*, bool const&, bool, bool) /onnxruntime_src/onnxruntime/core/framework/sequential_executor.cc:588
1: #26 0x575e9e8 in ExecuteGraphImpl /onnxruntime_src/onnxruntime/core/framework/utils.cc:633
1: #27 0x576057d in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollectionHolder&, bool, onnxruntime::Stream*) /onnxruntime_src/onnxruntime/core/framework/utils.cc:751
1: #28 0x57607d0 in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, ExecutionMode, OrtRunOptions const&, onnxruntime::DeviceStreamCollectionHolder&, onnxruntime::logging::Logger const&) /onnxruntime_src/onnxruntime/core/framework/utils.cc:778
1: #29 0x3896ccb in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >, std::vector<OrtDevice, std::allocator > const) /onnxruntime_src/onnxruntime/core/session/inference_session.cc:2508
1: #30 0x389bafd in onnxruntime::InferenceSession::Run(OrtRunOptions const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, OrtValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, OrtValue> > > const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >) /onnxruntime_src/onnxruntime/core/session/inference_session.cc:2716
1: #31 0x1d76c7d in void onnxruntime::test::BaseTester::ExecuteModelonnxruntime::InferenceSession(onnxruntime::Model&, onnxruntime::InferenceSession&, onnxruntime::test::BaseTester::ExpectResult, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, OrtRunOptions const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, OrtValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, OrtValue> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool) /onnxruntime_src/onnxruntime/test/providers/base_tester.cc:332
1: #32 0x1d6f2e5 in onnxruntime::test::BaseTester::ExecuteModelForEps(std::vector<std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider >, std::allocator<std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider > > >&&, onnxruntime::Model&, onnxruntime::SessionOptions, onnxruntime::test::BaseTester::ExpectResult, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, OrtRunOptions const*, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, OrtValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, OrtValue> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::vector<std::shared_ptronnxruntime::CustomRegistry, std::allocator<std::shared_ptronnxruntime::CustomRegistry > > const*, bool, bool, unsigned long*, unsigned long*) /onnxruntime_src/onnxruntime/test/providers/base_tester.cc:835

tpboudreau · 2024-03-20T21:07:45Z

Any idea of buffer overflow:

I have some leads, but it needs further research. Maybe I should reduce this PR to correcting Min/Max for only the float and double operand types -- that would cover many cases, including the original bug report -- and open a follow-up PR with fixes for the remaining 16-bit types after sorting out this issue?

tianleiwu · 2024-03-21T03:49:35Z

I have some leads, but it needs further research. Maybe I should reduce this PR to correcting Min/Max for only the float and double operand types -- that would cover many cases, including the original bug report -- and open a follow-up PR with fixes for the remaining 16-bit types after sorting out this issue?

That's fine.

tianleiwu · 2024-03-21T19:37:45Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

tianleiwu · 2024-03-21T19:37:46Z

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

tianleiwu · 2024-03-21T19:37:47Z

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

azure-pipelines · 2024-03-21T19:38:02Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2024-03-21T19:38:22Z

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines · 2024-03-21T19:38:26Z

Azure Pipelines successfully started running 10 pipeline(s).

tpboudreau · 2024-03-21T21:26:05Z

@tianleiwu -- thanks for the help and for reviewing so quickly!

### Description Update the Min and Max CUDA math operations on float/double types to propagate NaNs: if either operand is NaN, the result should be NaN. TODO: float16/bfloat16 need similar change. ### Motivation Currently, results differ between the CPU and CUDA implementations of the floating point Min and Max operators: the CPU operators correctly return NaN results if either operand is NaN. This PR updates the CUDA implementations to conform with this correct behavior. See the the issue and comments raised [here](onnx/onnx#6003). ### Context Same behavior in numpy, torch and Java: ``` >>> numpy.min([numpy.NAN, 1]) nan >>> numpy.max([numpy.NAN, 1]) nan >>> torch.min(torch.tensor([1, float('nan')])) tensor(nan) >>> torch.max(torch.tensor([1, float('nan')])) tensor(nan) ``` C languguage [fmin](https://en.cppreference.com/w/c/numeric/math/fmin) and [fmax](https://en.cppreference.com/w/c/numeric/math/fmax) has different behavior: ``` fmax(NaN,1) = 1 fmin(NaN,1) = 1 ``` https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf ![image](https://github.com/microsoft/onnxruntime/assets/30328909/62446cf1-f252-4ddc-8118-5ce605252331) https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2273.pdf

This makes min and max with NaN for either operand always return NaN for float16 data, matching the behaviour of float and double. The behaviour for floats and doubles was previously fixed for the CPU provider in #21492 and the CUDA provider in #19984, but these PRs didn't fix the behaviour for float16 due to tests causing asan errors. The memory access violations with float16 data have now been fixed in #22135, so this PR is a follow up to make float16 min and max behave the same as float and double for both the CPU and CUDA providers now that we can add tests for this. ### Motivation and Context Relevant previous issues (not float16 specific): * #21455 * onnx/onnx#6003

tpboudreau added 2 commits March 18, 2024 22:15

Recognize NaN operands in Min and Max ops

f202de6

Fix lint errors

25920cb

tianleiwu reviewed Mar 20, 2024

View reviewed changes

onnxruntime/test/providers/cpu/math/element_wise_ops_test.cc Outdated Show resolved Hide resolved

Exclude QNN execution provider for single precision float tests

2749a90

Limit tests to CPU and CUDA execution providers

dd4ee53

Remove 16-bit float operand changes

8907083

tianleiwu approved these changes Mar 21, 2024

View reviewed changes

tianleiwu merged commit 983fd83 into microsoft:main Mar 21, 2024
80 of 82 checks passed

This was referenced Jul 23, 2024

Incorrect NaN handling for Min and Max operators on CPU with a single element input #21455

Closed

Propagate NaNs in the CPU min and max operators #21492

Merged

heap-buffer-overflow running address sanitizer on Min with float16 #21558

Closed

adamreeve mentioned this pull request Sep 20, 2024

Fix NaN propagation for float16 min and max operators #22161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recognize NaN operands in Min and Max ops #19984

Recognize NaN operands in Min and Max ops #19984

tpboudreau commented Mar 19, 2024 •

edited by tianleiwu

Loading

tianleiwu commented Mar 19, 2024

tianleiwu commented Mar 19, 2024

tianleiwu commented Mar 19, 2024

azure-pipelines bot commented Mar 19, 2024

azure-pipelines bot commented Mar 19, 2024

azure-pipelines bot commented Mar 19, 2024

tianleiwu commented Mar 20, 2024

tianleiwu commented Mar 20, 2024

tianleiwu commented Mar 20, 2024

azure-pipelines bot commented Mar 20, 2024

azure-pipelines bot commented Mar 20, 2024

baijumeswani commented Mar 20, 2024

azure-pipelines bot commented Mar 20, 2024

baijumeswani commented Mar 20, 2024

baijumeswani commented Mar 20, 2024

azure-pipelines bot commented Mar 20, 2024

azure-pipelines bot commented Mar 20, 2024

tianleiwu commented Mar 20, 2024

tpboudreau commented Mar 20, 2024

tianleiwu commented Mar 21, 2024

tianleiwu commented Mar 21, 2024

tianleiwu commented Mar 21, 2024

tianleiwu commented Mar 21, 2024

azure-pipelines bot commented Mar 21, 2024

azure-pipelines bot commented Mar 21, 2024

azure-pipelines bot commented Mar 21, 2024

tpboudreau commented Mar 21, 2024

Recognize NaN operands in Min and Max ops #19984

Recognize NaN operands in Min and Max ops #19984

Conversation

tpboudreau commented Mar 19, 2024 • edited by tianleiwu Loading

Description

Motivation

Context

tianleiwu commented Mar 19, 2024

tianleiwu commented Mar 19, 2024

tianleiwu commented Mar 19, 2024

azure-pipelines bot commented Mar 19, 2024

azure-pipelines bot commented Mar 19, 2024

azure-pipelines bot commented Mar 19, 2024

tianleiwu commented Mar 20, 2024

tianleiwu commented Mar 20, 2024

tianleiwu commented Mar 20, 2024

azure-pipelines bot commented Mar 20, 2024

azure-pipelines bot commented Mar 20, 2024

baijumeswani commented Mar 20, 2024

azure-pipelines bot commented Mar 20, 2024

baijumeswani commented Mar 20, 2024

baijumeswani commented Mar 20, 2024

azure-pipelines bot commented Mar 20, 2024

azure-pipelines bot commented Mar 20, 2024

tianleiwu commented Mar 20, 2024

tpboudreau commented Mar 20, 2024

tianleiwu commented Mar 21, 2024

tianleiwu commented Mar 21, 2024

tianleiwu commented Mar 21, 2024

tianleiwu commented Mar 21, 2024

azure-pipelines bot commented Mar 21, 2024

azure-pipelines bot commented Mar 21, 2024

azure-pipelines bot commented Mar 21, 2024

tpboudreau commented Mar 21, 2024

tpboudreau commented Mar 19, 2024 •

edited by tianleiwu

Loading