Propagate NaNs in the CPU min and max operators #21492

adamreeve · 2024-07-25T02:29:58Z

Description

Propagates NaN values in the min and max operators so that min or max with a NaN in either input always produces NaN.

Only fixes NaN propagation for float and double data types due to invalid read errors when testing NaNs with float16 data: #21492 (comment).

Motivation and Context

Fixes #21455

skottmckay · 2024-07-26T07:45:39Z

Should there be a test where the NaN is the scalar to check it is propagated throughout the broadcast?

e.g. input {2,2} with no NaN and {1} with a NaN should result in all NaN in the output IIUC

adamreeve · 2024-07-28T20:58:28Z

@microsoft-github-policy-service agree company="G-Research"

adamreeve · 2024-07-28T21:38:31Z

Should there be a test where the NaN is the scalar to check it is propagated throughout the broadcast?

Good idea thanks, I've added those tests now and fixed the formatting errors.

skottmckay · 2024-07-28T23:42:32Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

skottmckay · 2024-07-28T23:42:34Z

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

skottmckay · 2024-07-28T23:42:36Z

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

azure-pipelines · 2024-07-28T23:42:52Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2024-07-28T23:43:05Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-07-28T23:43:10Z

Azure Pipelines successfully started running 10 pipeline(s).

adamreeve · 2024-07-29T01:25:49Z

It looks like there are some issues with the asan runs and the MLFloat16 tests:

And the web builds also seem to have a problem with these new tests: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1447211&view=logs&j=990616d7-1f75-5fe5-5c67-c84a39482fba&t=67a495a8-8444-507d-6780-4eb1ddef3707&l=14255

The same problem was run into in #19984 (comment), where there weren't any changes to element_wise_ops.cc but there were tests added for handling NaNs with MLFloat16.

I'm not sure what could be causing this, it seems like a bug in Eigen rather than onnxruntime at first glance. The best way forward is probably to do the same as before and revert the MLFloat16 related changes for now, and I can make a new issue to follow up on this.

skottmckay · 2024-07-29T03:46:09Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

skottmckay · 2024-07-29T03:46:11Z

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

skottmckay · 2024-07-29T03:46:13Z

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

azure-pipelines · 2024-07-29T03:46:29Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2024-07-29T03:46:44Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-07-29T03:46:47Z

Azure Pipelines successfully started running 10 pipeline(s).

This makes min and max with NaN for either operand always return NaN for float16 data, matching the behaviour of float and double. The behaviour for floats and doubles was previously fixed for the CPU provider in #21492 and the CUDA provider in #19984, but these PRs didn't fix the behaviour for float16 due to tests causing asan errors. The memory access violations with float16 data have now been fixed in #22135, so this PR is a follow up to make float16 min and max behave the same as float and double for both the CPU and CUDA providers now that we can add tests for this. ### Motivation and Context Relevant previous issues (not float16 specific): * #21455 * onnx/onnx#6003

adamreeve added 3 commits July 25, 2024 14:18

Add tests to reproduce NaN propagation failure for min and max

7b89d01

Use PropagateNaN behaviour for Eigen min and max methods

359b3cc

Also fix min and max with float16

77d403a

adamreeve added 2 commits July 29, 2024 09:36

Fix formatting

f4a6d48

Test broadcasting scalar NaN

51ed042

microsoft deleted a comment from azure-pipelines bot Jul 28, 2024

skottmckay previously approved these changes Jul 28, 2024

View reviewed changes

Revert MLFloat16 changes (except checker fix and test name typos)

4abbd41

adamreeve dismissed skottmckay’s stale review via 4abbd41 July 29, 2024 01:41

skottmckay approved these changes Jul 29, 2024

View reviewed changes

skottmckay merged commit 7543dd0 into microsoft:main Jul 29, 2024
82 checks passed

adamreeve deleted the min_max_nan_fix branch July 29, 2024 23:25

adamreeve mentioned this pull request Jul 30, 2024

heap-buffer-overflow running address sanitizer on Min with float16 #21558

Closed

adamreeve mentioned this pull request Sep 20, 2024

Fix NaN propagation for float16 min and max operators #22161

Merged

luncliff mentioned this pull request Oct 2, 2024

[onnx,onnxruntime] new port for v1.19.2 with onnx 1.16.0 microsoft/vcpkg#36850

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate NaNs in the CPU min and max operators #21492

Propagate NaNs in the CPU min and max operators #21492

adamreeve commented Jul 25, 2024 •

edited

Loading

skottmckay commented Jul 26, 2024

adamreeve commented Jul 28, 2024

adamreeve commented Jul 28, 2024

skottmckay commented Jul 28, 2024

skottmckay commented Jul 28, 2024

skottmckay commented Jul 28, 2024

azure-pipelines bot commented Jul 28, 2024

azure-pipelines bot commented Jul 28, 2024

azure-pipelines bot commented Jul 28, 2024

adamreeve commented Jul 29, 2024

skottmckay commented Jul 29, 2024

skottmckay commented Jul 29, 2024

skottmckay commented Jul 29, 2024

azure-pipelines bot commented Jul 29, 2024

azure-pipelines bot commented Jul 29, 2024

azure-pipelines bot commented Jul 29, 2024

Propagate NaNs in the CPU min and max operators #21492

Propagate NaNs in the CPU min and max operators #21492

Conversation

adamreeve commented Jul 25, 2024 • edited Loading

Description

Motivation and Context

skottmckay commented Jul 26, 2024

adamreeve commented Jul 28, 2024

adamreeve commented Jul 28, 2024

skottmckay commented Jul 28, 2024

skottmckay commented Jul 28, 2024

skottmckay commented Jul 28, 2024

azure-pipelines bot commented Jul 28, 2024

azure-pipelines bot commented Jul 28, 2024

azure-pipelines bot commented Jul 28, 2024

adamreeve commented Jul 29, 2024

skottmckay commented Jul 29, 2024

skottmckay commented Jul 29, 2024

skottmckay commented Jul 29, 2024

azure-pipelines bot commented Jul 29, 2024

azure-pipelines bot commented Jul 29, 2024

azure-pipelines bot commented Jul 29, 2024

adamreeve commented Jul 25, 2024 •

edited

Loading