Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compatibility for NumPy 2.0 #21085

Merged
merged 13 commits into from
Jun 27, 2024
Merged

Add compatibility for NumPy 2.0 #21085

merged 13 commits into from
Jun 27, 2024

Conversation

snnn
Copy link
Member

@snnn snnn commented Jun 18, 2024

Description

As suggested by SciPy's doc, we will
Build against NumPy 2.0.0, then it will work for all NumPy versions with the same major version number (NumPy does maintain backwards ABI compatibility), and as far back as NumPy 1.19 series at the time of writing

I think it works because in numpyconfig.h#L64 there is a macro NPY_FEATURE_VERSION. By default it is set to NPY_1_19_API_VERSION. And the NPY_FEATURE_VERSION macro controls ABI.

This PR only upgrade the build time dependency; When a user installs ONNX Runtime, they still can use numpy 1.x.

Motivation and Context

@snnn snnn added the python Pull requests that update Python code label Jun 18, 2024
@snnn snnn linked an issue Jun 18, 2024 that may be closed by this pull request
@snnn snnn marked this pull request as ready for review June 19, 2024 00:39
@snnn snnn requested review from a team as code owners June 19, 2024 00:39
@snnn
Copy link
Member Author

snnn commented Jun 19, 2024

@yufenglee, there is a test error in test_quant_util.py. I think it is caused by precision. Maybe we should just relax the tolerance. Can you help take a look?

@jywu-msft
Copy link
Member

ONNX Runtime Python Test Pipeline (Packages_Somking_Test Test_MAC_Wheels Python310) seem to test nightly packages so the failure would be expected then?

@jywu-msft
Copy link
Member

@yufenglee, there is a test error in test_quant_util.py. I think it is caused by precision. Maybe we should just relax the tolerance. Can you help take a look?

2 errors on ARM build. x64 is ok.

=====================================================================
FAIL: test_compute_scale_zp (test_quant_util.TestQuantUtil)

Traceback (most recent call last):
File "/build/Release/quantization/test_quant_util.py", line 64, in test_compute_scale_zp
self.assertEqual(
AssertionError: Lists differ: [0.0, 3.921568634268624e-07] != [0, 3.921568627450981e-07]

First differing element 1:
3.921568634268624e-07
3.921568627450981e-07

  • [0.0, 3.921568634268624e-07]
  • [0, 3.921568627450981e-07]

======================================================================
FAIL: test_quantize_blockwise_bnb4 (test_quantizeblockwise_bnb4.TestQuantizeBlockwiseBnb4)

Traceback (most recent call last):
File "/build/Release/quantization/test_quantizeblockwise_bnb4.py", line 134, in test_quantize_blockwise_bnb4
assert np.allclose(quant_value_ref, quant_value)
AssertionError

@jywu-msft
Copy link
Member

@yufenglee, there is a test error in test_quant_util.py. I think it is caused by precision. Maybe we should just relax the tolerance. Can you help take a look?

2 errors on ARM build. x64 is ok.

=====================================================================

FAIL: test_compute_scale_zp (test_quant_util.TestQuantUtil)
Traceback (most recent call last): File "/build/Release/quantization/test_quant_util.py", line 64, in test_compute_scale_zp self.assertEqual( AssertionError: Lists differ: [0.0, 3.921568634268624e-07] != [0, 3.921568627450981e-07]

First differing element 1: 3.921568634268624e-07 3.921568627450981e-07

  • [0.0, 3.921568634268624e-07]

  • [0, 3.921568627450981e-07]

======================================================================

FAIL: test_quantize_blockwise_bnb4 (test_quantizeblockwise_bnb4.TestQuantizeBlockwiseBnb4)
Traceback (most recent call last): File "/build/Release/quantization/test_quantizeblockwise_bnb4.py", line 134, in test_quantize_blockwise_bnb4 assert np.allclose(quant_value_ref, quant_value) AssertionError

+@jambayk who added test_quantizeblockwise_bnb4.py
+@adrianlizarraga , who added that failing test_compute_scale_zp test

@snnn snnn marked this pull request as draft June 19, 2024 07:19
@xadupre
Copy link
Member

xadupre commented Jun 19, 2024

Among the changes which could introduce some numerical differences (see [https://numpy.org/devdocs/numpy_2_0_migration_guide.html#changes-to-numpy-data-type-promotion](Changes to NumPy data type promotion)).

The default integer type on Windows is now int64 rather than int32, matching the behavior on other platforms,
np.float32(3) + 3. now returns a float32 when it previously returned a float64.
np.array([3], dtype=np.float32) + np.float64(3) will now return a float64 array. (The higher precision of the scalar is not ignored.)

About the failing test, the tested function compute_scale_zp is impacted by those changes. The numerical discrepancies look ok to me.

I think we should also add a warning in build.py to let the users know if numpy 1.x is installed when building the python binding and tell them onnxruntime is not compatible with numpy 2.0 in that case. This warning could becone an error if this condition is met during one of the pipeline building the release.

@adrianlizarraga
Copy link
Contributor

=====================================================================

FAIL: test_compute_scale_zp (test_quant_util.TestQuantUtil)
Traceback (most recent call last): File "/build/Release/quantization/test_quant_util.py", line 64, in test_compute_scale_zp self.assertEqual( AssertionError: Lists differ: [0.0, 3.921568634268624e-07] != [0, 3.921568627450981e-07]
First differing element 1: 3.921568634268624e-07 3.921568627450981e-07

  • [0.0, 3.921568634268624e-07]
  • [0, 3.921568627450981e-07]

+@adrianlizarraga , who added that failing test_compute_scale_zp test

That discrepancy looks ok. Wrapping the expected value in np.float32() may fix the test, or we can increase the tolerance since the difference is very small.

@yufenglee
Copy link
Member

@adrianlizarraga, could you please add a commit to fix both numeric discrepancy issues?

@snnn
Copy link
Member Author

snnn commented Jun 19, 2024

Talked to @yufenglee offline. We need to change the assertEqual function calls to numpy.testing.assert_allclose in test_quant_util.py

@snnn snnn mentioned this pull request Jun 20, 2024
@adrianlizarraga
Copy link
Contributor

@snn @yufenglee I pushed an update that fixes both numerical accuracies

@adrianlizarraga
Copy link
Contributor

Now there's a different error with test/onnx/gen_test_models.py:

2024-06-20 20:40:38,658 build [INFO] - /Library/Frameworks/Python.framework/Versions/3.11/bin/python3 /Users/runner/work/1/s/onnxruntime/test/onnx/gen_test_models.py --output_dir test_models
Traceback (most recent call last):
File "/Users/runner/work/1/s/onnxruntime/test/onnx/gen_test_models.py", line 199, in
test_abs(args.output_dir)
File "/Users/runner/work/1/s/onnxruntime/test/onnx/gen_test_models.py", line 147, in test_abs
np.uint16([-32767, -4, 0, 3, 32767]),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python integer -32767 out of bounds for uint16

@snnn
Copy link
Member Author

snnn commented Jun 20, 2024

It was my code. I will fix it.

@snnn snnn marked this pull request as ready for review June 21, 2024 05:25
@snnn
Copy link
Member Author

snnn commented Jun 21, 2024

/azp run Windows CPU CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@snnn
Copy link
Member Author

snnn commented Jun 21, 2024

/azp run orttraining-amd-gpu-ci-pipeline , Linux MIGraphX CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@snnn snnn changed the title Update numpy to 2.0 Add compatibility for NumPy 2.0 Jun 21, 2024
@snnn snnn requested a review from jywu-msft June 21, 2024 16:18
@jywu-msft
Copy link
Member

do we need some kind of test for testing 1.x and 2.x compatibility with these numpy 2.0 builds?

@snnn
Copy link
Member Author

snnn commented Jun 26, 2024

do we need some kind of test for testing 1.x and 2.x compatibility with these numpy 2.0 builds?

Obviously, the answer is yes. I manually tested it. But there is not coverage in our pipeline. In theory we have a lot of things to test. For example, we build our onnxruntime on Windows Server 2022 and we claim the binaries are compatible with Windows 10, but we do not have automated tests to coverage that. Similarly, we only test our python package with the latest ONNX version, while we should test it with all history ONNX version from 1.2. Due to limited resources we have, we cannot do most of the things.

@jywu-msft
Copy link
Member

do we need some kind of test for testing 1.x and 2.x compatibility with these numpy 2.0 builds?

Obviously, the answer is yes. I manually tested it. But there is not coverage in our pipeline. In theory we have a lot of things to test. For example, we build our onnxruntime on Windows Server 2022 and we claim the binaries are compatible with Windows 10, but we do not have automated tests to coverage that. Similarly, we only test our python package with the latest ONNX version, while we should test it with all history ONNX version from 1.2. Due to limited resources we have, we cannot do most of the things.

ok yeah, understood.
change looks mostly good otherwise. the only other thing I had was that previous behavior the pipelines built against specific numpy versions. now we just have numpy in the various requirements.txt under docker directories. this means we will just build with whatever the latest is? should we pin it to 2.0.0 ?

@snnn
Copy link
Member Author

snnn commented Jun 26, 2024

this means we will just build with whatever the latest is? should we pin it to 2.0.0 ?

I think either is fine. The difference is: if we pin it, we won't get a surprise when numpy published a new version that is incompatible with 1.21.6(now it supports as low as 1.19). The downside is: we need to manually update the pinned version number when numpy updates, which means more work. Now in the requirements.txt files, some packages are pinned, some packages are not pinned. What do you think is better?

@snnn
Copy link
Member Author

snnn commented Jun 26, 2024

Let me add a commit to pin them.

@jywu-msft
Copy link
Member

this means we will just build with whatever the latest is? should we pin it to 2.0.0 ?

I think either is fine. The difference is: if we pin it, we won't get a surprise when numpy published a new version that is incompatible with 1.21.6(now it supports as low as 1.19). The downside is: we need to manually update the pinned version number when numpy updates, which means more work. Now in the requirements.txt files, some packages are pinned, some packages are not pinned. What do you think is better?

i don't think we will need to update numpy that soon after pinning to 2.0.0
agree that some dependencies are pinned, some are not isn't a good state. but we can fix it one at a time.

@snnn snnn force-pushed the snnn/update_numpy branch 2 times, most recently from 2278aa4 to 35d81af Compare June 26, 2024 18:32
@snnn snnn requested a review from jywu-msft June 26, 2024 23:16
@snnn
Copy link
Member Author

snnn commented Jun 27, 2024

@jywu-msft, I updated the requirements.txt files. Would mind reviewing them again?

@snnn snnn merged commit d1ab94c into main Jun 27, 2024
99 checks passed
@snnn snnn deleted the snnn/update_numpy branch June 27, 2024 20:50
jchen351 added a commit that referenced this pull request Jul 22, 2024
…21106)

### Description
Replace inline pip install with pip install from requirements*.txt



### Motivation and Context
so that CG can recognize

### Dependency

- [x] #21085
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Numpy v2.0
5 participants