Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable LTO optimization by default for runtime releases. #16811

Merged
merged 2 commits into from
Mar 18, 2024
Merged

Conversation

stellaraccident
Copy link
Collaborator

@stellaraccident stellaraccident commented Mar 17, 2024

This is done by generalizing the primordial IREE_SIZE_OPTIMIZED flag into a IREE_RUNTIME_OPTIMIZATION_PROFILE that:

  • Can enable 'lto' or 'size'.
  • Is scoped to just the runtime targets.
  • Minimally does the right thing for 'size' on Linux vs just on Windows (not the goal of this patch but drops ~300KB from binary sizes when enabled).

The compile time delta for a clean build of the runtime in full LTO vs regular mode was not measured precisely but is in the noise (i.e. <1m). As such, just enabling by default for Python release binaries.

Others can be enabled via: -DIREE_RUNTIME_OPTIMIZATION_PROFILE=lto, which is recommended for benchmarking, etc.

Note that this removes the use of the CMake option IREE_SIZE_OPTIMIZED. It was never even declared properly as an option and didn't do the same class of thing across Windows/Linux. This has been fixed and it can be enabled via -DIREE_RUNTIME_OPTIMIZATION_PROFILE=size. Note that as on Windows, this implies LTO. If old behavior without LTO is desired, we can add a profile for that.

Progress on #898.

@stellaraccident
Copy link
Collaborator Author

@ScottTodd I expect that the change of the linking mode has effectively just fuzzed execution a bit and caused a latent bug to surface in the tooling. I'm going to squash this branch for easy access but could use your help to resolve/land.

This is done by generalizing the primordial `IREE_SIZE_OPTIMIZED` flag into a `IREE_RUNTIME_OPTIMIZATION_PROFILE` that:

* Can enable 'lto' or 'size'.
* Is scoped to just the runtime targets.
* Minimally does the right thing for 'size' on Linux vs just on Windows (not the goal of this patch but drops ~300KB from binary sizes when enabled).

The compile time delta for a clean build of the runtime in full LTO vs regular mode was not measured precisely but is in the noise (i.e. <1m). As such, just enabling by default for Python release binaries.

Others can be enabled via: `-DIREE_RUNTIME_OPTIMIZATION_PROFILE=lto`, which is recommended for benchmarking, etc.

Progress on #898.
@ScottTodd
Copy link
Member

@ScottTodd I expect that the change of the linking mode has effectively just fuzzed execution a bit and caused a latent bug to surface in the tooling. I'm going to squash this branch for easy access but could use your help to resolve/land.

Interesting. I can help tomorrow. A few thoughts from a quick look:

  • Are the failure consistent, or flaky?

  • If the failures are in the same tests each run, could exclude or mark XFAIL

  • Could run a few times at HEAD to see if the PR itself is changing behavior

  • I've seen what look like system (driver/hardware/OS/etc.) flakes on the AMDGPU runners (mostly for ROCm, but also for Vulkan). Not sure if I've seen CPU flakes. Reducing parallelism helped with some classes of flakes (resource exhaustion or multithreaded compilation across dozens of runs at once hitting race conditions... idk)

  • Diagnostics are lacking... wonder if that's from not having debug symbols or what. Would really like more info from the tests...

    https://github.com/openxla/iree/actions/runs/8319107825/job/22761993141?pr=16811#step:10:45

    =================================== FAILURES ===================================
    _____ IREE compile and run: test_constantofshape_float_ones::cpu_llvm_sync _____
    [gw88] linux -- Python 3.11.8 /runner-root/actions-runner/_work/iree/iree/venv/bin/python
    Error invoking iree-run-module
    Error code: 245
    Stderr diagnostics:
    
    Stdout diagnostics:
    EXEC @test_constantofshape_float_ones
    
    Compiled with:
      cd /runner-root/actions-runner/_work/iree/iree/SHARK-TestSuite/iree_tests/onnx/node/generated/test_constantofshape_float_ones && iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model_cpu_llvm_sync.vmfb
    
    Run with:
      cd /runner-root/actions-runner/_work/iree/iree/SHARK-TestSuite/iree_tests/onnx/node/generated/test_constantofshape_float_ones && iree-run-module --module=model_cpu_llvm_sync.vmfb --device=local-sync --flagfile=test_data_flags.txt
    

@ScottTodd
Copy link
Member

Retried the failing jobs... same test failures on both CPU and Vulkan. So could mark as XFAIL to unblock this. Should still investigate though.

@ScottTodd
Copy link
Member

On my Windows machine I see the compiler crashing on these test cases at tip of tree and with this PR:

FAILED onnx/node/generated/test_shape_end_1/model.mlir::cpu_llvm_sync
FAILED onnx/node/generated/test_shape_start_1_end_negative_1/model.mlir::cpu_llvm_sync
FAILED onnx/node/generated/test_shape_start_1/model.mlir::cpu_llvm_sync
FAILED onnx/node/generated/test_shape_start_negative_1/model.mlir::cpu_llvm_sync
FAILED onnx/node/generated/test_shape_start_1_end_2/model.mlir::cpu_llvm_sync
FAILED onnx/node/generated/test_shape_end_negative_1/model.mlir::cpu_llvm_sync

So the CPU failures on CI here are strange... Could skip them instead of XFAIL to keep the same set of tests running across platforms.

@ScottTodd
Copy link
Member

Would really help to have asserts enabled in the pkgci build that these tests use. That is disabled right now: https://github.com/openxla/iree/blob/ee32fc7b74b4aad49fc46d67ea6c1f617416c4d4/.github/workflows/pkgci_build_packages.yml#L81-L83

@stellaraccident
Copy link
Collaborator Author

Would really help to have asserts enabled in the pkgci build that these tests use. That is disabled right now:

https://github.com/openxla/iree/blob/ee32fc7b74b4aad49fc46d67ea6c1f617416c4d4/.github/workflows/pkgci_build_packages.yml#L81-L83

Fwiw, I think I fixed the issue behind that todo un another setup. But yeah.

Ok, I'm taking all of this as orthogonal to this patch (but still needs addressing). How do I mark those tests? (Sorry, running a few things here...)

@ScottTodd
Copy link
Member

How do I mark those tests?

https://github.com/openxla/iree/tree/main/experimental/regression_suite/external_test_suite the CPU and Vulkan JSON files. Can add to either skip_compile_tests to exclude entirely or expected_run_failures.

CPU should add these:

test_constantofshape_float_ones
test_dropout_default_mask_ratio
test_reduce_min_empty_set

Vulkan should add these:

test_constantofshape_float_ones
test_dropout_default_mask_ratio

@ScottTodd ScottTodd added infrastructure Relating to build systems, CI, or testing performance ⚡ Performance/optimization related work across the compiler and runtime labels Mar 18, 2024
@ScottTodd
Copy link
Member

Okay, CI is happy now.

@ScottTodd ScottTodd merged commit b61a918 into main Mar 18, 2024
58 checks passed
@ScottTodd ScottTodd deleted the runtime_lto branch March 18, 2024 19:52
stellaraccident added a commit that referenced this pull request Mar 19, 2024
This is done by generalizing the primordial `IREE_SIZE_OPTIMIZED` flag
into a `IREE_RUNTIME_OPTIMIZATION_PROFILE` that:

* Can enable 'lto' or 'size'.
* Is scoped to just the runtime targets.
* Minimally does the right thing for 'size' on Linux vs just on Windows
(not the goal of this patch but drops ~300KB from binary sizes when
enabled).

The compile time delta for a clean build of the runtime in full LTO vs
regular mode was not measured precisely but is in the noise (i.e. <1m).
As such, just enabling by default for Python release binaries.

Others can be enabled via: `-DIREE_RUNTIME_OPTIMIZATION_PROFILE=lto`,
which is recommended for benchmarking, etc.

Note that this removes the use of the CMake option
`IREE_SIZE_OPTIMIZED`. It was never even declared properly as an option
and didn't do the same class of thing across Windows/Linux. This has
been fixed and it can be enabled via
`-DIREE_RUNTIME_OPTIMIZATION_PROFILE=size`. Note that as on Windows,
this implies LTO. If old behavior without LTO is desired, we can add a
profile for that.

Progress on #898.

---------

Co-authored-by: Scott Todd <[email protected]>
Comment on lines 24 to +26
-DIREE_SIZE_OPTIMIZED=ON \
-DIREE_FORCE_LTO_COMPAT_BINUTILS_ON_LINUX=size \
-DIREE_FORCE_GCC_BINUTILS_ON_LINUX=ON \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been

-  -DIREE_SIZE_OPTIMIZED=ON \
-  -DIREE_FORCE_LTO_COMPAT_BINUTILS_ON_LINUX=size \
-  -DIREE_FORCE_GCC_BINUTILS_ON_LINUX=ON \
+  -DIREE_RUNTIME_OPTIMIZATION_PROFILE=lto" \
+  -DIREE_FORCE_LTO_COMPAT_BINUTILS_ON_LINUX=ON" \

Fixing in #18164

ScottTodd added a commit that referenced this pull request Aug 8, 2024
Similar to #18162, this reduces the
reliance on scripts in
[`build_tools/cmake/`](https://github.com/iree-org/iree/tree/main/build_tools/cmake)
by inlining the CMake commands used for these build configurations.

Summary of changes:

* Deleted `build_tools/cmake/build_runtime_small.sh` and
`build_tools/cmake/build_runtime_tracing.sh`
* Fixed size-optimized ("small") options used in the "small runtime" job
and resolved an unused variable warning/error in
`runtime/src/iree/hal/drivers/hip/dynamic_symbols.c` that slipped
through. When we landed #16811, it
used old CMake variables in the
`build_tools/cmake/build_runtime_small.sh` script. Oops!
* Renamed jobs so they all begin with `runtime`

    | name before | name after |
    -- | --
    `build_test_runtime` | `runtime`
    `small_runtime` | `runtime_small`
     `tracing` | `runtime_tracing`

Seeing these runtime jobs next to each other, we could expand the build
matrix even further ("all platforms" x "all configurations"). Leaving
that for the future. Now it's easier to see what the jobs have in common
and how they differ.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Relating to build systems, CI, or testing performance ⚡ Performance/optimization related work across the compiler and runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants