Halide v15.0.1
steven-johnson
released this
07 Apr 23:21
·
1 commit
to release/15.x
since this release
What's Changed
- The Python binding of
compile_to_callable()
was not properly copying from device to host for output buffers, so output was typically black (or garbage) when used with a GPU target. (#7213) - The
bin
directory was missing from the installs. - Upgraded LLVM to 15.0.7
- New in 15.0.0, but restated here for visibility: The target flag disable_llvm_loop_opt is deprecated, as it's now the default behavior. This means that we have turned off llvm's autovectorization and loop unrolling. This should not affect any schedules with manually-specified vectorization and unrolling, other than trimming code size a little. However, schedules that do not vectorize or unroll may slow down because they were (intentionally or not) relying on llvm to do it automatically. If you see a performance regression with Halide 15, try turning on the enable_llvm_loop_opt target flag.