Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASAN reports out-of-bounds read error in anderson2021_test_apps_autoscheduler #7606

Closed
steven-johnson opened this issue Jun 1, 2023 · 23 comments · Fixed by #7771
Closed
Assignees

Comments

@steven-johnson
Copy link
Contributor

ASAN is reporting an out-of-bounds read error in SearchSpace.cpp:125; examination reveals that we are reading past the end of a std::vector (in this case, c->size.size() is 1, but l.pure_dim is 1)... so this is a real bug.

Unfortunately I have no idea where to even begin debugging this... assigning to @abadams for now to see if he can route it to the right person.

(ASAN isn't necessary to repro this bug, just add something like internal_assert(l.pure_dim < c->size.size()); in the code)

(But if you want to run with asan, just add --preset linux-x64-asan to the cmake and ctest commands)

@abadams
Copy link
Member

abadams commented Jun 2, 2023

Ideally @aekul could take a look.

@Yongqi-Zhuo
Copy link
Contributor

Any updates on this? :)
Looking forward to using this in production

@aekul
Copy link
Contributor

aekul commented Jul 7, 2023

Will take a look at this in the next few days.

@aekul
Copy link
Contributor

aekul commented Jul 11, 2023

I'm not able to reproduce this. I added an assert and built Halide with CMake but things work fine. Is that all I should have to do to reproduce it? @steven-johnson or @Yongqi-Zhuo are using the default settings for the autoscheduler?

@steven-johnson
Copy link
Contributor Author

You need to run with ASAN (Address Sanitizer) to repro. Specify --preset linux-x64-asan as mentioned above.

@aekul
Copy link
Contributor

aekul commented Jul 18, 2023

I'm having trouble getting ASAN to build. I've tried LLVM 15 and 16, but can't build either of them. Error is:

CMake Error at cmake/Modules/CompilerRTUtils.cmake:434 (string):
  string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
  CMakeLists.txt:118 (construct_compiler_rt_default_triple)

Tried Googling but can't find any way to resolve it. "compiler-rt" seems to be the issue. Is -DLLVM_ENABLE_RUNTIMES="compiler-rt" required for ASAN?

@steven-johnson
Copy link
Contributor Author

Yikes, yes, sorry (I thought this was in the readme)... for ASAN you must build LLVM with

-D LLVM_ENABLE_PROJECTS="clang;lld;clang-tools-extra" 
-D LLVM_ENABLE_RUNTIMES="compiler-rt;libcxx;libcxxabi;libunwind" 

(I have only tested recently with top-of-tree LLVM but 16 should work)

@aekul
Copy link
Contributor

aekul commented Jul 24, 2023

Building LLVM works with those changes, thanks. But now when I compile Halide, I get Float16.h:43:24: error: _Float16 is not supported on this target when using --preset linux-x64-asan or --preset release. Any ideas? I'm using a PowerPC machine in case that's relevant.

@steven-johnson
Copy link
Contributor Author

As you may surmise from the name, the linux-x64-asan preset has only been designed/tested for use on x86-64 Linux machines, so this failure isn't surprising.

(Frankly, though, I'm surprised that the Float16 stuff isn't failing for you on PowerPC in general, since that support is fairly specific to x86 / ARM IIUC, and we do basically zero testing of our PowerPC backend at this point.)

@antonysigma
Copy link
Contributor

antonysigma commented Jul 24, 2023

@steven-johnson @aekul Here you are, the minimal repro that triggers the ASAN error with Anderson2021 autoscheduler. #7699

The same code did not trigger ASAN error with the Mullapudi2016 autoscheduler.

I am also open to a code rewrite/refactoring to bypass the ASAN error.

@steven-johnson
Copy link
Contributor Author

See also #7699, which may or may not be the same bug. (The case for 7699 is that we have site(s) that incorrectly assume gpu_loop_info.thread_info can never be null.)

@steven-johnson
Copy link
Contributor Author

See #7703

@steven-johnson
Copy link
Contributor Author

And also #7706

@aekul
Copy link
Contributor

aekul commented Aug 6, 2023

I'm still having issues getting ASAN to run. Is this error still occurring with the recent changes?

@steven-johnson
Copy link
Contributor Author

I'm still having issues getting ASAN to run. Is this error still occurring with the recent changes?

I have not re-tested with current top-of-tree, I can do so today.

Can you be more specific about ASAN problems? (It is rather painful to make work unfortunately, but with the right recipe it actually is reliable on Linux x64 machines... and now that you mention it, the right recipe should be documented. I'll do that too.)

@steven-johnson
Copy link
Contributor Author

Unfortunately, at today's top-of-tree, we still fail; building and testing with --preset linux-x64-asan yields this failure:

==4092472==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x502000efe578 at pc 0x7fb275b816ef bp 0x7ffe9ec4d330 sp 0x7ffe9ec4d328
READ of size 8 at 0x502000efe578 thread T0
    #0 0x7fb275b816ee in Halide::Internal::Autoscheduler::SearchSpace::filter_parallel_tile_options(Halide::Internal::IntrusivePtr<Halide::Internal::Autoscheduler::State> const&, Halide::Internal::Autoscheduler::FunctionDAG::Node const*, std::vector<std::vector<long, std::allocator<long>>, std::allocator<std::vector<long, std::allocator<long>>>>&, std::vector<long, std::allocator<long>> const&) const /usr/local/google/home/srj/GitHub/Halide/src/autoschedulers/anderson2021/SearchSpace.cpp:133:42
    #1 0x7fb275b85772 in Halide::Internal::Autoscheduler::SearchSpace::generate_children(Halide::Internal::IntrusivePtr<Halide::Internal::Autoscheduler::State> const&, std::function<void (Halide::Internal::IntrusivePtr<Halide::Internal::Autoscheduler::State>&&)>&, int, bool) /usr/local/google/home/srj/GitHub/Halide/src/autoschedulers/anderson2021/SearchSpace.cpp:536:28
    #2 0x7fb275a62972 in Halide::Internal::Autoscheduler::AutoSchedule::optimal_schedule_pass(int, int, int, Halide::Internal::Autoscheduler::ProgressBar&, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long>>&) /usr/local/google/home/srj/GitHub/Halide/src/autoschedulers/anderson2021/AutoSchedule.cpp:372:26
    #3 0x7fb275a64dd9 in Halide::Internal::Autoscheduler::AutoSchedule::optimal_schedule(int) /usr/local/google/home/srj/GitHub/Halide/src/autoschedulers/anderson2021/AutoSchedule.cpp:522:21
    #4 0x7fb275a66c44 in Halide::Internal::Autoscheduler::generate_schedule(std::vector<Halide::Internal::Function, std::allocator<Halide::Internal::Function>> const&, Halide::Target const&, Halide::Internal::Autoscheduler::Anderson2021Params const&, Halide::AutoSchedulerResults*) /usr/local/google/home/srj/GitHub/Halide/src/autoschedulers/anderson2021/AutoSchedule.cpp:623:28
    #5 0x7fb275a82397 in Halide::Internal::Autoscheduler::Anderson2021::operator()(Halide::Pipeline const&, Halide::Target const&, Halide::AutoschedulerParams const&, Halide::AutoSchedulerResults*) /usr/local/google/home/srj/GitHub/Halide/src/autoschedulers/anderson2021/AutoSchedule.cpp:709:9
    #6 0x7fb27d2c0ff3 in std::function<void (Halide::Pipeline const&, Halide::Target const&, Halide::AutoschedulerParams const&, Halide::AutoSchedulerResults*)>::operator()(Halide::Pipeline const&, Halide::Target const&, Halide::AutoschedulerParams const&, Halide::AutoSchedulerResults*) const /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:591:9
    #7 0x7fb27d2c0ff3 in Halide::Pipeline::apply_autoscheduler(Halide::Target const&, Halide::AutoschedulerParams const&) const /usr/local/google/home/srj/GitHub/Halide/src/Pipeline.cpp:249:5
    #8 0x564f8107e042 in main /usr/local/google/home/srj/GitHub/Halide/test/autoschedulers/anderson2021/test.cpp:144:21
    #9 0x7fb278846189 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #10 0x7fb278846244 in __libc_start_main csu/../csu/libc-start.c:381:3
    #11 0x564f80f8ea50 in _start (/usr/local/google/home/srj/GitHub/Halide/build/linux-x64-asan/test/autoschedulers/anderson2021/anderson2021_test_apps_autoscheduler+0x81a50)

0x502000efe578 is located 0 bytes after 8-byte region [0x502000efe570,0x502000efe578)
allocated by thread T0 here:
    #0 0x564f81066ebd in operator new(unsigned long) /usr/local/google/home/srj/GitHub/llvm-project/18/compiler-rt/lib/asan/asan_new_delete.cpp:95:3
    #1 0x7fb275b73573 in std::__new_allocator<long>::allocate(unsigned long, void const*) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/new_allocator.h:137:27
    #2 0x7fb275b73573 in std::allocator_traits<std::allocator<long>>::allocate(std::allocator<long>&, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/alloc_traits.h:464:20
    #3 0x7fb275b73573 in std::_Vector_base<long, std::allocator<long>>::_M_allocate(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_vector.h:378:20
    #4 0x7fb275b73573 in std::vector<long, std::allocator<long>>::_M_default_append(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/vector.tcc:650:34

SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/local/google/home/srj/GitHub/Halide/src/autoschedulers/anderson2021/SearchSpace.cpp:133:42 in Halide::Internal::Autoscheduler::SearchSpace::filter_parallel_tile_options(Halide::Internal::IntrusivePtr<Halide::Internal::Autoscheduler::State> const&, Halide::Internal::Autoscheduler::FunctionDAG::Node const*, std::vector<std::vector<long, std::allocator<long>>, std::allocator<std::vector<long, std::allocator<long>>>>&, std::vector<long, std::allocator<long>> const&) const
Shadow bytes around the buggy address:
  0x502000efe280: fa fa fd fd fa fa fa fa fa fa fa fa fa fa fd fd
  0x502000efe300: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fa fa
  0x502000efe380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000efe400: fa fa fd fd fa fa fd fd fa fa fd fa fa fa fd fd
  0x502000efe480: fa fa fd fd fa fa fa fa fa fa fd fd fa fa fd fa
=>0x502000efe500: fa fa fd fd fa fa fa fa fa fa fd fd fa fa 00[fa]
  0x502000efe580: fa fa fd fd fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000efe600: fa fa fd fa fa fa fd fa fa fa fa fa fa fa fd fa
  0x502000efe680: fa fa fa fa fa fa fd fd fa fa fa fa fa fa fd fd
  0x502000efe700: fa fa fa fa fa fa fd fd fa fa fd fd fa fa fd fd
  0x502000efe780: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==4092472==ABORTING

@steven-johnson
Copy link
Contributor Author

#7748 should make this easier to track down

@steven-johnson
Copy link
Contributor Author

The new failure is occurring in this sub-test:

        // A Func with multiple stages, some of which include additional loops
        if (true) {
            Buffer<float> a(1024, 1024);
            Func f("multiple_stages"), g("g"), h("h");
            Var x, y;
            h(x, y) = pow(x, y);
            f(x, y) = a(x, y) * 2;
            f(x, y) += 17;
            RDom r(0, 10);
            f(x, y) += r * h(x, y);
            f(x, y) *= 2;
            f(0, y) = 23.0f;
            g(x, y) = f(x - 1, y - 1) + f(x + 1, y + 1);

            g.set_estimate(x, 1, 1022).set_estimate(y, 1, 1022);

            Pipeline(g).apply_autoscheduler(target, params);
        }

...with a similar issue as the previous one, IIRC: in SearchSpace::filter_parallel_tile_options(), we end up with l.pure_dim == 1 and c->size.size() == 1, so we read past the end.

@steven-johnson
Copy link
Contributor Author

Hi @aekul -- I took a look at this one remaining (known) failure but (again) the nature of what's going on isn't entirely clear to me. If you have any bandwidth to at least point me in a useful direction for debugging, I will try to fix it. (I really want to get this autoscheduler properly enabled for automated testing and use, but I am understandably reluctant to do so while we still have known crashy/asan bugs in it.) Thanks for any advice you can give on this.

@aekul
Copy link
Contributor

aekul commented Aug 15, 2023

I was getting error: _Float16 is not supported on this target when using --preset linux-x64-asan on PowerPC. Does ASAN work on macOS? Or is it expected to fail with an assertion without ASAN? If there's a way for me to repro it, I can look into it.

If there isn't, then maybe you could provide me with some more debugging output e.g. which Func + stage is it failing on? I'm guessing it's something to do with the RDom or maybe the f(0, y) update stage (y's pure dimension is likely 1 but there's no loop over x so c->size.size() could be 1 too) so a more minimal failure case might help too.

@steven-johnson
Copy link
Contributor Author

Per previous discussions, our ASAN support is current supported only on x86-64-linux. It could probably be made to work on x64-64-osx, but you are literally the only person I know who is building/testing on PowerPC, so I can't help much there.

It sounds like you don't have an x86-64-linux machine available? I'll try to get a repro case to you soon.

@steven-johnson
Copy link
Contributor Author

FYI: At current top-of-tree, we now fail via internal_assert so you don't need ASAN to repro it. (I added it in #7748 specifically for this purpose).

To avoid noise, I commented out all the other subtests (aside from the one I quoted in the earlier comment) and ran with HL_DEBUG_CODEGEN=4 and captured the output here: https://gist.github.com/steven-johnson/ff76ad4e986f48b5e4570743760651e0 (a bit too long to paste into a comment here)

@aekul
Copy link
Contributor

aekul commented Aug 17, 2023

Ah, I can repro with the internal_assert. Will look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants