Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot clone_in() two different functions. #6476

Open
mcourteaux opened this issue Dec 7, 2021 · 4 comments
Open

Cannot clone_in() two different functions. #6476

mcourteaux opened this issue Dec 7, 2021 · 4 comments

Comments

@mcourteaux
Copy link
Contributor

class GeneratorBug : public Generator<GeneratorBug> {
   public:
    Input<Buffer<float>> input1{"input1", 2};
    Output<Buffer<float>> output1{"output1", 1};
    Output<Buffer<float>> output2{"output2", 1};

    Var k{"kernel"};

    Func intermediate{"intermediate"};

    void generate() {
        intermediate(k) = input1(k, 0) * input1(k, 1);
        output1(k) = intermediate(k) * intermediate(k);
        output2(k) = intermediate(k) + intermediate(k);
    }

    void schedule() {
        // clang-format off
        Func intm_clone_1 = intermediate.clone_in(output1);
        Func intm_clone_2 = intermediate.clone_in(output2); // Error (see below for stack trace)

        intm_clone_1.compute_root();
        intm_clone_2.compute_root();
        // clang-format on
    }
};

HALIDE_REGISTER_GENERATOR(GeneratorBug, generator_bug)

Gives:

Unhandled exception: Internal Error at /home/martijn/w/3rd/Halide/src/Schedule.cpp:348 triggered by user code at : Condition failed: copied_func.defined(): intermediate_clone_in_output1$0

Stack trace, right before assertion:

Thread 1 "generator_em" hit Breakpoint 1, Halide::Internal::FuncSchedule::deep_copy (this=0x5555562f4a50, copied_map=std::map with 2 elements = {...}) at /home/martijn/w/3rd/Halide/src/Schedule.cpp:348
348	       internal_assert(copied_func.defined()) << Function(iter.second).name() << "\n";
(gdb) bt
#0  Halide::Internal::FuncSchedule::deep_copy (this=0x5555562f4a50, copied_map=std::map with 2 elements = {...}) at /home/martijn/w/3rd/Halide/src/Schedule.cpp:348
#1  0x00007ffff23bbd34 in Halide::Internal::Function::deep_copy (this=0x7fffffffc0c0, copy=..., copied_map=std::map with 2 elements = {...}) at /home/martijn/w/3rd/Halide/src/Function.cpp:358
#2  0x00007ffff23bc324 in Halide::Internal::Function::deep_copy (this=0x7fffffffc0c0, name="intermediate_clone_in_output2$1", copy=..., copied_map=std::map with 2 elements = {...}) at /home/martijn/w/3rd/Halide/src/Function.cpp:382
#3  0x00007ffff2389451 in Halide::(anonymous namespace)::create_clone_wrapper (wrapped_fn=..., wrapper_name="intermediate_clone_in_output2$1") at /home/martijn/w/3rd/Halide/src/Func.cpp:1954
#4  0x00007ffff23899eb in Halide::(anonymous namespace)::get_wrapper (wrapped_fn=..., wrapper_name="intermediate_clone_in_output2$1", fs=std::vector of length 1, capacity 1 = {...}, clone=true) at /home/martijn/w/3rd/Halide/src/Func.cpp:1977
#5  0x00007ffff238a6e0 in Halide::Func::clone_in (this=0x555555974780, f=...) at /home/martijn/w/3rd/Halide/src/Func.cpp:2023
#6  0x00005555555718f8 in GeneratorBug::schedule (this=0x555555974440) at /usr/include/c++/9/bits/atomic_base.h:318

Analyzing which one of the two clone_in()s it is:

(gdb) f 5
#5  0x00007ffff238a6e0 in Halide::Func::clone_in (this=0x555555974780, f=...) at /home/martijn/w/3rd/Halide/src/Func.cpp:2023
2023	   return get_wrapper(func, name() + "_clone_in_" + f.name(), fs, true);
(gdb) p *this
$1 = {func = {contents = {strong = {ptr = 0x555555e38740}, weak = 0x0, idx = 0}}, pipeline_ = {contents = {ptr = 0x0}}}
(gdb) p this->name()
$2 = "intermediate"
(gdb) p f
$3 = (const Halide::Func &) @0x7fffffffc420: {func = {contents = {strong = {ptr = 0x5555560d7600}, weak = 0x0, idx = 0}}, pipeline_ = {contents = {ptr = 0x0}}}
(gdb) p f.name()
$4 = "output2"
(gdb) 

Reveals that it's the second clone_in() call, i.e.: the one on output2, yet the error message is talking about intermediate_clone_in_output1 (notice output1!). So it seems that the first call to clone_in(output1) seems to change intermediate globally, instead of only in the context of output1.

@mcourteaux
Copy link
Contributor Author

I kinda think that the second clone_in() deep copies the schedule associated with the existing first wrapper along, which is unintended here I believe.

wrapped_fn.deep_copy(wrapper.name(), wrapper.function().get_contents(), remapping);

which goes to this line in Function::deep_copy():

copy->func_schedule = contents->func_schedule.deep_copy(copied_map);

Copying over the wrapper schedule inserted by:

Halide/src/Function.cpp

Lines 980 to 991 in fb305fd

void Function::add_wrapper(const std::string &f, Function &wrapper) {
wrapper.freeze();
FunctionPtr ptr = wrapper.contents;
// Weaken the pointer from the function to its wrapper
ptr.weaken();
contents->func_schedule.add_wrapper(f, ptr);
// Weaken the pointer from the wrapper back to the function.
WeakenFunctionPtrs weakener(contents.get());
wrapper.mutate(&weakener);
}

Called from:

Halide/src/Func.cpp

Lines 1977 to 1988 in fb305fd

Func wrapper = clone ? create_clone_wrapper(wrapped_fn, wrapper_name) : create_in_wrapper(wrapped_fn, wrapper_name);
Function wrapper_fn = wrapper.function();
if (fs.empty()) {
// Add global wrapper
wrapped_fn.add_wrapper("", wrapper_fn);
} else {
for (const Func &f : fs) {
user_assert(wrapped_fn.name() != f.name())
<< "Cannot create wrapper of itself (\"" << wrapped_fn.name() << "\")\n";
wrapped_fn.add_wrapper(f.name(), wrapper_fn);
}
}

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Dec 7, 2021

I don't understand why a clone_in() should register a wrapper with the original function. I feel like clone_in() should be equivalent to copy-pasting the code for that Func in the generator, and have it be a completely separate Func. No link to the original one. Am I missing something, or is this the actual bug: Func get_wrapper(Function wrapped_fn, string wrapper_name, const vector<Func> &fs, bool clone) always registering a wrapper (both for .clone_in(Func) and for .in(Func))?

(Although, me not understand why, might be due to the fact that I don't know what the purpose of the registering is whatsoever. It just seems odd that a deep copy of a function still has a reference/connection/link to the original function.)

@abadams
Copy link
Member

abadams commented Dec 7, 2021

I think it's more that the list of wrappers is also used to store the list of clones. When lowering, consumers should call the appropriate clone instead of the original Func.

If your diagnosis is correct, then using the schedule of the first wrapper definitely seems like the wrong thing.

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Dec 7, 2021

As extra info: the use case is to preload two buffers (length n and k) into shared memory of GPU that will do computations on the outer product of those two buffers (yielding a function with n*k elements). Computing these n*k elements everywhere they are used in the pipeline is much faster than compute_root() them and have n*k memory accesses for loading them. Using the preload I have n+k global memory reads, and just redo the computation. This outer product is used multiple times, so I want to schedule the preload appropriately for each call site. My use case here is to do the outer_product.clone_in(call_site_1); and the same for call_site_2.


Currently I just duplicated the "outer product" function just 3 times. One instance per call site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants