-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : refactor llama_build_graph to reduce code duplication #3382
Comments
Something I am thinking we should consider in the scope of this issue is decoupling the // current llm_build_llama()
struct ggml_tensor * inp_tokens = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, n_tokens);
ggml_allocr_alloc(lctx.alloc, inp_tokens);
if (!ggml_allocr_is_measure(lctx.alloc)) {
memcpy(inp_tokens->data, batch.token, n_tokens*ggml_element_size(inp_tokens));
}
ggml_set_name(inp_tokens, "inp_tokens");
// ------
// new llm_build_llama()
struct ggml_tensor * inp_tokens = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, n_tokens);
ggml_set_name(inp_tokens, "inp_tokens");
// new llm_setup_llama()
ggml_tensor * inp_tokens = ggml_get_tensor(ctx, "inp_tokens");
ggml_allocr_alloc(lctx.alloc, inp_tokens);
if (!ggml_allocr_is_measure(lctx.alloc)) {
memcpy(inp_tokens->data, batch.token, n_tokens*ggml_element_size(inp_tokens));
} Having build functions that do not rely on the state of the allocator would facilitate some things around estimating the required memory. (cc @slaren for thoughts) |
I think it would be good to pre-allocate all the input and output tensors in a different buffer. In this way, these tensors would always be allocated and the calls to This was already in the first version of Lines 2631 to 2651 in d273bfd
|
I don't write much cpp but I'm happy to take a stab at this |
With the support of new model architectures, we start to observe a lot of repeating patterns in the code for building their compute graphs. We should find a way to refactor and reuse the repetitive code. We should also consider splitting the implementation in separate source files if necessary.
llama.cpp/llama.cpp
Lines 3997 to 4026 in 0e76a89
Open to ideas and suggestions
The text was updated successfully, but these errors were encountered: