Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct ggml_tensor -> add a struct meta pointer into it (can replace padding) #1093

Closed
cmp-nct opened this issue Apr 20, 2023 · 5 comments
Closed
Labels
enhancement New feature or request high priority Very important issue

Comments

@cmp-nct
Copy link
Contributor

cmp-nct commented Apr 20, 2023

Given that the tensor struct uses padding it's not nice to add any more information into it.
It currently has a static 8 byte padding at the end, that's perfect to be replaced by a pointer to a struct to store additional information.
For example a optional human readable tensor name (to be used in graph print) or a couple u_int8 to switch on or off features by tensor.
For example: use_cublas=0
This would allow to fine-control the usage of such a library instead of hard-coding it on through a define flag.
For example: performance_type=HIGH/LOW/MID
For example: threads_override=2

use_cublas could be initialized depending on the define as 1/0.

I'd also move the task scheduling n_task out and the performance stuff
The overhead to access a compact external struct should be zero.

I suppose all that stuff could also be added directly into the tensor struct but if we have to keep it aligned that's not nice.
Especially given 64/32 bit environments with different pointer sizes etc.

Just as an example, giving tensors a name can look like that in the debug output:

 - 842: [    4096,       1,       1]          MUL_MAT   (  1) cpu =   0.000 /   0.000 ms, wall =   0.595 /   0.595 ms [wo x cur 22]
 - 843: [    4096,       1,       1]              ADD   (  1) cpu =   0.000 /   0.000 ms, wall =   0.008 /   0.008 ms [noname]
 - 844: [    4096,       1,       1]         RMS_NORM   (  1) cpu =   0.000 /   0.000 ms, wall =   0.015 /   0.015 ms [noname]
 - 845: [    4096,       1,       1]              MUL   (  1) cpu =   0.000 /   0.000 ms, wall =   0.004 /   0.004 ms [noname]
 - 846: [   11008,       1,       1]          MUL_MAT   (  1) cpu =   2.000 /   2.000 ms, wall =   1.584 /   1.584 ms [w1 x cur 22] (Hot)
 - 847: [   11008,       1,       1]             SILU   (  1) cpu =   0.000 /   0.000 ms, wall =   0.219 /   0.219 ms [noname]
 - 848: [   11008,       1,       1]          MUL_MAT   (  1) cpu =   2.000 /   2.000 ms, wall =   1.643 /   1.643 ms [w3 x cur 22] (Hot)
 - 849: [   11008,       1,       1]              MUL   (  1) cpu =   0.000 /   0.000 ms, wall =   0.009 /   0.009 ms [noname]
 - 850: [    4096,       1,       1]          MUL_MAT   (  1) cpu =   3.000 /   3.000 ms, wall =   3.137 /   3.137 ms [w2 x cur 22] (Hot)

Without names it's unreadable, the image version looks ok but in text form it's just too many operations to keep track.

@ggerganov
Copy link
Owner

Yes, good idea. I think it's better to just extend ggml_tensor instead of another level of indirection.
But anyway, this extension will happen soon as I also see various use cases

@ggerganov ggerganov added enhancement New feature or request high priority Very important issue labels Apr 21, 2023
@cmp-nct
Copy link
Contributor Author

cmp-nct commented Apr 21, 2023

Is the padding still relevant ? Can I just extend it without causing problems ?

Maybe a couple structs (non pointers) in that case, separating the tensorstruct into a few logical groups like performance, debug, scheduler would make it a bit more accessible while still keeping all together in one parent.

I am a bit confused on padding in general because in 32 bit pointer size is 4 bytes, so I am not sure if the static 8 byte padding is right for all hardware.

Currently I got that one in use. The name is set by an optional function call:

typedef struct
{
    struct ggml_tensor *tensor; // points back to it's tensor
    char *name;                 // a optional name

    uint8_t flag_high_performance; // if this tensor is high performance (relevant for hybrid core systems like Intel 12/13 gen)
    uint8_t flag_use_blas_cuda;    // defines if CUDA is enabled for this tensor (destination tensor)
} tensor_meta;

@ggerganov
Copy link
Owner

Is the padding still relevant ? Can I just extend it without causing problems ?

Padding is just to make sure these asserts hold:

llama.cpp/ggml.c

Lines 3403 to 3404 in 5f93949

static_assert(sizeof(struct ggml_object)%GGML_MEM_ALIGN == 0, "ggml_object size must be a multiple of GGML_MEM_ALIGN");
static_assert(sizeof(struct ggml_tensor)%GGML_MEM_ALIGN == 0, "ggml_tensor size must be a multiple of GGML_MEM_ALIGN");

Maybe a couple structs (non pointers) in that case

I think we want extra members without wrapping them in extra structures. Just streamline it as much as possible.
Also, no char * pointers. Names should have a predetermined max length (e.g. 16 or 32) so that memory size per tensor is fixed

@sw
Copy link
Collaborator

sw commented Apr 28, 2023

The static_assert(sizeof(... seems a bit dubious, it should probably be static_assert(_Alignof(... (which currently fails). If there was a clean and portable way to specify struct alignment, we could apply that to these two structs and remove the explicit padding bytes.

@ggerganov
Copy link
Owner

Decided to add void * extra; after all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high priority Very important issue
Projects
None yet
Development

No branches or pull requests

3 participants