Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement spilling in BlockCombineHashed #10727

Open
lll-phill-lll opened this issue Oct 22, 2024 · 0 comments
Open

Implement spilling in BlockCombineHashed #10727

lll-phill-lll opened this issue Oct 22, 2024 · 0 comments
Assignees
Labels
area/runtime YDB runtime issues

Comments

@lll-phill-lll
Copy link
Member

lll-phill-lll commented Oct 22, 2024

TBlockCombineHashedWrapper(TComputationMutables& mutables,

Not needed for BlockCombineAll, since a large state is not accumulated there.

After single Fetch call an array of UnboxedValue in returned, each element of which is a wrapper around the arrow datum (one column).

The state is stored either in the RH hash table or in Arena, but with pointers in RH. Payload comes immediately after the keys.

When spilling, we will need to go through the datum and check each key separately - whether we will spill it or not. In general, the algorithm may be the same as in WideLastCombine.

mkql_block_add.cpp will use the methods added here: #10726
During spilling methods Serialize/Deserialize should be used like it's done here:

Aggs_[i]->LoadState(ptr + AggStateOffsets_[i], BatchNum_, UnwrappedValues_.data(), row);

Input data is always serializable. The output is not always, so it will be necessary to separately add a serialization/deserialization function similar to #7300

As part of this task, the spilling should be turned off and on by changing the code.

@lll-phill-lll lll-phill-lll self-assigned this Oct 22, 2024
@lll-phill-lll lll-phill-lll added the area/runtime YDB runtime issues label Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime YDB runtime issues
Projects
None yet
Development

No branches or pull requests

1 participant