Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: a better way to track memory for chunks. #14358

Open
SunRunAway opened this issue Jan 6, 2020 · 1 comment
Open

Proposal: a better way to track memory for chunks. #14358

SunRunAway opened this issue Jan 6, 2020 · 1 comment
Labels
epic/memory-management help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. proposal sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement.

Comments

@SunRunAway
Copy link
Contributor

SunRunAway commented Jan 6, 2020

Feature Request

Is your feature request related to a problem? Please describe:

It's hard to estimate memory usage of a chunk.Chunk during sql execution.

We often use these sort of code, and it's make us think every time we call a method of Chunk.

	mSize := output.chk.MemoryUsage()
	chk.SwapColumns(output.chk)
	e.memTracker.Consume(output.chk.MemoryUsage() - mSize)

Describe the feature you'd like:

I propose a more friendly and object-oriented way to handle memory trakcer for chunk.Chunk,
when a chunk created, we should

chk := chunk.NewChunk(memTrackerFromExecutor)

And a method of Chunk is called, it's Chunk's responsibility to track the memory difference. For example,

// AppendRow appends a row to the chunk.
func (c *Chunk) AppendRow(row Row) {
	c.AppendPartialRow(0, row)
	c.numVirtualRows++
        c.memTracker.Consume(row.Size()) // we adjust the memory usage inside this method.
}

But how we release the memory usage when the Chunk is not used anymore.

  1. We can add a Close method and call it dumbly.
  2. We can consider trying runtime.SetFinalizer when the memory is actually garbage collected (like destructors in C++). But we need ensure that it does not regress performance.

Describe alternatives you've considered:

We have many components that need to track memory during a SQL execution. And this method can be generalized to other components.

components For
chunk.Chunk Storing a chunk in memory, a basic unit in executor framework.
chunk.Column  Storing a column in memory, a basic unit in chunk.Chunk.
chunk.List  Storing multiple chunks in memory.
chunk.ListInDisk  Storing multiple chunks in temporary directory.
chunk.RowContainer  Storing multiple chunks, and handle their spilling.
chunk.hashRowContainer Storing chunk.RowContainer and a hash map

Teachability, Documentation, Adoption, Migration Strategy:

@SunRunAway SunRunAway added type/enhancement The issue or PR belongs to an enhancement. proposal help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/execution SIG execution difficulty/medium labels Jan 6, 2020
@zz-jason
Copy link
Member

We can consider trying runtime.SetFinalizer when the memory is actually garbage collected (like destructors in C++). But we need ensure that it does not regress performance.

This method seems much friendly to engineers. Could you design a benchmark to test the impact on performance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic/memory-management help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. proposal sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants