Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BanyanDB-Server] Using "bufio" to Improve the Write #12447

Open
2 of 3 tasks
hanahmily opened this issue Jul 16, 2024 · 11 comments
Open
2 of 3 tasks

[BanyanDB-Server] Using "bufio" to Improve the Write #12447

hanahmily opened this issue Jul 16, 2024 · 11 comments
Assignees
Labels
database BanyanDB - SkyWalking native database feature New feature

Comments

@hanahmily
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

In the file system module, the write operation flushes the data directly to the OS. It is recommended to use "bufio" to reduce the write operation frequency and improve performance.

Use case

No response

Related issues

No response

Are you willing to submit a pull request to implement this on your own?

  • Yes I am willing to submit a pull request on my own!

Code of Conduct

@hanahmily hanahmily added feature New feature database BanyanDB - SkyWalking native database labels Jul 16, 2024
@hanahmily hanahmily added this to the BanyanDB - 0.8.0 milestone Jul 16, 2024
@sollhui
Copy link

sollhui commented Jul 16, 2024

Please assign to me.

@wu-sheng
Copy link
Member

@sollhui Have you talked with @hanahmily to make sure you are working on the correct road?

@sollhui
Copy link

sollhui commented Jul 16, 2024

@wu-sheng Not yet, but I think it is a easy work.

@wu-sheng
Copy link
Member

Talk with him. I don't think that easy. And if you don't know SkyWalking and BanyanDB that much, you are hard to test and verify.

@sollhui
Copy link

sollhui commented Jul 16, 2024

I had contribute this pr apache/skywalking-banyandb#341 to build local system, and I think the work of this issue is aim to replace the code

size, err := file.file.Write(buffer)

by

writer := bufio.NewWriter(file)
n, err := writer.Write(buffer)

So I think it is a easy work.

@hanahmily, @wu-sheng please check it.

@wu-sheng
Copy link
Member

OK, as 341 was on you, I think it is good.

@hanahmily
Copy link
Contributor Author

@sollhui The key point is determining the optimal buffer size. I want to avoid requiring users to set it manually. Can you come up with a way to automatically retrieve this value based on the system's available memory? Please share the specifics here once you have some ideas.

@sollhui
Copy link

sollhui commented Jul 17, 2024

@hanahmily Before discussing this issue, I have another point I would like to discuss. Why not batch in memory instead of using bufio? Some of the drawbacks I can think of are that there is still a memory copy from memory to bufio, which is unnecessary. Isn't it better to directly save to a certain byte in memory like 200MB and flush it down?

@hanahmily
Copy link
Contributor Author

@hanahmily Before discussing this issue, I have another point I would like to discuss. Why not batch in memory instead of using bufio? Some of the drawbacks I can think of are that there is still a memory copy from memory to bufio, which is unnecessary. Isn't it better to directly save to a certain byte in memory like 200MB and flush it down?

It's reasonable but too difficult to implement. The entire design needs to be re-evaluated, but that's not the goal of this issue. Let's solve this issue based on the current design.

After you finish this, if you are interested in this matter, please review the code and then propose a reasonable design for it.

@sollhui
Copy link

sollhui commented Jul 24, 2024

@sollhui The key point is determining the optimal buffer size. I want to avoid requiring users to set it manually. Can you come up with a way to automatically retrieve this value based on the system's available memory? Please share the specifics here once you have some ideas.

@hanahmily I have some ideas for this design:

  1. It is important to avoid requiring users to set it manually, So I think we can obtain system available memory information through a scheduled task (different systems may have different ways of obtaining it), and determine the size of bufio based on available memory, which also is the basis of [BanyanDB] Unified memory data structure and memory tracking #11338

  2. I think designing algorithms to adapt to available memory is a difficult part, and I have some simple ideas about it:

  • We can use benchmarks to determine the buffer size when there is sufficient available memory.
  • Adaptive memory is difficult, and it is difficult for us to estimate what buffer size is appropriate for different available memory. Therefore, I believe that when the available memory is insufficient, we can bypass the buffer and directly write it to the file system. This threshold can be set at 10% of the maximum available memory obtained during initialization.

@hanahmily
Copy link
Contributor Author

  1. It is important to avoid requiring users to set it manually, So I think we can obtain system available memory information through a scheduled task (different systems may have different ways of obtaining it), and determine the size of bufio based on available memory, which also is the basis of [BanyanDB] Unified memory data structure and memory tracking #11338

That's a good idea. We have a metric to track the available memory on the node. You can make use of it.

  1. I think designing algorithms to adapt to available memory is a difficult part, and I have some simple ideas about it:
  • We can use benchmarks to determine the buffer size when there is sufficient available memory.

Exactly, we should get the default value through such a benchmark.

  • Adaptive memory is difficult, and it is difficult for us to estimate what buffer size is appropriate for different available memory. Therefore, I believe that when the available memory is insufficient, we can bypass the buffer and directly write it to the file system. This threshold can be set at 10% of the maximum available memory obtained during initialization.

The maximum buffer size should be specified as a quantity rather than a ratio. Based on my experience, a buffer size of 4KB to 1MB is a reasonable range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database BanyanDB - SkyWalking native database feature New feature
Projects
None yet
Development

No branches or pull requests

3 participants