[BanyanDB-Server] Using "bufio" to Improve the Write #12447

hanahmily · 2024-07-16T00:00:53Z

Search before asking

I had searched in the issues and found no similar feature requirement.

Description

In the file system module, the write operation flushes the data directly to the OS. It is recommended to use "bufio" to reduce the write operation frequency and improve performance.

Use case

No response

Related issues

No response

Are you willing to submit a pull request to implement this on your own?

Yes I am willing to submit a pull request on my own!

Code of Conduct

I agree to follow this project's Code of Conduct

sollhui · 2024-07-16T05:08:07Z

Please assign to me.

wu-sheng · 2024-07-16T05:09:02Z

@sollhui Have you talked with @hanahmily to make sure you are working on the correct road?

sollhui · 2024-07-16T05:12:18Z

@wu-sheng Not yet, but I think it is a easy work.

wu-sheng · 2024-07-16T05:13:23Z

Talk with him. I don't think that easy. And if you don't know SkyWalking and BanyanDB that much, you are hard to test and verify.

sollhui · 2024-07-16T05:21:25Z

I had contribute this pr apache/skywalking-banyandb#341 to build local system, and I think the work of this issue is aim to replace the code

size, err := file.file.Write(buffer)

by

writer := bufio.NewWriter(file)
n, err := writer.Write(buffer)

So I think it is a easy work.

@hanahmily, @wu-sheng please check it.

wu-sheng · 2024-07-16T05:25:29Z

OK, as 341 was on you, I think it is good.

hanahmily · 2024-07-16T07:42:56Z

@sollhui The key point is determining the optimal buffer size. I want to avoid requiring users to set it manually. Can you come up with a way to automatically retrieve this value based on the system's available memory? Please share the specifics here once you have some ideas.

sollhui · 2024-07-17T03:23:38Z

@hanahmily Before discussing this issue, I have another point I would like to discuss. Why not batch in memory instead of using bufio? Some of the drawbacks I can think of are that there is still a memory copy from memory to bufio, which is unnecessary. Isn't it better to directly save to a certain byte in memory like 200MB and flush it down?

hanahmily · 2024-07-17T10:06:18Z

@hanahmily Before discussing this issue, I have another point I would like to discuss. Why not batch in memory instead of using bufio? Some of the drawbacks I can think of are that there is still a memory copy from memory to bufio, which is unnecessary. Isn't it better to directly save to a certain byte in memory like 200MB and flush it down?

It's reasonable but too difficult to implement. The entire design needs to be re-evaluated, but that's not the goal of this issue. Let's solve this issue based on the current design.

After you finish this, if you are interested in this matter, please review the code and then propose a reasonable design for it.

sollhui · 2024-07-24T06:47:33Z

@sollhui The key point is determining the optimal buffer size. I want to avoid requiring users to set it manually. Can you come up with a way to automatically retrieve this value based on the system's available memory? Please share the specifics here once you have some ideas.

@hanahmily I have some ideas for this design:

It is important to avoid requiring users to set it manually, So I think we can obtain system available memory information through a scheduled task (different systems may have different ways of obtaining it), and determine the size of bufio based on available memory, which also is the basis of [BanyanDB] Unified memory data structure and memory tracking #11338
I think designing algorithms to adapt to available memory is a difficult part, and I have some simple ideas about it:

We can use benchmarks to determine the buffer size when there is sufficient available memory.
Adaptive memory is difficult, and it is difficult for us to estimate what buffer size is appropriate for different available memory. Therefore, I believe that when the available memory is insufficient, we can bypass the buffer and directly write it to the file system. This threshold can be set at 10% of the maximum available memory obtained during initialization.

hanahmily · 2024-07-25T08:04:10Z

It is important to avoid requiring users to set it manually, So I think we can obtain system available memory information through a scheduled task (different systems may have different ways of obtaining it), and determine the size of bufio based on available memory, which also is the basis of [BanyanDB] Unified memory data structure and memory tracking #11338

That's a good idea. We have a metric to track the available memory on the node. You can make use of it.

I think designing algorithms to adapt to available memory is a difficult part, and I have some simple ideas about it:

We can use benchmarks to determine the buffer size when there is sufficient available memory.

Exactly, we should get the default value through such a benchmark.

Adaptive memory is difficult, and it is difficult for us to estimate what buffer size is appropriate for different available memory. Therefore, I believe that when the available memory is insufficient, we can bypass the buffer and directly write it to the file system. This threshold can be set at 10% of the maximum available memory obtained during initialization.

The maximum buffer size should be specified as a quantity rather than a ratio. Based on my experience, a buffer size of 4KB to 1MB is a reasonable range.

hanahmily added feature New feature database BanyanDB - SkyWalking native database labels Jul 16, 2024

hanahmily added this to the BanyanDB - 0.8.0 milestone Jul 16, 2024

wu-sheng assigned sollhui Jul 16, 2024

hanahmily modified the milestones: BanyanDB - 0.8.0, BanyanDB-0.9.0 Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BanyanDB-Server] Using "bufio" to Improve the Write #12447

[BanyanDB-Server] Using "bufio" to Improve the Write #12447

hanahmily commented Jul 16, 2024

sollhui commented Jul 16, 2024

wu-sheng commented Jul 16, 2024

sollhui commented Jul 16, 2024 •

edited

Loading

wu-sheng commented Jul 16, 2024

sollhui commented Jul 16, 2024 •

edited

Loading

wu-sheng commented Jul 16, 2024

hanahmily commented Jul 16, 2024

sollhui commented Jul 17, 2024 •

edited

Loading

hanahmily commented Jul 17, 2024

sollhui commented Jul 24, 2024 •

edited

Loading

hanahmily commented Jul 25, 2024

[BanyanDB-Server] Using "bufio" to Improve the Write #12447

[BanyanDB-Server] Using "bufio" to Improve the Write #12447

Comments

hanahmily commented Jul 16, 2024

Search before asking

Description

Use case

Related issues

Are you willing to submit a pull request to implement this on your own?

Code of Conduct

sollhui commented Jul 16, 2024

wu-sheng commented Jul 16, 2024

sollhui commented Jul 16, 2024 • edited Loading

wu-sheng commented Jul 16, 2024

sollhui commented Jul 16, 2024 • edited Loading

wu-sheng commented Jul 16, 2024

hanahmily commented Jul 16, 2024

sollhui commented Jul 17, 2024 • edited Loading

hanahmily commented Jul 17, 2024

sollhui commented Jul 24, 2024 • edited Loading

hanahmily commented Jul 25, 2024

sollhui commented Jul 16, 2024 •

edited

Loading

sollhui commented Jul 16, 2024 •

edited

Loading

sollhui commented Jul 17, 2024 •

edited

Loading

sollhui commented Jul 24, 2024 •

edited

Loading