Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve file writer performance by fwrite with cache. v5.0.133 #3308

Merged
merged 5 commits into from
Jan 8, 2023

Conversation

bluestn
Copy link
Contributor

@bluestn bluestn commented Dec 14, 2022

I found srs dvr bottleneck while doing dvr stress test with many concurrent streams pushing to srs.
with perf flame graph's help, I found there are so many file write ops which consumed so many cpu time.
And I tracked the dvr module code, and found the SrsFileWriter use unix system file operation functions including open/write/lseek/close, it has no buffer mechanism in it, and every write op will lead to a new system call.
With this analysis, I used corresponding libc file operation functions to implement the SrsFileWriter, and use setvbuf with 65536 bytes buffer to enable the libc buffer mechanism, as i wished, the cpu usage rate drop down very obviously.

the following is the graph before and after the performance tuning.
before tuning:
before_tuning

after tuning:

after_tuning

trunk/src/kernel/srs_kernel_file.cpp Outdated Show resolved Hide resolved
trunk/src/kernel/srs_kernel_file.cpp Outdated Show resolved Hide resolved
trunk/src/kernel/srs_kernel_file.hpp Outdated Show resolved Hide resolved
trunk/src/kernel/srs_kernel_file.hpp Outdated Show resolved Hide resolved
trunk/src/utest/srs_utest_kernel.cpp Outdated Show resolved Hide resolved
trunk/src/kernel/srs_kernel_file.cpp Outdated Show resolved Hide resolved
trunk/src/kernel/srs_kernel_file.cpp Outdated Show resolved Hide resolved
@ossrs ossrs deleted a comment from xiaozhihong Dec 17, 2022
@ossrs ossrs deleted a comment from bluestn Dec 17, 2022
@winlinvip winlinvip force-pushed the develop branch 2 times, most recently from 55dce2c to bc381a0 Compare December 31, 2022 04:40
@winlinvip
Copy link
Member

winlinvip commented Jan 2, 2023

By Winlin

I remember why "write" is used instead of "fwrite". It's because "st_write" can also write files, and it is a non-blocking operation.

So actually, using "fwrite" from libc has better performance, but there are two issues that need to be discussed:

  1. For servers, non-blocking is more crucial, and "write" can be easily changed to "st_write".

  2. It is generally recommended to use memory disks, and whether "write" will have performance similar to "fwrite" for writing to disks.

By John

st_write cannot perform non-blocking file writing, right?

By Bluestn

epoll does not support files.

nginx does not use epoll to support file read/write, but instead uses aio.

By Chen Yu

It's not that epoll doesn't support it, it's that even ext4 hasn't implemented the poll callback, so it can't be achieved.

The implementation of poll is required for it to be used with select/poll/epoll.

Another issue with fwrite is that it cannot implement writev.

The advantages of aio don't seem very obvious.

By Winlin

The main purpose of writev is to reduce syscalls. In fact, if fwrite has a user space buffer, it can also work without supporting writev.

Implementing it yourself is also a solution.

Creating a thread is also considered as a final solution. However, you can also use fwrite within the thread.

The main issue with threads currently is that cygwin64 SRT crashes. This problem needs to be resolved before creating threads.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Jan 8, 2023

Benchmark data:

  1. macOS: MacBook Pro, 16-inch, 2019, 12CPU(2.6 GHz 6-Core Intel Core i7), 16GB memory(16 GB 2667 MHz DDR4).
  2. v5.0.132 without this PR: RTMP to HLS, 200 streams, SRS CPU 87%, 740MB
  3. v5.0.133 with this PR: RTMP to HLS, 200 streams, SRS CPU 56%, 618MB
  4. v5.0.132 without this PR: DVR RTMP to FLV, 500 streams, SRS CPU 83%, 759MB
  5. v5.0.133 with this PR: DVR RTMP to FLV, 500 streams, SRS CPU 35%, 912MB
  6. v5.0.133 with this PR: DVR RTMP to FLV, 1200 streams, SRS CPU 79%, 1590MB

Run SRS for HLS:

env SRS_LISTEN=1935 SRS_DAEMON=off SRS_SRS_LOG_TANK=console SRS_HTTP_API_ENABLED=on \
  SRS_VHOST_HLS_ENABLED=on ./objs/srs -e

Run SRS for DVR:

env SRS_LISTEN=1935 SRS_DAEMON=off SRS_SRS_LOG_TANK=console SRS_HTTP_API_ENABLED=on \
  SRS_VHOST_DVR_ENABLED=on ./objs/srs -e

Publish 100 streams by srs-bench:

./objs/sb_rtmp_publish  -i doc/source.200kbps.768x320.flv -c 100 -r rtmp://127.0.0.1:1935/live/livestream_{i}

Publish another 200 streams:

./objs/sb_rtmp_publish -i doc/source.200kbps.768x320.flv -c 200 -r rtmp://127.0.0.1:1935/live/livestream2_{i}

@winlinvip winlinvip changed the title SrsFileWriter leverages libc buffer to boost dvr write speed. DVR: SrsFileWriter leverages libc buffer to boost dvr write speed. Jan 8, 2023
@winlinvip winlinvip changed the title DVR: SrsFileWriter leverages libc buffer to boost dvr write speed. DVR: Improve file write performance by fwrite with cache. Jan 8, 2023
@winlinvip winlinvip changed the title DVR: Improve file write performance by fwrite with cache. Improve file writer performance by fwrite with cache. Jan 8, 2023
@winlinvip winlinvip changed the base branch from develop to 5.0release January 8, 2023 03:37
@winlinvip winlinvip merged commit 25eb21e into ossrs:5.0release Jan 8, 2023
@winlinvip winlinvip changed the title Improve file writer performance by fwrite with cache. Improve file writer performance by fwrite with cache. v5.0.133 Jan 8, 2023
winlinvip pushed a commit that referenced this pull request Jan 8, 2023
…3308)

* SrsFileWriter leverages libc buffer to boost dvr write speed.

* Refactor SrsFileWriter to use libc file functions mockable

* Add utest and refine code.

Co-authored-by: winlin <[email protected]>

PICK 25eb21e
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TransByAI Translated by AI/GPT.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants