Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incidental: Retrieving client through the http interface/api/v1/clients will cause the SRS service to restart. #2311

Closed
yangxiangzhi opened this issue Apr 25, 2021 · 6 comments · Fixed by #2352
Assignees
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Milestone

Comments

@yangxiangzhi
Copy link

yangxiangzhi commented Apr 25, 2021

Accessing the client through the HTTP interface /api/v1/clients/?start=0&count=1000 may cause the SRS service to restart intermittently.

Please describe the issue you encountered.

1. SRS version: SRS/3.0.137(OuXuli)
1. SRS log:

[2021-04-25 06:17:00.391][Trace][1][19550] HTTP API GET http://docker-srs:1985/api/v1/clients?start=0&count=1000, content-length=-1, chunked=0/0
233621 [2021-04-25 06:17:00.391][Trace][1][19550] client finished.
233622 [2021-04-25 06:17:00.393][Trace][1][19551] API server client, ip=172.21.0.3
233623 [2021-04-25 06:17:00.393][Trace][1][19551] HTTP API GET http://docker-srs:1985/api/v1/clients/?start=0&count=1000, content-length=-1, chunked=0/0
233624 [2021-04-25 06:17:02.941][Trace][1][0] SRS/3.0.137(OuXuli), The MIT License (MIT)
233625 [2021-04-25 06:17:02.941][Trace][1][0] contributors: winlin<[email protected]> wenjie.zhao<[email protected]> xiangcheng.liu<[email protected]> naijia.liu<youngco       [email protected]> alcoholyi<[email protected]> byteman<[email protected]> chad.wang<[email protected]> suhetao<[email protected]> Johnny<[email protected]> kart       hikeyan<[email protected]> StevenLiu<[email protected]> zhengfl<[email protected]> tufang14<[email protected]> allspace<[email protected]> niesongsong<nie       [email protected]> rudeb0t<[email protected]> CallMeNP<[email protected]> synote<[email protected]> lovecat<[email protected]> panda1986<[email protected]> YueHonghui<h       [email protected]> ThomasDreibholz<[email protected]> JuntaoLiu<[email protected]> RocFang<[email protected]> MakarovYaroslav<[email protected]> Mirk       oVelic<[email protected]> HuiZhang(huzhang2)<[email protected]> OtterWa<[email protected]> walkermi<[email protected]> haofz<[email protected]> ME_Kun_Han<h       [email protected]> ljx0305<[email protected]> cenxinwei<[email protected]> StarBrilliant<[email protected]> xubin<[email protected]> intliang<[email protected]       > flowerwrong<[email protected]> YLX<[email protected]> J<[email protected]> Harlan<[email protected]> hankun<[email protected]> JonathanBarratt<jonatha       [email protected]> KeeganH<[email protected]> StevenLiu<[email protected]> liuxc0116<[email protected]> ChengdongZhang<[email protected]> lovacat<lovec       [email protected]> qiang.li<[email protected]> HungMingWu<[email protected]> Himer<[email protected]> xialixin<[email protected]> alphonsetai<[email protected]>        Michael.Ma<[email protected]> lam2003<[email protected]>
233626 [2021-04-25 06:17:02.941][Trace][1][0] cwd=/usr/local/srs, work_dir=./, build: 2020-04-07 02:35:32, configure: --x86-x64 , uname: Linux 3ebf9b107f9a 4.9.87-linuxkit-a       ufs #1 SMP Wed Mar 14 15:12:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
233627 [2021-04-25 06:17:02.941][Trace][1][0] configure detail: --prefix=/usr/local/srs --with-hls --with-hds --with-dvr --with-ssl --with-transcode --with-ingest --with-sta       t --with-http-callback --with-http-server --with-stream-caster --with-http-api --with-librtmp --without-research --without-utest --without-gperf --without-gmc --witho       ut-gmd --without-gmp --without-gcp --without-gprof --log-trace --cc=gcc --cxx=g++ --ar=ar --ld=ld --randlib=randlib
233628 [2021-04-25 06:17:02.941][Trace][1][0] srs checking config...

1. SRS configuration:

# main config for srs.
# @see full.conf for detail config.

listen 1935;
max_connections 1000;
ff_log_dir ./objs;
srs_log_tank file;
srs_log_level trace;
srs_log_file ./objs/srs.log
daemon off;
http_api {
    enabled on;
    listen 1985;
    crossdomain on;
}
http_server {
    enabled on;
    listen 8080;
    dir ./objs/nginx/html;
}
stats {
    network 0;
    disk sda sdb xvda xvdb;
}
vhost __defaultVhost__ {
    #tcp_nodelay on;
**# Minimum delay open, it is open by default. When this option is enabled, MR is disabled by default.**
    #min_latency on;
**# Merged-Write, SRS always uses Merged-Write, which means sending packets to the client once every N milliseconds. This algorithm can improve the efficiency of RTMP downstream by about 5 times, with a range of [350-1800].**
    #mw_latency on;

    #play{
**# Cache a group of video frames**
        #gop_cache off;
**# Configure the length of the live queue. The server will place data in the live queue, and if it exceeds this length, it will be cleared to the last I-frame.**
        #queue_length 10;
**# Merged-Write, SRS always uses Merged-Write, which means sending packets to the client once every N milliseconds. This algorithm can improve the efficiency of RTMP downstream by about 5 times, within the range of [350-1800].**
        #mw_latency 100;
    #}

    #publish {
        #mr off;
    #}

    http_hooks {
        enabled on;
        on_publish http://docker-nginx/check/publish;
        on_play http://docker-nginx/check/publish;
        on_stop http://docker-nginx/check/publish;
    }
    hls {
        enabled on;
        hls_path ./objs/nginx/html;
**# Clean up expired m3u8 and ts files**
        hls_dispose 1;
        hls_fragment 2;
        hls_window 5;
    }
    http_remux {
        enabled on;
        mount [vhost]/[app]/[stream].flv;
        hstrs on;
    }
}

Replay

How to replay bug?

Steps to reproduce the bug

Bug Reproduction Steps:

  1. After starting the SRS service, frequent requests to the /api/v1/clients?start=0&count=1000 interface will cause the service to restart.
    • However, the bug does not occur every time. I have tested it locally and it has not been reproduced consistently.
    • On the production server, the issue occurs approximately 15 times per day, causing the service to restart when retrieving client information.

Please note that the translation provided is a direct translation and may require further clarification or adjustment based on the context.

TRANS_BY_GPT3

@duiniuluantanqin
Copy link
Member

duiniuluantanqin commented May 8, 2021

Verified, this problem does exist.
Scenario: RTMP push streaming, FLV pull streaming.
Reason: If FLV pull streaming occurs before RTMP push streaming, it will trigger an update source id, causing the SrsRequest pointer to be reallocated without notifying the statistic. When calling the /api/v1/clients/ interface, it will result in an illegal pointer access.

Reproduction method: Modify the srs_app_http_stream.cpp file.

srs_error_t SrsLiveStream::update(SrsSource* s, SrsRequest* r)
{
    source = s;
    
    srs_freep(req);
    SrsRequest* test = new SrsRequest();  // Add this line of garbage code, it will trigger 100%
    req = r->copy()->as_http();
    
    return srs_success;
}

The problem is very subtle. Due to the optimization of the operating system, the address of 'req' does not change after each 'realloc', making it difficult to reproduce. Only when that block of memory is coincidentally occupied and modified by another program, will it cause a crash.

TRANS_BY_GPT3

duiniuluantanqin pushed a commit to duiniuluantanqin/srs that referenced this issue May 8, 2021
@duiniuluantanqin
Copy link
Member

duiniuluantanqin commented May 8, 2021

The bug has been fixed, please refer to 497c2d5

TRANS_BY_GPT3

@winlinvip
Copy link
Member

@duiniuluantanqin 👍

@duiniuluantanqin
Copy link
Member

duiniuluantanqin commented May 12, 2021

6689d88 Fix the bug from the previous commit

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented May 12, 2021

Thank you @duiniuluantanqin for analyzing and identifying the root cause, and providing a PR. Let's summarize this issue.

The essence of this issue is that stat directly references req.

srs_error_t SrsStatistic::on_client(std::string id, SrsRequest* req, SrsConnection* conn, SrsRtmpConnType type)
{
    client->req = req;

void SrsStatistic::on_disconnect(std::string id)
{
    clients.erase(it);

// RTMP references Conn's req.
srs_error_t SrsRtmpConn::stream_service_cycle()
{
    SrsRequest* req = info->req;
    if ((err = stat->on_client(srs_int2str(_srs_context->get_id()), req, this, info->type)) != srs_success) {

// FLV references SrsLiveStream.
srs_error_t SrsLiveStream::do_serve_http(ISrsHttpResponseWriter* w, ISrsHttpMessage* r)
{
    if ((err = stat->on_client(srs_int2str(_srs_context->get_id()), req, hc, SrsRtmpConnPlay)) != srs_success) {

srs_error_t SrsLiveStream::update(SrsSource* s, SrsRequest* r)
{
    srs_freep(req);
    req = r->copy()->as_http();

This issue does not exist in RTMP because SrsStatistic references SrsRtmpConn's req, and the latter has a longer lifespan and is always available.

This issue exists in HTTP-FLV and causes a crash. The reproduction steps are provided in the previous comment. The problem arises because SrsStatistic references SrsLiveStream's req, and the lifespans of these two objects are uncertain, resulting in SrsStatistic referencing a dangling pointer.

Solution: Copy req in SrsStatistic. Since SrsStatistic and conn are corresponding, their req objects are also corresponding. Therefore, stat can be considered as one of conn's members and can be copied. Refer to 71dda68 for more details.

Once again, thank you @duiniuluantanqin for spending a lot of time analyzing and locating this issue. This kind of wild pointer problem is very subtle, and the most difficult part is how to locate it. Once located, it is easy to fix.

TRANS_BY_GPT3

@yangxiangzhi
Copy link
Author

yangxiangzhi commented May 13, 2021

Thank you two experts, the problem has been resolved.

TRANS_BY_GPT3

@winlinvip winlinvip linked a pull request May 13, 2021 that will close this issue
@winlinvip winlinvip changed the title 通过http接口 /api/v1/clients/?start=0&count=1000获取客户端会导致srs服务重启 偶发 偶发:通过http接口/api/v1/clients获取客户端会导致srs服务重启 May 13, 2021
duiniuluantanqin pushed a commit to duiniuluantanqin/srs that referenced this issue May 14, 2021
@winlinvip winlinvip added the Bug It might be a bug. label Aug 27, 2021
@winlinvip winlinvip added this to the 4.0 milestone Sep 4, 2021
@winlinvip winlinvip changed the title 偶发:通过http接口/api/v1/clients获取客户端会导致srs服务重启 Incidental: Retrieving client through the http interface/api/v1/clients will cause the SRS service to restart. Jul 28, 2023
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants