Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLS: Rewrite HLS(espaciallly audio-only) for aac. #547

Closed
winlinvip opened this issue Dec 23, 2015 · 4 comments
Closed

HLS: Rewrite HLS(espaciallly audio-only) for aac. #547

winlinvip opened this issue Dec 23, 2015 · 4 comments
Assignees
Labels
Enhancement Improvement or enhancement. Feature It's a new feature. TransByAI Translated by AI/GPT.
Milestone

Comments

@winlinvip
Copy link
Member

winlinvip commented Dec 23, 2015

The AAC audio stream of HLS needs to recalculate the timestamps, so this part of HLS needs to be rewritten, especially the pure audio part. Currently, there are no issues when mixing HLS audio and video streams, probably because there is video as a reference. However, pure audio streams will have crackling noise, which is caused by the gaps between audio packet aggregation. Apple replied that AAC should be used instead of TS, and the timestamps need to be recalculated.

For a detailed understanding of AAC audio standards and MP3 standards, you can refer to SRS3.

TRANS_BY_GPT3

@winlinvip winlinvip added Enhancement Improvement or enhancement. Feature It's a new feature. labels Dec 23, 2015
@winlinvip winlinvip added this to the srs 3.0 release milestone Dec 23, 2015
@winlinvip winlinvip changed the title Rewrite HLS(espaciallly audio-only) for aac. HLS: Rewrite HLS(espaciallly audio-only) for aac. Apr 16, 2017
@winlinvip
Copy link
Member Author

winlinvip commented Apr 16, 2017

The problem with the loud sound has been identified. It is caused by the sampling rate, which results in timestamps that cannot be evenly divided, leading to errors. Safari is more accurate, hence the popping sound.

The verification is as follows: first, consider a sampling rate of 8000Hz. An AAC frame consists of 1024 samples, so one AAC frame is:

1024/8000.0=0.128s=128ms
If the sampling rate is 16000Hz, then each AAC frame is: 1024/16000.0 = 0.064s = 64ms.

The SRS configuration is as follows:

listen              1935;
max_connections     1000;
daemon              off;
srs_log_tank        console;
http_api {
    enabled         on;
    listen          1985;
}
http_server {
    enabled         on;
    listen          8080;
}
vhost __defaultVhost__ {
    hls {
        enabled         on;
        hls_vcodec vn;
        hls_dts_directly off; 
    }
}

Transcode using FFMPEG to output audio at 16KHZ.

ffmpeg -re -i doc/source.200kbps.768x320.flv \
-vn -acodec libfdk_aac -ar 16000 -ac 2 -b:a 48k \
-f flv -y rtmp://127.0.0.1/live/livestream

When accessing http://localhost:8080/live/livestream.html with Safari, it can be observed that there is no audio distortion.

When transcoding, output the audio at 44100Hz.

ffmpeg -re -i doc/source.200kbps.768x320.flv \
-vn -acodec libfdk_aac -ar 44100 -ac 2 -b:a 48k \
-f flv -y rtmp://127.0.0.1/live/livestream

You can hear a "popping" or "crackling" noise every 4 seconds or so. It happens with each piece, but you have to listen carefully to notice it.

What is the reason? At 44100Hz, each AAC frame is:

1024/44100.0=0.02321995s=23.21995ms

If rounded, each frame will have an error of 0.2ms. Safari is more sensitive, so problems are more likely to occur.

How to solve this problem? NGINX combines multiple AAC frames into one TS Packet and then calculates the accumulated time. If calculating the time for each frame directly:

90000*1024/44100.0=2089.795918367347

This way, the error can be reduced to 1/90.

For example, the information of an audio is:

(lldb) p audio->timestamp
(int64_t) $8 = 23
(lldb) p audio->timestamp*90
(long long) $9 = 2070

However, the result recalculated based on the number of samples is:

int64_t dts = 90000 * aac_samples / srs_flv_srates[format->acodec->sound_rate];
(lldb) p dts
(int64_t) $6 = 2089

After 200ms:

(lldb) p audio->timestamp
(int64_t) $14 = 209
(lldb) p audio->timestamp*90
(long long) $15 = 18810
(lldb) p dts
(int64_t) $16 = 18808

As a result, there was no more popping sound.

Note: Starting from version 3.0.71, the default value of hls_dts_directly is set to on, which is consistent with SRS2. However, this may cause occasional popping sound issues in HLS. If your stream is not abnormal, it is recommended to set hls_dts_directly to off to optimize the popping sound problem.

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Apr 16, 2017

It seems that there is no need to encapsulate pure audio HLS in AAC format, TS is fine.

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Dec 10, 2019

Note that although this improvement can prevent HLS audio explosion, it may generate a large number of very small segments in HLS due to timestamp issues. Refer to #1506.

The solution is to add a configuration to disable this improvement and use the original timestamps for direct conversion, which may still result in HLS audio explosion issues.

    hls {
        enabled         on;
        hls_dts_directly on;
    }

Note: Starting from version 3.0.71, the default value of hls_dts_directly is on, which is consistent with SRS2. However, this may sometimes cause audio explosion issues in HLS. If your stream is not abnormal, it is recommended to set hls_dts_directly to off to optimize the audio explosion problem.

Using AAC sampling conversion, without using the original timestamps, there are no HLS audio explosion issues, but there may be abnormal slicing problems:

    hls {
        enabled         on;
        hls_dts_directly off;
    }

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Dec 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improvement or enhancement. Feature It's a new feature. TransByAI Translated by AI/GPT.
Projects
None yet
Development

No branches or pull requests

1 participant