Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-resolution volume control and normalisation #660

Merged
merged 24 commits into from
Apr 10, 2021

Conversation

roderickvd
Copy link
Member

@roderickvd roderickvd commented Mar 1, 2021

Enhancements:

  • Store and handle samples as 32-bit floats instead of 16-bit integers. This provides 24-25 bits of transparency, allowing for 48-54 dB of headroom to do volume control and normalisation without throwing away bits or dropping dynamic range below 96 dB CD quality.

  • Perform volume control and normalisation in 64-bit arithmetic for minimum quantisation noise.

  • Output to 32-bit float, 32-bit integer, 24-bit integer (both in padded 32-bit words and as three-byte arrays) or 16-bit integer (default), as specified on the command line.

  • Add a dynamic limiter with configurable threshold, attack time, release or decay time, and steepness for the sigmoid transfer function. This mimics the native Spotify limiter, offering greater dynamic range than the old limiter, that just reduced overall gain to prevent clipping.

  • Make the configurable threshold also apply to the old limiter, which is still available.

  • DRY-ups of a lot of audio backend code, at the same time enabling OggData passthrough on the subprocess backend.

Resolves: #608

Notes:

  • New command line options:
        --format FORMAT Output format (F32, S32, S24, S24_3 or S16). Defaults to S16
        --normalisation-method NORMALISATION_METHOD
                        Specify the normalisation method to use - [basic,
                        dynamic]. Default is dynamic.
        --normalisation-threshold THRESHOLD
                        Threshold (dBFS) to prevent clipping. Default is -1.0.
        --normalisation-attack ATTACK
                        Attack time (ms) in which the dynamic limiter is
                        reducing gain. Default is 5.
        --normalisation-release RELEASE
                        Release or decay time (ms) in which the dynamic
                        limiter is restoring gain. Default is 100.
        --normalisation-knee KNEE
                        Knee steepness of the dynamic limiter. Default is 1.0.
  • For the dynamic limiter, steepness between 0.5 and 2.0 work well. The default of 1.0 yields a linear function; > 1.0 rolls off softly, is steeper midway and < 1.0 has a sharp initial response, then gentler midway. Feedback on optimisation of these parameters is welcome.

  • Compiling with-vorbis works, but panics. This was already the case and is not new to this PR. See: UB with std::mem::zeroed tomaka/vorbis-rs#19

To do:

  • Add a command-line option to output either 16 or 32 bit depth
  • Add support for four-byte S24 output format
  • Add support for three-byte S24_3 output format
  • Rename normalisation-steepness to normalisation-knee
  • Revert default format to S16 for a seamless out-of-the-box experience
  • Test remaining backends (help needed!)
  • Optimise requantizer to work in f32 then round

Pending refactoring:

  • DRY up sample conversion in PortAudio and SDL backends
    Edit: Not much to be gained, dropped idea
  • Use TryFrom idiom instead of AudioPacket::f32_to_<type>() helper functions
    Edit: Moved sample conversion into separate struct
  • Use Self instead of full struct and enum names on config.rs

All done!

Test status:

All backends compile successfully, but require testing.

Backend Status First verified by Remarks
Alsa @roderickvd -
GStreamer @JasonLG1979
JACK audio @sashahilton00
pipe @roderickvd -
PortAudio ☑️ @roderickvd [2]
PulseAudio @JasonLG1979 -
Rodio ☑️ @roderickvd [1]
SDL [3]
subprocess @roderickvd -
  1. Rodio on Raspian 10 (Raspberry Pi 3 Model B+) does not open the output correctly. See issue at: Alsa output opened incorrectly RustAudio/cpal#564. This seems to be on Alsa only, no issues on macOS Big Sur, and not strictly related to this PR.

  2. Panics on Alsa and macOS but that's already the case in dev.

  3. Free pass granted by @sashahilton00.

@roderickvd
Copy link
Member Author

Is Rust 1.41.1 still a target, should I work around the fact that clamp was an experimental API back then? Easy enough to do at the expense of a little elegance.

@sashahilton00
Copy link
Member

1.41.1 is still a target, as I understand it is currently the version available in stable Debian

@roderickvd
Copy link
Member Author

From #652 I gather that the tokio_migration branch is the current development target? This PR tracks the current dev branch. I'm fine with rebasing on tokio_migration if that's the future.

Also I well on track adding a --format {F32|S16} command-line option and while at it, doing a lot of DRY-ing up in the backend department by reusing common code.

 - Store and output samples as 32-bit floats instead of 16-bit integers.
   This provides 24-25 bits of transparency, allowing for 42-48 dB of
   headroom to do volume control and normalisation without throwing
   away bits or dropping dynamic range below 96 dB CD quality.

 - Perform volume control and normalisation in 64-bit arithmetic.

 - Add a dynamic limiter with configurable threshold, attack time,
   release or decay time, and steepness for the sigmoid transfer
   function. This mimics the native Spotify limiter, offering greater
   dynamic range than the old limiter, that just reduced overall gain
   to prevent clipping.

 - Make the configurable threshold also apply to the old limiter, which
   is still available.

Resolves: librespot-org#608
Usage: `--format {F32|S16}`. Default is F32.

 - Implemented for all backends, except for JACK audio which itself
 only supports 32-bit output at this time. Setting JACK audio to S16
 will panic and instruct the user to set output to F32.

 - The F32 default works fine for Rodio on macOS, but not on Raspian 10
 with Alsa as host. Therefore users on Linux systems are warned to set
 output to S16 in case of garbled sound with Rodio. This seems an issue
 with cpal incorrectly detecting the output stream format.

 - While at it, DRY up lots of code in the backends and by that virtue,
 also enable OggData passthrough on the subprocess backend.

 - I tested Rodio, ALSA, pipe and subprocess quite a bit, and call on
 others to join in and test the other backends.
@roderickvd
Copy link
Member Author

roderickvd commented Mar 13, 2021

@JasonLG1979 continuing from #608:

@roderickvd you pushed a couple commits since I cloned and started to compile. The version I cloned does not work. I get errors about setting the format. ALSA does not seem to support float formats even running though dmix. I see in your commits since then you mention ALSA only working in 16bit linear mode.

Alsa works great in 32-bit float, just not through the current cpal and Rodio. So be sure to launch with --backend alsa.

aplay --dump-hw-params /usr/share/sounds/alsa/Front_Right.wav tells me that on my system at least that dmix will accept S16_LE S16_BE S24_LE S32_LE S32_BE S24_3LE so basically 16, 24, and 32bit linear.

If you're already doing 16bit is there a reason you can't do 32bit?

Sure, should be easy enough to do now the plumbing is there. In the meantime does adding defaults.pcm.dmix.format S24_LE to /etc/asound.conf help?

Really though I'd like to see a 24bit option also. Basically all but the cheapest DACs will do 24bit natively a lot more than will do 32bit float that's for sure.

It's in the back of my mind. This looks like a promising route: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3d233fedc8ed595a1e88e815d23cd009

@Johannesd3
Copy link
Contributor

What's the advantage of u24 compared to u32, or perhaps a newtype wrapper around u32 with bound check?

@roderickvd
Copy link
Member Author

Only sound card compatibility. This is for final output to the driver only, internally all samples continue to be stored in f32.

I would use a helper function instead of implementing TryFrom because for PCM audio, it's not about putting the same value into another type, but shifting bytes. Or in the case of i32 to i24, dropping the least significant byte.

@JasonLG1979
Copy link
Contributor

@roderickvd the problem could have very well been fixed between the time I cloned and compiled last and now. I'll have to set up a cross compile environment. Compiling directly on a Pi Zero is for the birds.

Alsa works great in 32-bit float, just not through the current cpal and Rodio. So be sure to launch with --backend alsa.

I compiled with the alsa backend and did use the --backend alsa option when running it.

Sure, should be easy enough to do now the plumbing is there. In the meantime does adding defaults.pcm.dmix.format S24_LE to /etc/asound.conf help?

defaults.pcm.dmix.format S24_LE sets the sound cards input as in what ALSA gives to the card. i.e ALSA's output.

aplay --dump-hw-params /usr/share/sounds/alsa/Front_Right.wav when going though dmix tells you what you can input to dmix. Changing defaults.pcm.dmix.format will not change the output of aplay --dump-hw-params /usr/share/sounds/alsa/Front_Right.wav.

I generally use a custom /etc/asound.conf that I wrote that allows me to set the sampling rate, format, buffer size, and software volume settings. It's basically softvol > dmix > hardware.

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented Mar 13, 2021

I just cloned and built your branch on my desktop (it takes less than a min to compile which is nice compared to hours for the Pi Zero) and everything seems to work great. PulseAudio has no problem with 32bit float.

Edit: It runs great on my desktop. I have yet to compile on my Pi Zero.

While at it, add a small tweak when converting "silent" samples
from float to integer. This ensures 0.0 converts to 0 and vice
versa.
@roderickvd
Copy link
Member Author

@JasonLG1979 thanks for verifying with PulseAudio. Could you also try Alsa now I've added support for --format S32?

@JasonLG1979
Copy link
Contributor

thanks for verifying with PulseAudio.

I built it with the alsa backend. I'm not even sure what the point of having a PulseAudio specific backend is? PulseAudio when running is the "default" ALSA device for compatibility reasons. A separate PulseAudio backend seems redundant.

Could you also try Alsa now I've added support for --format S32?

Sure I'll give it a shot on my desktop 1st and if that goes well on the Pi Zero.

@JasonLG1979
Copy link
Contributor

@roderickvd running just fine on a Pi Zero. 15 - 20% CPU during normal playback and maybe 50 - 60% while fetching the next song during playback. I'd call that a success, it's not like you're going to be multitasking on a Pi Zero.

If it matters here are the args I used:

-n Librespot --enable-volume-normalisation --normalisation-gain-type track --normalisation-pregain 3 --format S32 --initial-volume 100 --username XXXX --password XXXX --autoplay --disable-discovery --disable-audio-cache -v

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented Mar 14, 2021

The limiter seems to also work pretty well. Totally unscientific A/B testing between the official Spotify desktop client and your branch on my desktop with --normalisation-pregain 3 to match Spotify's default levels and they sound pretty close to identical. I also ran into a track that hits the limiter pretty hard (1.94 dB), Red Hot Chili Peppers - Under The Bridge, and it didn't distort, breath or duck.

Thanks.

Edit: I would rename "Steepness" to "Knee". Attack, Release and Knee are common words used to describe compressor/limiter parameters. I've never seen Knee called Steepness.

@roderickvd
Copy link
Member Author

I built it with the alsa backend. I'm not even sure what the point of having a PulseAudio specific backend is? PulseAudio when running is the "default" ALSA device for compatibility reasons. A separate PulseAudio backend seems redundant.

Well although I'm no PulseAudio expert, I do think PA offers things like networking support and per-application volume control right? Which sets it apart from the Alsa kernel it builds on.

If it matters here are the args I used:

-n Librespot --enable-volume-normalisation --normalisation-gain-type track --normalisation-pregain 3 --format S32 --initial-volume 100 --username XXXX --password XXXX --autoplay --disable-discovery --disable-audio-cache -v

So just checking: this is on Alsa, not on Rodio or PulseAudio?

@roderickvd running just fine on a Pi Zero. 15 - 20% CPU during normal playback and maybe 50 - 60% while fetching the next song during playback. I'd call that a success, it's not like you're going to be multitasking on a Pi Zero.

The limiter seems to also work pretty well. Totally unscientific A/B testing between the official Spotify desktop client and your branch on my desktop with --normalisation-pregain 3 to match Spotify's default levels and they sound pretty close to identical. I also ran into a track that hits the limiter pretty hard (1.94 dB), Red Hot Chili Peppers - Under The Bridge, and it didn't distort, breath or duck.

Righteous!

Edit: I would rename "Steepness" to "Knee". Attack, Release and Knee are common words used to describe compressor/limiter parameters. I've never seen Knee called Steepness.

That's a good suggestion.

@JasonLG1979
Copy link
Contributor

Well although I'm no PulseAudio expert, I do think PA offers things like networking support and per-application volume control right? Which sets it apart from the Alsa kernel it builds on.

I'm not a PulseAudio expert either. Maybe? I will build the branch with the PulseAudio backend and test it on my desktop.

So just checking: this is on Alsa, not on Rodio or PulseAudio?

Yes ALSA. I'd never run PulseAudio on a headless Pi Zero, I'm not a masochist,lol!!! PulseAudio is not designed to run as a system service, but as a per user service. I generally run librespot as a system level service (with restricted permissions ofc) on headless Pi's.

Righteous!

I thought so,lol!!!

That's a good suggestion.

As far as the knee goes I think the steepness is about right, nice and middle of the road. And I'd still use steepness in the description.

@JasonLG1979
Copy link
Contributor

If they're going to do a PulseAudio backend though they should do it right and set the Pulseaudio Application Properties

@roderickvd
Copy link
Member Author

If they're going to do a PulseAudio backend though they should do it right and set the Pulseaudio Application Properties

With "they" you mean this project? The PA application and stream name are set at compile-time. Which properties are you missing? I'm not sure what you're getting at but you can consider opening a separate issue.

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented Mar 14, 2021

With "they" you mean this project? The PA application and stream name are set at compile-time. Which properties are you missing? I'm not sure what you're getting at but you can consider opening a separate issue.

It's not a must. PulseAudio is generally used on desktop systems with Desktop Environments and all the associated settings UI's and complex BS. PulseAudio properties allow things like telling the DE that librespot is a music player and have it's name show up in the sound settings volume panel. It's also a way to send metadata like track title and whatnot when using PulseAudio as a network streamer. (Not that anyone really uses that functionality in PulseAudio, the network stuff I mean)

Something like this, ofc it's a app with a UI and it's in python, but anyway:

https://github.com/pithos/pithos/blob/master/pithos/application.py#L42-L46
https://github.com/pithos/pithos/blob/master/pithos/pithos.py#L747-L750

@JasonLG1979
Copy link
Contributor

@roderickvd PulseAudio looks good to me. Seems to work as well as ALSA as far as I can tell.

@JasonLG1979
Copy link
Contributor

One thing you might consider though is defaulting to 16bit so you don't break librespot for people that upgrade from an existing install that up until this point was 16bit. 16bit is also the most commonly supported format. In my mind that means that librespot would work out of the box for more people.

@roderickvd
Copy link
Member Author

@roderickvd PulseAudio looks good to me. Seems to work as well as ALSA as far as I can tell.

Great, thanks for testing. Call on other lurkers to test the other backends as well!

One thing you might consider though is defaulting to 16bit so you don't break librespot for people that upgrade from an existing install that up until this point was 16bit. 16bit is also the most commonly supported format. In my mind that means that librespot would work out of the box for more people.

Yes I was thinking the same. Interestingly though JACK audio supports F32 and not S16. Previously, samples were reformatted from S16 to F32 without the user even knowing.

Currently in my branch --backend jackaudio --format S16 panics instructing the user to use --format F32. But it might be more intuitive to revert to the original behavior, and add a warning that the format is overridden.

What do you think?

I would use a helper function instead of implementing TryFrom because for PCM audio, it's not about putting the same value into another type, but shifting bytes. Or in the case of i32 to i24, dropping the least significant byte.

Yesterday I was slamming my head why I was only hearing white noise. Finally I gave up, then in bed I realized I had actually implemented the three-byte S24_3 array instead of the four-byte S24 array (zero-shifting all bits by eight) 😑
Good news is I'm now close to adding both.

@roderickvd
Copy link
Member Author

True, although there's something to be said about librespot being self contained and it just working out of the box on a small ARM based board running Linux as opposed to piping. I haven't tested to see how well piping actually works. I would assume it works just fine?

Yes it does.

@roderickvd
Copy link
Member Author

roderickvd commented Mar 27, 2021

Can someone verify there is no regression in the JACK Audio backend? I can't get playback and am unsure if I'm doing this right. I've got a server set up in working order; running jack_simple_client plays a test tone. However running librespot with just --backend jackaudio outputs no sound (nor errors) both on my branch and on dev.

Running jackd2 (jackdmp version 1.9.12 tmpdir /dev/shm protocol 8) as: jackd -dalsa -s -r 44100
librespot as: librespot --name test --verbose --backend jackaudio --disable-audio-cache

@roderickvd
Copy link
Member Author

I reached out for help on the JackAudio mailing list, hoping they'll chime in.

As for SDL, could you give me a free pass? From a code walkthrough I believe it should work fine, but it feels like a burden to write an entire SDL shim just to test playback.

That said I feel like this PR is ready for merge. Are there any obstacles or further points?

@sashahilton00
Copy link
Member

sashahilton00 commented Apr 9, 2021

Have just tested the Jack audio backend on my mac. My ears aren't good enough to tell the difference, but it appears to run fine. As for SDL, I'm fine with it being merged as is, don't think it warrants making a shim as you mentioned. Happy to merge if you are and there's no further feedback.

@roderickvd
Copy link
Member Author

That's great man. I've got one little optimisation I'll commit this evening and let you know.

@JasonLG1979
Copy link
Contributor

@sashahilton00 How long until this ends up in a release once it's merged?

@roderickvd
Copy link
Member Author

All done and happy to merge!

Hold my beer as I'm very close to opening a follow-up PR with configurable dithering and noise shaping.

@sashahilton00 sashahilton00 merged commit 8fe2e01 into librespot-org:dev Apr 10, 2021
@sashahilton00
Copy link
Member

Merged. Please update the wiki with any new CLI args and corresponding documentation as necessary.

@roderickvd
Copy link
Member Author

Merged. Please update the wiki with any new CLI args and corresponding documentation as necessary.

Done!

@herrernst
Copy link
Contributor

@roderickvd Great work! Nowadays, librespot (e. g. on a cheapo Raspi Zero) is probably better than Spotify Connect in some >1000 USD/EUR AVR. (I have done the initial and very naive normalization implementation, sorry for that 😉)

@eDad2003
Copy link

eDad2003 commented Feb 6, 2024

A voice from the future here. Thank you @JasonLG1979 and @roderickvd for recognizing the problem and spending the time to improve it. My (noob) take on this endeavor is this: "while Spotify may only transmit 16 bit information, the internal librespot code involved in gain adjustment may decrease resolution/precision of those bits resulting in sound quality degradation. Adjusting the FORMAT parameter will eliminate this decrease (provided your DAC can handle 24 or 32 bits)."

Perhaps the document can reflect this better? Sounds (pun intended) kinda important. I didn't understand the config parameter and googled my way to thread. I was relying on https://github.com/librespot-org/librespot/wiki/Options for information.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants