-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High-resolution volume control and normalisation #660
High-resolution volume control and normalisation #660
Conversation
Is Rust 1.41.1 still a target, should I work around the fact that |
1.41.1 is still a target, as I understand it is currently the version available in stable Debian |
From #652 I gather that the Also I well on track adding a |
- Store and output samples as 32-bit floats instead of 16-bit integers. This provides 24-25 bits of transparency, allowing for 42-48 dB of headroom to do volume control and normalisation without throwing away bits or dropping dynamic range below 96 dB CD quality. - Perform volume control and normalisation in 64-bit arithmetic. - Add a dynamic limiter with configurable threshold, attack time, release or decay time, and steepness for the sigmoid transfer function. This mimics the native Spotify limiter, offering greater dynamic range than the old limiter, that just reduced overall gain to prevent clipping. - Make the configurable threshold also apply to the old limiter, which is still available. Resolves: librespot-org#608
Usage: `--format {F32|S16}`. Default is F32. - Implemented for all backends, except for JACK audio which itself only supports 32-bit output at this time. Setting JACK audio to S16 will panic and instruct the user to set output to F32. - The F32 default works fine for Rodio on macOS, but not on Raspian 10 with Alsa as host. Therefore users on Linux systems are warned to set output to S16 in case of garbled sound with Rodio. This seems an issue with cpal incorrectly detecting the output stream format. - While at it, DRY up lots of code in the backends and by that virtue, also enable OggData passthrough on the subprocess backend. - I tested Rodio, ALSA, pipe and subprocess quite a bit, and call on others to join in and test the other backends.
bc05e7f
to
3837cc2
Compare
3837cc2
to
a4ef174
Compare
@JasonLG1979 continuing from #608:
Alsa works great in 32-bit float, just not through the current cpal and Rodio. So be sure to launch with
Sure, should be easy enough to do now the plumbing is there. In the meantime does adding
It's in the back of my mind. This looks like a promising route: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3d233fedc8ed595a1e88e815d23cd009 |
What's the advantage of |
Only sound card compatibility. This is for final output to the driver only, internally all samples continue to be stored in I would use a helper function instead of implementing |
@roderickvd the problem could have very well been fixed between the time I cloned and compiled last and now. I'll have to set up a cross compile environment. Compiling directly on a Pi Zero is for the birds.
I compiled with the alsa backend and did use the
I generally use a custom |
I just cloned and built your branch on my desktop (it takes less than a min to compile which is nice compared to hours for the Pi Zero) and everything seems to work great. PulseAudio has no problem with 32bit float. Edit: It runs great on my desktop. I have yet to compile on my Pi Zero. |
While at it, add a small tweak when converting "silent" samples from float to integer. This ensures 0.0 converts to 0 and vice versa.
@JasonLG1979 thanks for verifying with PulseAudio. Could you also try Alsa now I've added support for |
I built it with the alsa backend. I'm not even sure what the point of having a PulseAudio specific backend is? PulseAudio when running is the "default" ALSA device for compatibility reasons. A separate PulseAudio backend seems redundant.
Sure I'll give it a shot on my desktop 1st and if that goes well on the Pi Zero. |
@roderickvd running just fine on a Pi Zero. 15 - 20% CPU during normal playback and maybe 50 - 60% while fetching the next song during playback. I'd call that a success, it's not like you're going to be multitasking on a Pi Zero. If it matters here are the args I used:
|
The limiter seems to also work pretty well. Totally unscientific A/B testing between the official Spotify desktop client and your branch on my desktop with Thanks. Edit: I would rename "Steepness" to "Knee". Attack, Release and Knee are common words used to describe compressor/limiter parameters. I've never seen Knee called Steepness. |
Well although I'm no PulseAudio expert, I do think PA offers things like networking support and per-application volume control right? Which sets it apart from the Alsa kernel it builds on.
So just checking: this is on Alsa, not on Rodio or PulseAudio?
Righteous!
That's a good suggestion. |
I'm not a PulseAudio expert either. Maybe? I will build the branch with the PulseAudio backend and test it on my desktop.
Yes ALSA. I'd never run PulseAudio on a headless Pi Zero, I'm not a masochist,lol!!! PulseAudio is not designed to run as a system service, but as a per user service. I generally run librespot as a system level service (with restricted permissions ofc) on headless Pi's.
I thought so,lol!!!
As far as the knee goes I think the steepness is about right, nice and middle of the road. And I'd still use steepness in the description. |
If they're going to do a PulseAudio backend though they should do it right and set the Pulseaudio Application Properties |
With "they" you mean this project? The PA application and stream name are set at compile-time. Which properties are you missing? I'm not sure what you're getting at but you can consider opening a separate issue. |
It's not a must. PulseAudio is generally used on desktop systems with Desktop Environments and all the associated settings UI's and complex BS. PulseAudio properties allow things like telling the DE that librespot is a music player and have it's name show up in the sound settings volume panel. It's also a way to send metadata like track title and whatnot when using PulseAudio as a network streamer. (Not that anyone really uses that functionality in PulseAudio, the network stuff I mean) Something like this, ofc it's a app with a UI and it's in python, but anyway: https://github.com/pithos/pithos/blob/master/pithos/application.py#L42-L46 |
@roderickvd PulseAudio looks good to me. Seems to work as well as ALSA as far as I can tell. |
One thing you might consider though is defaulting to 16bit so you don't break librespot for people that upgrade from an existing install that up until this point was 16bit. 16bit is also the most commonly supported format. In my mind that means that librespot would work out of the box for more people. |
Great, thanks for testing. Call on other lurkers to test the other backends as well!
Yes I was thinking the same. Interestingly though JACK audio supports F32 and not S16. Previously, samples were reformatted from S16 to F32 without the user even knowing. Currently in my branch What do you think?
Yesterday I was slamming my head why I was only hearing white noise. Finally I gave up, then in bed I realized I had actually implemented the three-byte S24_3 array instead of the four-byte S24 array (zero-shifting all bits by eight) 😑 |
Yes it does. |
Can someone verify there is no regression in the JACK Audio backend? I can't get playback and am unsure if I'm doing this right. I've got a server set up in working order; running Running jackd2 (jackdmp version 1.9.12 tmpdir /dev/shm protocol 8) as: |
I reached out for help on the JackAudio mailing list, hoping they'll chime in. As for SDL, could you give me a free pass? From a code walkthrough I believe it should work fine, but it feels like a burden to write an entire SDL shim just to test playback. That said I feel like this PR is ready for merge. Are there any obstacles or further points? |
Have just tested the Jack audio backend on my mac. My ears aren't good enough to tell the difference, but it appears to run fine. As for SDL, I'm fine with it being merged as is, don't think it warrants making a shim as you mentioned. Happy to merge if you are and there's no further feedback. |
That's great man. I've got one little optimisation I'll commit this evening and let you know. |
@sashahilton00 How long until this ends up in a release once it's merged? |
175ba02
to
d0ea963
Compare
For Rodio, this fixes garbled sound on some but not all Alsa hosts.
All done and happy to merge! Hold my beer as I'm very close to opening a follow-up PR with configurable dithering and noise shaping. |
Merged. Please update the wiki with any new CLI args and corresponding documentation as necessary. |
Done! |
@roderickvd Great work! Nowadays, librespot (e. g. on a cheapo Raspi Zero) is probably better than Spotify Connect in some >1000 USD/EUR AVR. (I have done the initial and very naive normalization implementation, sorry for that 😉) |
A voice from the future here. Thank you @JasonLG1979 and @roderickvd for recognizing the problem and spending the time to improve it. My (noob) take on this endeavor is this: "while Spotify may only transmit 16 bit information, the internal librespot code involved in gain adjustment may decrease resolution/precision of those bits resulting in sound quality degradation. Adjusting the FORMAT parameter will eliminate this decrease (provided your DAC can handle 24 or 32 bits)." Perhaps the document can reflect this better? Sounds (pun intended) kinda important. I didn't understand the config parameter and googled my way to thread. I was relying on https://github.com/librespot-org/librespot/wiki/Options for information. Cheers |
Enhancements:
Store and handle samples as 32-bit floats instead of 16-bit integers. This provides 24-25 bits of transparency, allowing for 48-54 dB of headroom to do volume control and normalisation without throwing away bits or dropping dynamic range below 96 dB CD quality.
Perform volume control and normalisation in 64-bit arithmetic for minimum quantisation noise.
Output to 32-bit float, 32-bit integer, 24-bit integer (both in padded 32-bit words and as three-byte arrays) or 16-bit integer (default), as specified on the command line.
Add a dynamic limiter with configurable threshold, attack time, release or decay time, and steepness for the sigmoid transfer function. This mimics the native Spotify limiter, offering greater dynamic range than the old limiter, that just reduced overall gain to prevent clipping.
Make the configurable threshold also apply to the old limiter, which is still available.
DRY-ups of a lot of audio backend code, at the same time enabling
OggData
passthrough on the subprocess backend.Resolves: #608
Notes:
For the dynamic limiter, steepness between 0.5 and 2.0 work well. The default of 1.0 yields a linear function; > 1.0 rolls off softly, is steeper midway and < 1.0 has a sharp initial response, then gentler midway. Feedback on optimisation of these parameters is welcome.
Compiling
with-vorbis
works, but panics. This was already the case and is not new to this PR. See: UB with std::mem::zeroed tomaka/vorbis-rs#19To do:
normalisation-steepness
tonormalisation-knee
f32
then roundPending refactoring:
DRY up sample conversion in PortAudio and SDL backendsEdit: Not much to be gained, dropped idea
UseTryFrom
idiom instead ofAudioPacket::f32_to_<type>()
helper functionsEdit: Moved sample conversion into separate struct
Self
instead of full struct and enum names onconfig.rs
All done!
Test status:
All backends compile successfully, but require testing.
Rodio on Raspian 10 (Raspberry Pi 3 Model B+) does not open the output correctly. See issue at: Alsa output opened incorrectly RustAudio/cpal#564. This seems to be on Alsa only, no issues on macOS Big Sur, and not strictly related to this PR.
Panics on Alsa and macOS but that's already the case in
dev
.Free pass granted by @sashahilton00.