Speed changes with scaletempo2 can cause audio/video desynchronization #12028

ferreum · 2023-07-26T17:22:21Z

Important Information

mpv version: 0.36.0, also observed in 0.35.0 and current git at bc96b23
Arch Linux, up-to-date
installed from arch package repositories / AUR

Reproduction steps

Find a video where audio/video sync can be observed well (e.g. https://www.youtube.com/watch?v=jpKrrchVRGE)
Open the video with mpv --no-config --speed=1.01 <url>
Quickly change the playback speed back and forth between 1.01x and 2.02x using then } and { bindings. I had success by switching back and forth around twice a second or more, that is, at least 4 key presses a second.
Do this for a while as the video plays. Make sure speed does not go to 1x, as that resets the scaletempo2 filter. Do not seek the video, as that resets playback too.
After doing this for about 30-50 seconds of the video, let it play at 1.01x and check the a/v sync.

Expected behavior

Audio/Video stays synchronized. In the example video above, the numbers are spoken exactly when they first appear.
This does work when specifying --af=scaletempo or --af=rubberband.

Actual behavior

Audio is played back ahead of the video. In the example video above, the numbers are spoken before they appear, easily by 0.5s, but 1s or more is possible.

Note: when reverting to 1x speed, the scaletempo2 filter is removed, which fixes the desync. There are different ways this plays out:

sometimes, the video is forwarded to the correct time, catching up to the audio position
other times, the audio cuts out or repeats a short moment while the video plays normally

Log file

mpv-log.txt

Sample files

So far I could reproduce this in any video I want. See the countdown video linked above for easy tests.

Additional info

The scaletempo2 code went over my head, but I tried changing random things. I was lucky to find that removing this piece of code fixes the desynchronization:

diff --git audio/filter/af_scaletempo2.c audio/filter/af_scaletempo2.c
index 1a822ecd50..bbed7d3e9d 100644
--- audio/filter/af_scaletempo2.c
+++ audio/filter/af_scaletempo2.c
@@ -111,11 +111,6 @@ static void process(struct mp_filter *f)
         double pts = mp_aframe_get_pts(p->pending);
         p->frame_delay -= out_samples * p->speed;

-        if (pts != MP_NOPTS_VALUE) {
-            double delay = p->frame_delay / mp_aframe_get_effective_rate(out);
-            mp_aframe_set_pts(out, pts - delay);
-        }
-
         mp_aframe_set_size(out, out_samples);
         mp_aframe_mul_speed(out, p->speed);
         mp_pin_in_write(f->ppins[1], MAKE_FRAME(MP_FRAME_AUDIO, out));

I assume removing this code would break other situations, but I hope this helps someone find the problem.

Relates to #6797 and my script https://github.com/ferreum/mpv-skipsilence, where the frequent speed changes cause this problem too.

The text was updated successfully, but these errors were encountered:

christoph-heinrich · 2023-07-26T21:47:47Z

Can reproduce when using the linked script with linked video with pretty aggressive settings (ramp_constant = 4 to immediately hit max speed). It slowly de-syncs over time, I have to watch 10s+ for it to become very noticeable.

How quickly it gets out of sync depends on video-sync, but it ends up de-syncing with all of them.

P.S. Please add that script to the wiki. Hopefully it will eventually become as good as the feature in NewPipe.

Fixes mpv-player#12028, audio-video desynchronization caused by changing speed. There was an additional issue that audio was always delayed by half the configured search-interval. Include `ola_hop_size` in the delay to compensate for that. Notes: - Every WSOLA iteration advances the input buffer by _some amount_, and produces data in the output buffer always of size `ola_hop_size`. - `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size` - Thus, the rendered frames are always cleared immediately after processing, and `num_complete_frames` is 0 in the delay calculation. - The input buffer expression makes sense as the header comment states, "target_block is the 'natural' continuation of the output". The delay comes from the length of audio that the filter is holding back. - The factors contributing to delay are: - the pending samples in the input buffer, - the pending rendered samples in the output buffer, and - an amount of `ola_hop_size` The frame_delay code looked like that of the rubberband filter, which might not work for scaletempo2. Sometimes a different amount of input audio was consumed by scaletempo2 than expected. It may have been caused by speed changes being a more dynamic process in scaletempo2. This can be seen by where `playback_rate` is used in `run_one_wsola_iteration`: `playback_rate` is only referenced after the iteration, when updating the time and removing old data from buffers. In scaletempo2, the playback speed is applied by changing the amount the search window is moved. That apparently averages out correctly at constant playback speed, but when the speed changes, the error in this assumption probably spikes. This error accumulated across all speed changes because of the persistent `frame_delay` value. With the removal of the persistent `frame_delay`, there should be no way for the audio to drift off. By deriving the delay from filter buffer positions, and the buffers are filled only as much as needed, the delay always stays within buffer bounds.

Fixes mpv-player#12028, audio-video desynchronization caused by changing speed. There was an additional issue that audio was always delayed by half the configured search-interval. Include `ola_hop_size` in the delay to compensate for that. Notes: - Every WSOLA iteration advances the input buffer by _some amount_, and produces data in the output buffer always of size `ola_hop_size`. - `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size` - Thus, the rendered frames are always cleared immediately after processing, and `num_complete_frames` is 0 in the delay calculation. - The factors contributing to delay are: - the pending samples in the input buffer, - the pending rendered samples in the output buffer, and - an amount of `ola_hop_size` The frame_delay code looked like that of the rubberband filter, which might not work for scaletempo2. Sometimes a different amount of input audio was consumed by scaletempo2 than expected. It may have been caused by speed changes being a more dynamic process in scaletempo2. This can be seen by where `playback_rate` is used in `run_one_wsola_iteration`: `playback_rate` is only referenced after the iteration, when updating the time and removing old data from buffers. In scaletempo2, the playback speed is applied by changing the amount the search window is moved. That apparently averages out correctly at constant playback speed, but when the speed changes, the error in this assumption probably spikes. This error accumulated across all speed changes because of the persistent `frame_delay` value. With the removal of the persistent `frame_delay`, there should be no way for the audio to drift off. By deriving the delay from filter buffer positions, and the buffers are filled only as much as needed, the delay always stays within buffer bounds.

Fixes mpv-player#12028, audio-video desynchronization caused by changing speed. There was an additional issue that audio was always delayed by half the configured search-interval. Include `ola_hop_size` in the delay to compensate for that. Notes: - Every WSOLA iteration advances the input buffer by _some amount_, and produces data in the output buffer always of size `ola_hop_size`. - `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size` - Thus, the rendered frames are always cleared immediately after processing, and `num_complete_frames` is 0 in the delay calculation. - The factors contributing to delay are: - the pending samples in the input buffer according to the search block position, - the pending rendered samples in the output buffer, and - an amount of `ola_hop_size` on the output side - The because the optimal block can be anywhere in the search block, calculate the delay according to the average position, or start of the center window. The frame_delay code looked like that of the rubberband filter, which might not work for scaletempo2. Sometimes a different amount of input audio was consumed by scaletempo2 than expected. It may have been caused by speed changes being a more dynamic process in scaletempo2. This can be seen by where `playback_rate` is used in `run_one_wsola_iteration`: `playback_rate` is only referenced after the iteration, when updating the time and removing old data from buffers. In scaletempo2, the playback speed is applied by changing the amount the search block is moved. That apparently averages out correctly at constant playback speed, but when the speed changes, the error in this assumption probably spikes. This error accumulated across all speed changes because of the persistent `frame_delay` value. With the removal of the persistent `frame_delay`, there should be no way for the audio to drift off. By deriving the delay from filter buffer positions, and the buffers are filled only as much as needed, the delay always stays within buffer bounds.

Fixes mpv-player#12028, audio-video desynchronization caused by changing speed. There was an additional issue that audio was always delayed by half the configured search-interval. Include `ola_hop_size` in the delay to compensate for that. Notes: - Every WSOLA iteration advances the input buffer by _some amount_, and produces data in the output buffer always of size `ola_hop_size`. - `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size` - Thus, the rendered frames are always cleared immediately after processing, and `num_complete_frames` is 0 in the delay calculation. - The factors contributing to delay are: - the pending samples in the input buffer according to the search block position, - the pending rendered samples in the output buffer, and - an amount of `ola_hop_size` on the output side - Because the optimal block can be anywhere in the search block, calculate the delay according to the average position, or start of the center window. The frame_delay code looked like that of the rubberband filter, which might not work for scaletempo2. Sometimes a different amount of input audio was consumed by scaletempo2 than expected. It may have been caused by speed changes being a more dynamic process in scaletempo2. This can be seen by where `playback_rate` is used in `run_one_wsola_iteration`: `playback_rate` is only referenced after the iteration, when updating the time and removing old data from buffers. In scaletempo2, the playback speed is applied by changing the amount the search block is moved. That apparently averages out correctly at constant playback speed, but when the speed changes, the error in this assumption probably spikes. This error accumulated across all speed changes because of the persistent `frame_delay` value. With the removal of the persistent `frame_delay`, there should be no way for the audio to drift off. By deriving the delay from filter buffer positions, and the buffers are filled only as much as needed, the delay always stays within buffer bounds.

Fixes mpv-player#12028 There was an additional issue that audio was always delayed by half the configured search-interval. This was caused by the `out` buffer length not being included in the delay calculation. Notes: - Every WSOLA iteration advances the input buffer by _some amount_, and produces data in the output buffer always of size `ola_hop_size`. - `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size` - Thus, the rendered frames are always cleared immediately after processing, and `num_complete_frames` is 0 in the delay calculation. - The factors contributing to delay are: - the pending samples in the input buffer according to the search block position, and - the pending rendered samples in the output buffer (always empty in practice). The frame_delay code looked like that of the rubberband filter, which might not work for scaletempo2. Sometimes a different amount of input audio was consumed by scaletempo2 than expected. It may have been caused by speed changes being a more dynamic process in scaletempo2. This can be seen by where `playback_rate` is used in `run_one_wsola_iteration`: `playback_rate` is only referenced after the iteration, when updating the time and removing old data from buffers. In scaletempo2, the playback speed is applied by changing the amount the search block is moved. That apparently averages out correctly at constant playback speed, but when the speed changes, the error in this assumption probably spikes. This error accumulated across all speed changes because of the persistent `frame_delay` value. With the removal of the persistent `frame_delay`, there should be no way for the audio to drift off. By deriving the delay from filter buffer positions, and the buffers are filled only as much as needed, the delay always stays within buffer bounds.

Fixes #12028 There was an additional issue that audio was always delayed by half the configured search-interval. This was caused by the `out` buffer length not being included in the delay calculation. Notes: - Every WSOLA iteration advances the input buffer by _some amount_, and produces data in the output buffer always of size `ola_hop_size`. - `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size` - Thus, the rendered frames are always cleared immediately after processing, and `num_complete_frames` is 0 in the delay calculation. - The factors contributing to delay are: - the pending samples in the input buffer according to the search block position, and - the pending rendered samples in the output buffer (always empty in practice). The frame_delay code looked like that of the rubberband filter, which might not work for scaletempo2. Sometimes a different amount of input audio was consumed by scaletempo2 than expected. It may have been caused by speed changes being a more dynamic process in scaletempo2. This can be seen by where `playback_rate` is used in `run_one_wsola_iteration`: `playback_rate` is only referenced after the iteration, when updating the time and removing old data from buffers. In scaletempo2, the playback speed is applied by changing the amount the search block is moved. That apparently averages out correctly at constant playback speed, but when the speed changes, the error in this assumption probably spikes. This error accumulated across all speed changes because of the persistent `frame_delay` value. With the removal of the persistent `frame_delay`, there should be no way for the audio to drift off. By deriving the delay from filter buffer positions, and the buffers are filled only as much as needed, the delay always stays within buffer bounds.

ferreum changed the title ~~Speed change with scaletempo2 can cause audio/video desynchronization~~ Speed changes with scaletempo2 can cause audio/video desynchronization Jul 26, 2023

ferreum mentioned this issue Jul 30, 2023

af_scaletempo2: fix audio-video de-sync caused by speed changes #12052

Merged

christoph-heinrich mentioned this issue Sep 2, 2023

gpu-next: flicker with display-resample and interpolation #12316

Closed

haasn closed this as completed in #12052 Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed changes with scaletempo2 can cause audio/video desynchronization #12028

Speed changes with scaletempo2 can cause audio/video desynchronization #12028

ferreum commented Jul 26, 2023 •

edited

Loading

christoph-heinrich commented Jul 26, 2023

Speed changes with scaletempo2 can cause audio/video desynchronization #12028

Speed changes with scaletempo2 can cause audio/video desynchronization #12028

Comments

ferreum commented Jul 26, 2023 • edited Loading

Important Information

Reproduction steps

Expected behavior

Actual behavior

Log file

Sample files

Additional info

christoph-heinrich commented Jul 26, 2023

ferreum commented Jul 26, 2023 •

edited

Loading