Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed changes with scaletempo2 can cause audio/video desynchronization #12028

Closed
ferreum opened this issue Jul 26, 2023 · 1 comment · Fixed by #12052
Closed

Speed changes with scaletempo2 can cause audio/video desynchronization #12028

ferreum opened this issue Jul 26, 2023 · 1 comment · Fixed by #12052

Comments

@ferreum
Copy link
Contributor

ferreum commented Jul 26, 2023

Important Information

  • mpv version: 0.36.0, also observed in 0.35.0 and current git at bc96b23
  • Arch Linux, up-to-date
  • installed from arch package repositories / AUR

Reproduction steps

  1. Find a video where audio/video sync can be observed well (e.g. https://www.youtube.com/watch?v=jpKrrchVRGE)
  2. Open the video with mpv --no-config --speed=1.01 <url>
  3. Quickly change the playback speed back and forth between 1.01x and 2.02x using then } and { bindings. I had success by switching back and forth around twice a second or more, that is, at least 4 key presses a second.
  4. Do this for a while as the video plays. Make sure speed does not go to 1x, as that resets the scaletempo2 filter. Do not seek the video, as that resets playback too.
  5. After doing this for about 30-50 seconds of the video, let it play at 1.01x and check the a/v sync.

Expected behavior

Audio/Video stays synchronized. In the example video above, the numbers are spoken exactly when they first appear.
This does work when specifying --af=scaletempo or --af=rubberband.

Actual behavior

Audio is played back ahead of the video. In the example video above, the numbers are spoken before they appear, easily by 0.5s, but 1s or more is possible.

Note: when reverting to 1x speed, the scaletempo2 filter is removed, which fixes the desync. There are different ways this plays out:

  • sometimes, the video is forwarded to the correct time, catching up to the audio position
  • other times, the audio cuts out or repeats a short moment while the video plays normally

Log file

mpv-log.txt

Sample files

So far I could reproduce this in any video I want. See the countdown video linked above for easy tests.

Additional info

The scaletempo2 code went over my head, but I tried changing random things. I was lucky to find that removing this piece of code fixes the desynchronization:

diff --git audio/filter/af_scaletempo2.c audio/filter/af_scaletempo2.c
index 1a822ecd50..bbed7d3e9d 100644
--- audio/filter/af_scaletempo2.c
+++ audio/filter/af_scaletempo2.c
@@ -111,11 +111,6 @@ static void process(struct mp_filter *f)
         double pts = mp_aframe_get_pts(p->pending);
         p->frame_delay -= out_samples * p->speed;

-        if (pts != MP_NOPTS_VALUE) {
-            double delay = p->frame_delay / mp_aframe_get_effective_rate(out);
-            mp_aframe_set_pts(out, pts - delay);
-        }
-
         mp_aframe_set_size(out, out_samples);
         mp_aframe_mul_speed(out, p->speed);
         mp_pin_in_write(f->ppins[1], MAKE_FRAME(MP_FRAME_AUDIO, out));

I assume removing this code would break other situations, but I hope this helps someone find the problem.

Relates to #6797 and my script https://github.com/ferreum/mpv-skipsilence, where the frequent speed changes cause this problem too.

@ferreum ferreum changed the title Speed change with scaletempo2 can cause audio/video desynchronization Speed changes with scaletempo2 can cause audio/video desynchronization Jul 26, 2023
@christoph-heinrich
Copy link
Contributor

Can reproduce when using the linked script with linked video with pretty aggressive settings (ramp_constant = 4 to immediately hit max speed). It slowly de-syncs over time, I have to watch 10s+ for it to become very noticeable.

How quickly it gets out of sync depends on video-sync, but it ends up de-syncing with all of them.

P.S. Please add that script to the wiki. Hopefully it will eventually become as good as the feature in NewPipe.

ferreum added a commit to ferreum/mpv that referenced this issue Jul 30, 2023
Fixes mpv-player#12028, audio-video desynchronization caused by changing speed.

There was an additional issue that audio was always delayed by half the
configured search-interval. Include `ola_hop_size` in the delay
to compensate for that.

Notes:
- Every WSOLA iteration advances the input buffer by _some amount_, and
  produces data in the output buffer always of size `ola_hop_size`.
- `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size`
- Thus, the rendered frames are always cleared immediately after processing,
  and `num_complete_frames` is 0 in the delay calculation.
- The input buffer expression makes sense as the header comment states,
  "target_block is the 'natural' continuation of the output". The delay
  comes from the length of audio that the filter is holding back.
- The factors contributing to delay are:
  - the pending samples in the input buffer,
  - the pending rendered samples in the output buffer, and
  - an amount of `ola_hop_size`

The frame_delay code looked like that of the rubberband filter, which
might not work for scaletempo2. Sometimes a different amount of input
audio was consumed by scaletempo2 than expected. It may have been caused
by speed changes being a more dynamic process in scaletempo2. This can
be seen by where `playback_rate` is used in `run_one_wsola_iteration`:
`playback_rate` is only referenced after the iteration, when updating
the time and removing old data from buffers.

In scaletempo2, the playback speed is applied by changing the amount the
search window is moved. That apparently averages out correctly at
constant playback speed, but when the speed changes, the error in this
assumption probably spikes. This error accumulated across all speed
changes because of the persistent `frame_delay` value.

With the removal of the persistent `frame_delay`, there should be no way
for the audio to drift off. By deriving the delay from filter buffer
positions, and the buffers are filled only as much as needed, the delay
always stays within buffer bounds.
ferreum added a commit to ferreum/mpv that referenced this issue Aug 7, 2023
Fixes mpv-player#12028, audio-video desynchronization caused by changing speed.

There was an additional issue that audio was always delayed by half the
configured search-interval. Include `ola_hop_size` in the delay
to compensate for that.

Notes:
- Every WSOLA iteration advances the input buffer by _some amount_, and
  produces data in the output buffer always of size `ola_hop_size`.
- `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size`
- Thus, the rendered frames are always cleared immediately after processing,
  and `num_complete_frames` is 0 in the delay calculation.
- The factors contributing to delay are:
  - the pending samples in the input buffer,
  - the pending rendered samples in the output buffer, and
  - an amount of `ola_hop_size`

The frame_delay code looked like that of the rubberband filter, which
might not work for scaletempo2. Sometimes a different amount of input
audio was consumed by scaletempo2 than expected. It may have been caused
by speed changes being a more dynamic process in scaletempo2. This can
be seen by where `playback_rate` is used in `run_one_wsola_iteration`:
`playback_rate` is only referenced after the iteration, when updating
the time and removing old data from buffers.

In scaletempo2, the playback speed is applied by changing the amount the
search window is moved. That apparently averages out correctly at
constant playback speed, but when the speed changes, the error in this
assumption probably spikes. This error accumulated across all speed
changes because of the persistent `frame_delay` value.

With the removal of the persistent `frame_delay`, there should be no way
for the audio to drift off. By deriving the delay from filter buffer
positions, and the buffers are filled only as much as needed, the delay
always stays within buffer bounds.
ferreum added a commit to ferreum/mpv that referenced this issue Aug 7, 2023
Fixes mpv-player#12028, audio-video desynchronization caused by changing speed.

There was an additional issue that audio was always delayed by half the
configured search-interval. Include `ola_hop_size` in the delay
to compensate for that.

Notes:
- Every WSOLA iteration advances the input buffer by _some amount_, and
  produces data in the output buffer always of size `ola_hop_size`.
- `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size`
- Thus, the rendered frames are always cleared immediately after processing,
  and `num_complete_frames` is 0 in the delay calculation.
- The factors contributing to delay are:
  - the pending samples in the input buffer,
  - the pending rendered samples in the output buffer, and
  - an amount of `ola_hop_size`

The frame_delay code looked like that of the rubberband filter, which
might not work for scaletempo2. Sometimes a different amount of input
audio was consumed by scaletempo2 than expected. It may have been caused
by speed changes being a more dynamic process in scaletempo2. This can
be seen by where `playback_rate` is used in `run_one_wsola_iteration`:
`playback_rate` is only referenced after the iteration, when updating
the time and removing old data from buffers.

In scaletempo2, the playback speed is applied by changing the amount the
search window is moved. That apparently averages out correctly at
constant playback speed, but when the speed changes, the error in this
assumption probably spikes. This error accumulated across all speed
changes because of the persistent `frame_delay` value.

With the removal of the persistent `frame_delay`, there should be no way
for the audio to drift off. By deriving the delay from filter buffer
positions, and the buffers are filled only as much as needed, the delay
always stays within buffer bounds.
ferreum added a commit to ferreum/mpv that referenced this issue Aug 7, 2023
Fixes mpv-player#12028, audio-video desynchronization caused by changing speed.

There was an additional issue that audio was always delayed by half the
configured search-interval. Include `ola_hop_size` in the delay
to compensate for that.

Notes:
- Every WSOLA iteration advances the input buffer by _some amount_, and
  produces data in the output buffer always of size `ola_hop_size`.
- `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size`
- Thus, the rendered frames are always cleared immediately after
  processing, and `num_complete_frames` is 0 in the delay calculation.
- The factors contributing to delay are:
  - the pending samples in the input buffer according to the search
    block position,
  - the pending rendered samples in the output buffer, and
  - an amount of `ola_hop_size` on the output side
- The because the optimal block can be anywhere in the search block,
  calculate the delay according to the average position, or start of the
  center window.

The frame_delay code looked like that of the rubberband filter, which
might not work for scaletempo2. Sometimes a different amount of input
audio was consumed by scaletempo2 than expected. It may have been caused
by speed changes being a more dynamic process in scaletempo2. This can
be seen by where `playback_rate` is used in `run_one_wsola_iteration`:
`playback_rate` is only referenced after the iteration, when updating
the time and removing old data from buffers.

In scaletempo2, the playback speed is applied by changing the amount the
search block is moved. That apparently averages out correctly at
constant playback speed, but when the speed changes, the error in this
assumption probably spikes. This error accumulated across all speed
changes because of the persistent `frame_delay` value.

With the removal of the persistent `frame_delay`, there should be no way
for the audio to drift off. By deriving the delay from filter buffer
positions, and the buffers are filled only as much as needed, the delay
always stays within buffer bounds.
ferreum added a commit to ferreum/mpv that referenced this issue Aug 7, 2023
Fixes mpv-player#12028, audio-video desynchronization caused by changing speed.

There was an additional issue that audio was always delayed by half the
configured search-interval. Include `ola_hop_size` in the delay
to compensate for that.

Notes:
- Every WSOLA iteration advances the input buffer by _some amount_, and
  produces data in the output buffer always of size `ola_hop_size`.
- `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size`
- Thus, the rendered frames are always cleared immediately after
  processing, and `num_complete_frames` is 0 in the delay calculation.
- The factors contributing to delay are:
  - the pending samples in the input buffer according to the search
    block position,
  - the pending rendered samples in the output buffer, and
  - an amount of `ola_hop_size` on the output side
- Because the optimal block can be anywhere in the search block,
  calculate the delay according to the average position, or start of the
  center window.

The frame_delay code looked like that of the rubberband filter, which
might not work for scaletempo2. Sometimes a different amount of input
audio was consumed by scaletempo2 than expected. It may have been caused
by speed changes being a more dynamic process in scaletempo2. This can
be seen by where `playback_rate` is used in `run_one_wsola_iteration`:
`playback_rate` is only referenced after the iteration, when updating
the time and removing old data from buffers.

In scaletempo2, the playback speed is applied by changing the amount the
search block is moved. That apparently averages out correctly at
constant playback speed, but when the speed changes, the error in this
assumption probably spikes. This error accumulated across all speed
changes because of the persistent `frame_delay` value.

With the removal of the persistent `frame_delay`, there should be no way
for the audio to drift off. By deriving the delay from filter buffer
positions, and the buffers are filled only as much as needed, the delay
always stays within buffer bounds.
ferreum added a commit to ferreum/mpv that referenced this issue Aug 7, 2023
Fixes mpv-player#12028, audio-video desynchronization caused by changing speed.

There was an additional issue that audio was always delayed by half the
configured search-interval. Include `ola_hop_size` in the delay
to compensate for that.

Notes:
- Every WSOLA iteration advances the input buffer by _some amount_, and
  produces data in the output buffer always of size `ola_hop_size`.
- `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size`
- Thus, the rendered frames are always cleared immediately after
  processing, and `num_complete_frames` is 0 in the delay calculation.
- The factors contributing to delay are:
  - the pending samples in the input buffer according to the search
    block position,
  - the pending rendered samples in the output buffer, and
  - an amount of `ola_hop_size` on the output side
- Because the optimal block can be anywhere in the search block,
  calculate the delay according to the average position, or start of the
  center window.

The frame_delay code looked like that of the rubberband filter, which
might not work for scaletempo2. Sometimes a different amount of input
audio was consumed by scaletempo2 than expected. It may have been caused
by speed changes being a more dynamic process in scaletempo2. This can
be seen by where `playback_rate` is used in `run_one_wsola_iteration`:
`playback_rate` is only referenced after the iteration, when updating
the time and removing old data from buffers.

In scaletempo2, the playback speed is applied by changing the amount the
search block is moved. That apparently averages out correctly at
constant playback speed, but when the speed changes, the error in this
assumption probably spikes. This error accumulated across all speed
changes because of the persistent `frame_delay` value.

With the removal of the persistent `frame_delay`, there should be no way
for the audio to drift off. By deriving the delay from filter buffer
positions, and the buffers are filled only as much as needed, the delay
always stays within buffer bounds.
ferreum added a commit to ferreum/mpv that referenced this issue Aug 8, 2023
Fixes mpv-player#12028, audio-video desynchronization caused by changing speed.

There was an additional issue that audio was always delayed by half the
configured search-interval. Include `ola_hop_size` in the delay
to compensate for that.

Notes:
- Every WSOLA iteration advances the input buffer by _some amount_, and
  produces data in the output buffer always of size `ola_hop_size`.
- `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size`
- Thus, the rendered frames are always cleared immediately after
  processing, and `num_complete_frames` is 0 in the delay calculation.
- The factors contributing to delay are:
  - the pending samples in the input buffer according to the search
    block position,
  - the pending rendered samples in the output buffer, and
  - an amount of `ola_hop_size` on the output side
- Because the optimal block can be anywhere in the search block,
  calculate the delay according to the average position, or start of the
  center window.

The frame_delay code looked like that of the rubberband filter, which
might not work for scaletempo2. Sometimes a different amount of input
audio was consumed by scaletempo2 than expected. It may have been caused
by speed changes being a more dynamic process in scaletempo2. This can
be seen by where `playback_rate` is used in `run_one_wsola_iteration`:
`playback_rate` is only referenced after the iteration, when updating
the time and removing old data from buffers.

In scaletempo2, the playback speed is applied by changing the amount the
search block is moved. That apparently averages out correctly at
constant playback speed, but when the speed changes, the error in this
assumption probably spikes. This error accumulated across all speed
changes because of the persistent `frame_delay` value.

With the removal of the persistent `frame_delay`, there should be no way
for the audio to drift off. By deriving the delay from filter buffer
positions, and the buffers are filled only as much as needed, the delay
always stays within buffer bounds.
ferreum added a commit to ferreum/mpv that referenced this issue Aug 8, 2023
Fixes mpv-player#12028

There was an additional issue that audio was always delayed by half the
configured search-interval. This was caused by the `out` buffer length
not being included in the delay calculation.

Notes:
- Every WSOLA iteration advances the input buffer by _some amount_, and
  produces data in the output buffer always of size `ola_hop_size`.
- `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size`
- Thus, the rendered frames are always cleared immediately after
  processing, and `num_complete_frames` is 0 in the delay calculation.
- The factors contributing to delay are:
  - the pending samples in the input buffer according to the search
    block position, and
  - the pending rendered samples in the output buffer (always empty in
    practice).

The frame_delay code looked like that of the rubberband filter, which
might not work for scaletempo2. Sometimes a different amount of input
audio was consumed by scaletempo2 than expected. It may have been caused
by speed changes being a more dynamic process in scaletempo2. This can
be seen by where `playback_rate` is used in `run_one_wsola_iteration`:
`playback_rate` is only referenced after the iteration, when updating
the time and removing old data from buffers.

In scaletempo2, the playback speed is applied by changing the amount the
search block is moved. That apparently averages out correctly at
constant playback speed, but when the speed changes, the error in this
assumption probably spikes. This error accumulated across all speed
changes because of the persistent `frame_delay` value.

With the removal of the persistent `frame_delay`, there should be no way
for the audio to drift off. By deriving the delay from filter buffer
positions, and the buffers are filled only as much as needed, the delay
always stays within buffer bounds.
haasn pushed a commit that referenced this issue Sep 20, 2023
Fixes #12028

There was an additional issue that audio was always delayed by half the
configured search-interval. This was caused by the `out` buffer length
not being included in the delay calculation.

Notes:
- Every WSOLA iteration advances the input buffer by _some amount_, and
  produces data in the output buffer always of size `ola_hop_size`.
- `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size`
- Thus, the rendered frames are always cleared immediately after
  processing, and `num_complete_frames` is 0 in the delay calculation.
- The factors contributing to delay are:
  - the pending samples in the input buffer according to the search
    block position, and
  - the pending rendered samples in the output buffer (always empty in
    practice).

The frame_delay code looked like that of the rubberband filter, which
might not work for scaletempo2. Sometimes a different amount of input
audio was consumed by scaletempo2 than expected. It may have been caused
by speed changes being a more dynamic process in scaletempo2. This can
be seen by where `playback_rate` is used in `run_one_wsola_iteration`:
`playback_rate` is only referenced after the iteration, when updating
the time and removing old data from buffers.

In scaletempo2, the playback speed is applied by changing the amount the
search block is moved. That apparently averages out correctly at
constant playback speed, but when the speed changes, the error in this
assumption probably spikes. This error accumulated across all speed
changes because of the persistent `frame_delay` value.

With the removal of the persistent `frame_delay`, there should be no way
for the audio to drift off. By deriving the delay from filter buffer
positions, and the buffers are filled only as much as needed, the delay
always stays within buffer bounds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants