Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A/V sync regression after dynamic ad break on HLS stream #4589

Closed
fredrik-telia opened this issue Oct 18, 2022 · 47 comments · Fixed by #4870
Closed

A/V sync regression after dynamic ad break on HLS stream #4589

fredrik-telia opened this issue Oct 18, 2022 · 47 comments · Fixed by #4870
Labels
component: HLS The issue involves Apple's HLS manifest format priority: P0 Broken for everyone; no workaround; urgent status: archived Archived and locked; will not be updated type: bug Something isn't working correctly
Milestone

Comments

@fredrik-telia
Copy link
Contributor

Have you read the FAQ and checked for duplicate open issues?
yes

What version of Shaka Player are you using?
nightly

Can you reproduce the issue with our latest release version?
yes, 4.2.2

Can you reproduce the issue with the latest code from main?
yes

Are you using the demo app or your own custom app?
demo app

If custom app, can you reproduce the issue using our demo app?

What browser and OS are you using?
Chrome, macOS

For embedded devices (smart TVs, etc.), what model and firmware version are you using?

What are the manifest and license server URIs?

Will send in separate email

What configuration are you using? What is the output of player.getConfiguration()?

default demo app config

What did you do?

Add custom content, play, wait until ad break is over (there will be a timestamp on screen when the "real" program is playing)

Observe A/V sync

What did you expect to happen?
A/V to be in sync after ad break

What actually happened?

Audio and video are out of sync, and will get more out of sync after successive ad breaks.

We were hoping the issues would be solved by #4308 but unfortunately this doesn't seem to be the case.

Using an older version ( https://v3-3-11-dot-shaka-player-demo.appspot.com/demo/ ) A/V are still in sync after the ad break.
Using the nightly version ( https://nightly-dot-shaka-player-demo.appspot.com/demo/ ) A/V are out of sync after the ad break.

@fredrik-telia fredrik-telia added the type: bug Something isn't working correctly label Oct 18, 2022
@github-actions github-actions bot added this to the v4.3 milestone Oct 18, 2022
@joeyparrish joeyparrish added component: HLS The issue involves Apple's HLS manifest format priority: P1 Big impact or workaround impractical; resolve before feature release labels Oct 19, 2022
@avelad avelad modified the milestones: v4.3, v4.4 Nov 11, 2022
@fredrik-telia
Copy link
Contributor Author

Still an issue in 4.3.0

@joeyparrish
Copy link
Member

Thanks for the update. HLS issues are the top priority right now, but we are still short-handed and the list of HLS issues is long. If you want to investigate and/or send a PR, that would be most welcome. If not, we appreciate your patience.

@fredrik-telia
Copy link
Contributor Author

fredrik-telia commented Nov 21, 2022

Having done some bisecting, these are my findings:

36d0b5484fad68dc1d640fbddf2fae3e1eb7169b - this commit has audio in sync, but recurring bufferings as shaka will jump to catch up with seek range

c438e857f2f122eb45899148e067d68ffec3477c - this commit will fix the buffering from the above commit, but audio will be out of sync after the ad break

I'll see if I can find out more

@joeyparrish
Copy link
Member

Taking over for @theodab.

@joeyparrish
Copy link
Member

joeyparrish commented Dec 12, 2022

AV sync is definitely off somewhat (audio seems 0.5s late?) during main content, but it is impossible to tell if it is off or not during ad breaks. The ads mostly use voiceovers or titles, so there is not much native language spoken content in the ads to judge AV sync. It could be that AV sync is universally off, not just off after an ad break.

This content has no EXT-X-PROGRAM-DATE-TIME, which is critical for AV sync. I strongly recommend using EXT-X-PROGRAM-DATE-TIME.

I caught one instance of the ads themselves getting badly out of sync. I'm going to analyze the ad segments, and I expect to find that they are misaligned at at least one point. There was an ad in English showing interviews about 66% through the break, which was very helpful in this case.

If an ad is not aligned on the right-hand side (audio ends earlier than video or vice-versa), we might need to realign the SourceBuffers after the discontinuity. I'm seeing "physical" evidence of this, too, in the debugger:

player.mediaSourceEngine_.sourceBuffers_.audio.timestampOffset -
  player.mediaSourceEngine_.sourceBuffers_.video.timestampOffset
2.0372580000000653

This number grows over time during ad breaks, and seems like a plausible measurement of desync. This would also make sense with my theory above (that audio ends earlier than video or vice-versa in certain ads). In sequence mode, if an application does not touch timestampOffset during playback, it behaves as an accumulator, showing how much data has actually been appended to the buffers. Here we see different amounts appended to audio and video over time.

Eventually, this number is great enough that we start getting warnings from the player:

Possible encoding problem detected! Unexpected buffered range for reference...

@joeyparrish joeyparrish added priority: P0 Broken for everyone; no workaround; urgent and removed priority: P1 Big impact or workaround impractical; resolve before feature release labels Dec 12, 2022
@joeyparrish
Copy link
Member

Good news! I've watched this several times, and my Swedish is improving. 🇸🇪

It appears that desync increases by roughly 50-100 ms per ad. Joining mid-ad-break results in less desync that joining during the main program and sitting through an entire ad break.

This content appears to be a loop of the same program and ad content each time. Each ad break seems to add to the desync by a little more than 1 second.

I'm now experimenting with a change that resynchronizes the SourceBuffers across discontinuity bounds, so that we don't accumulate errors in timestampOffset over time. It should be fairly obvious if the fix is working, since 1) the effect is so pronounced and 2) the effect can be measured in the debug console.

@joeyparrish
Copy link
Member

I've watched the loop several more times testing my fix, and I'm certain I heard Ingvar Kamprad mentioned twice on the antiques show. 🫖 🛋️ 🛒

My fix appears to work. The gist is that we track discontinuity sequence numbers for segment references, and resync timestampOffset to the theoretical segment start any time the discontinuity sequence number changes. I initially did this based on init segments changing, but then sync would fail for self-initializing segments. Using the actual discontinuity sequence number is more robust, IMO.

This resync is only done at these boundaries, though, and not on every segment. This way, if segment durations in the playlist are somewhat inaccurate, we don't introduce sync issues by slightly moving the segments within discontinuity boundaries. Sequence mode is allowed to work as it should up until a discontinuity.

joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue Dec 13, 2022
If server-side ad segments aren't aligned, AV could get out of sync by
accumulating errors in the timestampOffset of the SourceBuffers.

This fixes the issue by tracking discontinuity boundaries and
resetting timestampOffset to theoretical segment start times when a
boundary is crossed.

Closes shaka-project#4589
@joeyparrish
Copy link
Member

joeyparrish commented Dec 13, 2022

My fix works, however, after watching several more rounds, there is another source of accumulated error. This time, in the playlist itself.

I'm syncing with theoretical segment start times, but those are based on summed segment durations only. What we really need is to resync against PROGRAM-DATE-TIME, which this playlist doesn't have. I'm seeing something on the order of 100ms drift in this particular content over about an hour.

@fredrik-telia, can PROGRAM-DATE-TIME be added? What I've done so far is an improvement, but it's not perfect. If I have a sample with EXT-X-PROGRAM-DATE-TIME after each EXT-X-DISCONTINUITY, or even just after the discontinuity that brings us back into the main program, I can follow up on this change to make it stay in sync better over an arbitrarily long playback session.

@fredrik-telia
Copy link
Contributor Author

Thank you so much for your findings, hopefully we can add PROGRAM-DATE-TIME 🙏

@fredrik-telia
Copy link
Contributor Author

I believe the config for the test url has been updated now, might take a few minutes for the change to take effect

@joeyparrish
Copy link
Member

I'm not seeing PROGRAM-DATE-TIME in the playlists yet.

@joeyparrish
Copy link
Member

Not present when joining main content, and not added after ad discontinuities, either.

@joeyparrish
Copy link
Member

I'll go ahead and update the tests affected by the fix I have so far, and get that through review today if I can. That fix should be in an update this week.

If I still don't have DATE-TIME in the playlists at that point, I'll start working on a change to use that for continuous sync (instead of one-time startup sync as it stands today), but that change will be purely theoretical until I have content to verify it. This second fix may not make it into this week's updates, and may get pushed out to January while we are on holiday.

joeyparrish added a commit that referenced this issue Dec 13, 2022
If server-side ad segments aren't aligned, AV could get out of sync by
accumulating errors in the timestampOffset of the SourceBuffers.

This improves the issue by tracking discontinuity boundaries and
resetting timestampOffset to theoretical segment start times when a
boundary is crossed.

Issue #4589
joeyparrish added a commit that referenced this issue Jan 13, 2023
Discontinuity tracking was broken and test coverage was insufficient to
catch this. This fixes the parsing and counting of discontinuities, and
replaces two outdated and useless tests with a new one that covers the
counting properly.

One of the old tests was checking that a timestamp offset was set for
each discontinuity, but this had become irrelevant since the test was
written. Discontinuities do not have anything to do with timestamp
offsets in current versions of Shaka Player.

The other old test was checking that after a discontinuity, we didn't
fetch a segment to parse out the timestamp, but we stopped doing that
entirely in v4.

The new test checks that the initial discontinuity sequence number is
honored, and that after a discontinuity, the number goes up. It also
checks that the correct number is extracted after an update. This test
fails without the fix, and passes with it.

This bug affected v4.2.6 and v4.3.2 only.

Issue #4589
joeyparrish added a commit that referenced this issue Jan 13, 2023
After seeking or adaptation, StreamingEngine will create a new
SegmentIterator. However, if the buffered ranges are different enough
from the references in the iterator, the lookup in the index can produce
a duplicate segment. If this happens when sequence mode is active,
StreamingEngine will append the duplicate segment instead of overwriting
the end of the original one.

Any time a new iterator is created, we force the stream to resync, so
that unaligned segments or slightly-inaccurate segments are written to
the correct place in the buffer.

Issue #4589
joeyparrish added a commit that referenced this issue Jan 13, 2023
Before, tiny segments (~33ms in practice) and overwritten segments
(sometimes necessary) would register as encoding problems. This makes
the logic more robust and the error logs more reliable.

Issue #4589
joeyparrish added a commit that referenced this issue Jan 13, 2023
Make SegmentIndex robust against rounding errors so that we don't end up
with duplicate segments on merge. In sequence mode, this would cause a
massive AV sync issue of 1 segment duration.

Issue #4589
joeyparrish added a commit that referenced this issue Jan 13, 2023
Closes #4589

Backported to v4.2.x:
 - patched in lowestSyncTime_ from v4.3.x in HlsParser
joeyparrish added a commit that referenced this issue Jan 13, 2023
Closes #4589

Backported to v4.2.x:
 - patched in lowestSyncTime_ from v4.3.x in HlsParser
@joeyparrish
Copy link
Member

Perfectly in sync after 45 minutes with the latest fixes and yo.pdt=sync.

@joeyparrish
Copy link
Member

And I've seen these ad breaks so many times now, I really want to watch "Marko & Irma".

@martinstark
Copy link
Contributor

Thanks so much for the great work!

@spvishwas122
Copy link

#4589 (comment)
@joeyparrish
how do we set yo.pdt=sync?
is it a configuration parameter?

We are seeing AV sync issues in 4.3.4 as well.

@fredrik-telia
Copy link
Contributor Author

it is a query parameter on the url for this particular type of stream, it has nothing to do with shaka

Robloche pushed a commit to Robloche/shaka-player that referenced this issue Feb 8, 2023
If server-side ad segments aren't aligned, AV could get out of sync by
accumulating errors in the timestampOffset of the SourceBuffers.

This improves the issue by tracking discontinuity boundaries and
resetting timestampOffset to theoretical segment start times when a
boundary is crossed.

Issue shaka-project#4589
Robloche pushed a commit to Robloche/shaka-player that referenced this issue Feb 8, 2023
…t#4869)

Setting the timestampOffset requires an abort() in some cases on
Chromecast.

Issue shaka-project#4589
Robloche pushed a commit to Robloche/shaka-player that referenced this issue Feb 8, 2023
Discontinuity tracking was broken and test coverage was insufficient to
catch this. This fixes the parsing and counting of discontinuities, and
replaces two outdated and useless tests with a new one that covers the
counting properly.

One of the old tests was checking that a timestamp offset was set for
each discontinuity, but this had become irrelevant since the test was
written. Discontinuities do not have anything to do with timestamp
offsets in current versions of Shaka Player.

The other old test was checking that after a discontinuity, we didn't
fetch a segment to parse out the timestamp, but we stopped doing that
entirely in v4.

The new test checks that the initial discontinuity sequence number is
honored, and that after a discontinuity, the number goes up. It also
checks that the correct number is extracted after an update. This test
fails without the fix, and passes with it.

This bug affected v4.2.6 and v4.3.2 only.

Issue shaka-project#4589
Robloche pushed a commit to Robloche/shaka-player that referenced this issue Feb 8, 2023
…ject#4886)

After seeking or adaptation, StreamingEngine will create a new
SegmentIterator. However, if the buffered ranges are different enough
from the references in the iterator, the lookup in the index can produce
a duplicate segment. If this happens when sequence mode is active,
StreamingEngine will append the duplicate segment instead of overwriting
the end of the original one.

Any time a new iterator is created, we force the stream to resync, so
that unaligned segments or slightly-inaccurate segments are written to
the correct place in the buffer.

Issue shaka-project#4589
Robloche pushed a commit to Robloche/shaka-player that referenced this issue Feb 8, 2023
Before, tiny segments (~33ms in practice) and overwritten segments
(sometimes necessary) would register as encoding problems. This makes
the logic more robust and the error logs more reliable.

Issue shaka-project#4589
Robloche pushed a commit to Robloche/shaka-player that referenced this issue Feb 8, 2023
…4884)

Make SegmentIndex robust against rounding errors so that we don't end up
with duplicate segments on merge. In sequence mode, this would cause a
massive AV sync issue of 1 segment duration.

Issue shaka-project#4589
@github-actions github-actions bot added the status: archived Archived and locked; will not be updated label Mar 13, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 13, 2023
@theodab
Copy link
Contributor

theodab commented Apr 11, 2024

Unfortunately, this fix was found to cause other issues (see issue #5785).
As such, it has been reverted by #6435. We need to find a new fix that accounts for both issues, if this one is still a problem after the reversion.

@avelad
Copy link
Member

avelad commented Apr 11, 2024

@fredrik-telia Can you review if the issue persist in the main branch even with the revert? Thanks!

@avelad avelad added the status: waiting on response Waiting on a response from the reporter(s) of the issue label May 27, 2024
@shaka-bot
Copy link
Collaborator

Closing due to inactivity. If this is still an issue for you or if you have further questions, the OP can ask shaka-bot to reopen it by including @shaka-bot reopen in a comment.

@shaka-bot shaka-bot removed the status: waiting on response Waiting on a response from the reporter(s) of the issue label Jun 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
component: HLS The issue involves Apple's HLS manifest format priority: P0 Broken for everyone; no workaround; urgent status: archived Archived and locked; will not be updated type: bug Something isn't working correctly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants