Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc(feature): Video replay envelope #129

Merged
merged 26 commits into from
Feb 9, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,4 @@ This repository contains RFCs and DACIs. Lost?
- [0118-mobile-transactions-and-spans](text/0118-mobile-transactions-and-spans.md): Transactions and Spans for Mobile Platforms
- [0123-metrics-correlation](text/0123-metrics-correlation.md): This RFC addresses the high level metrics to span correlation system
- [0126-sdk-spans-aggregator](text/0126-sdk-spans-aggregator.md): SDK Span Buffer
- [0129-video-replay-envelope](text/0129-video-replay-envelope.md): Video-based replay envelope format
104 changes: 104 additions & 0 deletions text/0129-video-replay-envelope.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
- Start Date: 2024-02-06
- RFC Type: feature
- RFC PR: [#129](https://github.com/getsentry/rfcs/pull/129)
- RFC Status: draft

# Summary

In order to capture video replays and present them to the user, we need to define/decide:

- how to transport the video data to the server
- how to integrate the video data into the RRWeb JSON format
- how to combine multiple replay chunks to a single replay session

All of these influence one another and need to be considered together in a single RFC.

Note: SDK-side implementation that is currently being worked on relies on taking screenshots and encoding them to a video.
This is based on an evaluation where a video has much smaller size than a sequence of images (**TODO fix these: factor of X for 720p video**).
vaind marked this conversation as resolved.
Show resolved Hide resolved

# Motivation

We need this to to capture replays on platforms where it's not possible/feasible to produce an HTML DOM (i.e. the native format supported by RRWeb). For example: mobile apps.

<!-- # Supporting Data -->
<!-- Metrics to help support your decision (if applicable). -->

# Options Considered

## Using a video, with EnvelopeItem:ReplayVideo

- From the SDK, we would send a new envelope with the following items: `Replay`, `ReplayVideo` and `ReplayRecording`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@billyvg How would the player like to be notified that it should download video data? A type value on the replay? The video events in the RRWeb? Infer it from the replay's platform? Something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmanallen Thinking of using the rrweb video event

- The newly introduced item type, [`ReplayVideo`](https://github.com/getsentry/relay/blob/5fd3969e88d3eea1f2849e55b61678cac6b14e44/relay-server/src/envelope.rs#L115C5-L115C20) is used to transport the video data.
The envelope item would consist of a single header line (JSON), followed by a new line and the raw video data.
vaind marked this conversation as resolved.
Show resolved Hide resolved
- The header should contain at least the following metadata: needed to ingest the item.

```json
{
"segment_id": 4,
}
```

- Additionally, it would be accompanied by an item [`ReplayRecording`](https://github.com/getsentry/relay/blob/5fd3969e88d3eea1f2849e55b61678cac6b14e44/relay-server/src/envelope.rs#L113), containing a header, e.g. `{"segment_id": 12}`, followed by a new line and the RRWeb JSON.
- The RRWeb JSON must start a single event of type [`EventType.Meta`](https://github.com/rrweb-io/rrweb/blob/8aea5b00a4dfe5a6f59bd2ae72bb624f45e51e81/packages/types/src/index.ts#L8-L16), with viewport (screen) dimensions.
vaind marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"type": 4,
"timestamp": 1681846559381,
"data": {
"href": "",
"height": 1920,
"width": 1080
}
}
```

> Note: these dimensions may be different than the video dimensions in case only part of the screen is captured.
In that case, the following video event will have non-zero `data.payload.left` & `data.payload.top` fields (see below).

- The RRWeb JSON must contain a single event of type [`EventType.Custom`](https://github.com/rrweb-io/rrweb/blob/8aea5b00a4dfe5a6f59bd2ae72bb624f45e51e81/packages/types/src/index.ts#L8-L16), with `data.tag == 'video'`.
This event must come at the second position in the array, right after the `EventType.Meta` event.
vaind marked this conversation as resolved.
Show resolved Hide resolved
If there's other data the UI needs, we can add it alongside the `type` to the `data` field. Because there's only a single `ReplayVideo` sent with a single `ReplayRecording`, there's a one-to-one mapping without further details necessary in the actual RRWeb JSON.

```json
vaind marked this conversation as resolved.
Show resolved Hide resolved
{
"type": 5,
"timestamp":1681846559381,
"data": {
"tag": "video",
"payload": {
"segmentId": 4,
"size": 3440,
"duration": 5000,
"encoding": "whatever",
"container": "whatever",
"height": 1920,
"width": 1080,
"frameCount": 50,
"frameRateType": "constant|variable",
"frameRate": 10,
"left": 0,
"top": 0,
}
}
}
```

> Note: The format is based on [RRWeb Custom event type specification](https://github.com/rrweb-io/rrweb/blob/8aea5b00a4dfe5a6f59bd2ae72bb624f45e51e81/packages/types/src/index.ts#L53-L59).

## Other considered options

### Using the existing RRWeb canvas replay format (image snapshots)

It would be easy to implement this because the SDK already captures screenshots and with RRWeb being able to show them, there's not much to do. However, this would come with significantly larger data transfer size (compared to video), which should be kept as low as reasonably possible, considering this is currently aimed at mobile apps. Additionally, these images would need to be encoded in base64 so that they can be embedded in the RRWeb JSON.

<!--
# Drawbacks

Why should we not do this? What are the drawbacks of this RFC or a particular option if
multiple options are presented.

# Unresolved questions

- What parts of the design do you expect to resolve through this RFC?
- What issues are out of scope for this RFC but are known? -->