- Start Date: 2023-03-21
- RFC Type: initiative
- RFC PR: #81
- RFC Status: draft
We want to make processing / SourceMap-ing of JavaScript stack traces more reliable.
To achieve this, we want to uniquely identify a (minified / deployed) JavaScript file using a DebugId
.
The same DebugId
also uniquely identifies the corresponding SourceMap.
That way it should be possible to reliably look up the SourceMap corresponding to
a JavaScript file.
It is currently not possible to reliably find the associated SourceMap for a JavaScript file.
A JavaScript stack trace only points to the (minified / transformed) source file
by its URL, such as https://example.com/file.min.js
, or /path/to/local/file.min.js
.
The corresponding SourceMap is often referenced using a sourceMappingURL
comment
at the end of that file. It is also possible to have a "hidden" SourceMap that is
not referenced in such a way, but is typically found by its filename {js_filename}.map
.
However it is not guaranteed that the SourceMap found in such a way actually corresponds to the JavaScript file in which the error happened.
A classical example is caching.
- An end-user is loading version
1
ofhttps://example.com/file.min.js
. - A new app version
2
is deployed. - The user experiences an error.
- The SourceMap at
https://example.com/file.min.js.map
(version2
) at this point in time does not correspond to the code the user was running.
This problem is even worse at Sentry scale, as at any point in time, errors can come in that happened with arbitrary versions of the deployed code, sometimes even involving multiple files which might be out-of-sync with each other.
To work around this problem, Sentry has used the combination of release
and optional dist
, together with matching
paths to better associate JavaScript files from one release with SourceMaps uploaded to Sentry.
However this solution is still not reliable, as mentioned above, even two files loaded in the end-users browser can belong to a different release, due to caching or other reasons.
Using a DebugId
, which uniquely associates the JavaScript file and its corresponding SourceMap, should make source-mapping
a lot more reliable.
Sentry has used the release (+ dist) + abs_path
solution for quite some time and found it inadequate.
A lot of events are not being resolved correctly due to these mismatches, and problems with source-mapping are very
common in customer-support interactions. Up to X% of customer-support questions are related to SourceMaps.
On the other hand, using a DebugId
for symbolication of Native crashes and stack traces is working reliably both in
Sentry and in the wider native ecosystem. The Native and C# community has the concept of Symbol Servers, which can
serve any debug file based on its DebugId
, which allows reliable symbolication for any release, at any point in time.
Anecdotally, Sentry itself has recently experienced a hard-to-diagnose problem with SourceMap processing.
The root of the problem was with path-matching, which becomes unnecessary using DebugId
s.
The file https://s1.sentry-cdn.com/_static/dist/sentry/chunks/app_views_issueDetails_index_tsx.fd0d65cc90b6c116f093.js
appeared as part of an Error
stack trace. After stripping the domain name prefix, the path that is searched for
was ~/_static/dist/sentry/chunks/app_views_issueDetails_index_tsx.fd0d65cc90b6c116f093.js
. However, due to a
misconfiguration, all the uploaded files had a wrong path associated with them, in this particular case it was
~/chunks/app_views_issueDetails_index_tsx.fd0d65cc90b6c116f093.js
. With these paths not matching, Sentry was unable
to correctly process this file, as no minified file or SourceMap was found with that path.
This is a prime example that would be solved by DebugId
s.
To make DebugId
work, we need to generate one, and associate it to both the JavaScript file, and its corresponding
SourceMap.
The DebugId
should have the same format as a standard UUI, specifically:
It should be a 128 bit (16 byte), formatted to a string using base-16 hex encoding like so:
XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
There is two options of choosing a DebugId
: Making it completely random, or make it reproducible / deterministic, derived
from the build.
Different approaches and examples are shown below, but in summary, we would like the DebugId
to be reproducible and
deterministic for a build. Building the exact same project and original code should always result in the same assets.
This means the same minified JS files (with embedded DebugId
), and the same SourceMap file.
As long as the DebugId
is deterministic, it does not matter how it is being derived. Different tools will use different
methods. And a DebugId
may differ from one tool to the next, just as the exact output files may differ between tools,
and even tool versions.
We do not want content hashes to be only based on the minified JavaScript file to also pick up on the original source cahnges. For a source hash only approach the solutions outlined in RFC 85 apply.
Instead we recommend to generate a new DebugId
by hashing the contents of the
SourceMap file. The reasoning for this is primarily motivated by a drawback of
the previous option. It is possible that non-essential changes to the source
file, such as whitespace or comments, will be completely removed by a minifier,
and will result in an identical minified output file. These changes would
however create different mappings in the SourceMap file, making it possible to
resolve to the correct original source.
It is also possible to use a combined content-hash of both the (minified) JavaScript file, and its corresponding SourceMap.
pros
- Generates a new
DebugId
for changes to source files that would otherwise not lead to changes in the (minified) JavaScript file. - SourceMap processing will always resolve back to the original source code.
cons
- Does lead to slightly more cache invalidation.
This option would create a new random DebugId
for each file, on each build.
pros
- Simpler server-side SourceMap processing, as one
DebugId
is only included in a single bundle, and that one bundle can serve multiple stack frames for multiple files of the same build.
cons
- Completely breaks the concept of caching, as every file is unique for every build.
Consider the following two files:
// File A
/* with some added comment*/
export function anExportedFunction() {
return anInternalInlinedFunction();
}
function anInternalInlinedFunction() {
window.foo = "side effect";
}
// File B
function aCompletelyDifferentFunction() {
// that has its own comments and everything
window.foo = "side effect";
}
function deadCodeThatsEliminated() {}
export function anExportedFunction() {
return aCompletelyDifferentFunction();
}
Both of these files will generate the exact same minified file using terser
,
which will completely inline simple tail calls, and eliminate dead code:
export function anExportedFunction(){window.foo="side effect"}
(Unfortunately, the https://try.terser.org/ REPL does not allow hotlinking, but you can try this for yourself.)
It is not even necessary that all these functions are defined in the same file. They might have been combined from
several files using a bundler like rollup
:
REPL example
As these examples show, vastly different code can produce the exact same minified output. However when doing SourceMap processing, we want to link back to the original code in every one of these cases.
We propose to add a new magic comment to the end of JavaScript files similar to the existing //# sourceMappingURL
comment. It should be at the end of the file, preferable as the line right before the sourceMappingURL
, as the
second line from the bottom
It should look like this:
someRandomJSCode();
//# debugId=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
//# sourceMappingURL=file.min.js.map
In a shiny utopian future, Browsers would directly expose builtin APIs to programmatically access each frame of an Error
s stack.
This might include the absolute path, the line and column number, and the DebugId
.
Though the reality of today is that each browser has its own text-based Error.stack
format, which might even give
completely different line and column numbers across the different browsers.
No programmatic API exists today, and might never exist. At the very least, widespread support for this is years away.
It is therefore necessary to extract this DebugId
through other means.
Current JavaScript stack traces include the absolute path (called abs_path
) of each stack frame. It should be possible
to load and inspect that file at runtime whenever an error happens.
An example of this might look like this:
// cached in the SDK:
const RESOLVED_FRAMES = new Map();
async function attachDebugMeta(event) {
for (const { abs_path } of allStackFrames(event)) {
if (!RESOLVED_FRAMES.has(abs_path)) {
const rawSource = await fetch(abs_path).then((res) => res.text());
RESOLVED_FRAMES.set(abs_path, extractDebugIdFromSource(rawSource));
}
}
event.debug_meta = resolvedFramesToImages(RESOLVED_FRAMES);
}
pros
- Does not require injecting any code into the JavaScript files.
cons
- Needs
async
code to resolveDebugId
s, and might incur some async fetching / IO when capturing an Error. - Though any source referenced from the stack trace via
abs_path
is likely in the browser cache already.
One solution here is to inject a small snippet of JS which will be executed when the JavaScript file is loaded, and adds
the DebugId
to a global map.
Any snippet would need to be injected in the top of JS files, to account for errors that happen during execution of
top-level statements. Additional care needs to be taken for any possible "use strict"
annotation on the top of the file.
Doing injection on the top of the file would also mean that changes to the SourceMap are needed, more specifically, all
the line
offsets need to shift down by 1
, for a 1-line snippet.
An example snippet is here:
!function(){try{var e="undefined"!=typeof window?window:"undefined"!=typeof global?global:"undefined"!=typeof self?self:{},f="_sentryDebugIds",n=(new Error).stack;n&&(e[f]=e[f]||{},e[f][n]="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")}catch(e){}}()
This snippet adds a complete Error.stack
to a global called _sentryDebugIds
.
Further post-processing at time of capturing an Error
is required to extract the abs_path
from that captured stack.
pros
- Does not require any async fetching at time of capturing an
Error
. - Compatible with current JS runtimes.
cons
- Injection needs to modify the mappings of the SourceMap itself.
- It does however require parsing of the
Error.stack
s in_sentryDebugIds
at time of capturing theError
. - However this should be cached and only happen once.
An alternative implementation might use the import.meta.url
property. This would avoid capturing and post-processing an Error.stack
, but does require usage of ECMAScript Modules.
((globalThis._sentryDebugIds=globalThis._sentryDebugIds||{})[import.meta.url]="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX");
pros
- More compact snippet.
- No post-processing required.
cons
- Depends on usage of ECMAScript Modules.
Deploying JavaScript applications can range from a simple copy files via ftp to a complex workflow like the following:
graph TD
transpile[Transpile source files] --> bundle[Bundle source files]
bundle --> minify[Minify bundled chunk]
minify --> fingerprint[Fingerprint minified chunks]
minify --> sentry[Upload release to Sentry]
fingerprint --> upload[Upload assets to CDN]
upload --> propagate[Wait for CDS assets to propagate]
fingerprint --> deploy[Deploy updated asset references]
propagate --> deploy
In this example, assets are fingerprinted, and after being fully propagated through a global CDN, they are starting to be referenced from the backend service via HTML.
Fingerprinting in this case means creating a unique content-hash which is then used in various of ways:
- As part of the filename, to give each file a unique and stable filename.
- Use the derived hash for Subresource Integrity (SRI).
An example may look like this, for a CDN-deployed and fingerprinted reference to katex:
<script
defer
src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js"
integrity="sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4"
crossorigin="anonymous"
></script>
Not only is the deployment pipeline very complex, it can also involve a variety of tools with varying degree of
integration between them.
The example <script>
tag shown above might be generated as part of one integrated JS bundler tool, or it might be
generated by a Rust or python backend, based on a supplied JSON file, which might look like this:
{
"assets": {
"assert_id": {
"filename": "my_asset.abcdefg.min.js",
"integrity": "sha384-VWXYZ"
}
}
}
The checksums themselves might be directly output by a JS bundler tool, or they might be generated by a completely different tool at another stage of the build pipeline.
Each application and build pipeline is unique, and there is an ever growing multitude of tools. Insert joke about a new JS bundler being created each week here.
It is therefore important that whatever comments and/or code we end up injecting into the final JavaScript assets is being injected at the right point in this pipeline. Ideally it would be injected before fingerprinting happens, and before any content-hash based naming happens.
As most JavaScript bundlers support automatic bundle-splitting, and will insert dynamic import
or require
statements
referencing those chunks by (fingerprinted) filename, a deep integration into those various bundlers might be needed.
With this, injection would happen with a new command, sentry-cli inject
. It will be the responsibility of the developer
to call this at the appropriate time depending on their unique build pipeline.
pros
- Gives full control for build pipelines that involve a heterogenous set of tools and stages.
cons
- Requires manually using this command.
- Does not work with bundlers that integrate fingerprinting.
In this scenario, injection happens at the time of sentry-cli upload
, and will also modify the files at that time.
This however is rejected due to the many downsides:
- Violates expectations of file immutability and integrity.
- Does not work with bundlers that integrate fingerprinting.
- Does not work in build pipelines where
sentry-cli upload
is not in the main deployment path.
Here, we would build DebugId
injection right into the various JavaScript bundlers.
We should validate the core ideas using third-party plugins at first, and then strive for inclusion in the
upstream bundlers directly.
Each bundler is unique though, and has different hooks at different stages of its internal pipeline. Some bundlers might not have the necessary hooks at the necessary stage at all.
Rollup has a very comprehensive plugin system, with good documentation about the various hooks and the internal pipeline: https://rollupjs.org/plugin-development/#output-generation-hooks
According to the above diagram, the appropriate plugin hook to use might be the
renderChunk
hook, which allows
access and modification of a chunks code
and map
(SourceMap) output.
This hook runs before the augmentChunkHash
and generateBundle
hooks which are responsible for fingerprinting and
generating the final output for each chunk.
Webpack documentation for plugin hooks is not as extensive, and there is no broad overview of the internal pipeline and
phases. There is a general overview of all the Compilation
hooks though:
https://webpack.js.org/api/compilation-hooks/
It might be possible to use the processAssets
hook
for this purpose. Documentation mentions the PROCESS_ASSETS_STAGE_DEV_TOOLING
phase which is responsible for
extracting SourceMaps, or the PROCESS_ASSETS_STAGE_OPTIMIZE_HASH
which looks to be responsible for generating the
final fingerprint of an asset.
As the examples above show, existing bundler hooks might not be appropriate for DebugId
injection.
Not all the bundlers offer appropriate hooks, and even if those do exist, using a plugin creates another problem:
The combination of different plugins might also mean that the order in which these plugins are invoked is not reliable,
and might depend on user configuration.
This is a less controversial part, as SourceMaps are in general not distributed to production, and are less likely to
be fingerprinted or integrity-checked. They are also plain JSON, making it trivial to inject additional fields.
We propose to add a new JSON field to the root of the SourceMap object called debugId
.
This new field should encode the DebugId
as a plain string.
As part of the original problem was matching files based on their paths, that same problem also applies to the "sources"
referenced from the SourceMap. The source file references in "sources"
are relative to the path of the SourceMap.
However this concept would just not exist anymore when using DebugId
s. The SourceMaps must therefore inline all
their "sourcesContent"
.
Additionally, to simplify DebugId
generation, and to reduce implementation complexity, we strongly encourage to use
"flat" SourceMaps, as opposed to "indexed" SourceMaps, which themselves contain other nested SourceMaps.
The Sentry protocol gains a new debug image type called sourcemap
which can be used to refer to source maps
by DebugId
in the following format:
{
"images": [
{
"type": "sourcemap",
"code_file": "http://example.com/file.min.js",
"debug_id": "dd6a14d1-fd08-48a5-b827-b87d8e9856bc"
}
]
}
The two keys defined are code_file
which needs to match the frame's abs_path
and debug_id
which needs to
match the DebugId
of the file.
The main drawback is that this might feel like an invasive change to the JavaScript ecosystem. It is a huge implementation burden, and might not be received positively by neither customers nor the wider JS tools ecosystem.
Especially injecting a piece of JavaScript into every production asset might alienate some users.
The effectiveness and success of this initiative needs to be proved out first, and is not guaranteed.
- We so far only leverage our injected code, we are not yet fetching
debugId
s via XHR. Is this a good path? - We have not yet made any advances for standardizing Debug IDs in the wider ecosystem
- The proposal does not yet support sources not being inlined in the source map.
- This RFC does not specify the changes to the artifact bundle or manifest in it.