From ce2aca0d13d14a802c474bec1e93ec30bb1f50ea Mon Sep 17 00:00:00 2001 From: Oliver Dunk Date: Fri, 9 Feb 2024 15:56:51 +0000 Subject: [PATCH 1/5] Add content scripts section in specification --- specification/index.bs | 80 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 76 insertions(+), 4 deletions(-) diff --git a/specification/index.bs b/specification/index.bs index e11c271e..da7b0f12 100644 --- a/specification/index.bs +++ b/specification/index.bs @@ -7,6 +7,7 @@ Group: WECG URL: https://w3c.github.io/webextensions Editor: Mukul Purohit, Microsoft Corporation https://www.microsoft.com, mpurohit@microsoft.com Editor: Tomislav Jovanovic, Mozilla https://www.mozilla.org/, tjovanovic@mozilla.com +Editor: Oliver Dunk, Google https://www.google.com, oliverdunk@chromium.org Abstract: [Placeholder] Abstract. Markup Shorthands: markdown yes @@ -27,11 +28,11 @@ An optional directory containing strings as defined in l ### Other files -An extension may also contain other files, such as those referenced in the content_scripts and background part of the Manifest. +An extension may also contain other files, such as those referenced in the [[#key-content_scripts]] and [[#key-background]] part of the [=manifest=]. ## Manifest -A WebExtension must have a manifest file at its root directory. +A WebExtension must have a manifest file at its root directory. ### Manifest file @@ -112,7 +113,7 @@ This key may be present. #### Key `content_scripts` -This key may be present. +The `content_scripts` key is a [=list=] of items representing [=content scripts=] that should be registered. #### Key `content_security_policy` @@ -154,6 +155,8 @@ Filenames beginning with an underscore (`_`) are reserved for use by user agent. ## Isolated worlds +Worlds are isolated JavaScript contexts with access to the same underlying DOM tree but their own set of wrappers around those DOM objects. + ## Unavailable APIs ## The `browser` global @@ -172,6 +175,12 @@ Issue(62): Specify localization handling. ## Match patterns +A match pattern is a pattern used to match URLs. + +## Globs + +A glob can be any [=string=]. It can contain any number of wildcards where * can match zero or more characters and ? matches exactly one character. + ## Concepts ### Uniqueness of extension IDs @@ -190,7 +199,70 @@ Issue(62): Specify localization handling. ### Content scripts -#### Isolated worlds +Content scripts represent a set of JS and CSS files that should be injected into pages loaded by the user agent. + +#### Key `matches` + +A [=list=] of [=match patterns=] that are used to decide where the content script runs. This key is required. + +#### Key `exclude_matches` + +A [=list=] of [=match patterns=] that should be used to exclude URLs from where the content script runs. + +#### Key `js` + +A [=list=] of file paths that should be injected as scripts. + +#### Key `css` + +A [=list=] of file paths that should be injected as stylesheets. + +#### Key `all_frames` + +If `all_frames` is true, the content script must be injected into subframes. Defaults to false. + +#### Key `match_about_blank` + +If this is `true`, the content script will also be injected into an additional user agent specified set of pages used to represent empty frames. This will only happen if the content script matches the page that embedded the frame. Defaults to `false`. + +#### Key `match_origin_as_fallback` + +Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority: + +1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin. +1. If available, use the origin of the parent frame. +1. Otherwise, no origin is found and this frame can never be matched. + +#### Key `run_at` + +Specifies when the content script should be injected. Valid values are `document_start`, `document_end` and `document_idle`. + +#### Key `include_globs` + +A list of [=globs=] that a page should match in addition to matches. + +#### Key `exclude_globs` + +A list of [=globs=] that should be used to exclude URLs from where the content script runs. + +#### Key `world` + +The [=world=] any JavaScript scripts should be injected into. Defaults to `ISOLATED`. Valid values are `MAIN` and `ISOLATED`. + +#### Injecting a content script + +Issue: If the same extension specifies the same script twice, what should happen? ([bug](https://crbug.com/324096753)) + +Issue: The below algorithm needs to be updated to include `match_about_blank` and `match_origin_as_fallback`. + +To determine if a content script should be injected in a frame: + +1. If the extension does not have access to the origin, return. +1. If the origin is not included in `matches`, return. +1. If `include_globs` is present and the origin is not matched, return. +1. If the origin matches an entry in `exclude_matches` or `exclude_globs`, return. +1. If this is a frame, and `all_frames` is not `true`, return. +1. Otherwise, inject the content script. This should be done based on the `run_at` setting. ### Extension pages From b7963a951d7b9ca18a053dfa3d5d6b414624ed62 Mon Sep 17 00:00:00 2001 From: Oliver Dunk Date: Sun, 15 Sep 2024 15:36:34 +0100 Subject: [PATCH 2/5] Address feedback --- specification/index.bs | 73 ++++++++++++++++++++++++++---------------- 1 file changed, 45 insertions(+), 28 deletions(-) diff --git a/specification/index.bs b/specification/index.bs index fa02ba84..e84e4917 100644 --- a/specification/index.bs +++ b/specification/index.bs @@ -28,7 +28,7 @@ An optional directory containing strings as defined in l ## Other files -An extension may also contain other files, such as those referenced in the [[#key-content_scripts]] and [[#key-background]] part of the [=manifest=]. +An extension may also contain other files, such as those referenced in the [[#key-content_scripts]] and [[#key-background]] parts of the [=manifest=]. # Manifest @@ -175,7 +175,7 @@ Issue(62): Specify localization handling. # Match patterns -A match pattern is a pattern used to match URLs. +A match pattern is a pattern used to match URLs. They are case-insensitive. # Globs @@ -199,11 +199,11 @@ A glob can be any [=string=]. It can contain any number of wildcards ## Content scripts -Content scripts represent a set of JS and CSS files that should be injected into pages loaded by the user agent. +Content scripts represent a set of JS and CSS files that should be injected into matching pages loaded by the user agent. They are injected using the steps in [[#inject-a-content-script]]. ### Key `matches` -A [=list=] of [=match patterns=] that are used to decide where the content script runs. This key is required. +A [=list=] of [=match patterns=] that are used to decide which pages the user agent injects the content script into. This key is required. ### Key `exclude_matches` @@ -211,27 +211,29 @@ A [=list=] of [=match patterns=] that should be used to exclude URLs from where ### Key `js` -A [=list=] of file paths that should be injected as scripts. +A [=list=] of file paths, relative to the extension's package, that should be injected as scripts. ### Key `css` -A [=list=] of file paths that should be injected as stylesheets. +A [=list=] of file paths, relative to the extension's package, that should be injected as stylesheets. ### Key `all_frames` -If `all_frames` is true, the content script must be injected into subframes. Defaults to false. +If `all_frames` is true, the content script must be injected into any subframes that match the other matching criteria for the content script. If false, content scripts will only be injected into top-level documents. See Defaults to false. ### Key `match_about_blank` -If this is `true`, the content script will also be injected into an additional user agent specified set of pages used to represent empty frames. This will only happen if the content script matches the page that embedded the frame. Defaults to `false`. +If this is true, use the URL of the parent frame when matching a child frame whose document URL has the `about` [=scheme=]. See also [[#determine-the-url-for-content-script-matching]]. Defaults to `false`. + +Note: In Firefox, setting `match_about_blank` to `true` also allows injection into top-level `about:blank` pages. ### Key `match_origin_as_fallback` -Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority: +If this is true, use fallbacks as described in [[#determine-the-url-for-content-script-matching]]. + +No path is available when the URL to match against falls back to an origin. Therefore, when set, the user agent must not allow [[#key-matches]] to contain entries with a path other than `/*`. -1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin. -1. If available, use the origin of the parent frame. -1. Otherwise, no origin is found and this frame can never be matched. +Defaults to `false`. ### Key `run_at` @@ -239,7 +241,7 @@ Specifies when the content script should be injected. Valid values are `document ### Key `include_globs` -A list of [=globs=] that a page should match in addition to matches. +A list of [=globs=] that a page should match. A page matches if the URL matches both the [[#key-matches]] field and the [[#key-include_globs]] field. ### Key `exclude_globs` @@ -249,21 +251,6 @@ A list of [=globs=] that should be used to exclude URLs from where the content s The [=world=] any JavaScript scripts should be injected into. Defaults to `ISOLATED`. Valid values are `MAIN` and `ISOLATED`. -### Injecting a content script - -Issue: If the same extension specifies the same script twice, what should happen? ([bug](https://crbug.com/324096753)) - -Issue: The below algorithm needs to be updated to include `match_about_blank` and `match_origin_as_fallback`. - -To determine if a content script should be injected in a frame: - -1. If the extension does not have access to the origin, return. -1. If the origin is not included in `matches`, return. -1. If `include_globs` is present and the origin is not matched, return. -1. If the origin matches an entry in `exclude_matches` or `exclude_globs`, return. -1. If this is a frame, and `all_frames` is not `true`, return. -1. Otherwise, inject the content script. This should be done based on the `run_at` setting. - ## Extension pages # Classes of security risk @@ -275,3 +262,33 @@ To determine if a content script should be injected in a frame: ## Current behavior of cookie partitioning # Version number handling + +# Algorithms + +## Determine the URL for content script matching + +To determine the URL to use for a document when injecting a content script: + +1. Let |url| be the document's URL. +1. If the document is within a child frame: + 1. If the [=scheme=] of the document's URL is `about`, and `match_about_blank` or `match_origin_as_fallback` is set to true: + 1. Set |url| to a URL based on the origin of the parent frame. + 1. If the [=scheme=] of the document's URL is `data` and `match_origin_as_fallback` is set to true: + 1. Set |url| to be a URL based on the origin of the parent frame. + 1. If the [=scheme=] of the document's URL is `filesystem` or `blob` and `match_origin_as_fallback` is set to true: + 1. Set |url| to be a URL based on the origin of the frame which created the URL. +1. Return |url|. + +## Inject a content script + +Issue: If the same extension specifies the same script twice, what should happen? ([bug](https://crbug.com/324096753)) + +To determine if a content script should be injected in a frame: + +1. Let |url| be the result of running [[#determine-the-url-for-content-script-matching]]. +1. If the extension does not have access to the origin, return. +1. If |url| is not matched by a match pattern in `matches`, return. +1. If `include_globs` is present and |url| is not matched by any pattern, return. +1. If |url| matches an entry in `exclude_matches` or `exclude_globs`, return. +1. If this is a child frame, and `all_frames` is not `true`, return. +1. Otherwise, inject the content script. This should be done based on the `run_at` setting. From d05985766483b6603a65b76d4c0b6161206af847 Mon Sep 17 00:00:00 2001 From: Oliver Dunk Date: Fri, 18 Oct 2024 18:38:23 +0100 Subject: [PATCH 3/5] Fix typo --- specification/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specification/index.bs b/specification/index.bs index 2cca14fd..82e2d2a2 100644 --- a/specification/index.bs +++ b/specification/index.bs @@ -219,7 +219,7 @@ A [=list=] of file paths, relative to the extension's package, that should be in ### Key `all_frames` -If `all_frames` is true, the content script must be injected into any subframes that match the other matching criteria for the content script. If false, content scripts will only be injected into top-level documents. See Defaults to false. +If `all_frames` is true, the content script must be injected into any subframes that match the other matching criteria for the content script. If false, content scripts will only be injected into top-level documents. Defaults to false. ### Key `match_about_blank` From 1963f223a66084599ac9d2bb14d704433a4fe997 Mon Sep 17 00:00:00 2001 From: Oliver Dunk Date: Fri, 18 Oct 2024 19:32:37 +0100 Subject: [PATCH 4/5] Use "may" for behavior of non-wildcard path in match_origin_as_fallback --- specification/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specification/index.bs b/specification/index.bs index 82e2d2a2..19a4238c 100644 --- a/specification/index.bs +++ b/specification/index.bs @@ -231,7 +231,7 @@ Note: In Firefox, setting `match_about_blank` to `true` also allows injection in If this is true, use fallbacks as described in [[#determine-the-url-for-content-script-matching]]. -No path is available when the URL to match against falls back to an origin. Therefore, when set, the user agent must not allow [[#key-matches]] to contain entries with a path other than `/*`. +No path is available when the URL to match against falls back to an origin. Therefore, when set, the user agent may treat a [[#key-matches]] with a path other than `/*` as an error. Defaults to `false`. From 5d5c86de39985819a264536afe6f20944d0eaffc Mon Sep 17 00:00:00 2001 From: Oliver Dunk Date: Fri, 18 Oct 2024 20:10:30 +0100 Subject: [PATCH 5/5] Use document rather than frame in inejct a content script algorithm --- specification/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specification/index.bs b/specification/index.bs index 19a4238c..7fcb8941 100644 --- a/specification/index.bs +++ b/specification/index.bs @@ -283,7 +283,7 @@ To determine the URL to use for a document when injecting a content script: Issue: If the same extension specifies the same script twice, what should happen? ([bug](https://crbug.com/324096753)) -To determine if a content script should be injected in a frame: +To determine if a content script should be injected in a document: 1. Let |url| be the result of running [[#determine-the-url-for-content-script-matching]]. 1. If the extension does not have access to the origin, return.