Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

userScripts API: method to get matching ids for a URL #646

Open
tophf opened this issue Jun 25, 2024 · 11 comments
Open

userScripts API: method to get matching ids for a URL #646

tophf opened this issue Jun 25, 2024 · 11 comments
Labels
needs-triage: chrome Chrome needs to assess this issue for the first time needs-triage: firefox Firefox needs to assess this issue for the first time neutral: safari Not opposed or supportive from Safari topic: user scripts

Comments

@tophf
Copy link

tophf commented Jun 25, 2024

Something like chrome.userScripts.getMatchingIds({url: string, allFrames: boolean}) that returns a Promise resolved to an array of ids of the registered userscripts. When the allFrames parameter is omitted/null/undefined the allFrames flag in RegisteredUserScript is ignored.

Necessary to show the list of matching scripts in the popup and a number of running scripts in the icon badge (e.g. # of scripts, unique # of scripts, or # per main_frame/sub_frame).

The current workaround is to get this list from the running scripts in the tab, but it won't help in these cases:

  • reloading the tab without userscripts intentionally (a feature in the extension), but still be able to see the list of matching scripts
  • syntax error in one of the userscripts (we don't want to use eval to overcome this problem)
  • extension was just installed in Chrome but the tab wasn't yet injected - we want to show the matching scripts with a hint to reload the tab or to run them now (when chrome.userScripts.execute is implemented)
  • URL host permission is not yet granted, i.e. the extension can't show the list without asking the permission
  • userscript that runs only on demand (when chrome.userScripts.execute is implemented) and we don't want to register it to avoid unnecessarily injecting a possibly huge amount of code
  • userscript was created and registered after the tab was loaded - we want to show it in the list with a hint that it'll run after reloading the tab or with a button to run it immediately (when chrome.userScripts.execute is implemented)
  • filtering in the script management UI of the extension (dashboard) i.e. to show only scripts that match an arbitrary URL.

Another workaround is to perform matching manually, but it's wasteful when there's a lot of scripts (some scripts may have hundreds of matching patterns) and it duplicates the existing matching mechanism in the browser.

@github-actions github-actions bot added needs-triage: chrome Chrome needs to assess this issue for the first time needs-triage: firefox Firefox needs to assess this issue for the first time needs-triage: safari Safari needs to assess this issue for the first time labels Jun 25, 2024
@Rob--W
Copy link
Member

Rob--W commented Jul 1, 2024

There is currently no way to disable a user script without unregistering it. How would the proposed method allow you to see which scripts run there? This also applies to your "userscript that runs only on demand" point.

Your proposed API is URL-based, which is less expressive than what the matching engines of browsers are capable of. Are you intentionally limiting the API to http(s) URLs only?
E.g. the match_origin_as_fallback option of content scripts enable injection in about:blank and about:srcdoc (which inherit the origin of the opener), blob: and filesystem:-URLs (which are associated with the origin that created these URLs), and data:. In all of these cases the internal matching engine of the browser allows extensions to match at a granular level, by the origin/precursor of the opener instead of just the URL itself. This option is currently not exposed in the user script API, but if we ever want to, this could be addressed by introducing an origin key next to url.

Matching a URL does not imply the ability to execute. If you are interested in knowing whether you could execute scripts, at the very least the input would need to be topUrl + url, but documentId, tabId+frameId may also be options. The latter could reveal information about the URL of a tab/frame, and therefore the capability should require any of these permissions: host permissions, "webNavigation' or "tabs".

userscript was created and registered after the tab was loaded - we want to show it in the list with a hint that it'll run after reloading the tab or with a button to run it immediately (when chrome.userScripts.execute is implemented)

The API you are proposing won't help with this use case. For that, you'd need a method that returns whether a script has already executed in a specific frame (given a tabId+frameId and/or documentId), and if it had not, whether it could execute there.

FYI: alternatives to querying on demand is to declaratively specify whether scripts should run in existing documents. There are feature requests for this at:

@erosman
Copy link

erosman commented Jul 1, 2024

@Rob--W Out of curiosity, is there a possibility for the userScripts API to add the userScript ID/s to a new property to the tabs.Tab at the time it is injecting the userScript/userCSS, OR when a positive match is made?

PRO

  • It should not be very costly to do so

CON

@tophf
Copy link
Author

tophf commented Jul 1, 2024

@Rob--W, I think you've missed the point, which implies my description isn't good. I'll try to elaborate.

Your proposed API is URL-based, which is less expressive than what the matching engines of browsers are capable of. Are you intentionally limiting the API to http(s) URLs only?

There's no need for more expressive matching. The use cases listed in the description are already implemented via URL + allFrames in Violentmonkey/Tampermonkey.

E.g. the match_origin_as_fallback [...]

This is not implemented for userScripts API, so it's not a concern.

Matching a URL does not imply the ability to execute.

It does when coupled with allFrames and a check for host permissions.

The API you are proposing won't help with this use case. For that, you'd need a method that returns whether a script has already executed in a specific frame (given a tabId+frameId and/or documentId),

We already have such a method: send a message to the controller script which tracks all executed scripts by their internal id.

and if it had not, whether it could execute there.

This is what the suggested API is for.

@tophf
Copy link
Author

tophf commented Jul 1, 2024

Regarding the use case for on-demand scripts (when userScripts.execute is implemented), the extensions will either add those ids explicitly to their popup menu or register an empty/dummy userscript in userScripts.register() in case the user wants to limit such on-demand scripts to specific URL patterns (via matches etc.).

@xeenon xeenon added neutral: safari Not opposed or supportive from Safari and removed needs-triage: safari Safari needs to assess this issue for the first time labels Sep 24, 2024
@xeenon
Copy link
Collaborator

xeenon commented Sep 24, 2024

If we added this to userScripts, I would want to add this to scripting too. I don't like having two similar APIs having drastically different API surfaces.

@rdcronin
Copy link
Contributor

We discussed this at TPAC this week.

As @Rob--W mentioned, there's a lot of state that goes into determining if a script will inject -- it's not just based on URL, even if additional properties like matchAboutBlank or matchOriginAsFallback are provided, since in that case, we need to know the parent or initiator frames. To get an accurate answer about which scripts injected into a certain frame, an extension would need to provide information about the frame tree, which seems arduous at best.

The use cases highlighted here ("Necessary to show the list of matching scripts in the popup and a number of running scripts in the icon badge (e.g. # of scripts, unique # of scripts, or # per main_frame/sub_frame).") aren't so much about querying which scripts would match a frame with arbitrary provided state, but rather more about which scripts, in practice, injected in a frame. With that in mind, I think it would be much more useful for developers if we provided an API that handled that directly -- a way to determine scripts injected in a given frame.

There were a few different idea of what this API may look like in practice -- a simple one was scripting.getInjectedScripts, but @oliverdunk raised that this could potentially be incorporated into runtime.getContexts() if we included the appropriate information on the context.

Setting aside the exact signature of a new API, if we provided an API that let you determine the scripts that did inject in a given frame, would that be suitable for your use cases?

@tophf
Copy link
Author

tophf commented Sep 25, 2024

there's a lot of state that goes into determining if a script will inject -- it's not just based on URL, even if additional properties like matchAboutBlank or matchOriginAsFallback are provided

No. For this idea only the URL and all_frames are necessary. This is how userscript platform has been working for decades as implemented by all userscript managers. There's no need for matchAboutBlank or matchOriginAsFallback because those aren't used for userscripts.

One possible addition would be whether to consider the global blacklist if it's ever implemented in #607.

aren't so much about querying which scripts would match a frame with arbitrary provided state, but rather more about which scripts, in practice, injected in a frame

No. The idea is to get a list of scripts which should be injected, because it's easy to get a list of actually injected userscripts e.g. by maintaining a registry in the controlling content script or by sending a message on injection.

if we provided an API that let you determine the scripts that did inject in a given frame, would that be suitable for your use cases?

No need (as I explained above).

One related feature is necessary though since there's no safe cross-world communication API and the bug still exists in Chrome/Firefox that allows the embedder to mutate the global JS environment of a same-origin iframe before document_start script is executed inside: a synchronous event in the isolated or userscript world before injection occurs, the event parameter should list the ids, so that the extension can prepare a safe communication channel. The event would fire for each id, I guess.

@rdcronin
Copy link
Contributor

Thanks for the extra details.

If the request here is really just to perform URL and boolean matching on a script (independent of the existing frame tree or if the script injected), I don't think that's something we'll want to add as a dedicated API in Chromium. The extension can perform this matching relatively simply in its own code, since it has all the necessary information. This is called out as the existing workaround above:

Another workaround is to perform matching manually, but it's wasteful when there's a lot of scripts (some scripts may have hundreds of matching patterns) and it duplicates the existing matching mechanism in the browser.

Regardless of the number of scripts or match patterns, this matching has to happen somewhere, and it should not be dramatically slower to happen in the extension rather than natively in the browser*. The matching algorithm, though implemented in the browser, is not complex and should be straightforward to implement in JS.** Additionally, the extension could then optimize this in a number of ways, such as caching scripts for different URLs or leveraging optimized data structures to make that counting more efficient.

Given there is a functional (and reasonably straightforward) workaround for this already, and we want extension APIs to be more "fundamental building blocks" rather than providing APIs for each individual use case (which doesn't scale), I don't think this is one we'll pursue on the Chromium side, though I'm open to opinions from other browsers.

* One note that @Rob--W mentioned previously is that the fact we serialize all the code associated with a script and pass it back in RegisteredUserScripts can be inefficient due to the size; we should address this separately so that extensions can get the registered scripts without needing to serialize the full content of the script.

** We also discussed whether we should add support for e.g. a MatchPattern implementation that we can provide to extensions. I think this is worth exploring, but having a polyfill in the extension code should also be fairly lightweight.

@tophf
Copy link
Author

tophf commented Sep 26, 2024

this matching has to happen somewhere,

The browser already has implementation in place for exactly this use case doing exactly that (url matching + all_frames check and nothing else for userscripts), hence the suggestion to expose it. There's no need to implement anything new, it's just about exposing the existing matching code.

@Rob--W
Copy link
Member

Rob--W commented Sep 26, 2024

One related feature is necessary though since there's no safe cross-world communication API

This is being designed in #678 + #679. #678 provides a synchronous code execution mechanism in another world, with its own closure and a way to pass parameters; #679 describes a synchronous messaging mechanism between the executed script and the isolated world.

We have had more discussions with this during this week at TPAC; the meeting notes will be published later (and linked from #659).

My question in the context of this issue: is there anything that you cannot already do with the existing APIs? As Devlin stated in the comment above, things like matching by URL is something that can easily be implemented in JS code. Sure, it is more code, but not impossible nor causing significant implementation complexity.

@tophf
Copy link
Author

tophf commented Sep 26, 2024

things like matching by URL is something that can easily be implemented in JS code.

It's not easy in practice.

Sure, it is more code, but not impossible nor causing significant implementation complexity.

It's a lot of code and a lot of complexity to implement it in a performant fashion as it requires reading potentially hundreds of scripts and potentially thousands of matches, processing which may take even a second for the first time before everything is intelligently cached.

It's unclear why it is problematic to expose the existing matching mechanism in the browser that already does exactly what is suggested here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage: chrome Chrome needs to assess this issue for the first time needs-triage: firefox Firefox needs to assess this issue for the first time neutral: safari Not opposed or supportive from Safari topic: user scripts
Projects
None yet
Development

No branches or pull requests

5 participants