Potentially standardize window.find() #3539

annevk · 2018-03-07T04:58:43Z

See:

Related #2858.

js-choi · 2018-03-07T07:52:07Z

Good news.

Is the scope of this issue simply to standardize window.find’s existing behavior in Firefox, WebKit, and Chromium? How does its matching work? Are Unicode code points matched with any normalization? Does case folding occur; if so, how? Can paragraph breaks and line breaks be matched, and by what characters? Is there any fuzzy search (e.g. between straight quotes and curly quotes as some browsers do on some systems)?

The answer to all of these probably will be: “What do current browsers do? Let’s stick with that,” but it’d be good to be explicit about the goal. And there may be some platform inconsistencies, especially in fuzzy matching.

See also Charmod Norm, w3c/selection-api#37, whatwg/dom#431, tc39/proposal-intl-segmenter#17, #2424, the inactive String Search API the inactive FindText API, and the inactive RangeFinder API.

fred-wang · 2018-03-13T13:27:25Z

For the record some browsers also implement a document.execCommand("FindString", ...) command.
https://w3c.github.io/editing/execCommand.html

grantcv1 · 2018-03-30T20:08:12Z

I always find it distressing that counting the usage of public-facing websites is used to make decisions. In my experience, there are far more complex web applications with big companies and big governments that would not (should not) be included within these statistics.

window.find() is very much needed in editing style applications and there is a need for this feature (or a better alternative). Support for case folding, regular expressions, and other things that would help with a fuzzy search are really needed.

It seems that one possible effort to standardize this capability, the FindText API, (http://www.w3.org/TR/findtext/) has been discontinued :-(

tilgovi · 2018-06-11T17:40:08Z

One way to accommodate some of the goals of FindText without requiring standardization to take a stance on algorithm would be to specify how window.find interacts with Symbol.search (or other relevant, well-known symbols).

vmpstr · 2020-06-12T15:03:48Z

I'm not sure if this should be a separate issue, but my proposal to start the process of standardizing window.find by standardizing some of the aspects of find-in-page commonly used first. For instance,

Define terms like active match(?) vs potential match(?), meaning the thing that was found and highlighted vs the thing that could be found if the user or script continue searching for the same string
Perhaps also define how find-in-page interacts with things like clipped out content, and opacity 0 content, etc.

By starting with definitions, I think we can start thinking about how to define the algorithm. However for some features, it might already be useful to reference definitions of find-in-page (e.g. https://drafts.csswg.org/css-scroll-anchoring/#anchor-priority-candidates 2nd candidate is "an element containing the current active selected match of the find-in-page user-agent algorithm" which could reference this)

As an aside, I put together a brief overview of behaviors of find-in-page in different browsers (Chrome, Firefox, Safari) to see the commonalities and differences in behaviors. The doc uses find-in-page dialog, not window.find though.

Does this seem like a good approach?

domenic · 2020-06-12T16:25:40Z

my proposal to start the process of standardizing window.find by standardizing some of the aspects of find-in-page commonly used first.

Interesting.

This falls into a gray area of web specs, of specifying UI. Generally we try to shy away from that, and only specify things which are observable from JavaScript. I believe nothing about find-in-page is observable, so we normally wouldn't specify it.

However, sometimes we bend this rule, when it's especially beneficial, and all the browsers are interested.

I guess I would ask what is the goal here, and for who. Are you trying to make things more predictable for web page authors? In what way, since find-in-page is not observable? Are you trying to make things easier for implementers?

If the goal is purely to work on a better spec for window.find, then I would probably treat that orthogonally to find-in-page...

vmpstr · 2020-06-12T17:14:25Z

I guess I would ask what is the goal here, and for who

Good question. The immediate benefit from having the definitions is for spec writers and implementers so that they can agree what is meant by terms like 'active match' (e.g. the scroll anchor spec I linked, and beforematch proposal; the latter would benefit from the algorithm specified as well since the timing of the event and timing of find-in-page scroll are dependent on each other).

I think the ultimate benefit of at least partially speccing the algorithm is for users to have a consistent experience across browsers (although I'm not sure how valuable it is, since I imagine users don't typically switch browsers very often). That is, you can see in the compat doc I linked that browsers tend to do different things in a number of situations. In some cases, none of the browser seem to do "the right thing". For instance, content clipped by overflow hidden can be found on the three browsers I tested. It is conceivable that the spec here would dictate what should and should not be found, if that makes sense.

As an aside, I assume that window.find essentially hooks into the find-in-page algorithm (maybe this is a wrong assumption), so any kind of specification for it is likely to be very similar. To put it differently, I think if window.find is specified and browsers update their implementations to match the spec, I suspect that they will also have to change the find-in-page behavior to simplify the code.

domenic · 2020-06-12T20:21:47Z

The immediate benefit from having the definitions is for spec writers and implementers so that they can agree what is meant by terms like 'active match' (e.g. the scroll anchor spec I linked, and beforematch proposal; the latter would benefit from the algorithm specified as well since the timing of the event and timing of find-in-page scroll are dependent on each other).

I definitely see the benefit there. That could probably be accomplished with a fairly minimal spec, that just hand-waves at how the feature works but builds around a skeleton of some <dfn>s like "active match" that other, more observable features can reference. I'm happy to support that much, at least.

I think the ultimate benefit of at least partially speccing the algorithm is for users to have a consistent experience across browsers (although I'm not sure how valuable it is, since I imagine users don't typically switch browsers very often). That is, you can see in the compat doc I linked that browsers tend to do different things in a number of situations. In some cases, none of the browser seem to do "the right thing". For instance, content clipped by overflow hidden can be found on the three browsers I tested. It is conceivable that the spec here would dictate what should and should not be found, if that makes sense.

I think you're right that this would be valuable for users, in that it would guide browsers toward doing "the right thing", where "the right thing" is what domain experts (HTML spec editors, CSS WG, i18n folks, and browser engineers) can collectively get together and agree upon. Maybe we wouldn't get total agreement, e.g. maybe one browser representative has a very different philosophical stance on what a "word" means, but that's fine. Any discussion at all would likely be an opportunity to improve things in this way.

In other words, since this isn't JS-developer-observable, the goal isn't to get total interop, but instead to get the other values that the standards process brings. And I suspect that even if not all browser engines want to spend to spend time on this, you'd be able to get good discussion from the rest of the web standards community, and from any interested web developers and users.

So, I'm sold that this is worth trying to specify.

As an aside, I assume that window.find essentially hooks into the find-in-page algorithm (maybe this is a wrong assumption), so any kind of specification for it is likely to be very similar. To put it differently, I think if window.find is specified and browsers update their implementations to match the spec, I suspect that they will also have to change the find-in-page behavior to simplify the code.

Well, but as long as the result of window.find is not observable from JS, it seems like the specification could just be "calling this function does something with the user interface generally related to finding things". Although, maybe it's observable from scroll offsets? I'm not sure.

aphillips · 2020-06-13T16:29:52Z

Text search is a complex topic for reasons such as those called out in @js-choi's comment. Past attempts to write a spec at W3C failed to consider I18N basics early on and have foundered on that. The I18N WG (perhaps wisely?) shelved any attempt to work on it directly as part of Charmod-Norm by creating a separate document. Any group starting to work on this might want to have a look at string-search and to the issues we filed against FindText.

I think this is worth taking a stab at--it is possible to overcomplicate the problem and at long as judicious choices are made (and well-documented) I think it is possible to have a successful result in a finite amount of time.

annevk · 2020-06-14T09:59:59Z

@domenic it's pretty observable, no?

console.log(window.getSelection())
window.find("test");
console.log(window.getSelection())

domenic · 2020-06-15T17:03:00Z

Hmm, that appears to be a Firefox quirk where window.find() (and Ctrl+F!) actually affect window.getSelection(). That's not the case in other browsers.

domenic · 2020-06-15T17:07:20Z

For the record, I was testing something wrong; window.getSelection() is impacted by window.find() in Chrome too. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=8206

domenic · 2020-08-07T21:36:31Z

For folks watching this thread, @vmpstr has put together an initial pull request describing find-in-page in #5770 (direct preview link). It's pretty basic and, I think, should be uncontroversial. But it might provide a good place to collect some of the notes or open issues here, e.g. we could expand it to link to https://w3c.github.io/string-search/#searching, and eventually try to define window.find() as triggering that feature.

This serves as helpful structure for work such as #3539, or potentially integrating with https://github.com/WICG/scroll-to-text-fragment or https://github.com/WICG/display-locking.

xfq · 2020-08-15T08:56:15Z

FWIW, there's a CSS issue about controlling whether an element is findable/searchable: w3c/csswg-drafts#3460

This serves as helpful structure for work such as whatwg#3539, or potentially integrating with https://github.com/WICG/scroll-to-text-fragment or https://github.com/WICG/display-locking.

petelomax · 2021-09-04T03:23:58Z

My gut instinct on this is that "find" is too generic and meaningless. Adding eg openFindWindow() or findTextOnPage() or highlight/selectTextOnPage() would be intuitively more distinct from querySelector() and friends, which "find" just isn't.

domenic · 2021-09-04T05:26:13Z

We don't get to choose the name; it's already in all browsers. This issue is just about writing a spec for it.

mantou132 · 2022-05-09T07:07:03Z

find just needs to return some Range that contains the specified text. Other processing such as highlighting should be left to the web developer, e.g: use custom highlight api

hsivonen · 2023-09-28T12:43:34Z

@domenic it's pretty observable, no?

console.log(window.getSelection())
window.find("test");
console.log(window.getSelection())

It's rather unfortunate that what window.find() finds is Web-exposed when Gecko implements the search technically in a very different way from WebKit (forked to Blink), and the WebKit/Blink behavior depends on the UI language of the browser.

Specifically, Firefox operates on the Unicode Database level (in a language-independent way) and WebKit&Blink use collator-based search (with primary-level matching only) such that the collation data that is used is the CLDR search collation for the browser UI language.

As a collator implementor, I'm very skeptical of the technical merit of collator-based search compared to search implemented directly over the Unicode Database layer (possibly with hard-coded exceptions to try to reproduce the main effects of collator-based search). (When operating on the Unicode Database, you transform characters to other characters and match on the transformed stream of characters. When operating on collations, you perform a complex mapping from characters to collation units and then ignore everything but the primary weight in the collation unit and match on the primary weights. Even with fast computers of today, you can experience a performance difference by using cmd/ctrl-f on the HTML spec in Firefox and Chrome.) I also don't want to bring collator-based search into scope Gecko or ICU4X. See a URL text fragment issue.

sideshowbarker · 2024-02-27T10:40:14Z

Given that — along with the core “highlight the active match and scroll into view” behavior — browser UIs also expose a count of the total matches for the current query, it’s imaginable that it might be useful to developers (and for testing scenarios too) to have an API which programmatically exposes that total match count to JavaScript code.

annevk added addition/proposal New features or enhancements i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. interop Implementations are not interoperable with each other labels Mar 7, 2018

domenic mentioned this issue Mar 9, 2018

Explain that window.find() is not relevant rakina/find-in-page-api#11

Closed

js-choi mentioned this issue Mar 13, 2018

Clarify relationships with other specs rakina/find-in-page-api#12

Closed

annevk mentioned this issue Oct 25, 2019

Scroll To Text Fragment mozilla/standards-positions#194

Closed

fred-wang mentioned this issue Oct 26, 2019

Clarify how scroll to fragment is performed WICG/scroll-to-text-fragment#66

Closed

himorin mentioned this issue Nov 12, 2019

Potentially standardize window.find() w3c/i18n-activity#811

Open

vmpstr mentioned this issue Jul 29, 2020

Added find-in-page definitions #5770

Merged

josepharhar mentioned this issue Aug 5, 2021

Support testing find-in-page web-platform-tests/wpt#29915

Open

aphillips mentioned this issue Sep 6, 2021

Fix matching user input to datalist values #4814 #7003

Merged

3 tasks

whatwg deleted a comment Oct 9, 2022

hsivonen mentioned this issue Sep 28, 2023

Please reconsider the use of collator-based search WICG/scroll-to-text-fragment#233

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentially standardize window.find() #3539

Potentially standardize window.find() #3539

annevk commented Mar 7, 2018

js-choi commented Mar 7, 2018 •

edited

Loading

fred-wang commented Mar 13, 2018

grantcv1 commented Mar 30, 2018

tilgovi commented Jun 11, 2018

vmpstr commented Jun 12, 2020

domenic commented Jun 12, 2020

vmpstr commented Jun 12, 2020

domenic commented Jun 12, 2020

aphillips commented Jun 13, 2020

annevk commented Jun 14, 2020

domenic commented Jun 15, 2020

domenic commented Jun 15, 2020

domenic commented Aug 7, 2020

xfq commented Aug 15, 2020

petelomax commented Sep 4, 2021

domenic commented Sep 4, 2021

mantou132 commented May 9, 2022 •

edited

Loading

hsivonen commented Sep 28, 2023

sideshowbarker commented Feb 27, 2024

Potentially standardize window.find() #3539

Potentially standardize window.find() #3539

Comments

annevk commented Mar 7, 2018

js-choi commented Mar 7, 2018 • edited Loading

fred-wang commented Mar 13, 2018

grantcv1 commented Mar 30, 2018

tilgovi commented Jun 11, 2018

vmpstr commented Jun 12, 2020

domenic commented Jun 12, 2020

vmpstr commented Jun 12, 2020

domenic commented Jun 12, 2020

aphillips commented Jun 13, 2020

annevk commented Jun 14, 2020

domenic commented Jun 15, 2020

domenic commented Jun 15, 2020

domenic commented Aug 7, 2020

xfq commented Aug 15, 2020

petelomax commented Sep 4, 2021

domenic commented Sep 4, 2021

mantou132 commented May 9, 2022 • edited Loading

hsivonen commented Sep 28, 2023

sideshowbarker commented Feb 27, 2024

js-choi commented Mar 7, 2018 •

edited

Loading

mantou132 commented May 9, 2022 •

edited

Loading