Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule 2ee8b8 ("Visible label is part of accessible name"): introducing a new "label in name algorithm". #2075

Open
wants to merge 42 commits into
base: develop
Choose a base branch
from

Conversation

dan-tripp-siteimprove
Copy link
Collaborator

@dan-tripp-siteimprove dan-tripp-siteimprove commented Jun 22, 2023

<< Describe the changes >>

Closes issue(s):

Need for Call for Review:
This will require a 2 weeks Call for Review


Pull Request Etiquette

When creating PR:

  • [ x] Make sure you're requesting to pull a branch (right side) to the develop branch (left side).
  • [x ] Make sure you do not remove the "How to Review and Approve" section in your pull request description

After creating PR:

  • [ x] Add yourself (and co-authors) as "Assignees" for PR.
  • [ x] Add label to indicate if it's a Rule, Definition or Chore.
  • [x ] Link the PR to any issue it solves. This will be done automatically by referencing the issue at the top of this comment in the indicated place.
  • [ x] Optionally request feedback from anyone in particular by assigning them as "Reviewers".

When merging a PR:

  • Close any issue that the PR resolves. This will happen automatically upon merging if the PR was correctly linked to the issue, e.g. by referencing the issue at the top of this comment.

How to Review And Approve

  • Go to the “Files changed” tab
  • Here you will have the option to leave comments on different lines.
  • Once the review is completed, find the “Review changes” button in the top right, select “Approve” (if you are really confident in the rule) or "Request changes" and click “Submit review”.
  • Make sure to also review the proposed Call for Review period. In case of disagreement, the longer period wins.

@dan-tripp-siteimprove dan-tripp-siteimprove added Rule Update Use this label for an existing rule that is being updated reviewers wanted labels Jun 22, 2023
@dan-tripp-siteimprove dan-tripp-siteimprove self-assigned this Jun 22, 2023
@dan-tripp-siteimprove dan-tripp-siteimprove changed the title Rule 2ee8b8 may 2023 Rule 2ee8b8 ("Visible label is part of accessible name"): introducing a new "label in name algorithm". Jun 22, 2023
@WilcoFiers
Copy link
Member

@dan-tripp-siteimprove Since this is being worked on still by @kengdoj, can we set this to draft?

@dan-tripp-siteimprove dan-tripp-siteimprove marked this pull request as draft July 20, 2023 21:19
@dan-tripp-siteimprove
Copy link
Collaborator Author

@dan-tripp-siteimprove Since this is being worked on still by @kengdoj, can we set this to draft?

Done

Jym77
Jym77 previously requested changes Nov 9, 2023
Copy link
Collaborator

@Jym77 Jym77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I like the details and the many new examples that explicit the decisions we've taken.

_rules/visible-label-in-accessible-name-2ee8b8.md Outdated Show resolved Hide resolved
_rules/visible-label-in-accessible-name-2ee8b8.md Outdated Show resolved Hide resolved
pages/glossary/visible-inner-text.md Outdated Show resolved Hide resolved

The <dfn id="for-text">visible inner text of a [text node][]</dfn> is:
- if the [text node][] is [visible][], its visible inner text is its [data][];
- if the [text node][] is not-[visible][], [rendered][], and contains only [whitespace][], its visible inner text is the string `" "` (a single ASCII whitespace);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional here sounds a bit weird 🤔
Notably, a text node that is not visible, rendered, and contains more than whitespace (e.g. in <span style="visibility: hidden">Hello</span>) would not trigger it and therefore have an empty string as visible inner text (rather than a whitespace).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question. I don't know the answer. But I'll note that I copied this definition from sanshikan so if it needs fixing here, it probably needs fixing there too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, doing some archaeology, this is due to the fact that whitespace are not visible per our definition…

<button aria-label="hello world"><span>hello</span><span id="space"> </span><span>world</span></button>

The span#space is not visible (and neither is its child text node). So the first bullet doesn't apply. Without the second bullet, the visible inner text of the button would be helloworld, not matching the accessible name of hello world due to spacing…
I guess we need to add an example to show that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b2df021

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises another question: what should we do with this?
<a aria-label="Download specification" href="#"><span>Download</span><span style="visibility: hidden">x</span><span>specification</span></a>
According to the current definition, because of the clause "contains only [whitespace][]", the visible inner text of the <a> element is "Downloadspecification". Visually it looks like "Download specification". So I wonder if we could remove the clause "contains only [whitespace][]". What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 🤔 But if the span was invisible due to absolute positioning out of viewport, it shrould be removed:

<a aria-label="Download specification" href="#"><span>Download</span><span style="position: absolute; left: -9999px">x</span><span>specification</span></a>

I guess the true condition is whether it creates a CSS box that lies somewhere between the ones of the rest of the text taking part in the computation (and isn't fully contained in them), or something like that 🙈
Or maybe we just make the special case for visibility: hidden and assume that these is already a corner case and that it won't create too many true problems (We've been using that definition in Alfa for two years and I don't remember seeing a problem caused by it, so it may be safe to assume that it is a good enough approximation).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has given me a lot to think about. I'll try to bring it up in our next one-on-one meeting.

pages/glossary/visible-inner-text.md Outdated Show resolved Hide resolved
pages/glossary/visible-inner-text.md Outdated Show resolved Hide resolved
pages/glossary/visible-inner-text.md Outdated Show resolved Hide resolved
pages/glossary/visible-inner-text.md Outdated Show resolved Hide resolved
pages/glossary/label-in-name-algorithm.md Show resolved Hide resolved
pages/glossary/label-in-name-algorithm.md Outdated Show resolved Hide resolved
…://github.com/Siteimprove/sanshikan/blob/main/terms/visible-inner-text.md)

- changing glossary links' prefixes from "./" to "#".  I don't know if the former was working or not.  but the latter is the common practice, it seems.
…placing it with a new idea: the algorithm 'return value' eg. 'returns "is contained"'.

- rewording rule expectation.  I think that 'For the target element' is better than 'For each target element' because for this rule, the computation of the expecation for each applicable target element is done in isolation from the other applicable targets on the page.  It's simpler if the "for loop" over all applicable targets is done by the tool, not the rule.
@Jym77 Jym77 dismissed their stale review November 10, 2023 09:15

Changes done

…ses.

Cases handled better as of this commit:

- <a aria-label="Download specification" href="#"><span>Download</span><span style="visibility: hidden">x</span><span>specification</span></a>
	- desired visible inner text: "Download specification"
	- visible inner text before this commit: "Downloadxspecification"
	- visible inner text as of this commit: "Download specification"
	- desired == actual.  this is good.

- <a aria-label="Download specification" href="#"><span>Download</span><span style="visibility: hidden; width: 0; display: inline-block">x</span><span>specification</span></a>
	- desired visible inner text: "Downloadspecification"
	- visible inner text before this commit: "Downloadxspecification"
	- visible inner text as of this commit: "Downloadspecification"
	- desired == actual.  this is good.

Case not changed by this commit:

- <a aria-label="Download specification" href="#"><span>Download</span><span style="position: absolute; left: -9999px">x</span><span>specification</span></a>
	- desired visible inner text: "Downloadspecification"
	- visible inner text before this commit: "Downloadxspecification"
	- visible inner text as of this commit: "Downloadxspecification"
	- desired != actual.  this is (still) bad.  it violates this assumption of the rule (visible-label-in-accessible-name-2ee8b8.md): "This rule assumes that neither the label nor the visible inner text are rearranged with CSS in some way so that they appear to the user in a different order than they do in the DOM."
@dan-tripp-siteimprove
Copy link
Collaborator Author

I just pushed a commit a few minutes ago (7b2a053) which handles some more cases. Namely:

Cases handled better as of this commit:

  • <a aria-label="Download specification" href="#"><span>Download</span><span style="visibility: hidden">x</span><span>specification</span></a>

    • desired visible inner text: "Download specification"
    • visible inner text before this commit: "Downloadxspecification"
    • visible inner text as of this commit: "Download specification"
    • desired == actual. This is good.
  • <a aria-label="Download specification" href="#"><span>Download</span><span style="visibility: hidden; width: 0; display: inline-block">x</span><span>specification</span></a>

    • desired visible inner text: "Downloadspecification"
    • visible inner text before this commit: "Downloadxspecification"
    • visible inner text as of this commit: "Downloadspecification"
    • desired == actual. This is good.

Case not changed by this commit:

  • <a aria-label="Download specification" href="#"><span>Download</span><span style="position: absolute; left: -9999px">x</span><span>specification</span></a>
    • desired visible inner text: "Downloadspecification"
    • visible inner text before this commit: "Downloadxspecification"
    • visible inner text as of this commit: "Downloadxspecification"
    • desired != actual. This is (still) bad. It violates this assumption of the rule: "This rule assumes that neither the label nor the visible inner text are rearranged with CSS in some way so that they appear to the user in a different order than they do in the DOM."

I think this is enough progress that I'll take this PR out of draft.

@dan-tripp-siteimprove dan-tripp-siteimprove marked this pull request as ready for review April 10, 2024 20:15
@dan-tripp-siteimprove
Copy link
Collaborator Author

dan-tripp-siteimprove commented Apr 12, 2024 via email

Copy link
Collaborator

@dd8 dd8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely heading in the right direction. My main concern is the split on regex, this could be really slow on some types of input (will usually be fast, but could run for minutes for on some pages). I've suggested an alternative - which passes the unit tests and is about to go into production. Also some issues around Internationalization in the comments.

<button aria-label="💡 Submit 💡">&gt;&gt;&gt; ** Submit ** &lt;&lt;&lt;</button>
```


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding the "First Name:" pass example from the Understanding document would be useful (since it's very common).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a strong example for the SC, but I can't see how to adapt it to this ACT rule. That example (<label for="firstname">First Name:</label> <input id="firstname" type="text" name="firstname">) doesn't meet the applicability of this ACT rule because <input> doesn't support "name from content". For us to add an example which uses that visible label "First Name:" on an element which does support "name from content" (mostly <a> and <button>) seems unrealistic to me. Suggestions welcome on how to reconcile this.

- For b) Use the Unicode classes Letter, Mark, and "Number, Decimal Digit [Nd]". (This will exclude hyphens, punctuation, emoji, and more.)
- Remove all characters that are within parentheses (AKA round brackets).
- Ignore square brackets and braces.
- Split the string into a list of strings, using a greedy [whitespace][] regular expression as the separator.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part depends on your implementation language having a split on regex method. It's also quite slow - in Rust regex::split has time-complexity O(m * n^2). Regexs in other languages are also very slow see: https://swtch.com/~rsc/regexp/regexp1.html

Normalizing the strings by trimming and collapsing runs of [whitespace][] then looking for a substring match does the same thing, and is much faster - you can normalize a string in O(n) time, and the time-complexity of substring matching algorithms is usually O(n-m+1).

The algorithms in specs like the URL and HTML Standards are very detailed, but they only use very fast universally available operations like incrementing an index or comparing characters (which compile or JIT to single CPU instructions). If the algorithm was written in this style it would look something like:

A label is contained in name if:
1. Let normalizedLabel be the result of running voice-control string normalizer on label
2. Let normalizedName be the result of running voice-control string normalizer on name
3. If normalizedLabel matches a substring in normalizedName return true
4. Otherwise return false

The voice-control string normalizer takes a string input, and returns a normalized string:
1. Let output be the empty string
2. Let prev be the null code point
3. For each code point c in input:
 3.1 If c is non-text or c is whitespace let c = U+0020 Space // replace non-text with space characters
 3.2 If c == U+0020 Space
  3.2.1 If output is empty continue; // discard leading whitespace
  3.2.2 Otherwise If prev == U+0020 Space continue; // merge runs of space
 3.3 Let prev = c
 3.4 Append c to output
5. If prev == U+0020 Space remove the last code point from output // trim trailing whitespace
6. Return output

The algorithm should take Unicode normalization into account somewhere - the code point U+00F1 ñ [LATIN SMALL LETTER N WITH TILDE] produces an identical glyph on screen to the code points U+006E n [LATIN SMALL LETTER N] followed by U+0303 ◌̃ [COMBINING TILDE]. There are specific algorithms to produce strings suitable for comparing: https://en.wikipedia.org/wiki/Unicode_equivalence. These are typically available in a library somewhere (e.g. String.prototype.normalize() in JS)

There are also tricky corner cases, like ß and ss being equivalent and voiced exactly the same in German (this might be handled by unicode normalization)

Copy link
Collaborator Author

@dan-tripp-siteimprove dan-tripp-siteimprove Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out. I'm going to keep defending regular expressions for the time being, because I can't see how to handle this case otherwise: <a href="#" aria-label="Discover Italy">Discover It</a>. [Edit: I can see now, by adapting your string normalization pseudocode to output a list of strings instead of a string. But the necessity of doing that is still unclear to me.]

Obviously we want to avoid a situation like "over sixty seconds to match a 29-character string". It seems that the article's concern only applies to regular expressions which involve backtracking, and my algorithm's regex ("a greedy whitespace regular expression") isn't one of those. In Javascript, my algorithm's regex would be this: const regex = /\p{White_Space}+/gu;, which (I think) can't result in any backtracking, because there is nothing before the "atom" \p{White_Space} to backtrack to.

As for the concern about O(m * n^2) complexity, I'm unclear on what exactly m and n are, but I'm open to doing some tests. As always with Big O notation, even if the scaling (for large m and n) is bad, the unspoken constant "C" might be small enough that it's not a real problem for our uses cases.

Regarding Unicode normalization: no argument there. I'll try to add something to that end. Not sure when the best time to do that would be.

Copy link
Collaborator

@dd8 dd8 Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, a more serious issue is the whitespace RegEx and my suggested algorithm only work for a restricted set of languages. Thai, Chinese and Japanese don't use whitespace to separate words.

MDN also recommends against using Regex for word breaking:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Word_boundary_assertion#examples

One way of resolving this - instead of:

Split the string into a list of strings, using a greedy [whitespace][] regular expression as the separator.

say something like

Split the string into a list of strings, using the word breaking rules for the inherited programmatic language.

This won't force a change on implementations already using regexes for European languages, but avoids incompatibility with languages like Thai, Chinese and Japanese.

Of note: CSS doesn't specify exactly where word boundaries occur in different languages but does link to additional specs that do: https://drafts.csswg.org/css-text/#soft-wrap-opportunity

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I haven't had time to work on this. I think I will later this month.

- For each character that either a) represents non-text content, or b) isn't a letter or a digit: replace that character with a space character.
- For a) Judgment of "non-text" probably can't be fully automated. For example: "X" for "close" probably can be automated, but presumably there are more cases than this.
- For b) Use the Unicode classes Letter, Mark, and "Number, Decimal Digit [Nd]". (This will exclude hyphens, punctuation, emoji, and more.)
- Remove all characters that are within parentheses (AKA round brackets).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthesis mean something very different in German, so maybe shouldn't be removed in German text:
https://translationpost.com/2019/08/06/difference-german-english-parenthesis-usage/

And in Arabic parenthesis are often uses as quote marks:
https://forum.wordreference.com/threads/parentheses-in-arabic.1772289/

In accounting, numbers in round brackets are negative - typically losses or debts - so these are really important.
e.g. Profit: (2,400) is a 2,400 loss, but would match aria-label='Profit'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was educational. I admit that this PR will not handle those cases well. But at least it will handle them as "false negatives", which are inevitable in ACT rules. I think that this concern - and all of your concerns here - are valuable, but I don't think that this PR can reasonably be expected to handle them all. This PR "closes" 5 issues, explicitly "does not handle" 3 more, and (I think) doesn't do any harm. I'd like to improve the rule a little, then take it from there. This PR has been open for 10 months. "Perfect is the enemy of good" and all that.

Sub-algorithm to tokenize a string:

- Convert the string to lower case.
- For each character that either a) represents non-text content, or b) isn't a letter or a digit: replace that character with a space character.
Copy link
Collaborator

@dd8 dd8 May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can cause false positives with words that are not consistently hyphenated. For example:

'antimatter' vs 'anti-matter' (the Wikipedia antimatter article uses both)
'derisk' vs 'de-risk' (Cambridge dictionary uses first spelling, Collins dictionary uses the second)
'nonnegative' vs 'non-negative' https://math.stackexchange.com/a/3344027

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, and I assume the voice control software would handle it fine, but if it was a customer who was arguing this, I would say "too bad" (diplomatically, of course). This is part of a larger issue that I call "you need to play ball with the automated check". I run into customer arguments like this often enough that I have a "canned response" it. It reads, in part: "It's not feasible for an automated checker to know any better in this situation. When you use an automated checker instead of a manual audit, in order to gain efficiency, you lose accuracy. This is an unavoidable example of that."

For the page author to "play ball" for "label in name" would be easy: use the same spelling, including hyphenation, in the aria-label that they used in the visible label. This is, I think, not a lot to ask of a page author.

I could add an assumption to the rule. Something along the lines of "This rule assumes that for any word (including any hyphenated word) that appears in both the accessible name and the visible label, the same spelling and hyphenation is used in both places. For example, using 'antimatter' in the accessible name and 'anti-matter' in the visible label would fail this rule, but arguably pass the Success Criterion.

Copy link
Collaborator

@dd8 dd8 Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bearing in mind this is for voice control, would a voice control user pronounce the hyphenated and non-hyphenated versions differently? Personally, I would pronounce both versions identically:

'antimatter' vs 'anti-matter'
'derisk' vs 'de-risk'
'nonnegative' vs 'non-negative'

and sometimes it's context specific:

'1-2' might be pronounced as 'one two' in a sports context, but as 'one minus two' in a maths context/

…t-rules#2075 (comment)

- adding Unicode case folding and normalization form KD.  because of act-rules#2075 (comment)
- making the word-breaking step of the algorithm generic to more languages than english.  because of act-rules#2075 (comment)
@dan-tripp-siteimprove
Copy link
Collaborator Author

@dd8 I just pushed commit 9723ed1 which addresses some of your concerns.

@dan-tripp-siteimprove
Copy link
Collaborator Author

@dd8 I learned that the conversion of the German "ß" to "ss" is not handled by normalization and is handled by "case folding".

@dd8
Copy link
Collaborator

dd8 commented Sep 24, 2024

@dd8 I just pushed commit 9723ed1 which addresses some of your concerns.

Ok, I think that addresses the points raised.

@dd8
Copy link
Collaborator

dd8 commented Sep 24, 2024

We've just come across a new issue - you can add visual space between elements using CSS, which do not correspond to whitespace nodes. The algorithm considers some CSS display: values, but there are lots of ways to add visual space between elements (padding:, margin:, position:, width:, gap: etc)

It's not clear if there's a simple way to resolve this.

This example breaks if the algorithm does add a space between the spans in the visible label calculation because it visually renders as:

Once upon a time

<style>
.dropcap {
    font-size: 150%;
    font-family: Helvetica, sans-serif;
}
</style>
<p aria-label="Once upon a time"><span class="dropcap">O</span><span>nce upon a time...</span></p>

From here: https://www.w3.org/WAI/GL/tests/drop-caps.html

This example breaks if the algorithm does not add a space between the spans in the visible label calculation because it visually renders as:

4 Service disruption

<style>
    button { display: flex; gap: 10px; }
</style>
<button aria-label="4 Service disruption">
    <span aria-hidden="true" class="c-combined-messages__count--value">4</span><span aria-hidden="true" class="c-combined-messages__button--text">Service disruption</span>
</button>

This is a simplified version of a real world example from a customer.

- adding "whitespace via CSS" clause to address review comment act-rules#2075 (comment)
- rewording other parts.
@dan-tripp-siteimprove
Copy link
Collaborator Author

@dd8 As a band-aid solution, at least, I just pushed a commit (4da5300) which adds an assumption which covers "whitespace added via CSS". In the long term, I don't know what the best solution is. It's a false positive, which is quite bad. For what it's worth, I believe that this PR doesn't create this false positive. This false positive already existed, because the rule (pre-PR) is based on DOM text nodes (where it says "all text nodes in the visible text content") and so ignores "whitespace added via CSS". So I think that this false positive isn't a reason that this PR shouldn't be merged.

@dan-tripp-siteimprove
Copy link
Collaborator Author

When and if we get boundary/verbose examples, the "whitespace added via CSS" example would be a good one.

@dd8
Copy link
Collaborator

dd8 commented Sep 26, 2024

@dd8 This false positive already existed, because the rule (pre-PR) is based on DOM text nodes (where it says "all text nodes in the visible text content") and so ignores "whitespace added via CSS". So I think that this false positive isn't a reason that this PR shouldn't be merged.

Yes, I agree the false positive already exists, so the PR doesn't make this any worse, and makes it slightly better due to display: handling.

One algorithm tweak which might resolve this (but it requires layout) is:

  • if prevContentBox.right == nextContentBox.left and prevContentBox.bottom == nextContentBox.bottom then don't add a space (i.e. no gap between elements on same line)
  • otherwise add a space

@Jym77
Copy link
Collaborator

Jym77 commented Sep 27, 2024

I haven't followed the full discussion here 😅 but I recently stumbled upon White Space Processing Rules in CSS. Not sure if that makes things easier or harder for us, though…

@dd8
Copy link
Collaborator

dd8 commented Sep 27, 2024

I haven't followed the full discussion here 😅 but I recently stumbled upon White Space Processing Rules in CSS. Not sure if that makes things easier or harder for us, though…

Yes. Implementations that do their own layout do need to take this into consideration (but implementations doing target size rules already need to do this). JS implementations that get layout from Element: getClientRects() probably don't need to consider this because the browser has done all the calcs already.

I think you can avoid a lot of complexity in the visible layout algorithm by referring to the content box(s) for the element. For the visible label calc you just need to test if the content box edges have a gap between sibling elements to determine if the visible label needs a space added between the elements.

<!-- visual label: "Once upon a time" (no gap between edges of span content boxes - no space added) -->
<p><span style="font-weight:bold">O</span><span>nce upon a time</span>

<!-- visual label: "4 Services delayed" (4px gap between edges of span content boxes, space added) -->
<p><span style="font-weight:bold; padding-right:4px;">4</span><span>Services delayed</span>

@dan-tripp-siteimprove
Copy link
Collaborator Author

I hereby propose that the handling of whitespace-via-CSS should be done in a separate PR. I sincerely like the idea and I think that it should be done. I'm glad that you brought it up, because I never would have thought of it myself. But this PR has been open for 15 months. @dd8 - how's this for a plan: we get this PR merged and then you and I immediately make a PR for whitespace-via-CSS?

@dd8
Copy link
Collaborator

dd8 commented Oct 14, 2024

After looking at this again I think there's a fundamental problem with any algorithm proposed for the visible label computation. The visible inner text computation has to work exactly like step F Name From Content in https://www.w3.org/TR/accname-1.2/#comp_name_from_content.

If it doesn't work identically (including the exact places whitespace is added) the rule will produce false positives when the accessible name is calculated from content because the calculated accname and calculated visible label won't match. For example, the visible inner text algorithm adds spaces for certain display: values but accname doesn't:

    <button>
        <div class="ia-control-label"><span class="title">Certification Progress</span><span class="subtitle">1/3 passed</span></div>
    </button>

If the visible label computation needs to be exactly the same then it should just reference step F in accname, and only include a subset of the recursive steps (i.e. include step G. Text Nodes and H. Recursive Name From Content (and probably some others) but exclude steps like D. AriaLabel and B. AriaLabelledBy)

Of note: there's an ongoing discussion about taking display values into account in accname to add extra spaces, but visible inner text would still need to match an updated accname calculation.
w3c/accname#225

@dan-tripp-siteimprove
Copy link
Collaborator Author

@dd8 I think we might be able to dodge this case. The problem you describe would happen "when the accessible name is calculated from content". But I think that we at ACT don't need to worry about that case, because this rule's applicability covers only cases where "The element has an aria-label or aria-labelledby attribute". So if an element meets the rule's applicability, then the element's accessible name will not be calculated from content, so the problem you describe won't happen - at least, it won't happen to us. Other people (such as AT vendors, I assume) need to worry about that problem, but we at ACT won't, as long as the applicability stays like that. Any element which meets the ACT rule's applicability will, in its accessible name computation, go to Step B (LabelledBy) or Step D (AriaLabel) and will not reach Step F (Name From Content). So the <button> example you wrote doesn't meet the rule's applicability. Am I looking at this correctly?

@Jym77
Copy link
Collaborator

Jym77 commented Oct 22, 2024

🤔 aria-labelledby will end up calculating name from content of the referred elements, so the case should still apply.

Additionally, it's probably a good idea to look closer at the "name from content" part of the accname computation (why didn't we think about it earlier 😓). They have probably already thought a lot about the weird cases… and if something goes wrong we can say that we do the same as accname computation…

@dd8
Copy link
Collaborator

dd8 commented Oct 22, 2024

🤔 aria-labelledby will end up calculating name from content of the referred elements, so the case should still apply.

Yes, and can also get into problems if the aria-label exactly matches the name-from-content. You can get identical name/label pairs where one is a pass/inapplicable using name-from-content and the other fails because it uses aria-label.

Additionally, it's probably a good idea to look closer at the "name from content" part of the accname computation (why didn't we think about it earlier 😓). They have probably already thought a lot about the weird cases… and if something goes wrong we can say that we do the same as accname computation…

I'm pretty sure the intent of the name-from-content part of the accessible name calculation is to read out the visible label - maybe someone on the ARIA group could confirm that?

The acc name computation also handles additional cases like ::before and ::after content:

I think they key thing is that the visible label calculation is a subset of the acc name calculation (i.e. just the visible parts). If an implementation wants to add extra whitespace to the visible label, it can do that in the common acc name/visible label code path so that the calculations stay consistent.

Most AT already adds additional whitespace to the accessible name in special cases, which is not currently specified in the accname recommendation (hence the discussion in w3c/accname#225)

@dan-tripp-siteimprove
Copy link
Collaborator Author

Ok I see - at least, partly. There are parts of this that are over my head but still I can see now that this identifies a real category of false positives which weren't discussed until now.

@dan-tripp-siteimprove
Copy link
Collaborator Author

Even so - again it seems to me that these whitespace cases shouldn't block this PR, because this PR didn't create them and doesn't make them worse. So using the strategy of incremental improvement (rather than perfection in one fell swoop): @dd8 what do you think of a separate PR for handling these whitespace cases?

@dd8
Copy link
Collaborator

dd8 commented Oct 25, 2024

It looks like w3c/accname#205 is a blocker here. The accname 1.2 spec, current browser implementations, the current visible label rule, and this PR all disagree on where to add whitespace.

It's difficult to be sure whether this PR makes false positives better or worse without a lot more examples.

Here are some test cases:

        <!-- Example 1 - Passes current rule and PR 2075
         accname-1.2: One
         Chrome 129 accname: One
         FF 131 accname: One
         Safari 17.6 accname: One
         visible text content (current rule): One
         visible text content (PR 2075): One
         -->
        <button>One</button>

        <!-- Example 2 - fails current rule and PR 2075 because accname computed per spec has a space that visible text content doesn't
         accname-1.2: One Two
         Chrome 129 accname: OneTwo
         FF 131 accname: OneTwo
         Safari 17.6 accname: One Two (Note: Safari 17 accname has a space, Chrome/FF doesn't add a space)
         visible text content (current rule): OneTwo
         visible text content (PR 2075): OneTwo

         see https://github.com/w3c/accname/issues/205
        -->
        <button><span>One</span><span>Two</span></button>

        <!-- Example 3 - Passes current rule and PR 2075
         accname-1.2: One Two
         Chrome 129 accname: One Two
         FF 131 accname: One Two
         Safari 17.6 accname: One Two
         visible text content (current rule): One Two
         visible text content (PR 2075): One Two
        -->
        <button><span>One</span> <span>Two</span></button>

        <!-- Example 4 - Passes current rule and PR 2075
         accname-1.2: One Two
         Chrome 129 accname: One\nTwo
         FF 131 accname: One Two
         Safari 17.6 accname: One Two
         visible text content (current rule): One Two
         visible text content (PR 2075): One Two
        -->
        <button><span>One</span><br><span>Two</span></button>

        <!-- Example 5 - Fails current rule and passes with PR 2075
         accname-1.2: One Two
         Chrome 129 accname: One Two
         FF 131 accname: One Two
         Safari 17.6 accname: One Two
         visible text content (current rule): OneTwo
         visible text content (PR 2075): One\nTwo\n
        -->
        <button><div>One</div><div>Two</div></button>

@Jym77
Copy link
Collaborator

Jym77 commented Oct 25, 2024

@dd8 From these five examples, it seems that:

  • this PR improves some without making anything worse, so it probably goes in the wrong direction anyway;
  • the weird case that this PR doesn't handle is also something that accname is fighting with, so we probably won't solve it on our own.

Based on that, I feel that we could indeed move on with this PR, it is a step in the good direction. Worst case is that when we come up with the perfect solution we'll throw away everything from here, but in the meantime we are still in a slightly better place.

I agree we should probably take up the discussion with the rest of the CG anyway.

@dd8
Copy link
Collaborator

dd8 commented Oct 25, 2024

I think the whitespace handling in the different algorithms can be summarised like so:

  • accname-1.2 always adds a space between nodes
  • browser accname implementations usually add a space between nodes (space added for all display values except display:inline in Chrome/FF, space always added in Safari)
  • visible inner text used by this PR sometimes adds a space between nodes (space added for display:block display:table-caption display:table-cell display:table-row)
  • visible text content used by the previous version of this rule never adds a space between nodes

@dan-tripp-siteimprove Is my summary of visible inner text and visible text content correct?

@dan-tripp-siteimprove
Copy link
Collaborator Author

@dan-tripp-siteimprove Is my summary of visible inner text and visible text content correct?

Yes, I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agenda item reviewers wanted Rule Update Use this label for an existing rule that is being updated
Projects
None yet
4 participants