Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete i18n self-review #148

Closed
LJWatson opened this issue Apr 22, 2020 · 3 comments
Closed

Complete i18n self-review #148

LJWatson opened this issue Apr 22, 2020 · 3 comments

Comments

@LJWatson
Copy link

We need to complete an I18n self-review before we can request an official review by the I18n WG.

There is a short checklist that will help identify parts of the spec
that need i18n attention. There is then a more detailed checklist to be completed.

Related issue w3c/webappswg#27

@inexorabletash
Copy link
Member

inexorabletash commented Apr 29, 2020

Language

Language basics

  1. It should be possible to associate a language with any piece of natural language text that will be read by a user. more

Not directly supported. The text fields (text and title) are not given metadata such as language.

Sharing rich text (such as HTML markup) as the content of text or one of the files is possible, if the sharing web site and share target support it, and would allow language association.

  1. Where possible, there should be a way to label natural language changes in inline text. more

Not supported, except by sharing rich text (as above).

  1. Consider whether it is useful to express the intended linguistic audience of a resource, in addition to specifying the language used for text processing. more

Not supported.

  1. A language declaration that indicates the text processing language for a range of text must associate a single language value with a specific range of text. more

N/A

  1. Use the HTML lang and XML xml:lang language attributes where appropriate to identify the text processing language, rather than creating a new attribute or mechanism. more

N/A

  1. It should be possible to associate a metadata-type language declaration (which indicates the intended use of the resource rather than the language of a specific range of text) with multiple language values. more

Not supported.

  1. Attributes that express the language of external resources should not use the HTML lang and XML xml:lang language attributes, but should use a different attribute when they represent metadata (which indicates the intended use of the resource rather than the language of a specific range of text). more

N/A

Defining language values

  1. Values for language declarations must use BCP 47. more

N/A

  1. Refer to BCP 47, not to RFC 5646. more

N/A

  1. Be specific about what level of conformance you expect for language tags: BCP 47 defines two levels of conformance, "valid" and "well-formed".

N/A

  1. Specifications may require implementations to check if language tags are "valid", but in most circumstances should only require that the language tags be "well-formed".

N/A

  1. Specifications should require content and content authors to use "valid" language tags.

N/A

  1. Reference BCP47 for language tag matching.

N/A

Declaring language at the resource level

  1. The specification should indicate how to define the default text-processing language for the resource as a whole. more

Not supported.

  1. Content within the resource should inherit the language of the text-processing declared at the resource level, unless it is specifically overridden.

N/A - the content is not structured.

  1. Consider whether it is necessary to have separate declarations to indicate the text-processing language versus metadata about the expected use of the resource. more

N/A

  1. If there is only one language declaration for a resource, and it has more than one language tag as a value, it must be possible to identify the default text-processing language for the resource. more

N/A

Establishing the language of a content block

  1. By default, blocks of content should inherit any text-processing language set for the resource as a whole. more

N/A - the content is not structured.

  1. It should be possible to indicate a change in language for blocks of content where the language changes. more

N/A - the content is not structured.

Establishing the language of inline runs

  1. It should be possible to indicate language for spans of inline text where the language changes. more

N/A - the content is not structured.

Text direction

Basic requirements

  1. It must be possible to indicate base direction for each individual paragraph-level item of natural language text that will be read by someone. more

Text content is required to be Unicode (USVString inputs); this allows for the use of directional marks.

  1. It must be possible to indicate base direction changes for embedded runs of inline bidirectional text for all natural language text that will be read by someone. more

Text content is required to be Unicode (USVString inputs); this allows for the use of directional marks.

  1. Annotating right-to-left text must require the minimum amount of effort for people who work natively with right-to-left scripts. more

N/A - the specification does not deal with authoring the text.

Background information

  1. Do not assume that direction can be determined from language information. more

This is not assumed.

Base direction values

  1. Values for the default base direction should include left-to-right, right-to-left, and auto. more

The behavior should probably be specified; for example, sharing from an RTL web site to a share target on an LTR device. Is the user agent expected to introduce directional marks or not?

Handling direction in markup

  1. The spec should indicate how to define a default base direction for the resource as a whole, ie. set the overall base direction. more

N/A - markup is not defined by the specification.

  1. The default base direction, in the absence of other information, should be LTR. more

The behavior should probably be specified.

  1. The content author must be able to indicate parts of the text where the base direction changes. At the block level, this should be achieved using attributes or metadata, and should not rely on Unicode control characters.

N/A - the specification does not define content authoring.

  1. It must be possible to also set the direction for content fragments to auto. This means that the base direction will be determined by examining the content itself.

N/A - the specification does not define content authoring.

  1. If the overall base direction is set to auto for plain text, the direction of content paragraphs should be determined on a paragraph by paragraph basis.

N/A - the specification does not define content authoring.

  1. To indicate the sides of a block of text where relative to the start and end of its contained lines, you should use 'before' and 'after' (maybe block-start/block-end – the terminology is changing), rather than 'top' and 'bottom'.

N/A - the specification does not define content authoring.

  1. To indicate the start/end of a line you should use 'start' and 'end' rather than 'left' and 'right'.

N/A - the specification does not define content authoring.

  1. Provide dedicated attributes for control of base direction and bidirectional overrides; do not rely on the user applying style properties to arbitrary markup to achieve bidi control.

N/A - the specification does not define content authoring.

Handling base direction for strings

  1. Provide metadata constructs that can be used to indicate the base direction of any natural language string. more

This is not supported.

  1. Specify that consumers of strings should use heuristics, preferably based on the Unicode Standard first-strong algorithm, to detect the base direction of a string except where metadata is provided. more

The behavior should probably be specified.

  1. Where possible, define a field to indicate the default direction for all strings in a given resource or document. more

Not supported.

  1. Do NOT assume that a creating a document-level default without the ability to change direction for any string is sufficient. more

N/A - content is not structured.

  1. If metadata is not available due to legacy implementations and cannot otherwise be provided, specifications MAY allow a base direction to be interpolated from available language metadata. more

This may be relevant if there are platform conventions that would limit providing additional metadata.

  1. Specifications MUST NOT require the production or use of paired bidi controls. more

No such requirements are made.

Setting base direction for inline or substring text

  1. It must be possible to indicate spans of inline text where the base direction changes. If markup is available, this is the preferred method. Otherwise your specification must require that Unicode control characters are recognized by the receiving application, and correctly implemented.

Unicode control characters may be used.

  1. It must be possible to also set the direction for a span to auto. This means that the base direction will be determined by examining the content itself. A typical approach here would be to set the direction based on the first strong directional character outside of any markup. more

N/A - content authoring is outside the scope of the specification.

  1. If users use Unicode bidirectional control characters, the isolating RLI/LRI/FSI with PDI characters must be supported by the application and recommended (rather than RLE/LRE with PDF) by the spec.

N/A - content authoring is outside the scope of the specification.

  1. Use of RLM/LRM should be appropriate, and expectations of what those controls can and cannot do should be clear in the spec. more

N/A - content authoring is outside the scope of the specification.

  1. For markup, provide dedicated attributes for control of base direction and bidirectional overrides; do not rely on the user applying style properties to arbitrary markup to achieve bidi control.

N/A - content authoring is outside the scope of the specification.

  1. For markup, allow bidi attributes on all inline elements in markup that contain text.

N/A - content authoring is outside the scope of the specification.

  1. For markup, provide attributes that allow the user to (a) create an embedded base direction or (b) override the bidirectional algorithm altogether; the attribute should allow the user to set the direction to LTR or RTL or the aforementioned Auto in either of these two scenarios.

N/A - content authoring is outside the scope of the specification.

Characters

Choosing a definition of 'character'

  1. Specifications SHOULD use specific terms, when available, instead of the general term 'character'. more

N/A - "character" is not used; the specification deals with complete strings.

  1. When specifications use the term 'character' the specifications MUST define which meaning they intend, and SHOULD explicitly define the term 'character' to mean a Unicode code point. more

N/A - "character" is not used; the specification deals with complete strings.

  1. Specifications, software and content MUST NOT require or depend on a one-to-one relationship between characters and units of physical storage. more

the specification makes no such requirement

  1. Specifications, software and content MUST NOT require or depend on a one-to-one correspondence between characters and the sounds of a language. more

the specification makes no such requirement

  1. Specifications, software and content MUST NOT require or depend on a one-to-one mapping between characters and units of displayed text. more

the specification makes no such requirement

  1. Specifications and software MUST NOT require nor depend on a single keystroke resulting in a single character, nor that a single character be input with a single keystroke (even with modifiers), nor that keyboards are the same all over the world. more

the specification makes no such requirement

Defining a Reference Processing Model

  1. Textual data objects defined by protocol or format specifications MUST be in a single character encoding. more

The specification requires that strings be provided as USVString (16-bit code units providing a valid sequence of Unicode scalar values when interpreted as UTF-16); this allows lossless transcoding to UTF-8 as necessary if required by the share target and/or platform.

  1. All specifications that involve processing of text MUST specify the processing of text according to the Reference Processing Model described by the rest of the recommendations in this list. more

N/A - Text processing is not specified beyond the presence of strings.

  1. Specifications MUST define text in terms of Unicode characters, not bytes or glyphs. more

USVStrings are specified as the accepted string type.

  1. For their textual data objects specifications MAY allow use of any character encoding which can be transcoded to a Unicode encoding form. more

USVStrings are specified as the accepted string type.

  1. Specifications MAY choose to disallow or deprecate some character encodings and to make others mandatory. Independent of the actual character encoding, the specified behavior MUST be the same as if the processing happened as follows: (a) The character encoding of any textual data object received by the application implementing the specification MUST be determined and the data object MUST be interpreted as a sequence of Unicode characters - this MUST be equivalent to transcoding the data object to some Unicode encoding form, adjusting any character encoding label if necessary, and receiving it in that Unicode encoding form, (b) All processing MUST take place on this sequence of Unicode characters, (c) If text is output by the application, the sequence of Unicode characters MUST be encoded using a character encoding chosen among those allowed by the specification. more

Input into the API is in USVStrings which matches platform convention for script APIs accepting valid Unicode character sequences. Encoding of strings when passed to share targets will follow platform conventions; no encoding is specified.

  1. If a specification is such that multiple textual data objects are involved (such as an XML document referring to external parsed entities), it MAY choose to allow these data objects to be in different character encodings. In all cases, the Reference Processing Model MUST be applied to all textual data objects. more

N/A

Including and excluding character ranges

  1. Specifications SHOULD NOT arbitrarily exclude code points from the full range of Unicode code points from U+0000 to U+10FFFF inclusive. more

Supported by USVString type. No exclusions are made.

  1. Specifications MUST NOT allow code points above U+10FFFF. more

Implicitly enforced by USVString type; UTF-16 surrogate pairs cannot encode beyond U+10FFFF.

  1. Specifications SHOULD NOT allow the use of codepoints reserved by Unicode for internal use. more

Following web platform conventions, this restriction is not enforced. Any code point is allowed within strings.

  1. Specifications MUST NOT allow the use of surrogate code points. more

Allowed by USVString type.

  1. Specifications SHOULD exclude compatibility characters in the syntactic elements (markup, delimiters, identifiers) of the formats they define. more

N/A - no markup/etc is defined.

  1. Specifications SHOULD allow the full range of Unicode for user-defined values. more

Following web platform conventions, any code point is allowed within strings.

Using the Private Use Area

  1. Specifications MUST NOT require the use of private use area characters with particular assignments. more

No such requirement is made.

  1. Specifications MUST NOT require the use of mechanisms for defining agreements of private use code points. more

No such requirement is made.

  1. Specifications and implementations SHOULD NOT disallow the use of private use code points by private agreement. more

Following web platform conventions, any code point is allowed within strings.

  1. Specifications MAY define markup to allow the transmission of symbols not in Unicode or to identify specific variants of Unicode characters. more

No such support is provided. Content is limited to Unicode characters.

  1. Specifications SHOULD allow the inclusion of or reference to pictures and graphics where appropriate, to eliminate the need to (mis)use character-oriented mechanisms for pictures or graphics. more

Supported via files; either by sharing image files, or sharing rich text with embedded images.

Choosing character encodings

  1. Specifications MUST either specify a unique character encoding, or provide character encoding identification mechanisms such that the encoding of text can be reliably identified. more

The use of USVString follows platform convention; this is implicitly UTF-16. Encoding of strings before passing to the share target follows platform conventions.

  1. When designing a new protocol, format or API, specifications SHOULD require a unique character encoding. more

Platform conventions are followed, and USVString is mandated.

  1. When basing a protocol, format, or API on a protocol, format, or API that already has rules for character encoding, specifications SHOULD use rather than change these rules. more

Platform conventions are followed, and USVString is mandated.

  1. When a unique character encoding is required, the character encoding MUST be UTF-8, UTF-16 or UTF-32. more

Platform conventions are followed, and USVString (implicitly UTF-16) is mandated.

  1. Specifications SHOULD avoid using the terms 'character set' and 'charset' to refer to a character encoding, except when the latter is used to refer to the MIME charset parameter or its IANA-registered values. The term 'character encoding', or in specific cases the terms 'character encoding form' or 'character encoding scheme', are RECOMMENDED. more

A non-normative note refers to "Unicode encoding"

  1. If the unique encoding approach is not taken, specifications SHOULD require the use of the IANA charset registry names, and in particular the names identified in the registry as 'MIME preferred names', to designate character encodings in protocols, data formats and APIs. more

N/A

  1. Character encodings that are not in the IANA registry SHOULD NOT be used, except by private agreement. more

N/A

  1. If an unregistered character encoding is used, the convention of using 'x-' at the beginning of the name MUST be followed. more

N/A

  1. If the unique encoding approach is not chosen, specifications MUST designate at least one of the UTF-8 and UTF-16 encoding forms of Unicode as admissible character encodings and SHOULD choose at least one of UTF-8 or UTF-16 as required encoding forms (encoding forms that MUST be supported by implementations of the specification). more

N/A

  1. Specifications that require a default encoding MUST define either UTF-8 or UTF-16 as the default, or both if they define suitable means of distinguishing them. more

N/A

Identifying character encodings

  1. Specifications MUST NOT propose the use of heuristics to determine the encoding of data. more

No such requirement is made.

  1. Specifications MUST define conflict-resolution mechanisms (e.g. priorities) for cases where there is multiple or conflicting information about character encoding. more

N/A

Designing character escapes

  1. Specifications should provide a mechanism for escaping characters, particularly those which are invisible or ambiguous. more

N/A - the specification does not define content authoring requiring escaping.

  1. Specifications SHOULD NOT invent a new escaping mechanism if an appropriate one already exists. more

N/A - the specification does not define content escaping.

  1. The number of different ways to escape a character SHOULD be minimized (ideally to one). more

N/A - the specification does not define content escaping.

  1. Escape syntax SHOULD require either explicit end delimiters or a fixed number of characters in each character escape. Escape syntaxes where the end is determined by any character outside the set of characters admissible in the character escape itself SHOULD be avoided. more

N/A - the specification does not define content escaping.

  1. Whenever specifications define character escapes that allow the representation of characters using a number, the number MUST represent the Unicode code point of the character and SHOULD be in hexadecimal notation. more

N/A - the specification does not define content escaping.

  1. Escaped characters SHOULD be acceptable wherever their unescaped forms are; this does not preclude that syntax-significant characters, when escaped, lose their significance in the syntax. In particular, if a character is acceptable in identifiers and comments, then its escaped form should also be acceptable. more

N/A - the specification does not define content escaping.

Storing text

  1. Protocols, data formats and APIs MUST store, interchange or process text data in logical order. more

Strings are not processed unit by unit, only as whole strings.

  1. Independent of whether some implementation uses logical selection or visual selection, characters selected MUST be kept in logical order in storage. more

N/A - selection is not supported.

  1. Specifications of protocols and APIs that involve selection of ranges SHOULD provide for discontiguous logical selections, at least to the extent necessary to support implementation of visual selection on screen on top of those protocols and APIs. more

N/A - selection is not supported.

Defining 'string'

  1. Specifications SHOULD NOT define a string as a 'byte string'. more

No such requirement is made.

  1. The 'character string' definition SHOULD be used by most specifications. more

Close enough.

Referring to Unicode characters

  1. Use U+XXXX syntax to represent Unicode code points in the specification. more

N/A - no specific code point references are made.

Referencing the Unicode Standard

  1. Since specifications in general need both a definition for their characters and the semantics associated with these characters, specifications SHOULD include a reference to the Unicode Standard, whether or not they include a reference to ISO/IEC 10646. more

N/A - the Unicode standard is not referenced.

UTF-8 and UTF-16 are, but reference specific RFCs. Is there a better reference these days?

  1. A generic reference to the Unicode Standard MUST be made if it is desired that characters allocated after a specification is published are usable with that specification. A specific reference to the Unicode Standard MAY be included to ensure that functionality depending on a particular version is available and will not change over time. more

N/A

  1. All generic references to the Unicode Standard MUST refer to the latest version of the Unicode Standard available at the date of publication of the containing specification. more

N/A

  1. All generic references to ISO/IEC 10646 MUST refer to the latest version of ISO/IEC 10646 available at the date of publication of the containing specification. more

N/A

Text-processing

Choosing text units for segmentation, indexing, etc.

  1. The character string is RECOMMENDED as a basis for string indexing. more

N/A - Text segmentation/indexing is not used in the specification.

  1. Grapheme clusters MAY be used as a basis for string indexing in applications where user interaction is the primary concern. more

N/A - Text segmentation/indexing is not used in the specification.

  1. Specifications that define indexing in terms of grapheme clusters MUST either: (a) define grapheme clusters in terms of extended grapheme clusters as defined in Unicode Standard Annex Don't allow the caller to distinguish different failure modes #29, Text Boundaries [UTR Don't allow the caller to distinguish different failure modes #29], or (b) define specifically how tailoring is applied to the indexing operation. more

N/A - Text segmentation/indexing is not used in the specification.

  1. The use of byte strings for indexing is NOT RECOMMENDED. more

N/A - Text segmentation/indexing is not used in the specification.

  1. A UTF-16 code unit string is NOT RECOMMENDED as a basis for string indexing, even if this results in a significant improvement in the efficiency of internal operations when compared to the use of character string. more

N/A - Text segmentation/indexing is not used in the specification.

  1. Specifications that need a way to identify substrings or point within a string SHOULD consider ways other than string indexing to perform this operation. more

N/A - Text segmentation/indexing is not used in the specification.

  1. Specifications SHOULD understand and process single characters as substrings, and treat indices as boundary positions between counting units, regardless of the choice of counting units. more

N/A - Text segmentation/indexing is not used in the specification.

  1. Specifications of APIs SHOULD NOT specify single characters or single 'units of encoding' as argument or return types. more

No such requirement is made. Only handling of strings as complete units is specified.

  1. When the positions between the units are counted for string indexing, starting with an index of 0 for the position at the start of the string is the RECOMMENDED solution, with the last index then being equal to the number of counting units in the string. more

N/A - Text segmentation/indexing is not used in the specification.

Matching string identity for identifiers and syntactic content

  1. String identity matching for identifiers and syntactic content should involve the following steps: (a) Ensure the strings to be compared constitute a sequence of Unicode code points (b) Expand all character escapes and includes (c) Perform any appropriate case-folding and Unicode normalization step (d) Perform any additional matching tailoring specific to the specification, and (e) Compare the resulting sequences of code points for identity. more

N/A - String comparison/matching is not required by the specification.

  1. The default recommendation for matching strings in identifiers and syntactic content is to do no normalization (ie. case folding or Unicode Normalization) of content. more

N/A - String comparison/matching is not required by the specification.

  1. 'ASCII case fold' and 'Unicode canonical case fold' approaches should only be used in special circumstances. more

N/A - String comparison/matching is not required by the specification.

  1. A 'Unicode compatibility case fold' approach should not be used. more

N/A - String comparison/matching is not required by the specification.

  1. Specifications of vocabularies MUST define the boundaries between syntactic content and character data as well as entity boundaries (if the language has any include mechanism). more

N/A

Working with Unicode Normalization

  1. Specifications SHOULD NOT specify a Unicode normalization form for encoding, storage, or interchange of a given vocabulary. more

Normalization is not specified.

  1. Implementations MUST NOT alter the normalization form of syntactic or natural language content being exchanged, read, parsed, or processed except when required to do so as a side-effect of text transformation such as transcoding the content to a Unicode character encoding, case folding, or other user-initiated change, as consumers or the content itself might depend on the de-normalized representation. more

Normalization is not specified.

  1. Specifications SHOULD NOT specify compatibility normalization forms (NFKC, NFKD). more

Normalization is not specified.

  1. Specifications MUST document or provide a health-warning if canonically equivalent but disjoint Unicode character sequences represent a security issue. more

The specification notes: The data passed to navigator.share() might be used to exploit buffer overflow or other remote code execution vulnerabilities in native applications that receive shares. There is no general way to guard against this, but implementors will want to be aware that it is a possibility. This includes vulnerabilities in native applications as a result of improper handling of textual content, e.g. assumptions about normalization.

  1. Where operations can produce denormalized output from normalized text input, specifications MUST define whether the resulting output is required to be normalized or not. Specifications MAY state that performing normalization is optional for some operations; in this case the default SHOULD be that normalization is performed, and an explicit option SHOULD be used to switch normalization off. more

N/A

  1. Specifications that require normalization MUST NOT make the implementation of normalization optional. more

No such requirement is made.

  1. Normalization-sensitive operations MUST NOT be performed unless the implementation has first either confirmed through inspection that the text is in normalized form or it has re-normalized the text itself. Private agreements MAY be created within private systems which are not subject to these rules, but any externally observable results MUST be the same as if the rules had been obeyed. more

No such requirement is made.

  1. A normalizing text-processing component which modifies text and performs normalization-sensitive operations MUST behave as if normalization took place after each modification, so that any subsequent normalization-sensitive operations always behave as if they were dealing with normalized text. more

N/A

Case folding

  1. Specifications and implementations that define string matching as part of the definition of a format, protocol, or formal language (which might include operations such as parsing, matching, tokenizing, etc.) MUST define the criteria and matching forms used. These MUST be one of: (a) case-sensitive (b) Unicode case-insensitive using Unicode full case-folding (c) ASCII case-insensitive.

N/A - String comparison/matching is not required by the specification.

  1. Case-sensitive matching is RECOMMENDED for matching syntactic content, including user-defined values. more

N/A - String comparison/matching is not required by the specification.

  1. Specifications that define case-insensitive matching in vocabularies that include more than the Basic Latin (ASCII) range of Unicode MUST specify Unicode full casefold matching. more

N/A - String comparison/matching is not required by the specification.

  1. Specifications that define case-insensitive matching in vocabularies limited to the Basic Latin (ASCII) subset of Unicode MAY specify ASCII case-insensitive matching. more

N/A - String comparison/matching is not required by the specification.

  1. If language-sensitive case-sensitive matching is specified, Unicode case mappings SHOULD be tailored according to language and the source of the language used for each tailoring MUST be specified. more

N/A - String comparison/matching is not required by the specification.

  1. Specifications that define case-insensitive matching in vocabularies SHOULD NOT specify language-sensitive case-insensitive matching. more

N/A - String comparison/matching is not required by the specification.

Truncating or limiting the length of strings

  1. Specifications SHOULD NOT limit the size of data fields unless there is a specific practical or technical limitation.

No such requirement is made by the specification.

  1. Specifications that limit the length of a string MUST specify which type of unit (extended grapheme clusters, Unicode code points, or code units) the length limit uses.

N/A - No such requirement is made by the specification.

  1. Specifications that limit the length of a string SHOULD specify the length in terms of Unicode code points.

N/A - No such requirement is made by the specification.

  1. If a specification sets a length limit in code units (such as bytes), it MUST specify that truncation can only occur on code point boundaries.

N/A - No such requirement is made by the specification.

  1. If a specification specifies a length limit, it SHOULD specify that any string that is truncated include an indicator, such as ellipses, that the string has been altered.

N/A - No such requirement is made by the specification.

  1. When specifying a length limitation in code units (such as bytes), specifications SHOULD set the maximum length in a way that accommodates users whose language requires multibyte code unit sequences.

N/A - No such requirement is made by the specification.

Specifying sort and search functionality

  1. Software that sorts or searches text for users SHOULD do so on the basis of appropriate collation units and ordering rules for the relevant language and/or application. more

N/A - The specification does not define sorting or searching.

  1. Where searching or sorting is done dynamically, particularly in a multilingual environment, the 'relevant language' SHOULD be determined to be that of the current user, and may thus differ from user to user. more

N/A - The specification does not define sorting or searching.

  1. Software that allows users to sort or search text SHOULD allow the user to select alternative rules for collation units and ordering. more

N/A - The specification does not define sorting or searching.

  1. Specifications and implementations of sorting and searching algorithms SHOULD accommodate text that contains any character in Unicode. more

N/A - The specification does not define sorting or searching.

Resource identifiers

Basics

  1. Resource identifiers must permit the use of characters outside those of plain ASCII. discussion

N/A - The specification does not define resource identifiers.

  1. Specifications MUST define when the conversion from IRI references to URI references (or subsets thereof) takes place, in accordance with Internationalized Resource Identifiers (IRIs). more

N/A - The specification does not define resource identifiers.

Markup & syntax

Defining elements and attributes

  1. Do not define attribute values that will contain user readable content. Use elements for such content. more

N/A - The specification does not define markup or attributes.

  1. If you do define attribute values containing user readable content, provide a means to indicate directional and language information for that text separately from the text contained in the element.

N/A - The specification does not define markup or attributes.

  1. Provide a way for authors to annotate arbitrary inline content using a span-like element or construct. more

N/A - The specification does not define markup or attributes.

Defining identifiers

  1. Identifiers should be case-sensitive.

N/A - The specification does not define identifiers.

Working with plain text

  1. Avoid natural language text in elements that only allow for plain text and in attribute values.

N/A - The specification does not define markup.

  1. Provide a span-like element that can be used for any text content to apply information needed for internationalization. more

N/A - The specification does not define markup.

Typographic support

Text decoration

  1. Text decoration such as underline and overline should allow lines to skip ink.

N/A - The specification does not define markup or text rendering.

  1. It should be possible to specify the distance of overlines and underlines from the text. more

N/A - The specification does not define markup or text rendering.

Vertical text

  1. It should be possible to render text vertically for languages such as Japanese, Chinese, Korean, Mongolian, etc.

N/A - The specification does not define markup or text rendering.

  1. Vertical text must support line progression from LTR (eg. Mongolian) and RTL (eg. Japanese)

N/A - The specification does not define markup or text rendering.

  1. By default, text decoration, ruby, and the like in vertical text where lines are stacked from left to right (eg. Mongolian) should appear on the same side as for CJK vertical text. Placement should not rely on the before and after line locations.

N/A - The specification does not define markup or text rendering.

  1. Vertical writing modes that are equivalent to the vertical- values in CSS (only) should use UTR50 to apply default text orientation of characters. (This does not apply to writing modes that are equivalent to sideways- in CSS.)

N/A - The specification does not define markup or text rendering.

  1. By default, glyphs of scripts that are normally horizontal should run along a line in vertical text such that the top of the character is toward the right side of the vertical line, but there should also be a mechanism to allow them to progress down the line in upright orientation. Such a mechanism should use grapheme clusters as a minimum text unit, but where necessary allow syllabic clusters to be treated as a unit when they involve more than one grapheme cluster.

N/A - The specification does not define markup or text rendering.

  1. Upright Arabic text in vertical lines should use isolated letter forms and the order of text should read top to bottom.

N/A - The specification does not define markup or text rendering.

  1. It should be possible for some sequences of characters (particularly digits) to run horizontally within vertical lines (tate chu yoko).

N/A - The specification does not define markup or text rendering.

  1. Writing modes should provide values like sideways-lr and sideways-rl in CSS to allow for vertical rotation of lines of horizontal script text. UTR50 is not applicable for these cases.

N/A - The specification does not define markup or text rendering.

Setting box positioning coordinates when text direction varies

  1. Box positioning coordinates must take into account whether the text is horizontal or vertical. more

N/A - The specification does not define markup or text rendering.

Ruby text annotations

  1. 'Ruby' style annotations alongside base text should be supported for Chinese, Japanese, Korean and Mongolian text, in both horizontal and vertical writing modes.

N/A - The specification does not define markup or text rendering.

  1. Ruby implementations should support zhuyin fuhao (bopomofo) ruby for Traditional Chinese.

N/A - The specification does not define markup or text rendering.

  1. Ruby implementations should support a tabular content model (such that ruby contents can be arranged in a sequence approximating to rb rb rt rt).

N/A - The specification does not define markup or text rendering.

  1. Ruby implementations should make it possible to use an explicit rb tag for ruby bases.

N/A - The specification does not define markup or text rendering.

  1. Ruby implementations should allow annotations to appear on either or both sides of the base text.

N/A - The specification does not define markup or text rendering.

Miscellaneous

  1. Line heights must allow for characters that are taller than English.

N/A - The specification does not define markup or text rendering.

  1. Box sizes must allow for text expansion in translation.

N/A - The specification does not define markup or text rendering.

  1. Line wrapping should take into account the special rules needed for non-Latin scripts. more

N/A - The specification does not define markup or text rendering.

  1. Avoid specifying presentational tags, such as b for bold, and i for italic. more

N/A - The specification does not define markup or text rendering.

Local dates, times and formats

Working with time

  1. When defining calendar and date systems, be sure to allow for dates prior to the common era, or at least define handling of dates outside the most common range.

N/A - The specification does not define support for dates/times.

  1. When defining time or date data types, ensure that the time zone or relationship to UTC is always defined.

N/A - The specification does not define support for dates/times.

  1. Provide a health warning for conversion of time or date data types that are "floating" to/from incremental types, referring as necessary to the Time Zones WG Note. more

N/A - The specification does not define support for dates/times.

  1. Allow for leap seconds in date and time data types. more

N/A - The specification does not define support for dates/times.

  1. Use consistent terminology when discussing date and time values. Use 'floating' time for time zone independent values.

N/A - The specification does not define support for dates/times.

  1. Keep separate the definition of time zone from time zone offset.

N/A - The specification does not define support for dates/times.

  1. Use IANA time zone IDs to identify time zones. Do not use offsets or LTO as a proxy for time zone.

N/A - The specification does not define support for dates/times.

  1. Use a separate field to identify time zone.

N/A - The specification does not define support for dates/times.

  1. When defining rules for a "week", allow for culturally specific rules to be applied. more

N/A - The specification does not define support for dates/times.

  1. When defining rules for week number of year, allow for culturally specific rules to be applied.

N/A - The specification does not define support for dates/times.

  1. When non-Gregorian calendars are permitted, note that the "month" field can go to 13 (undecimber).

N/A - The specification does not define support for dates/times.

Working with personal names

  1. Check whether you really need to store or access given name and family name separately. more

N/A - The specification does not define support for names.

  1. Avoid placing limits on the length of names, or if you do, make allowance for long strings. more

N/A - The specification does not define support for names.

  1. Try to avoid using the labels 'first name' and 'last name' in non-localized contexts. more

N/A - The specification does not define support for names.

  1. Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where users can provide part(s) of their name that you need to use for a specific purpose. more

N/A - The specification does not define support for names.

  1. Allow for users to be asked separately how they would like you be addressed when someone contacts them. more

N/A - The specification does not define support for names.

  1. If parts of a person's name are captured separately, ensure that the separate items can capture all relevant information. more

N/A - The specification does not define support for names.

  1. Be careful about assumptions built into algorithms that pull out the parts of a name automatically. more

N/A - The specification does not define support for names.

  1. Don't assume that a single letter name is an initial. more

N/A - The specification does not define support for names.

  1. Don't require that people supply a family name. more

N/A - The specification does not define support for names.

  1. Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. more

N/A - The specification does not define support for names.

  1. Don't require names to be entered all in upper case. more

N/A - The specification does not define support for names.

  1. Allow the user to enter a name with spaces. more

N/A - The specification does not define support for names.

  1. Don't assume that members of the same family will share the same family name. more

N/A - The specification does not define support for names.

  1. It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'. more

N/A - The specification does not define support for names.

  1. You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, as separate items. more

N/A - The specification does not define support for names.

Designing forms

  1. When defining email field validation, allow for EAI (smtputf8) names.

N/A - The specification does not define support for forms.

Working with numbers

  1. When parsing user input of numeric values, allow for digit shaping (non-ASCII digits).

N/A - The specification does not define support for numbers.

  1. When formatting numeric values for display, allow for culturally sensitive display, including the use of non-ASCII digits (digit shaping).

N/A - The specification does not define support for numbers.

Navigation

Providing for content negotiation based on language

  1. In a multilingual environment it must be possible for the user to receive text in the language they prefer. This may depend on implicit user preferences based on the user's system or browser setup, or on user settings explicitly negotiated with the user.

The sharing web site has access to navigator.language as a way of customizing the data it makes available.

@marcoscaceres
Copy link
Member

closing as complete - only relevant issues is #6

@marcoscaceres
Copy link
Member

And I a huge thanks to @inexorabletash for diligently going through the i18n self review 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants