Base direction for annotations #224

r12a · 2016-05-17T13:02:45Z

In addition to language information, each annotation may need an optional indicator of overall base direction.

For example, the following annotation will not display correctly unless the application doing the display knows that the base direction needs to be rtl. (As it is, the 'W3C' will appear to the right, as shown here, rather than to the left of the Hebrew.)

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "id": "http://example.org/anno5",
  "type":"Annotation",
  "body": {
    "type" : "TextualBody",
    "text" : "<p>פעילות הבינאום, W3C</p>",
    "format" : "text/html",
    "language" : "he"
    "direction" : "rtl"
  },
  "target": "http://example.org/photo1"
}

The text was updated successfully, but these errors were encountered:

iherman · 2016-05-17T15:33:03Z

Can't we rely on the (HTML) content of the "text" to provide anything that is needed? (Discussed on the F2F meeting 17.05.2016)

azaroth42 · 2016-05-17T15:33:06Z

Can we add the information to additional html tags in the text field? e.g.

<p><span xml:lang="...">...</span> <span xml:lang="en">W3C</span></p>

csarven · 2016-05-17T15:59:47Z

@azaroth42 That's interesting to explore, however I think that might put additional stress (requirement?) on normalisation of the data. If HTML is used, they'll go with lang="en" and XHTML would with xml:lang="en" and perhaps Polygloth would need to do both. This is on top of: perhaps rdf:HTML and rdf:XMLLiteral blocks should not be preserved as such. Same goes for the direction.

There is a tradeoff somewhere :) I have some preference to: "don't touch the source" but "enrich" via adding language and direction to the annotation. I realise that comes across a bit clumsy.

csarven · 2016-05-17T16:03:29Z

I think it is worthwhile to also keep in mind future hashing of the original content and matching that with what's in the annotation. They won't match if the annotation is adding information into the text that's not at the source. In that case, it would require integrity checks to also normalize before comparing the hashes.

iherman · 2016-05-18T07:56:34Z

Discussed F2F 18.05.2016: accept by adding the relevant term for directionality.

RESOLUTION: Add a direction property to the vocabulary, to be associated with any content resource (body or target) with three possible values, auto, rtl and ltr (in JSON-LD) and define URIs to identify the concepts. Refer back to HTML5 document for the definitions.

See: http://www.w3.org/2016/05/18-annotation-irc#T07-56-24

azaroth42 · 2016-05-19T16:22:16Z

Also CSS for the same values: https://developer.mozilla.org/en/docs/Web/CSS/direction

CSVW defines the ltr/rtl instances: https://www.w3.org/ns/csvw#instance-definitions
And an unusable predicate: and https://www.w3.org/ns/csvw#textDirection (due to the overly restrictive domain in the ontology). So we need to duplicate it.

So I propose additions to context and vocab:

"csvw" : "https://www.w3.org/ns/csvw#"
"direction": "wa:textDirection",
"ltr" : "csvw:ltr",
"rtl": "csvw:rtl",
"auto": "csvw:auto"

and to decrement 🍻 owed to @gkellogg by one :)

Also, please note there is a discrepancy between: https://www.w3.org/TR/tabular-metadata/ which says the range of textDirection is a string, and the actual vocabulary, which defines a range of csvw:Direction (of which rtl etc are instances).

🍻 owed-- again

iherman · 2016-05-19T18:41:30Z

On 19 May 2016, at 18:22, Rob Sanderson [email protected] wrote:

Also CSS for the same values: https://developer.mozilla.org/en/docs/Web/CSS/direction https://developer.mozilla.org/en/docs/Web/CSS/direction
CSVW defines the ltr/rtl instances: https://www.w3.org/ns/csvw#instance-definitions https://www.w3.org/ns/csvw#instance-definitions
And an unusable predicate: and https://www.w3.org/ns/csvw#textDirection https://www.w3.org/ns/csvw#textDirection (due to the overly restrictive domain in the ontology). So we need to duplicate it.

So I propose additions to context and vocab:

"csvw" : "https://www.w3.org/ns/csvw#"
"direction": "wa:textDirection",
"ltr" : "csvw:ltr",
"rtl": "csvw:rtl",
"auto": "csvw:auto"
This is a bit of a bike shedding, but…I am not sure whether it is good of bringing in a new namespace for these three. Yes, I know, reuse namespaces if necessary, it does not count for the JSON-LD, etc, etc, but nevertheless, I am not 100% sure it is worth brining in a new one to the Turtle. I have a slight (emphasis: slight) preference to have these three values duplicated in our own namespace.

Ivan

and to decrement 🍻 owed to @gkellogg https://github.com/gkellogg by one :)

Also, please note there is a discrepancy between: https://www.w3.org/TR/tabular-metadata/ https://www.w3.org/TR/tabular-metadata/ which says the range of textDirection is a string, and the actual vocabulary, which defines a range of csvw:Direction (of which rtl etc are instances).

🍻 owed-- again

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub #224 (comment)

Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

azaroth42 · 2016-05-20T07:36:12Z

I'm easy either way. If we didn't have to mint a new predicate I would be more strongly in favor of reuse, but as we need our own copy of textDirection, we can just as easily redefine the values as well.

r12a · 2016-05-20T08:26:59Z

[removed incorrect comment]

CSVW defines the ltr/rtl instances: https://www.w3.org/ns/csvw#instance-definitions

i don't remember seeing this before. I have some issues with the definitions (apart from the typos 'Determins' and 'Indiects'). Do you need me to elaborate them here?

iherman · 2016-05-20T08:46:38Z

On 20 May 2016, at 09:36, Rob Sanderson [email protected] wrote:

I'm easy either way. If we didn't have to mint a new predicate I would be more strongly in favor of reuse, but as we need our own copy of textDirection, we can just as easily redefine the values as well.

Then I would definitely propose to do that.

iherman · 2016-05-20T09:05:47Z

On 20 May 2016, at 10:27, r12a [email protected] wrote:

CSVW defines the ltr/rtl instances: https://www.w3.org/ns/csvw#instance-definitions https://www.w3.org/ns/csvw#instance-definitions
i don't remember seeing this before. I have some issues with the definitions (apart from the typos 'Determins' and 'Indiects'). Do you need me to elaborate them here?

Well… if there are issues with the CSVW stuff, then this should be an errata in the CSVW errata page. But I think it now accepted that we would use our own terms in the case of annotation, so it is irrelevant for this thread...

iherman · 2016-05-20T09:07:42Z

On 20 May 2016, at 11:05, Ivan Herman [email protected] wrote:

On 20 May 2016, at 10:27, r12a <[email protected] mailto:[email protected]> wrote:

CSVW defines the ltr/rtl instances: https://www.w3.org/ns/csvw#instance-definitions https://www.w3.org/ns/csvw#instance-definitions
i don't remember seeing this before. I have some issues with the definitions (apart from the typos 'Determins' and 'Indiects'). Do you need me to elaborate them here?

Well… if there are issues with the CSVW stuff, then this should be an errata in the CSVW errata page. But I think it now accepted that we would use our own terms in the case of annotation, so it is irrelevant for this thread...

Oops, I just see that you have done that, sorry for the noise!

gkellogg · 2016-05-20T12:20:51Z

@azaroth42 said:

Also, please note there is a discrepancy between: https://www.w3.org/TR/tabular-metadata/ which says the range of textDirection is a string, and the actual vocabulary, which defines a range of csvw:Direction (of which rtl etc are instances).

Note that the CSVW metadata document is JSON, compatible with JSON-LD, so in that document, the values for direction must be specified as strings. However, this is not inconsistent with the RDF interpretation being an object; thus the instance definitions and range limitation.

gsergiu · 2016-05-24T08:34:43Z

@r12a shouldn't be the text direction easily identified from the script codes?
http://unicode.org/iso15924/iso15924-codes.html

gsergiu · 2016-05-24T08:46:18Z

It seems to me that this ticket started from a wrong example:

"text" : "<p>פעילות הבינאום, W3C</p>",
    "format" : "text/html",
    "language" : "en"
    "direction" : "rtl"

In the provided example, the language is not "en" according to my understanding, but hebrew, with some enghlish text (except if the provided hebrew text means W3C ... for which we would have 50% en and 50% hebrew).

I think it is dangerous to make standardization decisions starting from wrong examples, and proposing incomplete solutions.

Probalby you want to say that W3c should be understood as "en" text, even if it would match any latin based language+script, but that doesn't change the true nature of the text!
The text is written in 2 languages, and 2 scripts .. and the proposed "base direction "change doesn't solve the problem of not being able to correctly represent the text. see also the analysis I submitted to exactly 0 or 1 language(s) #213

r12a · 2016-05-24T09:56:40Z

the use of 'en' in the example was a mistake. I have corrected it (to 'he').

r12a · 2016-05-24T10:02:37Z

@r12a shouldn't be the text direction easily identified from the script codes?

It's a question we're asked a lot, and the answer is no. As a hack it sometimes works to infer the base direction from the language information when no real direction information is available, but it's unreliable. Note in particular that BCP47 strongly encourages you not to use script tags unless necessary to distinguish usages, and in fact has a mechanism to indicate that script should not be used with certain language tags. Moreover, language and direction are not the same thing. For example, how would you express 'auto' with language tags?

gsergiu · 2016-05-24T10:04:57Z

if the language is "he" don#t we already know that the text is RTL?
http://www.i18nguy.com/temp/rtl.html

Are there any languages that use both RTL and LTR?
Probably not, but even if ... whouldn't this be solved by the script code?

gsergiu · 2016-05-24T10:16:07Z

well .. obviously there are some languages for which you need to know the "script" fro correct representation. Probably the japanese is the best known example. Meaning that ... there will be implementations that will use the script part of the language encoding. I think this is a fact. We cannot and we should not prevent this.
I do recognize that I don't like adding the script in the language either. One could think if it makes more sense to use only language and country codes in the language field and to put the script in its own field.
However my basic question is if the script code is not the "richer" information for correct representation of the texts?

If the language + script code clearly indicates the writing direction, I would suggest adding the script code information to annotation and not the "text direction" which will. in this case be redundant information.

r12a · 2016-05-24T10:16:27Z

if the language is "he" don#t we already know that the text is RTL?

Not if the text is transcribed in latin script or some other script (the authors of BCP47 were explaining how that works to someone just last week, as it happens). Yes, people should use a script tag in that case, but nothing forces them to.

Are there any languages that use both RTL and LTR?

Yes. For example, Azerbaijani.

Language is not the same thing semantically as direction: there are different parameters to its use, and the places where you need to use it are different. We have been going around this tree for years, please just trust us.

gsergiu · 2016-05-24T10:27:44Z

well ... I recognize that I'm not the expert in the field, that's why I ask questions.
I agree with you, that the language is not the one that dictates the direction, but I assume that the scripts are.

If I understood it correctly ... we have languages which use both RTL scripts (mainly based on arabic alphabet) and LTR script (probalby all others).
https://en.wikipedia.org/wiki/Azerbaijani_alphabet

So .. this is my basic question. Does the language + script clearly identify the direction of the text?
If yes, I consider the textDirection to be redundant information. I don't really mean that we shouldn't have such a field, but I want to have it clearly stated in the standard, which is the relationship and what to do if both exists but are inconsistent?

My personal preference would be that language+script should be "the master" as this is already an ISO standard.

gsergiu · 2016-05-25T08:22:54Z

@r12a .. just to conclude the analysis, not the solution .. I would like to ask the following question:

Are the language + script codes (e.g. az-Arab | az-Cyrl | az-Latn) sufficient for correct representation of the text, including the font selection and the direction of the text?
(if yes ... than the main question of this issue, turns into: Do we need redundandancy for easier processing of the annotations? )

(here some references for others that want to provide feedback:
Script codes: http://unicode.org/iso15924/iso15924-codes.html
i18n QA: http://www.i18nguy.com/temp/rtl.html
W3C i18n script subtag recommendations: https://www.w3.org/International/articles/language-tags/#script
IANA subtag registry: http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
W3C i18n recommendations/QA on language tags: https://www.w3.org/International/questions/qa-choosing-language-tags )

r12a · 2016-05-25T09:44:59Z

@gsergiu please see my earlier comments. Here are four reasons i mentioned why you can't conflate language and direction, i may be able to come up with more: (1) you can't produce the auto value with language tags, (2) BCP47 recommends that you do not use script tags for languages like Hebrew (suppressscript: Hebr), (3) you won't be able to rely on people supplying script tags as part of the language information in order to influence direction, (4) these are semantically separate concepts. Another reason that i alluded to but didn't expand is that if you apply direction to inline content, it becomes even clearer that we are dealing with different things because the usage patterns don't overlap.

Are the language + script codes (e.g. az-Arab | az-Cyrl | az-Latn) sufficient for correct representation of the text, including the font selection and the direction of the text?

that would be a no, then.

gsergiu · 2016-05-25T10:12:51Z

Well .. I just tried to make the analysis of the issue.

So .. with the current specification we have a way to correctly represent the text but you claim that this is not the recommended way of doing it (on which I agree, that it was a bad idea from the begining to mix the language with the script concepts).
However, in the case that we don't want to include the script tag in the language. I would say that the script tag should have an own property, in which case the textDirection is redundant, as it can be unanbiguously derived from the script code (and language information eventually). Additionally the script code can help clients to choose the correct fonts for representing the text, while the text direction is not helping in this matter.
Moreover, I think that the default script for each language can be derived from the "Suppress-Script" field in IANA registry: http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

PS:
I'm not trying to impose a solution, I just wanted to analyze the problem and existing solutions/standards (and I'm trying myself to derive my own unbiased opinion).
The community can adopt the most approapriate solution, but I claim that the solution must solve both problems: text direction and font selection. It should also take in account the reuse of standards and best practices.

iherman · 2016-05-26T15:13:20Z

On the meeting with the I18N WG (2016-05-26) the Anno WG reiterated that it intends to follow the direction as advised by the I18N WG. The decision in #224 (comment) sticks.

See http://www.w3.org/2016/05/26-i18n-irc#T15-13-10

r12a added the i18n-review label May 17, 2016

r12a mentioned this issue May 17, 2016

Base direction for annotations w3c/i18n-activity#136

Closed

iherman added the editor_action label May 18, 2016

azaroth42 self-assigned this May 20, 2016

azaroth42 added the pending label May 23, 2016

azaroth42 mentioned this issue May 23, 2016

Rob Editor Actions #242

Merged

gsergiu mentioned this issue May 24, 2016

exactly 0 or 1 language(s) #213

Closed

azaroth42 closed this as completed Jun 3, 2016

azaroth42 removed editor_action pending labels Jun 3, 2016

BigBlueHat mentioned this issue Jul 26, 2016

Why exclude markup from name? w3c/activitystreams#338

Closed

5 tasks

r12a mentioned this issue Mar 16, 2017

Standardized way to add indications of text direction w3c/data-shapes#40

Closed

js-choi mentioned this issue Nov 11, 2017

Add text language / direction attributes w3c/web-share#6

Closed

plehegar added i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. and removed i18n-review labels Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Base direction for annotations #224

Base direction for annotations #224

r12a commented May 17, 2016 •

edited

Loading

iherman commented May 17, 2016

azaroth42 commented May 17, 2016

csarven commented May 17, 2016

csarven commented May 17, 2016 •

edited

Loading

iherman commented May 18, 2016

azaroth42 commented May 19, 2016

iherman commented May 19, 2016

azaroth42 commented May 20, 2016

r12a commented May 20, 2016 •

edited

Loading

iherman commented May 20, 2016

iherman commented May 20, 2016 •

edited by r12a

Loading

iherman commented May 20, 2016 •

edited by r12a

Loading

gkellogg commented May 20, 2016

gsergiu commented May 24, 2016

gsergiu commented May 24, 2016

r12a commented May 24, 2016

r12a commented May 24, 2016

gsergiu commented May 24, 2016 •

edited

Loading

gsergiu commented May 24, 2016

r12a commented May 24, 2016 •

edited

Loading

gsergiu commented May 24, 2016

gsergiu commented May 25, 2016 •

edited

Loading

r12a commented May 25, 2016 •

edited

Loading

gsergiu commented May 25, 2016 •

edited

Loading

iherman commented May 26, 2016

Base direction for annotations #224

Base direction for annotations #224

Comments

r12a commented May 17, 2016 • edited Loading

iherman commented May 17, 2016

azaroth42 commented May 17, 2016

csarven commented May 17, 2016

csarven commented May 17, 2016 • edited Loading

iherman commented May 18, 2016

azaroth42 commented May 19, 2016

iherman commented May 19, 2016

azaroth42 commented May 20, 2016

r12a commented May 20, 2016 • edited Loading

iherman commented May 20, 2016

iherman commented May 20, 2016 • edited by r12a Loading

iherman commented May 20, 2016 • edited by r12a Loading

gkellogg commented May 20, 2016

gsergiu commented May 24, 2016

gsergiu commented May 24, 2016

r12a commented May 24, 2016

r12a commented May 24, 2016

gsergiu commented May 24, 2016 • edited Loading

gsergiu commented May 24, 2016

r12a commented May 24, 2016 • edited Loading

gsergiu commented May 24, 2016

gsergiu commented May 25, 2016 • edited Loading

r12a commented May 25, 2016 • edited Loading

gsergiu commented May 25, 2016 • edited Loading

iherman commented May 26, 2016

r12a commented May 17, 2016 •

edited

Loading

csarven commented May 17, 2016 •

edited

Loading

r12a commented May 20, 2016 •

edited

Loading

iherman commented May 20, 2016 •

edited by r12a

Loading

iherman commented May 20, 2016 •

edited by r12a

Loading

gsergiu commented May 24, 2016 •

edited

Loading

r12a commented May 24, 2016 •

edited

Loading

gsergiu commented May 25, 2016 •

edited

Loading

r12a commented May 25, 2016 •

edited

Loading

gsergiu commented May 25, 2016 •

edited

Loading