-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base direction for annotations #224
Comments
Can't we rely on the (HTML) content of the "text" to provide anything that is needed? (Discussed on the F2F meeting 17.05.2016) |
Can we add the information to additional html tags in the
|
@azaroth42 That's interesting to explore, however I think that might put additional stress (requirement?) on normalisation of the data. If HTML is used, they'll go with There is a tradeoff somewhere :) I have some preference to: "don't touch the source" but "enrich" via adding language and direction to the annotation. I realise that comes across a bit clumsy. |
I think it is worthwhile to also keep in mind future hashing of the original content and matching that with what's in the annotation. They won't match if the annotation is adding information into the text that's not at the source. In that case, it would require integrity checks to also normalize before comparing the hashes. |
Discussed F2F 18.05.2016: accept by adding the relevant term for directionality. RESOLUTION: Add a |
Also CSS for the same values: https://developer.mozilla.org/en/docs/Web/CSS/direction CSVW defines the ltr/rtl instances: https://www.w3.org/ns/csvw#instance-definitions So I propose additions to context and vocab:
and to decrement 🍻 owed to @gkellogg by one :) Also, please note there is a discrepancy between: https://www.w3.org/TR/tabular-metadata/ which says the range of textDirection is a string, and the actual vocabulary, which defines a range of csvw:Direction (of which rtl etc are instances). 🍻 owed-- again |
Ivan
Ivan Herman, W3C |
I'm easy either way. If we didn't have to mint a new predicate I would be more strongly in favor of reuse, but as we need our own copy of |
[removed incorrect comment]
i don't remember seeing this before. I have some issues with the definitions (apart from the typos 'Determins' and 'Indiects'). Do you need me to elaborate them here? |
Then I would definitely propose to do that. |
|
Oops, I just see that you have done that, sorry for the noise! |
@azaroth42 said:
Note that the CSVW metadata document is JSON, compatible with JSON-LD, so in that document, the values for direction must be specified as strings. However, this is not inconsistent with the RDF interpretation being an object; thus the instance definitions and range limitation. |
@r12a shouldn't be the text direction easily identified from the script codes? |
It seems to me that this ticket started from a wrong example:
In the provided example, the language is not "en" according to my understanding, but hebrew, with some enghlish text (except if the provided hebrew text means W3C ... for which we would have 50% en and 50% hebrew). I think it is dangerous to make standardization decisions starting from wrong examples, and proposing incomplete solutions.
|
the use of 'en' in the example was a mistake. I have corrected it (to 'he'). |
It's a question we're asked a lot, and the answer is no. As a hack it sometimes works to infer the base direction from the language information when no real direction information is available, but it's unreliable. Note in particular that BCP47 strongly encourages you not to use script tags unless necessary to distinguish usages, and in fact has a mechanism to indicate that script should not be used with certain language tags. Moreover, language and direction are not the same thing. For example, how would you express 'auto' with language tags? |
if the language is "he" don#t we already know that the text is RTL? Are there any languages that use both RTL and LTR? |
If the language + script code clearly indicates the writing direction, I would suggest adding the script code information to annotation and not the "text direction" which will. in this case be redundant information. |
Not if the text is transcribed in latin script or some other script (the authors of BCP47 were explaining how that works to someone just last week, as it happens). Yes, people should use a script tag in that case, but nothing forces them to.
Yes. For example, Azerbaijani. Language is not the same thing semantically as direction: there are different parameters to its use, and the places where you need to use it are different. We have been going around this tree for years, please just trust us. |
well ... I recognize that I'm not the expert in the field, that's why I ask questions. If I understood it correctly ... we have languages which use both RTL scripts (mainly based on arabic alphabet) and LTR script (probalby all others). So .. this is my basic question. Does the language + script clearly identify the direction of the text? My personal preference would be that language+script should be "the master" as this is already an ISO standard. |
@r12a .. just to conclude the analysis, not the solution .. I would like to ask the following question: Are the language + script codes (e.g. az-Arab | az-Cyrl | az-Latn) sufficient for correct representation of the text, including the font selection and the direction of the text? (here some references for others that want to provide feedback: |
@gsergiu please see my earlier comments. Here are four reasons i mentioned why you can't conflate language and direction, i may be able to come up with more: (1) you can't produce the
that would be a no, then. |
Well .. I just tried to make the analysis of the issue.
PS: |
On the meeting with the I18N WG (2016-05-26) the Anno WG reiterated that it intends to follow the direction as advised by the I18N WG. The decision in #224 (comment) sticks. |
In addition to language information, each annotation may need an optional indicator of overall base direction.
For example, the following annotation will not display correctly unless the application doing the display knows that the base direction needs to be rtl. (As it is, the 'W3C' will appear to the right, as shown here, rather than to the left of the Hebrew.)
The text was updated successfully, but these errors were encountered: