-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The textDirection and processingLanguage properties are not needed #335
Comments
I personally agree with this. |
Respectfully, these were added as the result of a LOT of discussion with the internationalization group as to their real utility and requirements. Unless you can get them to agree with your position, there's no new information to reconsider their inclusion. They're not mandatory, so if you don't like them, don't use them :) |
Is this discussion documented anywhere? On 2 Aug 2016 1:21 pm, "Rob Sanderson" [email protected] wrote:
|
Are there test cases for these properties to verify implementations of them and ensure interoperability? Are they "at risk" or are they required of all implementations? |
From what I understand, the i18n group was not recommending specifically these properties. They were pointing out the issue of supporting bidirectional text as a need. My current understanding of unicode and utf-8 is that there are a handful of control characters that can accomplish the text direction issues that were raised. |
@tantek there are not yet test cases but yes, they will be tested. We do not consider these particular properties to be "features" as such, but will use the testing to be able to assess whether the various optional properties are in use in the various implementations under test. @aaronpk yes it is possible to embed text direction in Unicode strings. The basic natural language of an annotation could be useful in that a client could select comments on something only in languages they understand. |
Could you clarify what is the alternative you are proposing? |
// chair hat off Note that Section 3.2.1 does not refer to the body of the Annotation itself. It contains a set of attributes that can be applied to an external resource. That is, to a separate file---not the Annotation itself. This is an important distinction, since in many cases the given attributes do not have an effect on the processing or presentation because the resource itself takes care of providing the necessary information. That said, there exists an important type of resource where this is not true: plain text formats. @kevinmarks Yes, text can have multiple directions. However, the Unicode Bidirectional Algorithm (TR9, which you mention) requires, among other things, a base direction with which to start. By default this is usually left-to-right (LTR) because most scripts are LTR. However, for documents containing primarily right-to-left (RTL) content, it is useful to have a way to set the base direction to RTL in cases in which the resource cannot itself set the base direction, such as plain text. @halindrome That's correct, there are Unicode controls that can be used (and are sometimes required) in plain text to control and manage directionality. However, these controls may not be present in a resource if the base direction is supplied externally. Thus, providing a base direction externally may be necessary in order to ensure proper presentation. @kevinmarks Regarding It is the case that most document formats contain language and direction information (in which case, any external attributes should be ignored). But the current zoo of attributes exists to serve the subset of resources that cannot help themselves. |
I also had several complains on the processingLanguage, and I still have the feeling that this is not documented enough in the standard, and concrete usecases are needed to understand theri meaning.
again my feedback on the 2 scenario types:
|
Some more links for those who care to dig and understand: Given that the default text direction is not always properly set within a
|
@BigBlueHat So there is a differentiation to make between the external resources and textual body, and these should probably be explained in a non-normative note. (this should be always the case when redundant information is included in the annotations. It must be stated clear which is the master source of information in case of conflicts.) 2 . exactly this kind of information is missing in the current draft
|
Unless there's a proposal that hasn't already been discussed in the links referenced above, I propose close wontfix. |
+1 for close/wontfix |
My point is that there are unicode control characters that can already accomplish setting the base text direction. From https://www.w3.org/International/questions/qa-bidi-unicode-controls
The example given is setting the base direction of the text in am HTML Having a separate property outside of the string itself for specifying the base text direction is fragile and will likely lead to loss of that information as it is propagated between systems. My proposal is to drop the textDirection property and add a note recommending including the unicode control characters in the string if it is needed. |
well ... obviously someone needs this peace of information, but in the current version of the draft, they are a bit confusing. I consider that better explanation should be added (at least as non-normative notes) in order to:
|
Okay, but that's not /this/ issue. Please create a new issue, with a proposal for improved descriptions. Thanks! |
Are there any other JSON-based specs that have textDirection and processingLanguage properties , and if so (links?), any implementation experience with them (links?), any publishing / consuming sites/code experience with them? (links?) And if not, (that is, this spec is the first to attempt this solution, which is my guess) then can we at a minimum mark textDirection and processingLanguage as "at risk" since they are more in a state of incubation, rather than having been proven? Ok for each property to have different answers to the above questions, and thus different conclusions per above reasoning. |
@azaroth42 |
@aaronpk I don't think that the implementations should do (complex) unicode text processing to extract information like textDirection, which is obvious/available in explicit for in the annotation editors. Yes ... I do agree that there is a level of redundancy in the processingLanguage and textDirection infromation, and these fields souldn't be considered the "master" source of information. From my side, this is what has to be documented. Also ... it should be also made clear who needs these fields and when?.. (basically the usecases that are missing in the specifications) However, I agree with one point that it is a very bad idea to mixup the real payload with the presentation metadata (i.e. in html everyone uses css nowadays). But I bet this improvement could only happen in the second version of the standard. |
@gsergiu this current issue is "The textDirection and processingLanguage properties are not needed." The editors on this issue--based on past discussions with @r12a and @aphillips as well as their current points feel that this particular issue should be marked as There's also a wide ranging assumption in this current discussion that everything on the web is being stored and disturbed in Unicode, UTF-8, or at least something that supports the same or similar direction specifying characters. Given that annotation bodies and targets can both be remote resources, it seems prudent that we give those who want it the ability to state these things--as it was deemed during our face-to-face when discussing I18N issues (with several RTL language authors and speakers present) that these would be useful to people working with these languages. Please reference this issue from any new issues created that have direct action to be taken that solves for these scenarios in what you feel is a better way. Thanks. |
Yes , I expressed a similar opinion, that some people need it, and they should remain in the standard as long a no better solution is proposed. However, the people the didn't participated in the past discussions are quite right to say that the fields are not needed, as there is no real explanation in the standard about who needs, this fields, given the existance of the language tags and implications of redundant information. Given this ... I claim that even this ticket is more about improper documentation of the 2 fields, that about their exclusion from the standard. And I hope that we have a agreement on this point.... otherwise we spend again time for long discussion that end up with "won't fix". I find it more appropriate from the process point of view to accept that the ticket is partially valid, create a new ticket for that part, and than close the current ticket. (I cannot enforce this work process, but it is a little bit frustrasting to invest time in discussions, that end up with the conclusion .. you are right but we won't fix) I would be great if the community members would have more time to contribute with very concrete/valid solutions, but this is very hard .. when we are not aware about past discussions. So ... I hope I can create new tickets tomorrow, that it will be fine for me to close this issue .. |
If some people need it, then as @BigBlueHat said, this issue can be closed -- the claim is that they are /not/ needed. If we were to remove them, that would not be just an editorial change, it would be a normative one (hence @tantek's question about whether they're marked at-risk, given that we're in CR). If the solution is to provide a better explanation, that is just an editorial issue that we can take care of before the PR phase. My request for a proposal as to a solution is because I want folks who claim the documentation is insufficient to not just complain, but actually come up with something better 😄 "I don't like the way you wrote that" is just not helpful at this stage. |
+1 |
@azaroth42 agreed that "I don't like the way you wrote that" is just not helpful at this stage. I believe the larger problem is one of not just "not needed", but rather, as this thread has uncovered: unproven, untested, and likely insufficient. A broken feature is typically worse than none. Since apparently no other JSON-based spec uses such an approach (sideband properties per text property), textDirection and processingLanguage are a first time "hypothetical" and definitely aspirational proposal themselves. I'm worried that they will give the appearance of satisfying i18n requirements, when in practice they won't (we don't know, and the burden of proof is on prototyping/implementability/usability, not on the absence thereof), and that will put us a worse position (broken features, backcompat headaches) than if they were absent. Aside: In general W3C work (web platform in particular) is frowning on anything aspirational being REC-track at this point. Not completely consistently across W3C yet, but more and more, and this (Annotations) may be an instance worth paying attention to in that regard. A concrete proposal would be drop these two aspirational properties, and instead provide a note explaining the limitations (as uncovered by i18n folks) in this version of the spec. Additional optional details:
|
@tantek: I agree with your points as well, and the somewhat last-minute addition of the properties is unfortunate. As with AS2, we left i18n review until we were happy with the rest of the work rather than engaging early. Hindsight being what it is, we certainly would have done that differently, and hopefully future WGs can learn from it rather than repeat. That said, the review did reveal needs that aren't solved by unicode. The properties are not only for embedded strings (which in JSON we can expect to be unicode) but for arbitrary resources with URIs. I have no idea how PDFs store text strings (for example) and how well implemented the control characters are in those strings, but I can point you to many instances of older or just badly implemented XML documents in a huge variety of encodings. As these resources can take the role of the body of the Annotation, the unicode proposal isn't sufficient to address the requirements. |
@azaroth42
I see it exactly the opposite.
|
Discussed on the telco of 2016-08-05. The resolution was that there is no new information that wasn't already discussed. The proposal does not address the established need to cover non unicode content, however much we might like to simply require unicode everywhere, retroactively. However, we fully acknowledge that the i18n group are the experts in this matter. If @r12a @fsasaki @aphillips would please weigh in to clarify, we're happy to go with whatever those recommendations are. We've tagged this as If the text is unclear, we continue to seek explicit proposals for how to improve it and would very much welcome i18n review of that text to ensure that we are correctly representing the requirements and usage. [which does not have a separate issue] For the resolution of "It is not the responsibility to correct wrong html/pdf/xml", we have opened Issue #339 to clarify that annotation born descriptive features are hints and not to be considered authoritative information. This covers much much more than just this one feature, and will be prominent at the top of the document. Reference: http://www.w3.org/2016/08/05-annotation-irc#T15-30-52 (and above) |
@azaroth42 appreciate the thoughtful consideration. I can understand the desire to at least try something (even if novel/untested) rather than nothing, and yes, defer that preference to WG consensus. My only requests (to "accept" this resolution to keep these features) is to both 1&2 (optionally also 3):
|
While I would be fine with something like #1 referring to future versions of the spec, from an administrative point of view I am afraid #2 and #3 are not really possible. We are already in CR; setting/changing exit criteria or turning a feature to be 'at risk' is not possible at this point… Introducing this would trigger a new CR round and, beyond the extra time required it would be possible only with the Director's approval. |
@iherman I don't understand what you mean by "#1 referring to future versions of the spec". The problem is in this version of the spec, and thus the note makes sense inline to refer to this version. Re: would be possible only with the Director's approval. Regardless, worse than "extra time required" or "possible only with the Director's approval", if there are features in a spec which are known to either not have test cases, or not have test cases that test the functionality for which the features were added (in this case, the i18n requirements), or not have implementations that pass those test cases in a way that demonstrates interoperable user functionality from the i18n requirements, then those features MUST NOT advance to PR, whether or not explicitly noted in CR exit requirements. My suggestion above was more to be explicit about it in the spec rather than having it be implied. If untested or unimplemented or uninteroperable features (these properties) were explicitly at-risk, the group may drop them to help transition to PR. Otherwise untested/unimplemented/uninteroperable features (especially a novel approach as documented) must block a CR from transitioning to PR. |
@tantek you are not wrong. But let's not put too much importance on this "feature". It isn't a "feature" in the classic sense of the word. These are optional properties in a data model that might be present in content. There are no requirements that they be present, nor that they be interpreted if they are present. It's just advice. There are LOTS of such properties in this data model. The presence of absence of them in annotations generated by clients is something we will evaluate in the test cases. If they are present, we will test the values to ensure they conform to the requirements of the spec. But that doesn't really mean anything. At least, that's my interpretation. |
@tantek, just to put the admin issue at rest:
See https://www.w3.org/2015/Process-20150901/#revised-cr As the text does say that the Director's approval is probably quicker than for the first round, but we cannot just re-issue a document without further ado. We need to get approval, republish, and all that jazz. (I have also checked the 2016 version of the process and, as far as I can see, there is no difference.) Anyway. We should not get bogged down in admin issues, but we should also avoid overcomplicating our lives. |
I assume
is before the initial CR rather than as a silent change to an existing document in CR. The wording probably should have been:
(metaspecifications are always fun! :) ) With that reading, we can't just mark them at risk whenever we want (which would have been useful, in this case). We have the testing process in hand, given that (a) the purpose is to verify that it has been implemented, not to validate the implementations and (b) they're optional parts of a feature, not individual features themselves. I think all of the concerns have been addressed, and it's okay to close the issue? |
My reading is that the group may identify (possibly new) "at risk" features before re-issuing a CR that this section is dealing with. Which is identical to what the group is allowed to do before issuing the original CR. Removing 'at risk' features happen when going to PR, and is the natural possible action with or without a new CR publication.
|
Thank you for the good hint ... We have a feature that is not a feature according to the text above. What speaks against moving these two "non-features" to the anex with the extensions, which is not normative? I do support @tantek 's point of view:
|
@halindrome this is not completely true:
Yes, these fields are optional, but their meaning has to be interpreted and used by "text processors". By taking in account that ... the indexing of annotations is already a text processing process, I expect that the most of the annotations systems will involve text processing, and consequently they should use the "processingLanguage" in order to be fully compliant with the standard! |
As I documented also in the related issue#337 (comment), there is a big discrepancy between what is discussed in the related tickets and presented usecases, and what we can find in the current version of the draft. |
Addison and i are investigating direction questions with the Activity Streams folks at the moment, and on Thursday we are due to meet with them to discuss language. We'd like to work through these topics carefully with them before determining whether there are implications for Web Annotations. So please continue with your ongoing CR work for now and look out for more information from us shortly. |
Web App Manifest contains both |
Activity Streams went the other way on dir, documenting how to include bidi signalling in the text itself, which is more robust: http://w3c.github.io/activitystreams/core/#biditext |
@kevinmarks there are still issues, however, with signalling in the text itself. It's not an easy problem to solve. See my notes at http://w3c.github.io/i18n-discuss/notes/json-bidi.html |
@kevinmarks @r12a
So: However, the big problem that originated this set of issues is the not-internationalized external resources. And none of the discussed solutions address this problem! Unfortunately the definition of the fields, was changed too much ... and doesn't reflect the initial purpose of these fields, ... (and tryies to claim more that these fields can actually do) |
Closing, as this has been split into other more specific issues. |
These are both simplistic assertions about an external resource that provide no useful information to a user or a user-agent.
An external resource can have multiple text directions, and languages; attempting to boil these down to one is not practical in general. See http://unicode.org/reports/tr9/ for the nuances of text direction
The text was updated successfully, but these errors were encountered: