-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify case sensitivity of Short Form licenses - for list and tools. #63
Comments
On Tue, Jan 02, 2018 at 08:11:00PM +0000, Kate Stewart wrote:
as-is means 'it IS case-sensitive'
Is this reflected by current spec wording? I haven't been able to
turn it up.
I'm in favor of requiring license/exception IDs to be case sensitive,
but explicitly allowing tools to accept case-insensitive matches if
they want. In order to avoid confusion there, we'd also want to warn
LicenseRef… authors to avoid case-insensitive overlap and not mint
LicenseRefAlice and LicenseRefalice, etc. With this approach, a
license expression like ‘mit’ would be non-compliant but might work
anyway (and if it worked, the semantics would be well-defined).
|
The previous case-insensitive matching was removed in e5f46fa (test required spdx-ids against data from spdx, 2016-05-25, github#418). That commit was designed [1] to allow case-sensitive matching as discussed in [2]. But while I'm in favor of case-sensitive keys in spdx_list, the case-sensitive match breaks script/check-approval which downcases its argument since it was added in 8e56bb8 (add script/check-approval, 2016-01-18, github#318). There are more notes on SPDX's plans for case sensitivity in [3], so we should see a clearer policy there soon. I'm arguing for case-sensitive *display* with optional case-insensitive matching. I am optimistic that the SPDX will at least agree not to register short IDs that only differ by case, which is all we need to make this case-insensitive match safe here. [1]: github#418 (comment) [2]: licensee/licensee#72 [3]: spdx/spdx-spec#63
In practice case does not matter and every id is unique case-wise. Mandating a certain case is an hindrance to adoption IMHO and serves absolutely no good purpose. |
If we explicitly allow tools to accept case-insensitive matches, then tool maintainers who feel that case is not important can ignore it.
I think preserving case is useful for readability, and we should at least SHOULD authors to use the right case even if we don't MUST them. For example, I think If we MUST case preservation for authors, then tools can can chose not to support case-insensitive matching as well. That simplifies the parsing logic. Downcasing the whole license expression before parsing it would be a straightforward way to do case-insensitive parsing, but then it's a bit tedious to get back to the original case if you want to warn about an unrecognized identifier. |
I am not sure that the mostly all uppercase approach we have today helps with readability. @jeffmcaffer ping |
On Thu, Feb 15, 2018 at 08:38:46AM +0000, Philippe Ombredanne wrote:
Instead we should define a canonical form and allow case-insensitive
IDs and operators in license expressions.
I'm ok with this. The difference vs. my proposal [1] is whether we
require tools to support case-insensitive matches (easier for SPDX
authors?) or not (easier for tool maintainers). Since matching the
canonical case doesn't seem particularly difficult, my preference is
to allow, but not require, tools to support noncanonical casing. But
if getting the case right is somehow a hinderance to adoption [2],
then putting the burden on the tool maintainers is acceptable.
[1]: #63 (comment)
[2]: #63 (comment)
|
as a user and relatively new to expressions I can definitely say that allowing people to do whatever case they want is a win. Consider
The latter is more approachable. More approachable == simpler. Simpler == more interest in using.
Seems like that would a problem as you might use non-canonical casing and then my tool may not understand it. So the interchange format is not all that interchangeable. I likely just missed it but what is the argument for dictating casing? |
The case sensitivity concerns were more with the actual short form identifiers. If there is quorum to permit the license expression operators to be case insensitive, that is less impact. Note this would probably mean that AND, and, And, aNd, etc. variants would all be recognized (impact is AND, OR, WITH keywords). |
As one example of a tool implementation of license expression, it would be quite easy to make the entire expression case insensitive, quite easy to make the entire expression case sensitive and just a little bit of work to have different case sensitivity for the operators and license ID's. The most important thing is to specify and document case sensitivity so that all tools behave consistently. You wouldn't want one tool to accept a license expression and a different tool to reject it due to different decisions on how to treat case sensitivity. Allowing tools to treat the case differently would make the interchange less reliable and defeat one of the goals of SPDX IMHO. I would support whatever is easiest for the user. |
“more approachable” probably depends on your past experience. For example,
If the spec says that canonical casing is required (my preference), than yeah, that's a risk you'd take by using non-canonical casing. I don't think that's a compatibility issue, because you'll have the same issue if you break any of the other SPDX rules. |
"more approachable", yes, it is subjective. However, I was speaking as a human reader, not a programmer or technical person. Simply put, many people find readying a long string of all uppercase text hard. If these expressions are to show up in a human context (e.g., SPDX identifier tags in readmes) then the more human-readable, the better. For tool compatibility, the spec should not IMHO bail on taking a position. Saying that "tools SHOULD tolerate different casing" (for example) is not really helpful as users still don't know with confidence what they can do. So anyone who cares about using different tools (which presumably is the point of an interchange format standard) will then read that as they MUST use the canonical casing if they want interchange. |
@jeffmcaffer, we are also looking at
and any other combination. At least we are still using ASCII for now :) I assume all this discussion is only about tag-value representation? In RDF, we will always have case-sensitive URIs:
|
I personally prefer having a single way of specifying these things.
Me too. And I think everyone here agrees that whatever position the
spec takes, that position should be spelled out clearly.
Saying that "tools SHOULD tolerate different casing" (for example)
is not really helpful as users still don't know with confidence what
they can do.
Right. “tools SHOULD …” would be a tool constraint. The SPDX author
constraint would be [1]:
License expressions are case sensitive, and expression authors MUST
use the correct case for all identifiers and operators within the
expression.
That makes author decisions very easy. And it matches the current
ABNF [2] as well. The ABNF would get pretty hairy if we addressed
case-insensitivity there; if we end up allowing authors to alter case,
I think we'll want to make the entire license expression case
insensitive so we can require downcasing before feeding it into the
ABNF.
[1]: #63 (comment)
[2]: https://github.com/spdx/spdx-spec/blame/4aa389d9c643ac082a0b49c11cc59dd7123fabc3/chapters/appendix-IV-SPDX-license-expressions.md#L24
|
This is a good reason to require case-sensitive IDs. With that, a tool can easily construct the URI for a given license ID. Without a case requirement, tools would need to build in a complete (for a given release?) copy of the license list if they wanted to be able to canonicalize the case to get the license URI. To mitigate that problem if we decide to allow case-insensitive IDs, I think we would want to add resources for downcased identifiers that redirect to the canonical URI. |
I always receive complaints about this case sensitivity. From Fedora people, from others. Case does not matter at all since every ID is unique ignoring case and the keyword case does not matter either. We should have a canonical representation of an expression (which can be specific case-wise) but mandating using a certain case for something that does not need it is just a barrier to use and adoption IMHO. |
The current style for case-sensitivity is a major annoyance for identifying tags and expressions. From my point of view, I want expressions to be clearly distinct from license identifiers (many of which are initialisms, so are in all capitals, or at least begin with a capital letter). Thus, my preference is that expression terms (such as |
On Sat, Feb 17, 2018 at 09:33:02AM -0800, Neal Gompa (ニール・ゴンパ) wrote:
The case-sensitivity is a major annoyance for identifying tags and
expressions.
From my point of view, I want expressions to be clearly distinct
from license identifiers (many of which are initialisms, so are in
all capitals, or at least begin with a capital letter). Thus, my
preference is that expression terms (such as `and`, `or`, `with`, or
`without`, etc.) should be lowercase while license tags are either
title case (if they are words) or all caps (if they are
initialisms/acronyms).
I expect folks are much more likely to be reading someone else's
license expression than to be reading one they wrote themselves. If
the average reader finds lowercase operators easier (which is what you
and @jeffmcaffer [1] seem to be suggesting), I think that's an
argument for “require lowercase operators”, not for “let authors
choose whatever casing they like, including ‘aNd’, etc.”. Of course,
you couldn't flip that switch immediately, we'd want to deprecate the
uppercase operators and SHOULD the lowercase operators for some long
transition period. But we could get to MUSTing lowercase operators
eventually. Are lowercase operators so much easier to read that a
transition like that is justified? Personally I'm fine with both
‘AND’ and ‘and’, but would rather not have to flip back and forth.
Human readers aside, allowing both uppercase and lowercase operators
(for a transition period or permanently) is technically easier than
allowing case-insensitive identifiers because the set of operators is
much more stable. You don't have the “what is the canonical case for
this identifier which I don't recognize?” issue [2] for operators.
[1]: #63 (comment)
[2]: #63 (comment)
|
An additional data point for tools developer impact. I created a pull request for the SPDX tools to make the entire license expression parsing case insensitive. See spdx/tools#153 Bottom line from the work- completely ignoring case would be a moderate amount of work to any tool that would like to preserve the proper case for license ID's for human readable purposes or to comply with the RDF spec. Details: The tools already allows all uppercase and all lower case operators (e.g. It was easy to update the operators to completely ignore case. It was a moderate amount of work to ignore case on listed license ID's. I had to maintain a map of lowercase to SPDX license ID's and translate back and forth when displaying or interpreting licenses. Not a big deal, but a couple dozen lines of code which make the code a bit more complex. Similar to the listed licenses, local document license-ref's needed a hashmap from the lowercase to proper (or original) cased ID's. In the case of the SPDX tools, there was already a map of ID's to the extracted license objects, so it was a bit easier to update. |
@goneall @kestewart has this issue been resolved? If no, can I work on it ? |
@salicodes We should wait until we have consensus on the specification before working on the solution. There is probably enough discussion to add this as an agenda topic to an upcoming SPDX technical meeting. Once resolved, we would welcome the help in the spec and also the tools.. |
Discussed on tech call on 6/5/2018: Need to respect the case for the license ID's since they translate to URI's in RDF. There are also other use cases that may break other parsers. Note: license identifies must be unique ignoring case. Spec can be strict on operator case sensitive, but tools implementations are suggested to allow case insensitivity. Operators will be case sensitive in spec. TODO: Create a pull request to update the spec. - just adding a sentence (ABNF already is case sensitive) |
Encouraging case preservation while allowing for case-insentive comparison matches the spec possition discussed in [1]. Note that this is just the list commitment. It *allows* the spec, tools, and other list consumers to decide to be case-insensitive, but does not require them to be either case-sensitive or case-sensitive. [1]: spdx/spdx-spec#63
Encouraging case preservation while allowing for case-insentive comparison matches the spec possition discussed in [1]. Note that this is just the list commitment. It *allows* the spec, tools, and other list consumers to decide to be case-insensitive, but does not require them to be either case-sensitive or case-insensitive. [1]: spdx/spdx-spec#63
This is a large diff, but I aimed for restructuring/polishing without changing the end result (much). I did make a few intentional changes: * Extended license-id to include appendix I.3 (deprecated licenses). We don't want folks using these in license expressions (because they're deprecated), but they are valid (or we would have removed them instead of just deprecating them). That means that in some cases the nature of a string is unclear. For example 'GPL-1.0+' could be the depreacted license-id, or it could be a simple-expression using the more-recently-deprecated GPL-1.0 license-id and the + operator. I don't think that's a problem though, because I can't think of a case where the ambiguity would matter. * I've allowed + for license-ref (it used to be only for license-id). There could be external licenses which offer a choice between only-this-version and or-later grants, and allowing + for license-ref makes it easier to support those licenses as they transition into the SPDX License List. This isn't a big deal, but it avoids needing separate license-refs for the only-this-version and or-later grants if you need both. * I've added explicit whitespace handling, vs. the previous version which just discussed it in the text. That way the ABNF is the sole source of normative syntax information. * I've added a paragraph addressing casing, based on discussion in [1]. * I've added enclosed-license-expression, so consumers like the tag:value format can suggest/require it. This allows for more precision in consumers (e.g. appendix V should be updated to require enclosed-license-expression), but I've left those other sections alone for this commit. Ideally the tag:value line would be moved to a separate section that defined the tag-value format, but we don't have such a section yet [2]. * I've added Gary's documentation for spdx:OrLaterOperator [3]; previously there was no way to represent the + operator in RDF/XML. * I've added Gary's documentation for spdx:WithExceptionOperator [4]. I think it's a bit odd that the XML operator represetation are using URLs instead of the SPDX IDs that the license expression syntax calls for. That means you cannot convert between the two representations without an ID <-> URL map. But we can address that later. * I've removed spdx:LicenseException, because we currently provide no other way for authors to define license exceptions. We do define a way for them to define their own licenses [5], and currently authors have to use that to give a LicenseRef to a license+exception pair if their exception is not in our list. Gary feels like we may return to this later (and I'd be happy giving users a way to define their own exceptions), but we're removing it for now [6]. * I've fleshed out the documentation for the + operator to explain how it works with the AGPL-1.0. Without this explaination, I think there's a risk that folks misinterpret ${ROOT}-${BASE_VERSION}+ as "allows ${ROOT}-${VERSION} for any ${VERSION} >= ${BASE_VERSION}", but that's not true. Instead the proper interpretation is "allows ${ROOT}-${VERSION} and any other licenses allowed by ${ROOT-VERSION} which are based on 'any later version' grant". For example, if the AGPL-2.0 had not been released, you could distribute AGPL-1.0-or-later code under the GPL-3.0-or-later, but *not* under the AGPL-3.0-or-later. The HTML comment avoids the ambiguous four-space indent after the list. Without the comment, it could be parsed as a code block (which is what we want) [7] or a second paragraph of the final list entry [8] (which is not what we want). The HTML comment closes the list to resolve the ambiguity. [1]: spdx#63 [2]: spdx#22 (comment) [3]: spdx#37 (comment) [4]: spdx#37 (comment) [5]: https://github.com/spdx/spdx-spec/blob/cfa1b9d08903befdf03e669da6472707b7b60cb9/chapters/6-other-licensing-information-detected.md#6.1 [6]: spdx#37 (comment) [7]: https://daringfireball.net/projects/markdown/syntax#precode "To produce a code block in Markdown, simply indent every line of the block by at least 4 spaces or 1 tab" [8]: https://daringfireball.net/projects/markdown/syntax#list "Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab"
Encouraging case preservation while allowing for case-insentive comparison matches the spec possition discussed in [1]. Note that this is just the list commitment. It *allows* the spec, tools, and other list consumers to decide to be case-insensitive, but does not require them to be either case-sensitive or case-insensitive. [1]: spdx/spdx-spec#63
We (Haskell's |
Reading more carefully matching guidelines say
I'm slightly confused. I guess, always producing identifiers as they are written in the License list, but being lenient in the parser is safer approach for tooling. I.e.
|
@phadej I think you were correct on your first assumption that the matching guidelines should not be used when parsing license expressions or short identifiers. |
I'll submit a PR to add the sentence clarifying that it needs to be case-sensitive, per @goneall's comment at #63 (comment) |
This commit attempts to reflect the outcome of the discussion at spdx#63 regarding whether license expression operators and identifiers should be matched in a case-sensitive manner. Specifically it attempts to reflect the comment at spdx#63 (comment) regarding the outcome of the tech team discussion on 2018-06-05. Signed-off-by: Steve Winslow <[email protected]>
This has been moved from bugzilla: https://bugs.linuxfoundation.org/show_bug.cgi?id=1327
Kate Stewart 2015-11-19 19:00:15 UTC
Jilayne wrote:
in http://lists.spdx.org/pipermail/spdx-tech/2015-November/002905.html
in http://wiki.spdx.org/view/Technical_Team/Minutes/2014-09-16#
Case_sensitivity_for_license_information - the tech team discussed this on 16 Sept 2014, note saying “License ID’s case sensitive”
and then the legal team discussed it - http://wiki.spdx.org/view/Legal_Team/Minutes/2014-09-18 - and concluded:
• Mark raised issue of whether SPDX License List short identifiers and (new) license expression operators should be case sensitive with the Tech Team and discussed further here: decided that for purposes of spec, in terms of a legitimate value, both could be case insensitive (but best practice would be to display with precise capitalization). Mark to go back to tech team with this decision.
So… looks like maybe we didn’t really capture this elsewhere? In any case, I don’t see a reason to have them be case sensitive in terms of matching (for tools), but have them display with the upper/lower case as they are shown in the SPDX License List - it’s easier for humans to read/spot :)
Kate Stewart 2015-11-19 19:01:50 UTC
I'll add it to the 2.1 version of the spec. Also consider adding this as an appendum/erratta for 2.0.
Kate Stewart 2015-12-22 18:13:49 UTC
Discussed on 12/22 - no concerns, going forward with documenting.
Bill Schineller 2016-05-10 17:53:56 UTC
didn't jump out at me where / if we made edit yet to SPDX 2.2
todo
Kate Stewart 2016-05-17 17:01:29 UTC
Have proposed edit to 6.1, and Appendix I. Lets review.
Kate Stewart 2016-05-17 17:14:40 UTC
In discussion, some concern about other tools and matching in future.
Circling back this discussion to include Mark Gisi.
Bill Schineller 2016-05-17 17:15:33 UTC
fwiw:
from http://lists.w3.org/Archives/Public/www-rdf-interest/2003Aug/0002.html
RDF is case-sensitive. From the last call Concepts working draft:
An upper-case 'A' and a lower-case 'a' are different Unicode characters.
Bill Schineller 2016-05-24 17:13:32 UTC
Kate / Jilayne agreed to leave the Spec language as-is for 2.1
as-is means 'it IS case-sensitive'
leaving ticket open with Version 'unspecified' in case we want to revisit in the future.
We were reluctant to make case-insensitive now for 2.1 without understanding the impacts case might have on URIs (website, other tools, RDF graphs, ...)
The text was updated successfully, but these errors were encountered: