-
-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
input: Change titles to objects #323
Conversation
This makes titles objects, with properties for full, main, sub, sub-sub, alternate and short. Also removes -short title variants.
Added.
If we're going to go this route, we might as well do it right, and an alternate title is not a subtitle. Practically speaking, a) they're formatted differently., and b) what happens if you have a title and subtitles and an alternate title? It's certainly feasible. |
Also, c) what if alternate has one or two subtitles? |
Yeah, I wondered about that :-) I made the decision, perhaps unwise (IDK), that this would be unlikely. WDYT? |
I don't really know. May be unlikely enough, but if we decided that these things are not subtitles or second subtitles, but fall rather in a third category then this category should be parallel to normal titles and similar in structure. title-ver1:
main: First main title
sub: First subtitle
alternate-main: First alternate title
alternate-sub: First alternate subtitle
title-ver2:
main: First main title
sub: First subtitle
alternate:
main: First alternate title
sub: First alternate subtitle
title-ver3:
- main: First main title
sub: First subtitle
- main: First alternate title
sub: First alternate subtitle Or even: title:
main: First main title
sub: First subtitle
alternate-title:
main: First alternate title
sub: First alternate subtitle None of these are really pretty I have to admit... Other options? |
In my reading of it, once this hits the processor for citation formatting, the "full" isn't really there to support unparsed titles. It's there for rendering in styles that don't normalize the punctuation. With this model, unparsed titles are handled at the preprocessor stage and parsed into their components. In APA, for example, the processor would:
Another option would be to add
I'm just really not sure (a) is the case. Looking at MLA and Chicago, such "alternate" titles are formatted just like subtitles. The only difference is the odd delimiter (b) and (c) can be addressed by allowing
or:
|
Yet another option with specified punctuation; title:
main:
content: "Finis Coronat Opus"
delimiter: ": "
sub:
- content: "A Curious Reciprocity"
delimiter: "; "
- content: "Shelley’s “When the Lamp Is Shattered”" Or, as said before, just include in title:
main: "Finis Coronat Opus: "
sub:
- "A Curious Reciprocity; "
- "Shelley’s “When the Lamp Is Shattered”" |
If we did the punctuation at the end of the titles, then we could specify that |
But, of course, that would again add complexity for citeprocs as they would have to extract the punctuation. |
That said, it seems like a favorable balance between data model complexity and processor complexity. Processors need to be able to substitute delimiters anyway, and this is simpler the needing to compare full to main/sub for capitalization. |
I just posted a note on the main issue asking us to clarify the requirements, which might help us settle this.
Exclamation and question marks would be exceptions though. In this approach, processors would still need to deal with trailing punctuation. |
It’s pretty similar to existing double-punctuation requirements (e.g., to not add a period at the end of a title). If we specified default delimiters and made normalize-title-delimiters a Boolean attribute, that makes a common set of requirements of that line up with the existing double punctuation step: Eg,
|
So am I correct you guys are thinking we want to remove |
A list/object, not an array, but yes. |
We're talking json schema, which doesn't have lists.
Array; right?
|
I mean Not |
That's not valid JSON ;-) The second is. An object is an unordered list of key values. An array is an ordered list. The YAML examples that Denis posted are arrays. |
On this, my thinking now is we should have the My suspicion is it won't be needed though. |
Given where we are ATM, what are we doing with the Keeping it? Modifying the description? |
To answer that, I think we need to answer the last question here: #324 How do we declare that a title variable is fully parsed vs needs preprocessing still? |
In a style, would alternate then be rendered with title form="long" and title form="sub"? In the unlikely event that both a sub and alternate title are present, what then? Finally, I am concerned about the label "alternate" and that it is likely to be confused by users (e.g., with translated-title). How about "second" instead? |
I was thinking we'd add an "alternate" title form: <text variable="title" form="alternate"/>
<text variable="title" form="sub"/>
<text variable="title" form="alternate" prefix=", or "/>
Here's how I ended up there. Both MLA and Chicago talk in terms of "double titles," but the MLA website description begins "For an alternative or double title ..." And the convention to preface the second one with the "or" in English clearly signals that it's an "alternate." It's "second" only in the context of output formatting. What does APA say on this? Do you have any view on this question @denismaier? |
I am not sure this is ideal. That would mean you will have to have three forms in a style. <text variable="title" form="main"/>
<text variable="title" form="sub"/>
<text variable="title" form="alternate" prefix=", or "/> And |
That's why I suggested we could just treat titles as array: title:
- main: First main title
sub: First subtitle
- main: First alternate title
sub: First alternate subtitle Question is though if we really need the full semantic representation. |
I really don’t think we should add a form="alternate". That is way too big a burden on style authoring. The most typical way for these to be cited is “Dr. Strangelove, or How I learned to stop worrying and love the bomb.” form=“long” should render that. I’m going to reiterate my point here that it is only a few styles (Chicago, MLA) making any point of this at all. There, this second title is functionally equivalent to a subtitle with a special delimter—“or” with some punctuation. These titles are capitalized and otherwise formatted like a subtitle. APA doesn’t really treat this case specially, and the capitalization of “or” is inconsistent. Most examples have “or” after a colon, so it gets capitalized. In modern applications of this convention, such as journal articles using the “or how I learned to stop worrying and love the ...”, it’s just treated as a subtitle. https://scholar.google.fr/scholar?hl=en&as_sdt=0%2C10&q=or+how+I+learned+to+stop+worrying+and+&btnG= I suggest we just not cover this at all. Drop “alternate” and let the “double” title get parsed into main or sub as follows the general punctuation rules. In the vast majority of cases, this will work out fine. If a user needs to, they can manually indicate a subtitle split to control capitalization/etc. Very few styles seem to normalize away from a colon, so I don’t think we need to worry about the case of |
I'm with @bwiernik here regarding "alternate". I think adding "alternate" would not have any practical benefits, on the contrary. Basically, I'd say we're just dealing with a list of "title-parts", of which we will want to render all or only a subset. title:
parts:
- content: Main
delimiter: ": "
- content: Subtitle
delimiter: "; or: "
- content: "Some Alt Title"
short: Main If we want to align the input schema with the styles schema then let's use this: title:
main:
content: "Finis Coronat Opus"
delimiter: ": "
sub:
- content: "A Curious Reciprocity"
delimiter: "; "
- content: "Shelley’s “When the Lamp Is Shattered”"
short: "Opus" (Not sure regarding the punctuation yet..; we could just include it in the text fields to reduce the complexity of the schema, but ending the field with a space looks a bit odd.) |
Just a couple of questions/remarks to make sure we're on the same page:
I don't think that's an issue.
Just to make that clear: My suggestion was not to allow delimiters as trailing punctuation, but for this to work they must be accessible somehow.
What do you mean with that? What kind of hacks? I don't think this was suggested somewhere...
So, say we don't add support for alternate this time: How will that look like in practical terms?
Or just this:
? |
Maybe English language styles don't do that, but as soon as you switch to a German style that's a different story. What speaks against just treating those "or"-constructs as a special character just like "?" and "!" ? |
Any parsing by a preprocessor would happen based on the punctuation. For German styles, would it really be writing for the colon to be replaced by a period in the Dr. Strangelove? If so, in that unusual situation, the user can ensure that the whole title is entered into main.
To be clear, Bruce, I was never suggesting these be treated any differently than any other subtitle. I was suggesting we treat them exactly like any other subtitle. |
If splitting happens on a ":", and ", or:" is not treated as a single token then this may lead to strange results because the colon may then possibly be normalized to a period. If we added ", or : " as a whole to the list of splitting characters that should not be normalized (just like question marks and exclamation marks),
I thought so too. |
That’s my question. Is that strange? Dr. Strangelove, or. How I Learned to Stop Worrying. Is that different from They seem similar to me, both have sentence fragments following a period |
And the simplest way for a user to avoid a colon becoming a period would be to enter the title in the Chicago-preferred format with commas instead. That wouldn't get normalized in any case. |
Having a period after "or" looks definitively strange to me, and much more than in the other case. |
The practical import of what I was saying about not explicitly supporting alternate is that we would not include any "or" examples in documentation, but people can figure out on their own it worked for that purpose.
Treating it as subtitle means, by definition, "or" is treated as a string, or at least as a string delimiter. You couldn't localize that without special, localized and brittle, parsing rules. Treating it a a delimiter that links subtitle and alternate means you can localize. Or do you think it's not a practical issue; that people won't care? I'll try to catch up on the punctuation issue. |
Exactly. I don't think you would ever want to localize that delimiter: |
On punctuation, I just want to point out that you guys are getting tripped up on a feature you said we're not formally supporting. It's straightforward to say in the spec something like:
But it seems another thing to look beyond punctuation to actual words, which may be in different languages. Do we have a consensus of where we've ended up on this, that one of you could summarize? Or do you want to think about this a bit more? I'd like to figure this out this week, but don't see a huge hurry. Better to get it right. And, of course, we will have a comment period to hopefully get feedback. |
I think these cases are rare enough that we can just not worry about it. If a user is really concerned, they can structure their data to ensure no punctuation changes or incorrect capitalization: Given the rarity of these cases where this would be necessary, I think avoiding the complexity is a bigger benefit than trying to accommodate it. If we accommodated it across styles versus as a configurable feature, I can also imagine user confusion. |
LGTM. The only problem with this is that we have no control over how calling applications will pass their data to a citeproc. Will So to summarize this using the examples from the Chicago Manual: Titles like those will just be parsed like so, right? a) ", or "
b) "; or, "
c) ", or: "
a) is entirely unproblematic. Is that all correct? |
From the standpoint of CSL,
Yes, but in the end, it would be a user choice. The only wrinkle with this choice is if you need to print a short variant, you probably want to drop the alternate. But again: if we're not covering this case, I don't think we should use these examples to see how well it works. It seems you're trying to slip the feature back in, convinced it will work, but I think you're mistaken. Why I introduced title:
main: "A title, "
alternate: "How I learned to stop worrying and love ..." This would (assuming rules for dropping of trailing punctuation) require no user tweaking you noted below, and does not have the problem I note with the above. Why is this a bad idea again? ...
Not entirely, per my point above.
Mostly. |
These cases are rare enough that I don’t think we need to trouble ourselves over them.
If the user is concerned here would enter the “or How I learned” part in sub (using whatever mechanism the application provides to do that—fields, syntax parsing, etc.; that’s not our concern). The comma wouldn’t be normalized.
I don’t think this is a problem. I’ve definitely seen “Title. Or other title” in German journals. The “or” part can be case-protected with Markup if needed.
The user would place the “or” into the The user intervention is to place “or” in the main
I have a few issues.
Given the inconsistent treatment of these across styles and their rarity, I don’t think the complexity is worth it. |
Exactly. That was what I was trying to say. This is beyond CSL. But we need to assume that it will be possible somehow.
Well, I'm just trying to figure out how these titles will be processed. My understanding is those titles will arrive either only in "main" (as in this case "A title, or How I learned to stop worrying and love ...") or the "alternate" part appear in "sub". (I assume some automatic pre-processing will happen somewhere along the way.)
I tend to agree with the points mentioned by @bwiernik. |
Automatic preprocessing will happen somewhere in the change If a user finds that a problem, they can resort to manually splitting the fields, either in separate application fields, with a splitting syntax, etc. That’s all beyond the scope of this PR. The upshot is that a user can accommodate the rare case if needed. |
I just pushed a commit that removes Final issue to discuss: How do we see this interacting with things like |
That would actually be a good fit for |
This is RTM then. I'll give @cormacrelf a bit of time if he wants to weigh in. |
Do we still need |
That was actually only partly a joke... I imagine we could do something like this: title:
main: "Война и миръ"
alternate:
- main: "War and peace"
type: "translated"
- main: "Vojna i mir"
type: "transliteration" |
@@ -20,20 +20,22 @@ | |||
{ | |||
"full": "Finis Coronat Opus: A Curious Reciprocity; Shelley’s “When the Lamp Is Shattered”", | |||
"main": "Finis Coronat Opus", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where have settled on regarding punctuation? Shouldn't we include the colon in "main"?
"main": "Finis Coronat Opus: "
(Of course only if the colon appears in the original...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping with the "let's keep simple things simple" approach, I think there should be a convention, for styles that require "original punctuation", to add a default delimiter if one doesn't exist, so that adding colons and such (by FAR the most common convention) shouldn't be necessary.
I think this should still work elegantly if someone wants to maintain their bibliographic database in CSL YAML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. That might mean we will want to add an option normalize-delimiters="false"
. Then you could still add title-delimiter=": "
to a style, which will then add a colon if no punctuation is present, but it will not override an existing period. (Exclamation marks and question marks will be protected anyway.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should title delimiters also be defined in locales?
Like, in English, would be a colon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better not do this. If I use Chicago, I'll always want to have colons as title-subtitle-delimiter, even when writing in German. But a institutional style from a German university will perhaps use periods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can allow it in a locale, but in one locale only so that there really is one standard delimiter (bit one per locale). Styles that require a different delimiter can still override this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we can just define colon as the default delimeter, and semi-colon as the default sub delimiter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems his concern was about user experience; unexpected behavior.
I'd agree that should be an important consideration, but that this (default delimiters) should be a part of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@georgd What do you think about this? I remember you've had doubts regarding localizing punctuation...
My impression is that the delimiter is a locale-independent property of the style – at least I’ve never come across anything else. A style that requires colon in English will be used with colon in a German document as well, as @denismaier said. Localisation for special typographical conventions (like the space before colon in French) is not restricted to this case, likewise script specific localisation (e.g. ano teleia instead of colon in Greek script). That’s why I thought that a simple global property would be enough.
But what do you expect to do with non-latin titles? I suppose they’re currently not normalised?
Seems better and more consistent to just do this? title:
main: "whatever"
translated:
main: "War and peace"
transliterated:
main: "Vojna i mir" Might need a language property though? Regardless of the details, should I include this too, or just keep them separate? I don't care; whatever seems best/most consistent. Or we can just not worry about it now. Edit: let's just keep as is. We can revisit later, if necessary. |
Lgtm. |
The thread I posted over here is based on an idea that maybe we don't have to choose between the two options we considered for this feature. |
Description
As an alternative to title parsing to support independent formatting of main and subtitles, this converts title strings to objects, with the following properties:
sub-subalternateI added embedded examples to the schema, drawn directly from style guides (MLA and Chicago), and here's example docs generated from the schema.
addresses #310.
Alternatives
Alternatives I considered but rejected (though I don't feel strongly against them) using arrays for parts.
But, after subsequent discussion below,
sub
is now an array.See Also
https://discourse.citationstyles.org/t/design-principles-for-csl-json/1671
Type of change