-
-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sub/main forms to json schema #310
Comments
For the prefixes, can we refer to an enumerated list of the title variables defined elsewhere in the schema? |
I think we should defer this.
|
Fine, but till when? But concerning the tests/documentation: Can I expect |
I think we assume available in styles via `@form`; so extracted from a full title.
|
Oh, and when: IDK.
Until we have some experience, from users and developers?
|
Yes, but
Depending on the settings, a citeproc will (incorrectly) produce:
So, we'll need to supply the main form explicitely in addition to the full form:
|
I understand that. But that's a hypothetical example. I'm saying I'd prefer to see what happens in the wild, before requiring all titles to split-able, in the data, upfront. It's just my impulse; if others feel strongly, we can consider those arguments. It does feel somehow wrong to have four different title variants in the actual data. To repeat history, we introduced the short variant, I am 99% certain, to handle main titles. Yes, it can be used for other purposes, but that was the primary idea, with it being flexible. |
I understand your point. Having used biblatex before, I'd rather just have I think for this feature to work reliably, we'd need to have this overriding mechanism. I think most users will not need to supply
Yes, but as, e.g. @adam3smith has already pointed out, and I completely agree with him, In any case, users being able to supply title parts in some way was a basic assumption of what @bwiernik and I have worked out. |
This is another one of those cases, like the debate about label and citekey, where whether that's the case is almost irrelevant. We do have this legacy that was based on this logic, so we have to design with it in mind. By far the most common example is this sort of pattern:
For that, main title = short title. So at minimum, we need to explain the difference in docmentation. |
I'm not so sure most users will have the main part of the title in Grazer, Brian, and Charles Fishman. A Curious Mind: The Secret to a Bigger Life. New York: Simon & Schuster, 2015. Borel, Brooke. The Chicago Guide to Fact-Checking. Chicago: University of Chicago Press, 2016. Keng, Shao-Hsun, Chun-Hung Lin, and Peter F. Orazem. “Expanding College Access in Taiwan, 1978–2014: Effects on Graduate Quality and Income Inequality.” Journal of Human Capital 11, no. 1 (Spring 2017): 1–34. https://doi.org/10.1086/690235 Mead, Rebecca. “The Prophet of Dystopia.” New Yorker, April 17, 2017. Rutz, Cynthia Lillian. “King Lear and Its Folktale Analogues.” PhD diss., University of Chicago, 2013. Of course, this needs to be documented accordingly. |
The other option I was wondering about is whether it'd be feasible to add some sort of split instruction to the full title, as part of the sub-field formatting, akin to preserving case. I'm not thrilled with the idea, but think it worth considering, given the need to override auto-splitting should be rare. |
So in that particular case, there's no need to change anything! |
In the proposal for this feature, we proposed to split multiple subtitles in |
So in the common (perhaps 99% or more) case, titles stay the same, and in the other case, one could just do ... title: Some Weird Title ... || With a Subtitle ...? |
Hopefully, yes.
|
Yes. That is also similar to citeproc-js existing syntax for separating family and given names when names are entered as a key value pair: author: Jones || Davey |
Just that this would be used on the standard @bwiernik Was there a reason we did not consider this option in the first place? |
So then details depend on the spec language. We could have something like:
So a processor would be splitting on whatever default split characters, and/or what is defined in locale and/style, or ||? Does that mean a full title is not rendered directly, but is always reassembled from the split title? |
Current proposal here says (point 3):
So, the answer to your question is yes. Citeprocs will always split titles into main and sub, and then reassemble. We could add a new option for |
Split characters are defined with |
Then a processor will just, for example, internally have some variable of characters to split on, and So the splitting can be auto or manual, but not both? And what did we decide about sub-sub titles? Is this valid to do internally? >>> split_characters = re.compile('[\?,:]')
>>> split_characters.split("One: Two? Three")
['One', ' Two', ' Three'] |
And yes, concerning sub-sub titles: that's mainly it. "One: Two? Three" will be split into:
|
So the split characters are either defined in the style OR In any case, this seems like the direction that's sensible. It would mean just a small change to the spec, and no change to the input schema. |
Yeah, there was no reason to not do this in the first place. Just didn’t occur to me. This is better. |
Yes, processors will need to check if there are explicit split-points defined with |
So to be or precise, it would split on ||, and only do a second pass if no array output.
|
So, summing this up: I will start drafting the documentation and the test based on these assumptions:
|
I put this placeholder in this PR I just pushed, where we can include this. |
I like one pattern per line, so you can have |
Just noting here that another possibility that people might wonder about:
Allowing titles to be an object (sub/main) or array.
So, per discussion in the rich text issue, expect the apps to create the pre-parsed data.
I'm not saying we should do this, but It does occur to me we'll need to provide for a longer review period for 1.1 in general, so people can consider all of these changes.
And you might try to prepare the PR in a way that it would be easier to change, if people would prefer that option.
|
We considered this already, but @dstillman was not really in favour. This makes things really complicated on that side. Also, how will users override the heuristics then? |
That would perhaps be an option with Zotero and pandoc, but I'm less optimistic with other apps. That's why we thought best is to implement thus in the processor. |
Probably the best approach for zotero et al is to do titles like they do names. |
I think both default parsing behavior and providing a common syntax for users to override default parsing behavior are necessary in the processor for CSL to be at its best with typical bibliographic data in the wild. |
In the end, the decision will come down who's responsible for parsing; user, client app, intermediate tools, csl processor.
I also raise this now because I think it overlaps with the rich text discussion (#315).
|
I think those issues really aren't that related. "Parsing" involves many things, and not all of of them have the answer. I'm coming around to thinking that rich text markup might be something the processor needn't necessarily worry about (I'm doing some investigation into what journals provide there). But parsing of titles is much more similar to testing So, for the many places where CSL is used outside of a person writing a manuscript, such as as Cite this For Me or Open Science Framework, the only tool in the chain is the citation processor. Asking every potential little application adopting CSL to roll their own title parser, name parser, etc. seems like a huge barrier to entry. |
It actually is connected, because titles would no long just be strings. The processing thus would necessarily change.
But in the end, we really need feedback from developers, rather than to speculate. Why I started the thread on discourse, even if it's not that active these days.
|
Could we revise the issue description to include a concise list of requirements? I think that would help us make final decisions. For example:
Are those two correct? And then what about the other wrinkle that is making this so difficult? Is it that some styles require printing full titles without modifying the sub-component punctuation? So in those styles, would one also need the 1 requirement above to access components? Is this the only other requirement; so three? |
Yes. Chicago modifies punctuation, APA and Vancouver do not. Both types are common.
I don’t think we identified any style where separate formatting of main/sub AND keeping original punctuation were needed. We had planned for the CSL style syntax to not permit that—separate formatting of main and sub is accomplished using a The data model thus needs to provide:
Processors need to be able to:
|
If the way I stated the third requirement is correct (and you confirmed it is), isn't it more simple than this? Isn't it that the processor needs access to the full title, full stop? If yes, then your second requirement is not needed; what is needed is for the full title property to be filled. So some styles require title decomposition and recomposition, and some don't? Why I'm asking this question. |
I don’t understand what you are saying your “third requirement” is. Another way to put the requirement:
The casing requirements are why I'm not a big fan of including "full" as an element--that would require text comparison of "full" to "main" and "sub" to determine what to capitalize. |
To put these together with examples:
|
Just to clarify this part, I meant this from above:
But your explanation here further clarifies. |
OK, this is a key piece I was missing. So among styles which do not specify decomposition, if we have a full title, some will specify to modify casing, and others will specify to leave it alone. The problem this presents is with a full title, a processor won't have access to the sub-components, so it won't be able to modify the casing. @denismaier, do you think you could modify the main post to reflect this as clearly as possible, for the record? |
Yeah, if citeprocs needed to compare "full" to "main" and "sub" to capitalize properly they just could do the whole splitting operation on their own, which is what using objects here tries to avoid. |
You mean the original PR? |
I mean this thread; the top post.
… |
To support independent formatting of main and subtitles, this converts title strings to objects, with "full" and "main" string properties and a "sub" array (to support multiple subtitles). Also, moves "short" title variants to the new object. addresses #310
This looks promising. This will work for
title
, right?What's the best way to add the prefix patterns? Does that work?
Or should we add one pattern per title?
Readability is better here, but it is redundant, of course. Maybe something like this?
Originally posted by @denismaier in #271 (comment)
It seems that we currently deal with sub/main forms in the rnc schema, but there's nothing on the input side. Shouldn't we add those?
(I was about to start writing the documentation for the split-title feature and also to prepare some tests. But it looks like there are still some open quesitons...)
Edit: Currently, it looks like we'll support this by changing titles to objects.
The text was updated successfully, but these errors were encountered: