Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sub/main form values to cs:text #271

Merged
merged 4 commits into from
Jun 24, 2020
Merged

Add sub/main form values to cs:text #271

merged 4 commits into from
Jun 24, 2020

Conversation

bdarcus
Copy link
Member

@bdarcus bdarcus commented Jun 24, 2020

Description

This moves specification of split title components from dedicated variables to new "sub" and "main" @form values.

Removes the -short title variable variants as well.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

@bwiernik
Copy link
Member

bwiernik commented Jun 24, 2020

Good call. cs:text is the only place where title variables can currently be rendered, so that's fine.

Is it possible to validate a style so that the variable attribute on cs:text must be a title variable in the schema form="main" or form="sub"? More generally, what is the expected fallback behavior? To form="long" if form="main" is empty and to empty if form="sub" is emptry?

I think that the rendering-attribute.form.values, which always contains "long" and "short", makes the schema less clear. Is this change just a Don't Repeat Yourself idea in case we ever had another standard form?

@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020

Is it possible to validate a style so that the variable attribute on cs:text must be a titlevariablein the schemaform="main"orform="sub"`?

Yes it is. It's just not easy, so I want confirmation before doing it.

More generally, what is the expected fallback behavior? To form="long"ifform="main"is empty and to empty ifform="sub" is empty?

I think so. The case that would be relevant here is a simple title like "A Book."

I think that the rendering-attribute.form.values, which always contains "long" and "short", makes the schema less clear. Is this change just a Don't Repeat Yourself idea in case we ever had another standard form?

Yes, but also, these two issues are, I think, related. I was planning to build on this to resolve the first point.

But I can do that other ways; it's not necessary.

@bwiernik
Copy link
Member

I would like the validator to reject something like <text variable="volume" form="main"/>, so I think restricting the main and sub forms to the title variables would be good if it's a reasonable effort. If rendering-attribute.form.values helps with that, great.

This allows specification of formatting for split title parts; like
subtitles.
@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020

I would like the validator to reject something like <text variable="volume" form="main"/>, so I think restricting the main and sub forms to the title variables would be good if it's a reasonable effort. If rendering-attribute.form.values helps with that, great.

Should work as expected now. Relevant change is here.

I removed the additional stuff I did earlier, so this is mostly just adding this.

@bdarcus bdarcus added this to the CSL 1.1 milestone Jun 24, 2020
@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020

I also just removed the sub/main title variables.

Which raises a question I think I should ask at this point: should we move "-short" titles to this mechanism too?

@denismaier
Copy link
Member

Yes. Short should be moved to this too. But short is relevant for more than just titles.
Also: how will we deal with a this in the json?

@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020 via email

@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020

Actually, short variables are only on titles, so I think this only applies to titles?

@denismaier
Copy link
Member

But the plan was to extend short to other variables as well, right?

@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020

So two options (I don't care):

  1. keep as is, and have a separate PR specific to the broader "short" question
  2. add "short" specific to titles here, and the rest on a separate PR

Preference?

@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020

In any case, when you guys are ready to GTM, please approve the PR.

@bwiernik
Copy link
Member

Let’s remove short from rnc. Should stay in some form in JSON. Can’t see How to access the review system on mobile, but this looks good to me

@denismaier
Copy link
Member

denismaier commented Jun 24, 2020

Git question: how can I checkout a PR for testing?

@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020

Git question: how can I checkout a PR for testing?

Here's one, direct, way.

https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/checking-out-pull-requests-locally

Can also fetch the branch, then check it out.

@bwiernik
Copy link
Member

git fetch origin <PR number>
git checkout origin <PR number>

Copy link
Member

@bwiernik bwiernik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop -short and then LGTM

@denismaier
Copy link
Member

denismaier commented Jun 24, 2020

Concerning json input: @dhimmel has indicated elsewhere that we might find a solution that doesn't involve polluting the json schema with -short variables. Maybe that can be of use here?

I was only able to find this comment by @dhimmel on this topic, but maybe he can give some more hints on what would be involved, and on potential drawbacks.

Besides that, looks good to me.

The @Form='short' variant can be used to access this, and so this
removes all the -short title variants.
@@ -122,10 +116,7 @@ div {
| "section"
| "source"
| "status"
| "title"
| "title-short"
| "translated-title"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translated-title should be under "variables.titles", shouldn't it?

Copy link
Member

@denismaier denismaier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't translated-title be moved to variables.titles?

@bwiernik
Copy link
Member

Should original-title, part-title, translated-title be listed under title variables?

@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020

Should original-title, part-title, translated-title be listed under title variables?

I was assuming no, so in the process of removing them from there.

@denismaier?

@dhimmel
Copy link
Contributor

dhimmel commented Jun 24, 2020

I was only able to find this comment by @dhimmel on this topic, but maybe he can give some more hints on what would be involved, and on potential drawbacks.

@denismaier I don't understand this PR (and the RNC side of things) sufficiently to weight in here.

It looks like their are four possible forms: { "short" | "long" | "sub" | "main" }. Anywhere I can read what these mean?

we might find a solution that doesn't involve polluting the json schema with -short variables.

As far as the CSL JSON goes, given that -short fields are already part of it, I'm not sure it makes sense to pursue an alternative. Although there are possibly some backwards compatible alternatives, like allowing the following syntax

container-title:
  long: PLOS Computational Biology
  short: PLOS Comp Bio

As opposed to the traditional:

container-title: PLOS Computational Biology
container-title-short: PLOS Comp Bio

But keeping this traditional syntax valid (at least for container-title). Makes it more complicated for the citeproc processor.

@denismaier
Copy link
Member

Should original-title, part-title, translated-title be listed under title variables?

I was assuming no, so in the process of removing them from there.

Why no? As they are clearly titles, I guess they should be there as well.

@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020

Should original-title, part-title, translated-title be listed under title variables?

I was assuming no, so in the process of removing them from there.

Why no? As they are clearly titles, I guess they should be there as well.

Because they were listed in under the "strings" list; not titles.

See the latest commit and confirm that looks right?

@bdarcus
Copy link
Member Author

bdarcus commented Jun 24, 2020

I was only able to find this comment by @dhimmel on this topic, but maybe he can give some more hints on what would be involved, and on potential drawbacks.

@denismaier I don't understand this PR (and the RNC side of things) sufficiently to weight in here.

It looks like their are four possible forms: { "short" | "long" | "sub" | "main" }. Anywhere I can read what these mean?

Not written yet. But, examples:

title: "Some Title: And a Subtitle"
main: Some Title
sub: And a Subtitle

"Short": when we originally defined it, I thought of it as alias for what we now introducing as "main." But it seems for very long titles, they may differ.

These are style details, BTW; not necessarily translating to the input json, which would stay the same. We'll define heuristics to derives these subfields.

@denismaier
Copy link
Member

"Short": when we originally defined it, I thought of it as alias for what we now introducing as "main." But it seems for very long titles, they may differ.

Yes, they are not the same. short sometimes be equivalent to main, or a shorter version of main. Or, they are almost same, but short drops a leading article:

title: "An important title: and a subtitle"
main: "An important title"
short: "important title"
sub: and a Subtitle

@denismaier
Copy link
Member

These are style details, BTW; not necessarily translating to the input json.

Yes, but as main and sub are derived by citeprocs from the full title we need them in input json to provide ways to override the heuristics.

@bdarcus bdarcus merged commit 447e7dd into v1.1 Jun 24, 2020
@bdarcus bdarcus deleted the split-title-forms branch June 24, 2020 21:15
@denismaier
Copy link
Member

I was hoping that we could get away with some sort of addition to the the specs: "processors should accept -short variants to all variables." Or something similar? Wdyt @dhimmel? Would that be reasonable?

@dhimmel
Copy link
Contributor

dhimmel commented Jun 25, 2020

I was hoping that we could get away with some sort of addition to the the specs: "processors should accept -short variants to all variables." Or something similar?

And then also add title-main and title-sub? Is it just titles that will have main & sub, or other variables like container-title might have container-title-sub?

On one hand, we could keep the number of CSL JSON top-level fields minimal with a syntax like:

title:
  long: "An important title: and a subtitle"
  main: "An important title"
  short: "important title"
  sub: and a Subtitle

On the other hand, there seems to be a preference for flat CSL JSON (ignoring the funky date-variables nesting). And some of the flat fields already exist like container-title-short and backcompat would be good.

If going with the flat structure, I think it's best to update the CSL JSON schema to include these properties. I don't think it makes sense to add -short variants where there is not a documented or theoretical need.

To enable the flexibility to add suffixes to CSL fields, we could look into JSON Schema pattern properties. This would allow us to keep the number of schema definitions from exploding:

{
  "type": "object",
  "patternProperties": {
    "^title(-long|-sub|-main|-short)?$": { "type": "string" }
  },
  "additionalProperties": false
}

@bdarcus
Copy link
Member Author

bdarcus commented Jun 25, 2020

And then also add title-main and title-sub?

I don't believe that will be necessary, as these can be derived from the full title.

To enable the flexibility to add suffixes to CSL fields, we could look into JSON Schema pattern properties.

It also occurred to me recently that we could use this mechanism, and possibly a similar one in rng, to allow prefix extension data; like law:foobar.

That would be a big decision to make, but it seems easy enough technically. And we could couple it, say, with a wiki to collect the extensions.

@denismaier
Copy link
Member

I was hoping that we could get away with some sort of addition to the the specs: "processors should accept -short variants to all variables." Or something similar?

And then also add title-main and title-sub? Is it just titles that will have main & sub, or other variables like container-title might have container-title-sub?

Yes, also -main and -sub. This should apply to all variables under variables.titles, so

collection-title
container-title
original-title
part-title
reviewed-title
title
translated-title
volume-title

Each of those will have -short, -main, and -sub

On one hand, we could keep the number of CSL JSON top-level fields minimal with a syntax like:

title:
  long: "An important title: and a subtitle"
  main: "An important title"
  short: "important title"
  sub: and a Subtitle

On the other hand, there seems to be a preference for flat CSL JSON (ignoring the funky date-variables nesting). And some of the flat fields already exist like container-title-short and backcompat would be good.

If going with the flat structure, I think it's best to update the CSL JSON schema to include these properties. I don't think it makes sense to add -short variants where there is not a documented or theoretical need.

To enable the flexibility to add suffixes to CSL fields, we could look into JSON Schema pattern properties. This would allow us to keep the number of schema definitions from exploding:

{
  "type": "object",
  "patternProperties": {
    "^title(-long|-sub|-main|-short)?$": { "type": "string" }
  },
  "additionalProperties": false
}

This looks promising. This will work for title, right?
What's the best way to add the prefix patterns? Does that work?

{
  "type": "object",
  "patternProperties": {
    "^(container-|collection-)?title(-long|-sub|-main|-short)?$": { "type": "string" }
  },
  "additionalProperties": false
}

Or should we add one pattern per title?

{
  "type": "object",
  "patternProperties": {
    "^title(-long|-sub|-main|-short)?$": { "type": "string" },
    "^container-title(-long|-sub|-main|-short)?$": { "type": "string" },
    "^collection-title(-long|-sub|-main|-short)?$": { "type": "string" }
  },
  "additionalProperties": false
}

Readability is better here, but it is redundant, of course. Maybe something like this?

{
  "type": "object",
  "patternProperties": {
    "^(container-
        |collection-
        |volume-)?
        title(-long|-sub|-main|-short)?$": { "type": "string" }
  },
  "additionalProperties": false
}

@denismaier
Copy link
Member

And then also add title-main and title-sub?

I don't believe that will be necessary, as these can be derived from the full title.

Sure we need them. Heuristics can fail and we need to provide some way to override those derived variables.

@bdarcus
Copy link
Member Author

bdarcus commented Jun 25, 2020

We were cross-posting.

Sure we need them. Heuristics can fail and we need to provide some way to override those derived variables.

What applications or legacy data formats will allow a user to manually specify these though? E.g. how would that "override" process actually work?

Biblatex has some of it, but handles in a flat representation, which is analogous to allowing -sub, etc.

@Book{Williams2002,
 Title                    = {Free as in Freedom},
 Author                   = {Williams, Sam},
 Publisher                = {O'Reilly Media},
 Year                     = {2002},
 ISBN                     = {0-596-00287-4},
 Subtitle                 = {Richard Stallman's Crusade for Free Software}
}

@denismaier
Copy link
Member

What applications or legacy data formats will allow a user to manually specify these though?

In Zotero you will just add title-main to the extra field.

The assumption is: Users enter the full title in the title field. The processor splits the title up into title-main and title-sub. Now: If you have a complicated title where that does not work, you will add title-main via extra so you can indicate where the processor should split.

@denismaier
Copy link
Member

Concerning biblatex: Biblatex's title is title-main in CSL.
We can't expect end user oriented applications to switch to this two field solution. That's why we came upt with this splitting mechanism in the first place.

@dhimmel
Copy link
Contributor

dhimmel commented Jun 25, 2020

Heuristics can fail and we need to provide some way to override those derived variables.

I agree that there should be a way to manually specify sub- and main- to address situation where the heuristic fails.

What applications or legacy data formats will allow a user to manually specify this though?

None yet, but perhaps in the future some. For Manubot we edit CSL JSON by hand when automated generation fails. Our reference manager is a bunch of persistent identifiers for which CSL JSON is automatically generated. And when that fails, the user manually provides the CSL JSON.

Or should we add one pattern per title?

{
  "type": "object",
  "patternProperties": {
    "^title(-long|-sub|-main|-short)?$": { "type": "string" },
    "^container-title(-long|-sub|-main|-short)?$": { "type": "string" },
    "^collection-title(-long|-sub|-main|-short)?$": { "type": "string" }
  },
  "additionalProperties": false
}

I like this format because we should be defining JSON Schema's title and description differently for each of these variables (see #190, we should confirm title and description work with pattern properties). I think it's okay that title-long and title-short and the rest of the title family share documentation. Also it's helpful in terms of looking through the schema to see what variables are available.

@bwiernik
Copy link
Member

In Zotero you will just add title-main to the extra field.

To elaborate on this, what this behavior is is that citeproc-js has behavior to extract CSL variables from the top of note (which is Extra in Zotero and Note in Mendeley). I've opened an issue to discuss whether we might want to add this behavior formally to the spec. citation-style-language/documentation#97

bwiernik pushed a commit to bwiernik/schema that referenced this pull request Jul 8, 2020
This uses @Form to allow specification of formatting for split title 
parts; like subtitles.

Also, removes -sub/-main and -short title variables, which are 
no longer needed.
bdarcus added a commit that referenced this pull request Jul 26, 2020
This uses @Form to allow specification of formatting for split title 
parts; like subtitles.

Also, removes -sub/-main and -short title variables, which are 
no longer needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants