Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input: Align date-parts and edtf string models #301

Merged
merged 14 commits into from
Jul 10, 2020
Merged

input: Align date-parts and edtf string models #301

merged 14 commits into from
Jul 10, 2020

Conversation

bdarcus
Copy link
Member

@bdarcus bdarcus commented Jul 6, 2020

This modifies the structured date object to align with the EDTF model.

It moves the qualifiers to the date object, and then defines a date
range as an array of these date objects.

The result is that dates can either be represented as an EDTF string,
or as a structured object date, or a date range array.

The intention is this structured variant will go away in time, so that
in the future the only option will be the EDTF string.

Closes #300


In CSL JSON 1.0, dates are either a two-level nested array ...

  "issued": {
    "date-parts": [
      [
        2000,
        3,
        15
      ],
      [
        2000,
        3,
        17
      ]
    ],
    "circa": "true"
  }

... or a raw string, with is a standard EDTF-like date (note, however, that the example isn't compliant because months are not two-digits, as they are in ISO 8601).

"accessed": {
     "raw": "2005-4-12"
},
"issued": {
     "raw": "2000-3-15/2000-3-17"
}

In CSL JSON 1.1, we move raw to an explicit EDTF string option on the property itself,

So in this option, the above becomes:

"accessed": "2005-4-12",
"issued":  "2000-3-15/2000-3-17"

This PR defines:

  1. a date as an object, and moves the qualifiers onto the object, to make it consistent with EDTF (where you can have different qualifiers on the begin and end of a range)
  2. a date range as a a two-item array of date objects

Also, removes date-parts.

A single date would be:

  "issued": {
      "year": "2000",
      "month":  "3",
      "day": "15"
      },
      "approximate": "true"
    }

... and the first range example becomes this ("approximate" is the EDTF language what we have called "circa"):

  "issued": [
    {
      "year": "2000",
      "month":  "3",
      "day": "15"
    },
    {
      "year": "2000",
      "month":  "3",
      "day": "17"
    }
  ]

Note: open question is how to handle the mismatch between circa on 1.0 and this.

And we can then do things like season ranges (not formally supported by EDTF, but easy for us to add):

  "issued": [
    {
      "year": "2000",
      "month":  "21",
    },
    {
      "year": "2000",
      "month":  "22"
    }
  ]

This would seem a better long-term that would allow us to painlessly drop the object/array representation at some point in the future.

@bdarcus bdarcus added the input label Jul 6, 2020
@bdarcus bdarcus force-pushed the csl-etdf-date branch 5 times, most recently from e8b5c0b to 5b7239c Compare July 6, 2020 14:20
@bdarcus

This comment has been minimized.

@dhimmel
Copy link
Contributor

dhimmel commented Jul 8, 2020

So the idea is to modify date-parts to make it better at handling a parsed EDTF string?

Aside from need for review of the specifics of the schema, also leaves open whether and how to use in the csl-data.json schema.

As per https://json-schema.org/understanding-json-schema/structuring.html, I think you can use $ref to point to load this schema via its URL.

@bdarcus
Copy link
Member Author

bdarcus commented Jul 8, 2020

So the idea is to modify date-parts to make it better at handling a parsed EDTF string?

Yes; to make them consistent.

It's based on the EDTF.js library's modeling, though there what we have as date-parts is simply values.

Aside from need for review of the specifics of the schema, also leaves open whether and how to use in the csl-data.json schema.

As per https://json-schema.org/understanding-json-schema/structuring.html, I think you can use $ref to point to load this schema via its URL.

Thanks, Do you have a view on whether we should?

@dhimmel
Copy link
Contributor

dhimmel commented Jul 8, 2020

Thanks, Do you have a view on whether we should?

Is there any reason this whole file shouldn't go in the CSL Data schema as a variable?

Also are users creating CSL JSON Items supposed to use this schema or just processors?

@bdarcus
Copy link
Member Author

bdarcus commented Jul 8, 2020 via email

@bdarcus
Copy link
Member Author

bdarcus commented Jul 8, 2020

Is there any reason this whole file shouldn't go in the CSL Data schema as a variable?

I just pushed a commit that moves it back into csl-data.json.

@bdarcus

This comment has been minimized.

@bdarcus bdarcus changed the title input: Add csl-edtf-date.json input: Align date-parts and edtf string models Jul 8, 2020
},
"date-range": {
"title": "Date Range",
"description": "An EDTF range is a two-item array. An open end or begin of a range can be represented with a null.",
Copy link
Member Author

@bdarcus bdarcus Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null option for open currently isn't implemented. Not sure how to do this, or whether this is the best approach.

But If I remove the required "date-parts" on the date object, we can just have an empty object (as below).

I've just pushed this change, but would appreciate feedback. Is this OK? I think it is, so am planning to merge this.

  {
    "id": "four",
    "type": "book",
    "title": "The Title",
    "issued": [
      { "date-parts": [2000] },
      {}
    ]
  }

Copy link
Member

@bwiernik bwiernik Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this or permitting an empty date-parts array be better? I'm leaning toward this empty object solution.

Copy link
Member Author

@bdarcus bdarcus Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK; if I don't hear from any developers, am just going to go with this and figure we can correct it before merging to master.

But I was looking for the most minimal representation of the open item.

I did also consider this:

 {
    "id": "four",
    "type": "book",
    "title": "The Title",
    "issued": [
      { "date-parts": [2000] },
      { "open": "true" }
    ]
  }

This modifies the structured date object to align with the EDTF model.
It moves the qualifiers to the date object, and then defines a date
range as an array of these date objects.
@bwiernik
Copy link
Member

bwiernik commented Jul 9, 2020

This LGTM. We may wish to include some guidance in the processor addendum to the spec about how to process the old date model in CSL 1.1+ (e.g., what to do if circa is encountered).

Co-authored-by: Brenton M. Wiernik <[email protected]>
@bdarcus
Copy link
Member Author

bdarcus commented Jul 9, 2020

This LGTM. We may wish to include some guidance in the processor addendum to the spec about how to process the old date model in CSL 1.1+ (e.g., what to do if circa is encountered).

We'll in general need to figure out that last issue.

In a style, for example, what to do with these?

issued: 1945-04-05~/1945-04-07
issued: 1945-04-05/1945-04-07~
issued: 1945-04-05~/1945-04-07~

@bdarcus bdarcus mentioned this pull request Jul 9, 2020
@bdarcus
Copy link
Member Author

bdarcus commented Jul 9, 2020

I think I should mention this, since I mentioned here:

I think date-parts is better represented as an object, as it is in pandoc-citeproc.

I can see no advantage at all to the array, given what we're modeling here, other than that it fits with how EDTF.js (and CSL JSON 1.0) models it.

But unless developers want to argue the case, it seems we should defer to status quo on this.

In any case, just to illustrate what I mean, date-parts would no longer be needed, so result would be:

{
 "id": "four",
 "type": "book",
 "title": "The Title",
 "issued": {
   "year": "1932",
   "month": "10",
   "day": "22",
   "approximate": "true"
  }
}

But that's an incompatible change of course, and it may not be worth it given the option to deprecate this entirely longer term in favor of the EDTF string.

@bwiernik
Copy link
Member

bwiernik commented Jul 9, 2020

Compatibility with edtf.js is a major advantage I think. It's not clear to me how the object format you describe handles date ranges, which I think is the major motivation behind the array structure?

@bdarcus
Copy link
Member Author

bdarcus commented Jul 9, 2020

It's not clear to me how the object format you describe handles date ranges, which I think is the major motivation behind the array structure?

Easy; the range is still an array.

{
   "id":"four",
   "type":"book",
   "title":"The Title",
   "issued":[
      {
         "year":"2012",
         "month":"21"
      },
      {
         "year":"2012",
         "month":"22"
      }
   ]
}

I'm wondering if @inukshuk decided to represent as the values array was for compatibility with CSL JSON 1.0, or if there's some other reason?

@bwiernik
Copy link
Member

bwiernik commented Jul 9, 2020

Oh, I see. You mean that the date itself is an object, whereas the date range is an array.

@bdarcus
Copy link
Member Author

bdarcus commented Jul 9, 2020

Oh, I see. You mean that the date itself is an object, whereas the date range is an array.

Yes, but just to be super clear for anyone coming late to this ...

Date is an object in this PR, and date-range is a two-item array of date objects.

Currently, and in 1.0, the date-parts property within the date object is also an array. This is what I was focusing on in my latest comments; that this property was not necessary (at least not now), and in hindsight, not the best decision.

@inukshuk
Copy link
Member

inukshuk commented Jul 9, 2020

@bdarcus the CSL date-parts may have influenced the values array initially but mainly it can be used very efficiently internally: on the one hand, the array's length corresponds to the date's 'precision' and it's easy to feed the array values to JavaScript's Date constructor; approximate, undefined and uncertain values are stored as bitmasks so representing dates as an array of numbers makes computing and applying those masks relatively easy, regardless of the date's precision level.

Overall, I'd say the values array owes much to edtf.js implementation details and JavaScript in general. I think that using single object, without a date-parts array is probably much easier to read and write.

@bdarcus
Copy link
Member Author

bdarcus commented Jul 9, 2020

Thanks much @inukshuk.

FYI, I just pushed a change (that will fail CI) to remove date-parts and add year, month, day, so people can see the concrete change.

I can remove the commit if objections.

This removes the date-parts array and adds year, month, day integer
properties.
"type": "book",
"title": "The Title",
"issued": {
"date-parts": [2000, 3, 10]
Copy link
Member Author

@bdarcus bdarcus Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove date-parts:

Suggested change
"date-parts": [2000, 3, 10]
"year": "2000",
"months": "3",
"day": "10"

"id": "range-1",
"type": "book",
"title": "The Title",
"issued": [{ "date-parts": [2000] }, { "date-parts": [2001] }]
Copy link
Member Author

@bdarcus bdarcus Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove date-parts:

Suggested change
"issued": [{ "date-parts": [2000] }, { "date-parts": [2001] }]
"issued": [{ "year": "2000" }, { "year": "2001" }]

"id": "range-2",
"type": "book",
"title": "The Title",
"issued": [{ "date-parts": [2000] }, {}]
Copy link
Member Author

@bdarcus bdarcus Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove date-parts:

Suggested change
"issued": [{ "date-parts": [2000] }, {}]
"issued": [{ "year": "2000" }, {}]

@bdarcus
Copy link
Member Author

bdarcus commented Jul 10, 2020

So should we just GTM?

@bwiernik
Copy link
Member

So long as give processor instructions for handling the old model, LGTM.

Now tests for a date object without the required year of undefined
properties.
@bdarcus
Copy link
Member Author

bdarcus commented Jul 10, 2020

This is basically ready, but the CI script is killing me. The test on push passes, but the test on pull doesn't.

@dhimmel
Copy link
Contributor

dhimmel commented Jul 10, 2020

The error at https://github.com/citation-style-language/schema/pull/301/checks?check_run_id=859242336#step:5:38 makes me think the CI run is for the old commit where the code has true and not 'true' (although you may want True).

Perhaps try rerunning CI... maybe some github actions problem where it ran on an old commit

@bdarcus
Copy link
Member Author

bdarcus commented Jul 10, 2020

although you may want True

This turned out to be the issue.

I had reran the job, and it had still failed.

In any case, thank you!

@bdarcus bdarcus merged commit 892334e into v1.1 Jul 10, 2020
@bdarcus
Copy link
Member Author

bdarcus commented Jul 10, 2020 via email

@bdarcus bdarcus deleted the csl-etdf-date branch August 7, 2020 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants