This document specifies a date representation for exchanging dates associated with genealogical data, and requests discussion and suggestions for improvements.
The current state of this document is as a "stable draft", and as such the document may be subject to limited changes, BUT NOT backwards-incompatible changes, according to the discussion and suggestions for improvement.
Copyright Intellectual Reserve, Inc.
This document is distributed under a Creative Commons Attribution-ShareAlike license. For details, see:
http://creativecommons.org/licenses/by-sa/3.0/
The GEDCOM X Date Format spec specifies a mechanism for representing dates, especially as they pertain to the need to represent genealogical dates. The spec includes definitions, date ranges, date formats and URI representation examples.
The GEDCOM X Date specification defines a way of representing dates associated with genealogical data.
This specification is heavily based on the ISO 8601 standard, the RFC 3339 proposal, and [W3C's profile] (http://www.w3.org/TR/NOTE-datetime) of ISO 8601. Concepts from the Dublin Core Date and Time Requirements Wiki were also leveraged.
This specification has been provided because each of these standards or proposals individually has limitations or omissions that do not fulfill the requirements identified for genealogical date representations.
- 1. Introduction
- 2. Terms and Definitions
- 3. Scope
- 4. Calendaring System
- 5. Format
- 6 URI Representation
- APPENDIX A: Implementation Hints and Observation
The identifier for this specification is:
http://gedcomx.org/date/v1
For convenience, the GEDCOM X date format may be referred to as "GEDCOM X Date 1.0". This specification uses "GEDCOM X Date" internally.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC2119, as scoped to those conformance targets.
For the purpose of this document, the following terms and definitions apply in addition to those defined by ISO 8601.
Portion of a date representing a particular day by specifying its calendar year, its calendar month and the ordinal number of the day within its calendar month.
Portion remaining from a date if the calendar date portion is ignored, represented in units of hours, minutes, and seconds.
NOTE: By implication, time of day must be less than 24 hours.
Abbreviation of "Common Era", "Current Era", or "Christian Era". Equivalent to "Anno Domini", or "AD".
Abbreviation for "Before the Common Era", "Before the Current Era", or "Before the Christian Era". The designation "BCE" is to "CE", as "BC" is to "AD".
NOTE: The year preceding 1 CE is identified as "1 BCE". Neither designation uses year 0 (zero).
A calendar introduced in 1582 by Pope Gregory XIII that enhanced the Julian calendar with improved leap year rules.
NOTE: The proleptic Gregorian calendar includes dates prior to 1582 using this calendaring system.
A date representing a single calendar date, and optionally including a time of day. This term is used
to clarify the distinction between a generic date and the aggregation of the GEDCOM X Date types
of values (e.g. simple date
, date range
, open-ended date range
, and approximate date
).
A time interval can be specified by a start date and an end date (both instances of simple date
) or by specifying a
start date (a simple date
) and a duration. Date ranges MAY be either "closed" (both end points are specified
or can be calculated) or "open-ended" (only one end-point is specified).
Examples of closed date range
:
- From January 1863 CE to December 14, 1642 CE
Examples of open-ended date range
- Before January 1863 CE
- After December 14, 1642 CE
A series of discrete dates, separated by a specified duration.
Examples:
- 10 leap years beginning 1924 CE
- at the same time every day for a week starting at June 18, 1937 CE 10 AM local time
- every 10 years starting 1820 CE
An indeterminate date with a single occurrence roughly centered on a specified simple date
.
Examples:
- About January 1777
- Around 1590
- Sometime in 1920
An indeterminate date with a single occurrence within a specified date range
.
Examples:
- Sometime between December 6, 1940 and December 8, 1940
The GEDCOM X Date represents one of the following:
- a simple date
- a date range
- a recurring date
- an approximate date
- an approximate date range
The precision of a simple date
is based on the smallest provided unit of measure.
The GEDCOM X Date units of measurement include, and are limited to year
, month
,
day
, hour
, minute
, and second
. For a given simple date
, all units of measurement
larger than the smallest unit specified MUST be provided.
A date range
MUST be either a closed date range
or an open-ended date range
.
A closed date range
MUST be one of the following:
- start date and end date
- start date and
duration
An open-ended date range
MUST include either the start date or
the end date, but NOT both.
A recurring date
is represented by a closed date range
providing the following:
- REQUIRED: a start date (or reference date)
- REQUIRED: the time interval between occurrences (calculated as the interval between the start date and the end date, or as the interval specified by the
duration
) - OPTIONAL: the number of recurrences
NOTE: If no recurrence count is provided, the recurrences are considered perpetual.
An approximate date
is represented by providing all of the following:
- an indicator that the date is approximate
- a
simple date
An approximate date range
is represented by providing all of the following:
- an indicator that the date is approximate
- a
date range
In order to provide consistency in interpretation of a date, a common calendaring and time system is specified as follows:
- Dates MUST be specified using the proleptic Gregorian calendar.
- The earliest representable date is January 1, 10000 BCE.
- The latest representable date is December 31, 9999 CE.
- Years are provided as follows:
- The year prior to 1 CE ("Common Era or "AD") MUST be represented as the year 0.
- Any year prior to year 0 MUST be represented as a negative number.
The following letters are used as value designators, and precede the value:
- [A] - designates an
approximate date
- [P] - designates the component as a duration
- [R] - designates a recurrence count for a
recurring date range
The following characters are used as value separators:
- [T] - separates the
calendar date
portion of adate
orduration
from thetime of day
portion - [Z] - designates the time is in UTC time
- [-] - separates the values of the
calendar date
portion's units of a date - [:] - separates the values of the
time of day
portion's units of a date - [/] - separates the components of a
date range
orrecurring date range
In the format for a simple date
, letters are used to represent digits of the date as follows:
- [Y] - digit used in the year
- [M] - digit used in the month
- [D] - digit used in the day of month
- [h] - digit used in the hour
- [m] - digit used in the minute
- [s] - digit used in the second
- [z] - digit used in the local time offset
- [±] represents a plus sign [+] if the following element's value is positive or zero, or a minus sign [-] if the following element's value is negative.
The format for a complete simple date
is defined as follows:
±YYYY[-MM[-DD[Thh:[mm[:ss]][±hh[:mm]|Z]]]]
The complete simple date
format specifies the format of all components and their order (largest to
smallest units). Unit components MAY be truncated right-to-left, to indicate precision level of the
date.
The year component is defined as a REQUIRED [+] or [-] and four digits, left-padded with zeros as needed. Valid values range from -9999 to +9999. The year component MUST always be present as part of a simple date, and is the maximal unit of precision.
The month component MUST be 2 digits when present, with values of between 01
and 12
.
The day of month component MUST be 2 digits when present. The range of valid values is
determined by the number of days in that proleptic Gregorian calendar month, with the
first day of the month designated as 01
.
If any time component is present, the character [T] MUST precede the time of day component.
Hours are based on a 24-hour day, and MUST have a value between 00 and 23. In the special case where the minute and second components have zero values, the value 24 is valid, representing midnight at the end of the calendar day. Likewise, if all three components have the value 00, it represents midnight at the beginning of the specified calendar day.
When any time of day is specified, there are three options for specifying its geographical reference:
- No specifier implies local time
- [Z] specifies UTC
- four digits (with a colon separator), preceded by a [+] or [-] indicates the shift of local time from UTC
- This is usually referred to as the local "time zone"
- The [+] or [-] character is REQUIRED
- The first 2 digits represent the hours
- The last 2 digits represent minutes, and MAY be omitted if zero
example | textual description |
---|---|
+1752-01-18T22:14:03Z | January 18, 1752 CE 10:14 and 3 seconds PM UTC |
+1964-11-14T10-07:00 | November 14, 1964 CE 10 AM, Mountain Standard Time |
+1889-05-17T14:23 | May 17, 1889 CE 2:23 PM |
+1492-07-27 | July 27, 1492 CE (presumed to be "local time", honoring the International Date Line) |
+0186-03 | March 186 CE |
-1321 | 1322 BCE |
The initial [P] designates the value is a duration
. The part including time components MUST be preceded by [T].
In the format representations for a duration
, a digit is represented by the letter [n]. Letters have specific
meaning, are literal, and represent the following units:
- [Y] The number of years
- [M] The number of months or minutes (determined by context)
- [D] The number of days
- [H] The number of hours
- [S] The number of seconds
The format for a complete duration is defined as follows:
PnnnnYnnMnnDTnnHnnMnnS
A date duration
can be represented by a combination of components/units with designators, with the following
guidelines and restrictions:
- Each component is OPTIONAL, and MAY be omitted.
- If any time component is present, the [T] MUST precede the time of day part.
- All components present MUST appear in hierarchical order, largest to smallest units.
- Components are NOT REQUIRED to be normalized.
- Any non-normalized unit MAY be represented with up to four digits.
- For example, the descriptive values "13 months" and "2 years, 52 days" each contain non-normalized values and both are considered acceptable.
NOTE: For a duration, local time and UTC distinction is meaningless.
NOTE: A GEDCOM X Date MAY contain a duration
, but MUST NOT solely represent a duration
itself.
example | textual description |
---|---|
P17Y6M2D | duration of 17 years, 6 months, and 2 days |
P186D | duration of 186 days |
PT5H17M | lapsed time: 5 hours 17 minutes |
P1000Y18M72DT56H10M1S | 1000 years 18 months 72 days 56 hours 10 minutes 1 second |
The format for a complete date range
is a start date and an end date (both simple dates
), separated by a [/]:
±YYYY-MM-DDThh:mm:ss[±hh[:mm]|Z]/±YYYY-MM-DDThh:mm:ss[±hh[:mm]|Z]
or a start date (a simple date
) and a duration
, separated by a [/]:
±YYYY-MM-DDThh:mm:ss[±hh[:mm]|Z]/PnnnnYnnMnnDTnnHnnMnnS
In either format, the presence of the slash character [/] indicates the date is a date range
.
The start date (the simple date
preceding the slash) MUST NOT be greater than maximum simple date
(+9999-12-31T23:59:59)
and MUST be earlier than or equivalent to the end date (the simple date
following the slash).
NOTE: It is not required that the precision of the two simple dates
be the same.
The duration
MUST be such that the calculated end date is earlier or equivalent to the maximum simple date
(+9999-12-31T23:59:59).
NOTE: It is not required that the precision of the start date and the duration
be the same.
The precision of the equivalent end date is the coarser precision of the start date and the
duration
.
example | textual description |
---|---|
+1752/+1823 | from 1752 CE to 1823 CE |
+1825-04-13/+1825-11-26 | from April 13, 1825 to November 26, 1825 |
+1933-02-19/P74Y | 74 years, starting on February 19, 1933, i.e. from February 19, 1933 to February 19, 2007 |
An open-ended date range
MUST be a date range
where
either the start date or end date is explicitly missing.
A leading slash character [/] is used to specify a date range before the provided end date:
/±YYYY-MM-DDThh:mm:ss[±hh[:mm]|Z]
A trailing slash character [/] is used to specify a date range after the provided start date:
±YYYY-MM-DDThh:mm:ss[±hh[:mm]|Z]/
example | textual description |
---|---|
/+1887-03 | before March, 1887 CE |
+1976-07-11/ | after July 11, 1976 CE |
/-1287 | before 1288 BCE |
/+0000 | before 1 BCE |
-0001-04/ | after April, 2 BCE |
The format for a recurring date
is defined as either:
R[n]/±YYYY-MM-DDThh:mm:ss[±hh[:mm]|Z]/±YYYY-MM-DDThh:mm:ss[±hh[:mm]|Z]
or
R[n]/±YYYY-MM-DDThh:mm:ss[±hh[:mm]|Z]/PnnnnYnnMnnDTnnHnnMnnS
The recurring date
is defined in terms of a closed date range
— where start date is the reference date
and the recurring interval is calculated as the interval between the start date and the end date or the interval
specified by the duration
—prepended with an [R], an OPTIONAL recurrence count, and a slash [/].
example | descriptive use case |
---|---|
R4/+1776-04-02/+1776-04-09 | every week, for 4 weeks starting on July 2, 1776 CE |
R/+2000/P12Y | the Chinese Year of the Dragon occurs every 12 years (perpetually), including the year 2000 CE |
R100/+1830/+1840 | the US census occurs every 10 years starting in 1830, for 100 repetitions |
The format for an approximate date
is defined as a simple date
prepended by the character [A].
example | unit of approx | textual description |
---|---|---|
A+1680 | year | about 1680 CE |
A-1400 | year | about 1401 BCE |
A+1980-05-18T18:53Z | minutes | about 6:53 PM [UTC], May 18, 1980 |
A+2014-08-19 | days | about August 19, 2014 CE |
The format for an approximate date range
is defined as a date range
prepended by the character [A].
example | description, textual equivalent |
---|---|
A+1752/+1823 | approximately between 1752 CE and 1823 CE |
A+1825-04-13/+1825-11-26 | approximately between April 13, 1825 CE and November 26, 1825 CE |
A+1633-02-19/P74Y | approximately within 74 years after February 19, 1633 CE |
A/+1887-03 | approximately before March, 1887 CE |
A+1976-07-11/ | approximately after July 11, 1976 CE |
A/-1287 | approximately before 1288 BCE |
A/+0000 | approximately before 1 BCE |
A-0001-04/ | approximately before April, 2 BCE |
A GEDCOM X Date MAY be identified using a Uniform Resource Identifier (URI) as defined by RFC-2396. A URI that identfies a GEDCOM X Date is of the following format:
gedcomx-date:<GEDCOM X Date value>
NOTE: The URI scheme is gedcomx-date
and the scheme-specific part is the representation of the date as defined
by this specification.
example type | description | applicable URI |
---|---|---|
simple date | Sept 14, 1863 | gedcomx-date:+1863-09-14 |
approx. date | about 1742 | gedcomx-date:A+1742 |
date range | between October 1834 and May 1835 | gedcomx-date:+1834-10/+1835-05 |
The following summaries may be beneficial in parsing and composing GEDCOM X Dates using this specification:
- Any value that begins with a [+] or a [-] must be a
simple date
.- The [-] will only affect the year component.
- A negative
simple date
year component can always be converted to a BCE Gregorian year by adding 1 to the absolute value.
- Any value that begins with a [P] must be a
duration
. - A leading [A] is always an
approximate date
, and must be followed by either asimple date
or adate range
. - A leading [R] is always a
recurring date range
. - A slash [/] always separates values, and its presence always indicates a
date range
(including open-ended and recurring date ranges). - A [T] always separates the
calendar date
(calendar units) from thetime of day
(time units). - Each component of a
simple date
has a fixed width, always preceded by a designated character in the set [±,-,T,:].- All components, except the year component have length of 3, including the delimiting prefix character.
- The year component has length of 5 (and the prefix is always [+] or [-]).
- When provided, local time offset has a length of 6, two components (hours and minutes) each of length 3.