A NodeJS-based converter for translating GEDCOM files into JSON, with Linked Data context as well.
The GEDCOM genealogy data file format is a text-based format, but defines a hierarchical structure (first value of each line of data is the "indent level" for that data) so very easily translates into a JSON structure, which in this booming age of REST APIs, lots of services understand more readily than GEDCOM files.
The JSON-LD specification is an extension of JSON that adds context for associating the data with the Semantic Web/Linked Data web. This converter maps a few ontologies to various parameters in GEDCOM:
- Friend of a Friend (
foaf
): People and common relations between them. - Relationship (
rel
): Deeper relationship terms for relating two people. - Biography (
bio
): Vocabulary for enumerating events in a person's life and participants in those events (GitHub Source). - Dublin Core (
dc
): Vocabulary for citing sources and dates.
Output JSON:
node convert.js myFamilyTree.ged
Save JSON to a file:
node convert.js myFamilyTree.ged > myFamilyTree.json
The output structure of the convert.js
script looks like:
{
"@context": {
"foaf": "http://xmlns.com/foaf/0.1/",
"rel": "http://purl.org/vocab/relationship",
"bio": "http://purl.org/vocab/bio/0.1/",
"dc": "http://purl.org/dc/elements/1.1/"
},
"@graph": [
{
"@id": "_:I101",
"@type": "foaf:Person",
"foaf:name": "John /Smith/",
"foaf:gender": "M",
"bio:event": {
"@type": "bio:Birth",
"DATE": "1 APR 1900",
"bio:principal": {
"@id": "_:I101"
}
},
"bio:relationship": {
"@id": "_:F101"
}
},
{
"@id": "_:F101",
"@type": "bio:Relationship",
"bio:participant": [
{
"@id": "_:I101"
},
{
"@id": "_:I102"
}
]
}
]
}
To be parsed into RDF, it will need an output structure like:
[
{
"@context": {
"foaf": "http://xmlns.com/foaf/0.1/",
"rel": "http://purl.org/vocab/relationship",
"bio": "http://purl.org/vocab/bio/0.1/",
"dc": "http://purl.org/dc/elements/1.1/"
},
"@id": "_:I101",
"@type": "foaf:Person",
"bio:relationship": {
"@id": "_:F101"
},
"foaf:gender": "F",
"foaf:name": "Jane /Smith/"
},
{
"@context": {
"foaf": "http://xmlns.com/foaf/0.1/",
"rel": "http://purl.org/vocab/relationship",
"bio": "http://purl.org/vocab/bio/0.1/",
"dc": "http://purl.org/dc/elements/1.1/"
},
"@id": "_:I102",
"@type": "foaf:Person",
"bio:child": {
"@id": "_:I103"
},
"bio:relationship": {
"@id": "_:F101"
},
"foaf:gender": "F",
"foaf:name": "Betty /Smith/"
}
]
Meaning return a list of objects, and every object has its own @context
set. Then a converter like riot --output=RDF/XML ged.jsonld
can convert it to RDF/XML. (TODO)
Grab the @graph
property from the result JSON, which is an array of JSON objects. Objects that have a @type
property of foaf:Person
are INDI
objects in the original GEDCOM, and @type
of bio:Relationship
are FAM
objects in the original file. Between those two types, all the properties of the original data file should be present.
CONT
items are concatenated onto their parent items with a line breakTIME
items are concatenated onto their parentDATE
items with a space- Events on an
INDI
have that individual asbio:principal
GEDCOM | Linked Data | Note |
---|---|---|
INDI |
foaf:Person |
|
INDI.NAME |
foaf:name |
|
INDI.SEX |
foaf:gender |
|
INDI.BIRT |
bio:Birth |
|
INDI.CHR |
bio:Baptism |
|
INDI.CHRA |
bio:Baptism |
|
INDI.BAPM |
bio:Baptism |
|
INDI.BLES |
bio:Baptism |
|
INDI.DEAT |
bio:Death |
|
INDI.BURI |
bio:Burial |
|
INDI.CREM |
bio:Cremation |
|
INDI.ADOP |
bio:Adoption |
|
INDI.BARM |
bio:BarMitzvah |
|
INDI.BASM |
bio:BasMitzvah |
|
INDI.CONF |
bio:IndividualEvent |
Confirmation |
INDI.FCOM |
bio:IndividualEvent |
First Communion |
INDI.ORDN |
bio:Ordination |
|
INDI.NATU |
bio:Naturalization |
|
INDI.EMIG |
bio:Emigration |
|
INDI.IMMI |
bio:IndividualEvent |
Immigration |
INDI.CENS |
bio:GroupEvent |
Census |
INDI.PROB |
bio:IndividualEvent |
Probate |
INDI.WILL |
bio:IndividualEvent |
Will |
INDI.GRAD |
bio:Graduation |
|
INDI.RETI |
bio:Retirement |
|
INDI.EVEN |
bio:IndividualEvent |
|
FAM |
bio:Relationship |
|
FAM.HUSB |
bio:participant |
Both husband and wife become bio:participant s on the FAM Relationship; to find the gender, reference the related foaf:Person . |
FAM.WIFE |
bio:participant |
Both husband and wife become bio:participant s on the FAM Relationship; to find the gender, reference the related foaf:Person . |
FAM.ANUL |
bio:Annulment |
|
FAM.CENS |
bio:GroupEvent |
Census |
FAM.DIV |
bio:Divorce |
|
FAM.DIVF |
bio:GroupEvent |
Divorce filed |
FAM.ENGA |
bio:GroupEvent |
Engagement |
FAM.MARR |
bio:Marriage |
|
FAM.MARB |
bio:GroupEvent |
Marriage Announcement |
FAM.MARC |
bio:GroupEvent |
Marriage Contract |
FAM.MARL |
bio:GroupEvent |
Marriage License |
FAM.MARS |
bio:GroupEvent |
Marriage Settlement |
FAM.EVEN |
bio:GroupEvent |
|
DATE |
dc:date |
|
SOUR |
dc:source |
Property on an object that points to the Source object |
SOUR |
dc:BibliographicResource |
Class that the above points to |
SOUR.DATA |
dc:coverage |
|
SOUR.DATA.DATE |
dc:temporal |
|
SOUR.AUTH |
dc:creator |
|
SOUR.TITL |
dc:title |
The GEDCOM format links individuals through FAM
objects, with the HUSB
, WIFE
, and CHIL
references pointing to the various individuals, rather than individuals referencing each other. This is useful for drawing family tree diagrams, as the parents are usually arranged horizontally and joined to a central node, which the children's lines sprout from.
But for traversing person-to-person relationships, it adds a needless step. The conversion script adds rel:childOf
rel:siblingOf
, rel:spouseOf
, and rel:parentOf
to the individual (foaf:Person
) objects, so FAM
/bio:Marriage
objects can be bypassed if desired. Where applicable, the more strict bio:child
, bio:father
, and bio:mother
are used instead.
-
CHIL
tags are left on theFAM
(bio:Relationship
) object to preserve the data of which marriage a child came from. -
If the
FAM
object has anANUL
tag, norel:spouseOf
relations are generated. (TODO) -
If the
FAM
object has anENGA
tag, but noMARR
tag,rel:engagedTo
is used instead ofrel:spouseOf
. -
If the
FAM
object has noENGA
and noMARR
tag, norel:spouseOf
orrel:engagedTo
are created between the parents, but any children get the properrel:childOf
andrel:siblingOf
relations added. -
If the
INDI
object has anFAMC
tag withPEDI
set to 'natural' or 'birth',bio:child/father/mother
tags are used instead ofrel:childOf/parentOf
. -
If the
FAM.CHIL
object has_MREL
or_FREL
attributes (used by Family Tree Maker software to indicate pedigree) set to 'natural',bio:child/father/mother
tags are used instead ofrel:childOf/parentOf
. -
If an
ANUL
,DIV
, orDIVF
exists on aFAM
object, thebio:concludingEvent
of thatbio:Marriage
is set to that event. If bothDIV
andDIVF
exist,DIV
takes precedence as the concluding event. (TODO) -
If one of the partners in a
bio:Marriage
has a Death event (or the first occurring Death if both are), that Death event is set as thebio:concludingEvent
for thebio:Marriage
if noANUL
,DIV
, orDIVF
exists. (TODO) -
If
DEAT
andBURI
orCREM
exist,bio:followingEvent
andbio:precedingEvent
relationships are added. (TODO)
There are a few places in the GEDCOM structure that break the standard linkage between nodes that an RDF graph has. Namely, the INDI.FAMC.PEDI
(Pedigree) and INDI.FAMC.STAT
(Status) tags break the standard INDI.FAMC
linkage. The PEDI
and STAT
attributes are not attributes of the FAM
referenced by the FAMC
ID, but rather attributes of the link that individual has with that family, which doesn't work well in JSON-LD. Technically, it's a reification of the link.
SOUR
tags have the same situation; they are added onto a link to another node, and modify the link, rather than either of the nodes.
So, to get that to work properly, when an object (e.g. a foaf:name
property on a foaf:Person
) has a SOUR
property, the parent object (foaf:Person
in this example) gets a GEDREIF
property with a value of:
{
"@type": "rdf:Statement",
"rdf:subject": "_:I101",
"rdf:predicate": "foaf:name",
"rdf:object": "John Smith",
"dc:source": "_:S101",
}
If there are multiple SOUR
references for that object, that property becomes an array of objects. If multiple SOUR
references have the same ID, the rdf:predicate
for that SOUR
becomes an array of properties that source affects. (TODO)
For pedigree information on an INDI.FAMC
, the INDI
object gets a GEDREIF
attribute, which is set to: (TODO)
{
"@type": "rdf:Statement",
"rdf:subject": "_:I101",
"rdf:predicate": "FAMC",
"rdf:object": "_:F101",
"dc:description": "natural"
}
Breakdowns for being more specific about an INDI.NAME
also exist in the GEDCOM specification. For example, an INDI
with a GIVN
and SURN
additional tag on their NAME
:
{
"@type": "rdf:Statement",
"rdf:subject": "_:I101",
"rdf:predicate": "foaf:name",
"rdf:object": "John Smith",
"GIVN": "John",
"SURN": "Smith"
}
- Pedigree tree: D3 "elbow dendrogram" using the "tree" D3 layout.
- D3 smart force labels: Adding functinality to have labels "orbit" their node, and repel each other, so they stay out of each other's way.