Skip to content

Data Model for Digital Object Specification

marlip edited this page Feb 17, 2022 · 1 revision

Data Model for Digital Object Specification

This is the description of the data model for digital objects chosen by National Library of Sweden (KB). Its purpose is preservation of and access to our digital objects, part of building a data platform encompassing multiple services and serving as a common ground for current and future development for projects like "Öppna", "Tidningar" and "Visa".

The model seeks to follow principles of linked data and PREMIS Data Dictionary for digital preservation and is using the linked library data of the National Library of Sweden where new terms have been introduced for that purpose.

The model hopes to provide a uniform ground for description and preservation of all kinds of material, from old manuscripts and musical records, to statistical data and tv-programs.

PREMIS

In order to differentiate, group and structure all the above-mentioned different preservation formats and relations we are using PREMIS semantics. The subject of digital preservation, according to PREMIS, are objects representing units of information. Objects are divided into four subcategories: Intellectual Entity, Representation, File and Bitstream. These four objects stand, in turn, always in relation to one another, as pictured in Figure 1.

Premis Conceptual View

Figure 1: Premis Data Dictionary, s 9. (https://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf)

Short definitions of the these categories are as follows:

  • Intellectual Entity - intellectual or artistic creation
  • Representation - usable form of the Intellectual Entity
  • File - a set of bytes known to an operating system
  • BitStream - a set of bits within a File.

Description of a digital object

In order to describe which files constitute a digital object and how they are connected to each other, we use FilePackage and Representation. FilePackage has the purpose of specifying all the files a digital object consists of and their metadata while the Representation aims to describe the structure and order of how the files should be consumed. Every FilePackage and Representation are connected to an Intellectual Entity.

The Intellectual Entity is in our model equated to the digital entity in the Libris catalogue by National Library of Sweden (KB). Unlike the concrete manifestations of the creation presented by FilePackage and Representation, the Intellectual Entity is seen as independent of them, as the abstract Creation. It is referred to from FilePackage with a relatedTo property and from Representation with a representationOf property and presents the main source of external descriptive information on the creation.

Digital objects' description in our data model must consist of a FilePackage, and must consist of one or more Representations.

If the digital object has more than one Representation, they will be listed under a Representation-type object (see Codex Aureus example below).

FilePackage

We introduce FilePackage in order to fill a need for descriptive metadata about the information package as well as listing and describing the files that are part of the information package and its description.

FilePackage consist of Files and the metadata describing the file, e.g. encodingFormat, size etc.

Each File refers to the FilePackage it is included in with a describedBy property.

Type of relationship that can be expressed on a FilePackage:

  • includes - an unsorted list of one or more Files

Representation

A Representation describes the structural relationships between Representations, Files or Bitstreams necessary for using or consuming an Intellectual Entity.  A Representation is a single preserved digital instance of an Intellectual Entity.

An Intellectual Entity may consist of more than one Representation.

  • For instance, if we have a digital object in both tiff and jp2 formats, we are dealing with two separate Representations of the same Intellectual Entity. This enables the user to consume the Intellectual Entity with preferred encodingFormat (mimetype).

When encountering Intellectual Entities with more than one Representation, we have included them in a Representation-type object that does not, in its turn, refer to an Intellectual Entity with a representationOf property, but merely groups them, serving as an umbrella term with a hasPart relation listing different representations of the creation.

Type of relationship that can be expressed on a Representation:

  • hasPart - an unsorted list of two or more Representations of an Intellectual Entity, to avoid setting an order or hierarchy between different representations of a creation.

  • hasPartList - a sorted list of Representations that make out the Representation of an Intellectual Entity.

  • includes - an unsorted list of two or more Files that make out a Representation. Files are of different formats and it is the combination of both/all the files that makes out a complete Representation.

  • includesList - a sorted list of one or more Files and / or Bitstreams that make out the Representation of an Intellectual Entity. The Files, if more than one, are of the same format and are needed in their entirety for a successful representation. BitStreams are parts of a File, and are pointing to the File they are a part of with an includedIn relationship.

Examples

Examples are given below on how we incorporate the data model into our digital objects and serve to clarify the relationships between them.

The categories describing the objects that make out an information entity of our catalogues will be demonstrated using five examples, embodying different types of relationships necessary for understanding the data model.

FilePackage for the available examples can be retrieved by curl -X 'GET' 'https://data.kb.se/{package id}' -H 'accept: application/json')

Represention for the available examples can be retrieved by curl -X 'GET' 'https://data.kb.se/{package id}.representation' -H 'accept: application/json')

Lånestatistik från Kungliga biblioteket 1999

Statistical data on library loan data from National Library of Sweden in 1999, available at https://data.kb.se/dark-17709827.  Package id: dark-17709827.

Digital object consisting of:

  • csv-file
  • files with descriptive and structural metadata (e.g. METS metadata)

Digital object description consisting of:

  • FilePackage with an includes relationships consisting of a File in csv-format and the metadata Files
  • Representation with a includesList relationship consisting of Files in csv-format
La donna e mobile

A digitalized phonograph record from 1800s, available at https://data.kb.se/smdb-001451702. Package id: smdb-001451702.

Digital object consisting of:

  • audio file in mp3-format
  • file with descriptive and structural metadata

Digital object description consisting of:

  • FilePackage with an includes relationship consisting of a File in mp3-format and the metadata Files
  • Representation of the record with an includesList relationship consisting of a File in mp3-format
Aftonbladet 1873-07-11

A digital newspaper edition, available at https://data.kb.se/dark-100474. Package id: dark-100474.

Digital object consisting of:

  • image files in JP2-format
  • OCR files in alto-xml format
  • files with descriptive and structural metadata

Digital object description consisting of:

  • FilePackage with an includes relationship consisting of Files in JP2-format, Files in xml-form and the metadata Files
  • Representation of the newspaper edition with a hasPartList relationship consisting of:
    • Representation of a Page of the newspaper edition with an includes relationship consisting of a File in JP2-format and a File in alto-xml format representing the same newspaper page. The rendition of the page using both of the files is the complete Representation of the page.
Codex Aureus

A digital edition of a manuscript, available at https://data.kb.se/dark-17756808. Package id: dark-17756808.

Digital object consisting of:

  • image files in TIFF-format
  • image files in JP2-format
  • files with descriptive and structural metadata

Digital object description consisting of:

  • FilePackage with an includes relationship consisting of Files in JP2-format, Files in TIFF-format and the metadata Files
  • Representation with a hasPart relationship consisting of:
    • Representation of the manuscript with an includesList relationship consisting of Files in JP2-format
    • Representation of the manuscript with an includesList relationship consisting of Files in TIFF-format
TV-schedule for SVT1, 1/1/2001

A digital edition of the tv-schedule broadcast from 00:00 to 23:59. This is part of a planned web service and therefore not currently available as either data or an existing digital object model. However, since our information package model needs to be generic and useful for different types of digital object formats, we have begun preparations for this model.

Digital object consisting of:

  • twenty-four video Files in mp4-format, each one hour in duration
  • file with descriptive and structural metadata

Digital object description consisting of:

  • FilePackage with an includes relationship consisting of Files in mp4-format, metadata-files
  • Representation of the schedule broadcast with a hasPartList relationship consisting of:
  • Representations of a tv-program, part of the broadcast, with an includesList relationship consisting of one or more BitStreams or Files ... followed by Representations of rest of the tv-programs, part of the broadcast.

An example of a Representation limited to two broadcasted programs of a TV schedule can be seen below with "Nyheter" and "Jurassic Park" as part of the program.

The propertyrepresentationOf of each of the programs is in this example not linking to an Intellectual Entity in the Libris catalogue as it is yet unclear whether each unique broadcasted tv-segment would indeed be catalogued. As that specific scenario does not seem probable at this point of the model development, we chose to content ourselves with a title at this time.

{
    "@context": "https://id.kb.se/context.jsonld",
    "@id": "http://data.kb.se/{package id}.representation",
    "@type": "Representation",
    "meta": {"derivedFrom": {"@id": "https://data.kb.se/{package id}"}},
    "representationOf": {"@id": "http://libris.kb.se/{intellectual entity}"},
    "hasPartList": [
        {
            "@id": "{id}",
            "@type": "Representation",
            "representationOf": {"title": "Nyheter"},
            "startDate": "2001-01-01T00:00:00Z",
            "endDate": "2001-01-01T00:15:00Z",
            "includesList": [
                {
                    "@id": {id},
                    "@type": "Bitstream",
                    "startTime": "00:00:00",
                    "endTime": "00:15:00",
                    "includedIn": {
                        "@id": "{File id}"                    },
                }
            ],
        },
        {
            "@id": "{URI}",
            "@type": "Representation",
            "representationOf": {"title": "Jurassic Park"},
            "startDate": "2001-01-01T00:15:00Z",
            "endDate": "2001-01-01T02:05:00Z",
            "includesList": [
                {
                    "@id": {URI}",
                    "@type": "Bitstream",
                    "startTime": "00:15:00",
                    "endTime": "00:59:59",
                    "includedIn": {
                        "@id": "{File id}"
                    },
                },
                {
                    "@id": "{File id}"
                },
                {
                    "@id": "URI",
                    "@type": "Bitstream",
                    "startTime": "00:00:00",
                    "endTime": "00:05:00",
                    "includedIn": {
                        "@id": "{File id}"
                    },
                },
            ],
        },
    ],
}

Further Development

As our data model for information packages and digital objects is part of an ongoing work, it will necessarily undergo tweaking and updating processes in relation to it being used on a broader set of data. The aim is to achieve a model so general that no changes are necessary, but it is early to claim that the goal is reached.

In accordance with PREMIS, we aim in time to be able to describe and trace occurrences when the material gets updated with a supplement or a previously unavailable segment, while preserving all of its versions.

Disclaimer:

The statistical data example given above is in process of being remodelled. The curl- command will therefore not return the expected result just yet.