Skip to content

Cookbook: Templated Metadata Parser

Mark Jordan edited this page Jun 24, 2017 · 12 revisions

Overview

The Templated metadata parser creates MODS or DC XML from Twig templates. It differs from the InsertXmlFromTemplate metadata manipulator used with the CSV and CONTENTdm toolchains in that it generates an entire MODS or DC XML file, whereas the InsertXmlFromTemplate metadata manipulator only generates a single top-level MODS element.

The Templated metadata parser is a drop-in replacement for the mods\CsvToMods and mods\CdmToMods metadata parsers, and can be used with any toolchains those two metadata parsers can be. It does not use a mappings file, but instead inserts CSV or CONTENTdm metadata values directly into the template. This metadata parser has some advantages over mods\CsvToMods and mods\CdmToMods but it also has some limitations:

Parser Pros Cons
mods\CsvToMods and mods\CdmToMods Can use configure-and-run metadata manipulators Require mappings files, which can be tricky to create
mods\CsvToMods and mods\CdmToMods Use simple one-to-one mappings between source and output metadata structures
templated\Templated Avoids mappings files Cannot use configure-and-run metadata manipulators (other than the SimpleReplaceTemplated manipulator)
templated\Templated Allows the use of Twig's control structures and filters

Example usage

Using CSV input data like this:

Identifier,File,Title,Creator,Description
"image01","IMG_1410.JPG","Small boats in Havana Harbour on a sunney day","Jordan, Mark","Taken on vacation in Cuba."
"image02","IMG_2549.JPG","Manhatten Island","Jordan, Mark","Taken from the ferry from downtown New York to Highlands, NJ. Weather was windy."
"image03","IMG_2940.JPG","Looking across Burrard Inlet","Jordan, Mark","View from Deep Cove to Burnaby Mountain. Simon Fraser University is visible on the top of the mountain in the distance."
"image04","IMG_2958.JPG","Amsterdam waterfront in a picture","Jordan, Mark","Amsterdam waterfront on an overcast day."
"image05","IMG_5083.JPG","Alcatraz Island from Fisherman's Wharf","Jordan, Mark","2014-01-14","Taken from Fisherman's Wharf, San Francisco."

and a Twig template like this:

<?xml version="1.0"?>
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
  <titleInfo>
     <title>{{ Title }}</title>
  </titleInfo>
  <name type="personal">
    <namePart>{{ Creator }}</namePart>
    <role>
      <roleTerm type="text">photographer</roleTerm>
    </role>
  </name>
  <abstract>{{ Description }}</abstract>
  <identifier type="local" displayLabel="Local identifier">{{ Identifier }}</identifier>
  <typeOfResource>still image</typeOfResource>
</mods>

this metadata parser can generate MODS XML files like this:

<?xml version="1.0"?>
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
  <titleInfo>
     <title>Small boats in Havana Harbour on a sunney day</title>
  </titleInfo>
  <name type="personal">
    <namePart>Jordan, Mark</namePart>
    <role>
      <roleTerm type="text">photographer</roleTerm>
    </role>
  </name>
  <abstract>Taken on vacation in Cuba.</abstract>
  <identifier type="local" displayLabel="Local identifier">image01</identifier>
  <typeOfResource>still image</typeOfResource>
</mods>

Twig features such as control structures, functions, and whitespace control are available within the templates. For example, the template below uses Twig's if/elseif/else control structure, its length filter, its trim and slice functions, and a test for an empty string (elseif not Title|length).

<?xml version="1.0"?>
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">

  <titleInfo>
  {% if Title|length < 256 %}
     <title>{{ Title|trim }}</title>
  {% elseif not Title|length %}
     <title>[no title]</title>
  {% else %}
     <title>{{ Title|slice(0,255) | trim }} [...]</title>
  {% endif %}
  </titleInfo>

  <name type="personal">
    <namePart>{{ Creator }}</namePart>
    <role>
      <roleTerm type="text">photographer</roleTerm>
    </role>
  </name>
  <abstract>{{ Description }}</abstract>
  <identifier type="local" displayLabel="Local identifier">{{ Identifier }}</identifier>
  <typeOfResource>still image</typeOfResource>
</mods>

Using this metadata parser requires a couple of specific settings in your .ini file:

  • The [METADATA_PARSER] class value must be templated\Templated
  • Instead of the [METADATA_PARSER] mapping_csv_path setting used by the mods\CsvToMods and mods\CdmToMods metadata parsers, this one uses the [METADATA_PARSER] template setting, whose value is the path to the Twig template.
[FETCHER]
class = Csv
input_file = templated_metadata.csv
temp_directory = /tmp/templated_temp
record_key = Identifier

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; This is the only section of the .ini file that is
; specific to this metadata parser.
[METADATA_PARSER]
class = templated\Templated
template = templated_mods_twig.xml
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

[FILE_GETTER]
class = CsvSingleFile
input_directory = "/home/mark/Downloads/mik_tutorial_data"
temp_directory = /tmp/templated_temp
file_name_field = File

[WRITER]
class = CsvSingleFile
preserve_content_filenames = false
output_directory = /tmp/templated_output
datastreams[] = MODS

Metadata manipulators used by the mods\CsvToMods and mods\CdmToMods metadata parsers are not available to the templated\Templated metadata parser. The only metadata manipulator that is available is SimpleReplaceTemplated, which as the name suggests lets you perform simple search and replace operations on the generated XML. It is registered in the .ini file like this:

[MANIPULATORS]
metadatamanipulators[] = "SimpleReplaceTemplated|/Island/|Peninsula"

; This metadata manipulator logs its operations, so you must include the path
; to the log in your .ini file.
[LOGGING]
path_to_manipulator_log = /tmp/templated_output/manipulator.log

If you want to write a custom metadata manipulator that uses PHP's DOM interface (or SimpleXML, or XSLT) to modify the XML, SimpleReplaceTemplated can be used as a model. Your manipulator's ->manipulate() method would look this this:

    /**
     * @param string $input An XML file to be manipulated.
     *
     * @return string
     *     Manipulated string
     */
     public function manipulate($input)
     {
         // Manipulate the XML using DOM here.
     }

Put your manipulator class file in src/metadatamanipulators, run composer dump-autoload, and register the manipulator in your .ini file:

[MANIPULATORS]
metadatamanipulators[] = "MyCustomMetadataManipulator"

Cookbook table of contents

Clone this wiki locally