Skip to content

Metadata manipulator: SimpleReplace

Mark Jordan edited this page Apr 4, 2017 · 13 revisions

Overview

This metadata manipulator performs simple search and replace on MODS data.

Toolchains

Can be used within any toolchain (i.e., is not specific to CONTENTdm CSV, etc.) that uses a MODS metadata parser.

Configuration

To register this manipulator in your toolchain, add an entry similar to the following to the "[MANIPULATORS]" section of your .ini file. The manipulator's configuration signature is

metadatamanipulators[] = "SimpleReplace|/pattern/|replacement text"

For example, to replace the word "Page" with the word "Part" if it immediately follows the MODS markup <title>, use this configuration:

metadatamanipulators[] = "SimpleReplace|/<title>Page/|<title>Part"

Some additional examples include:

  • metadatamanipulators[] = "SimpleReplace|/<roleTerm\stype=\"text\">photographer/|<roleTerm type=\"text\">Creator"
  • metadatamanipulators[] = 'SimpleReplace|/<identifier\stype="local"\sdisplayLabel="Local\sidentifier">image(\d\d)<\/identifier>/|<identifier type="local" displayLabel="Local number">Number $1</identifier>'

Because MIK uses the pipe (|) as a delimiter between manipulator parameters, that character cannot be used in patterns. To work around this, you can apply multiple instances of the SimpleReplace manipulator to account for multiple replacement matches. For example, if you wanted to replace both "TO" and "toronto" with "Toronto", you would use this configuration:

metadatamanipulators[] = "SimpleReplace|/TO/|Toronto"
metadatamanipulators[] = "SimpleReplace|/toronto/|Toronto"

A common use for this manipulator is to remove characters you don't want to show up in your Islandora MODS files. For example, if you have migrated your metadata from a legacy database, you may have Unicode replacement characters () in your data. You can use the following regular expression to replace the character with nothing:

metadatamanipulators[] = "SimpleReplace|/\x{FFFD}/u|"

Parameters

This manipulator takes two parameters:

  • The first parameter (required) is the pattern to match on. This pattern is a PHP Perl Compatible Regular Expression, without any leading or trailing quotation marks.
  • The second parameter (optional) is the replacement text. If you want to remove the text captured by the regular expression, omit the second parameter, e.g., "SimpleReplace|/foo/|".

Backreferences (e.g., $1, $2, etc.) are allowed.

Functionality

Unlike other metadata manipulators that apply to MODS toolchains, this metadata manipulator does not manipulate the PHP DOM. Instead, it performs preg_replace() operations directly to the incoming XML fragment. Therefore, to make matches (and corresponding replacements) as precise as possible, you should include in your pattern any contextual string data that will limit the search and replace to only the elements you want to modify. In the example above, Page will only be replaced with Part if it occurs immediately following the <title> markup.

If this manipulator modifies an XML fragment, it writes an entry to the manipulator log indicating the record key plus the before and after fragment, like this:

[2016-09-20 14:53:02] config.INFO: SimpleReplace {"Record key":"90","Input":"<titleInfo><title>Page 1</title></titleInfo>","Modified version":"<titleInfo><title>Part 1</title></titleInfo>"} []
Clone this wiki locally