Skip to content

File getter manipulator: FilterCdmNewspaperMasterPaths

Marcus Emmanuel Barnes edited this page Feb 4, 2016 · 3 revisions

Overview

This file getter manipulator filters paths to the master (OBJ) files for CONTENTdm newspapers.

Toolchains

This metadata manipulator applies to the CONTENTdm newspapers toolchain.

Configuration

To register this manipulator in your toolchain, add the following line to the "[MANIPULATORS]" section of your .ini file:

filegettermanipulators[] = "FilterCdmNewspaperMasterPaths|out|foo|baz"

Note that the [FILE_GETTER] section of your .ini file must contain at least one 'input_directories[]' entry, e.g.,

input_directories[] = "/path/to/master_tiffs"

Parameters

This manipulator takes at least two parameters:

  1. 'in' or 'out'. This parameter indicates the 'direction' of the filter. 'in' filters paths into the list of paths that may contain master files, and 'out' filters paths out of the list.
  2. A pipe-separated list of patterns that the manipulator will test each path against, without the leading an trailing regex delimters. These patterns cannot contain pipes.

Functionality

The purpose of this file getter manipulator to provide fine-grained control over the size of the list of all possible paths that MIK is to use for master files for CONTENTdm newspaper pages. The smaller this list, the faster MIK will process the newspapers.

For 'out' filtering this manipulator removes paths below the directory defined in the CONTENTdm newspaper toolchain's [FILE_GETTER] input_directories[] setting. For example, if you had the following in your .ini file:

[FILE_GETTER]
input_directories[] = "/data/master_tiffs"
[MANIPULATORS]
filegettermanipulators[] = "FilterCdmNewspaperMasterPaths|out|qa_bad"

This manipulator would tell MIK not to look in "/data/master_tiffs/january/qa_bad" for master files.

For 'in' filtering this manipulator adds paths below the directory defined in the CONTENTdm newspaper toolchain's [FILE_GETTER] input_directories[] setting. For example, the following in your .ini file:

[FILE_GETTER]
input_directories[] = "/data/master_tiffs"
[MANIPULATORS]
filegettermanipulators[] = "FilterCdmNewspaperMasterPaths|in|qa_good

will tell MIK to only look in "/data/master_tiffs/january/qa_good", "/data/master_tiffs/february/qa_good", etc. for master files. Note that 'in' filters achieve the same outcome as registering multiple input_directories[]; the following can be used instead of the previous combination of .ini settings:

[FILE_GETTER]
input_directories[] = "/data/master_tiffs/january/qa_good"
input_directories[] = "/data/master_tiffs/february/qa_good"
Clone this wiki locally