Skip to content
preserve-cat edited this page Mar 15, 2022 · 11 revisions

API Use

FITS_HOME is the directory containing the XML and tools directories. It can either be passed into the Fits() constructor, or it can be set as a system environment variable. All required FITS jars need to be on the classpath. The xml/nlnz directory also has to be added to the classpath.

The FITS XML can be retrieved from FitsOutput as a JDOM object: FitsOutput.getFitsXml()

Several convenience methods are provided. (This is not a complete list).

  • Test for an element with name: boolean hasElement = fitsOutput.hasMetadataElement("name");
  • Look up a single element by name: FitsMetadataElement element = fitsOutput.getMetadataElement("name");
  • Look up a list of all elements by name: List elements = fitsOutput.getMetadataElements("name");
  • Get a list of all elements in the FileInfo section: List elements = fitsOutput.getFileInfoElements();
  • Get a list of all elements in the FileStatus section: List elements = fitsOutput.getFileStatusElements();
  • Get the technical metadata type of the output (image, text, audio, document): String type = fitsOutput.getTechMetadataType();
  • Get a list of all elements in the technical metadata section: List elements = fitsOutput.getTechMetadataElements();
  • Test for conflicting metadata values by name: Boolean hasConflict = hasConflictingMetadataElements("name"); FITS_HOME is the directory containing the XML and tools directories. It can either be passed into the Fits() constructor, or it can be set as a system environment variable. All required FITS jars need to be on the classpath. The xml/nlnz directory also has to be added to the classpath.

FITS processing

  1. Configuration load
  • FITS_HOME environment variable is set up
  • Fits.xml configuration file is loaded
  • Tool wrappers are created
  • Output consolidator is configured.
  1. For each tool wrapper:
  • Each tool is executed on the input file creating a ToolOutput object containing a FITS xml document
  • For each tool a custom classloader is used to isolate this tool and its dependencies. This is especially important for Java-based tools where 3rd-party classes are invoked which might be different versions of the same classes used by other tools, each of which might have different public API's. For a technical implementation, see the Java class ParentLastClassLoader.java.
  • In addtion, each tool is started within its own thread.
  • If necessary, XSLT is applied to tool output to create the FITS-compatible xml
  • FITS mapping file is applied (xml/fits_xml_map.xml)
  1. Consolidation
  • Format identities are consolidated
  • the format tree (xml/fits_format_tree.xml) is consulted
  • Output from tools unable to identify the file or those who identified a less specific type are thrown out
  • Fileinfo sections are merged
  • Filestatus sections are merged
  • Metadata sections are merged
  1. Output
  • The consolidated fits xml file is written to a file or the console
  • If using the API a FitsOutput object is returned

Interfaces

The Tool interface defines the following methods:

  • ToolOutput extractInfo(File file) -- Called by FITS to invoke the tool against the input file. It must return a ToolOutput object containing valid FITS XML
  • boolean isIdentityKnown(FileIdentity identity) -- Logic for determining if the tool was able to identify the input file
  • ToolInfo getToolInfo() -- Returns a ToolInfo object describing the tool.
  • Boolean canIdentify() -- indicates whether or not the tool can identify file formats. This is important to know for output consolidation purposes.
  • void addExcludedExtension() -- Adds a file extension to specify that FITS should not use this tool wrapper to process files with that extension (set from xml/fits.xml tool definitions)
  • boolean hasExcludedExtension(String ext) -- Checks if the tool can process files having the provided file extension.

The ToolOutputConsolidator interface defines one method:

  • FitsOutput processResults(List results)

Classes implementing the ToolOutputConsolidator must accept a list of ToolOutput objects, merge them, and return a FitsOutput object.


Tool Addition

Any type of tool, whether it's based on Perl, Java, or something else entirely, can be added to FITS. Certain tools can extract technical metadata for files (e.g. Jhove, Exiftool, NLNZ ME), while others can only identify file formats (Droid, FFident, File Utility). In addition, different tools support different formats. Jhove and NLNZ ME support a small set of popular preservation formats, while Exiftool and File utility support a wide range.

A tool wrapper must be created for the tool that encapsulates the complexities of invoking the tool, capturing the output, and converting it to FITS XML. A tool wrapper must implement the Tool.java interface and extend the ToolBase.java base class. The implementing class has two options for its constructor. The first is a simple no-argument constructor. Alternatively, it's possible to create a constructor with Fits.java as its sole argument should the tool need access to data from within the Fits instance. In either case the constructor should call super() on ToolBase. These two alternatives are implemented via Java Reflection in ToolBelt.createToolClassInstance(...). See existing tools in the codebase to use as examples.

It is the responsibility of the tool wrapper to convert the tool output into FITS XML and return a valid ToolOutput object. ToolOutput must contain a valid FITS XML JDOM object.

If the tool depends on a specific operating system, the necessary checks should be made within the tool wrapper to prevent execution on incompatible systems. For example Exiftool is written in Perl. The Exiftool tool wrapper checks for the operating system type and whether or not Perl is installed. It then can decide if it should use the standard Perl version of Exiftool or the windows executable.

For tools that natively return XML, XSLT can be used to convert the output to FITS XML. For tools that do not return XML, the output can either be a) directly converted to FITS XML, or b) converted to a basic intermediate XML format and then converted using XSLT. It’s possible for tools to output conflicting data when they actually mean the same thing. For example, one tool could report the format of a PNG image as “Portable Network Graphics”, while another may report “PNG”. A tool could report a sampling frequency unit of “2”, while another may report the text string “inches”. If left alone, these would cause false positive conflicts to appear in the FITS consolidated output. These differences are converted in the XSLT that converts the native tool output into FITS XML. In general, FITS prefers text strings to numeric values (“inches” instead of “2”), and complete format names to abbreviations (“Portable Network Graphics” instead of “PNG”). If new tools or formats are being added to FITS then thorough testing should be done to ensure that any false positive conflicts are resolved. If a tool prefers to output numeric values then these can be converted either using either the FITS mapping file, or during the conversion process from the native tool output to FITS xml.

The ToolBase abstract class implements the Tool interface and provides methods for applying XSLT transforms, checking for unknown identities and excluded extensions. The current tool wrappers all extend ToolBase.

Each tool's output is validated against the local FITS XML schema (xml/fits_output.xsd) when the ToolOutput object is created.

Any new tool wrappers must be added to the xml/fits.xml configuration file. FITS handles initializing the wrapper and sending the input file to it. See FITS configuration files for how to configure a elements within the fits.xml file.

Note: Any Java-based tool should have its JAR files placed within a sub-directory of the ‘lib’ directory. This needs to be configured in fits.xml in the element. See FITS configuration files for how to add a tool to FITS.

If in addition to format identification the tool can be used for technical metadata extraction, additional steps need to be implemented:

  1. Decide if the new format fits into one of the already supported format genres: image, text, audio, document
  2. If it does not, request that the new metadata genre be added to the fits schemas.
  3. Decide if XSLT will be used to transform the tool output into FITS XML
  • If so, create an XSLT file to transform the technical metadata output into FITS XML.

  • If a XSLT to format mapping file exists, add the new format to it. For example: xml/exiftool/exiftool_xslt_map.xml

Clone this wiki locally