Skip to content

201411HierarchicalFacetSupport

jonescc edited this page Dec 9, 2014 · 22 revisions
Date November, 2014 Contacts Craig Jones
Status Approved Release 2.12
Resources Resources available Ticket # 679
Source code Pull request
Funding Integrated Marine Observing System

Overview

GeoNetwork currently supports simple faceted searching only - faceted searching can be performed on terms marked up in the metadata or on simple translations of these terms. This proposal seeks to add back-end support for more advanced faceted searching. In particular this proposal seeks to add:

  • indexing of multi-level category paths generated using a hierarchical classification process
  • search summaries broken down hierachically according the indexed category hierarchy
  • drilling down on a path or paths through the hierarchy as a part of a search request

It includes support for:

  • hierarchical classification using classification schemes loaded into GeoNetwork (using broader term relationships)
  • plugging in custom classification methods using spring
  • language translation for categories sourced from GeoNetwork classification schemes

This proposal does not include:

  • user interface support for display of hierarchical facet summaries/drill-down. The existing simple faceted searching functionality (indexing, summarisation and search) is still supported and no changes to the GeoNetwork client code have been made or are required to continue using this functionality.
  • hierarchical facet drilldown or summaries as a custom extension to the CSW service

An example application utilising this support in a customised 2.10 instance can be found at https://imos.aodn.org.au/imos123/home . Refer to the faceted search for collections in step 1. The Measured parameter facet consists of over a hundred possible parameters. Previously these were all displayed as single long list of possible selections making it difficult to find the relevant option or an area of interest such as temperature.

Technical Details:

Configuration

Configuration of available facets and the content of summary responses is now specified separately. Configuration is still performed in WEB-INF/config-summary.xml although custom spring elements are now utilised.

Facet Configuration

Available facets, how they should be indexed and default formatting options are now defined in a facets element. Each available facet is configured in facet children using the following attributes:

attribute description
name the name of the facet
indexKey the name of a field returned for indexing
label a label to use for the facet (used in response summaries)
classifier (optional) a reference to a classifier to use to generate facet values (see Configuring Classifiers below). The default is a classifier that simply returns the value of the indexKey for indexing.

Example:

<facets>
  <facet name="keyword" label="keywords" indexKey="keyword"/>
  <facet name="createDateYear" label="createDateYears" indexKey="createDateYear"/>
  ...
  <facet name="parameter" label="Parameter" indexKey="longParamName" classifier="parameterClassifier"/>
  <facet name="platform" label="Platform" indexKey="platform" classifier="platformClassifier"/>
</facets>

Summary Types

The content and format of predefined summary types are now configured using the summaryTypes element.

Each available summary type is configured in summaryType children using the following attributes:

attribute description
name the name of the summary type
format (optional) the format to use in the response. Default is the current format ('FACET_NAME'). 'DIMENSION' can also be specified, refer to Dimension Response Format below for more details

Each facet included in the summary type is configured in item children using the following attributes:

attribute description
facet the name of the facet to include
sortBy (optional) the ordering for the facet. Default is by count.
sortorder (optional) asc or desc. Default is descending.
max (optional) the number of values to be returned for the facet. Default is 10.
depth (optional) the maximum depth to which sub categories should be returned. Default is 1. Other values only make sense for multi-level facets
translator (optional) code of translator to use to translate facet values into language specific labels. See language translation below for configuring the new translator included in this proposal for classification scheme terms.

Example:

<summaryTypes>
  <summaryType name="hits">
    <item facet="keyword" max="15"/>
    <item facet="inspireTheme" sortBy="value" sortOrder="asc" max="35"/>
    ...
  </summaryType>
  <summaryType name="hierarchical_facets" format="DIMENSION">
    <item facet="parameter" max="10" depth="3" translator="term:http://vocab.aodn.org.au/classificationSchemes/parameterDiscovery"/>
    <item facet="organisation" max="10"/>
    <item facet="platform" max="10" depth="2" translator="term:http://vocab.aodn.org.au/classificationSchemes/platformDiscovery"/>
  </summaryType>
</summaryTypes>

Dimension Response Format

A new summary response format has been added for summarising hierarchical facet counts. This format can be used to simplify creation of drill down search requests. It can be configured for a service by specifying 'DIMENSION' for the format attribute of the summaryType element used for that service as shown above for the "hierarchical_facets" summaryType.

Example response for this format:

<response from="1" to="5" selected="0">
  <summary count="5" type="local">
    <dimension name="regionKeyword" label="Region">
      <category value="http://geonetwork-opensource.org/regions#country" label="country" count="5">
        <category value="http://geonetwork-opensource.org/regions#10" label="Australia" count="2" />
        <category value="http://geonetwork-opensource.org/regions#181" label="Zimbabwe" count="1" />
        <category value="http://geonetwork-opensource.org/regions#1220" label="All fishing areas" count="1" />
        <category value="http://geonetwork-opensource.org/regions#68" label="France" count="1" />
      </category>
      <category value="http://geonetwork-opensource.org/regions#ocean" label="ocean" count="1">
        <category value="http://geonetwork-opensource.org/regions#1220" label="All fishing areas" count="1" />
      </category>
    </dimension>
  </summary>
  <metadata>
  ...

Drilldown Search Request

A new search parameter 'facet.q' has been added that allows drill down queries to be added to a search request. A drill down path is constructed as follows:

<dimension_name>{"/"<category_value>}

For example to drill down on the country category above:

http://localhost:8080/geonetwork/srv/eng/xml.search.facet?fast=index&from=1&to=50&facet.q=regionKeyword/http%253A%252F%252Fgeonetwork-opensource.org%252Fregions%2523category

Note that drill down paths use '/' as the separator between categories in the path, so embedded '/' characters in categories should be escaped using %2F or alternatively, each category in the path url encoded in addition to normal parameter encoding. For example to drill down on Australia above:

http://localhost:8080/geonetwork/srv/eng/xml.search.facet?fast=index&from=1&to=50&facet.q=regionKeyword/http%253A%252F%252Fgeonetwork-opensource.org%252Fregions%2523country%2Fhttp%253A%252F%252Fgeonetwork-opensource.org%252Fregions%252310

Multiple drill down queries can be specified by providing multiple facet.q parameters or by combining drill down queries in one facet.q parameter using '&' appropriately encoded.

Configuring Classifiers

Classifiers implement the org.fao.geonet.search.classifier.Classifier interface which has one method:

public List<CategoryPath> classify(String value);

They take the value of an index field provided to the GeoNetwork indexing engine and return a list of category paths that should be indexed for that value.

Classifiers are configured using spring bean configuration e.g.

<bean id="regionKeywordClassifier" class="org.fao.geonet.kernel.search.classifier.TermLabel" lazy-init="true">
    <constructor-arg name="finder" ref="ThesaurusManager"/>
    <constructor-arg name="conceptScheme" value="http://geonetwork-opensource.org/regions"/>
    <constructor-arg name="langCode" value="eng"/>
</bean>

The bean reference is used when configuring the facet to use this classifier:

  <facet name="region" label="regions" indexKey="region" classifier="regionKeywordClassifier"/>

Note: the above assumes region keywords are passed to the indexing engine using the region field.

Provided Classifiers

Four classifiers are included in this proposal:

class description
org.fao.geonet.search.classifier.Value the default classifier - returns a single category path containing one category - the value passed
org.fao.geonet.search.classifier.TermLabel returns a list of category paths created by looking up broader terms for value passed in a classification scheme. The value passed is assumed to be the preferred label of a term in the classification scheme.
org.fao.geonet.search.classifier.TermUri returns a list of category paths created by looking up broader terms for value passed in a classification scheme. The value passed is assumed to be the identifier (URI) of a term in the classification scheme.
org.fao.geonet.search.classifier.Split returns a category path containing categories created by splitting the passed value using a regular expression.

Note: TermLabel and TermUri classifiers may return multiple category paths if there there are many possible parent paths (e.g. a term or terms in the parent hierarchy has more than one parent).

As an example, looking up the 'Practical salinity of the water body' in a parameter thesaurus using a TermLabel classifier may return the following category path:

http://vocab.aodn.org.au/def/ClassScheme/parameter1/Category/56,
http://vocab.aodn.org.au/def/ClassScheme/parameter1/Category/50,
http://vocab.nerc.ac.uk/collection/P01/current/PSLTZZ01

Or using preferred labels in english: Physical-Water, Salinity, Practical salinity of the water body

Language Translation

This proposal includes a term URI language translator for translating URI categories returned by the TermUri and TermLabel classifiers to the detected or requested language for the search response. The translator is specified using a 'term:' prefix on the translator specification and the identifier (URI) of the classification scheme to use to lookup labels.

For example, to return labels in French for region keywords indexed above use the following in the summaryType configuration for the service:

<item facet="regionKeyword" translator="term:http://geonetwork-opensource.org/regions"/>

TermLabel Classifier Constructor Arguments

Arguments Type Description
finder org.fao.geonet.kernel.ThesaurusFinder the thesausus finder to use to find the classification scheme
conceptScheme java.lang.String the identifier (URI) of the classification scheme to be used for term classification
langCode java.lang.String the language of preferred labels passed to the classify method

TermUri Classifier Constructor Arguments

Arguments Type Description
finder org.fao.geonet.kernel.ThesaurusFinder the thesausus finder to use to find the classification scheme
conceptScheme java.lang.String the identifier (URI) of the classification scheme to be used for term classification

Proposal Type:

  • Type:
  • Module:

Voting History

  • Vote Proposed: 2/12/14
  • +1 Patrizia, +1 Francois, +1 Jose, +1 Jesse

Participants

  • Craig Jones
  • Angus Scheibner
Clone this wiki locally