Skip to content

Commit

Permalink
Database harvester
Browse files Browse the repository at this point in the history
  • Loading branch information
josegar74 committed Jul 8, 2024
1 parent b5c29b8 commit 6b7fb4d
Show file tree
Hide file tree
Showing 20 changed files with 1,546 additions and 39 deletions.
43 changes: 43 additions & 0 deletions docs/manual/docs/user-guide/harvesting/harvesting-database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Database Harvesting {#database_harvester}

This harvesting type uses a database connection to harvest metadata stored in a database table.

## Adding a Database harvester

To create a Database harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `Database`:

![](img/add-database-harvester.png)

Providing the following information:

- **Identification**
- *Node name and logo*: A unique name for the harvester and optionally a logo to assign to the harvester.
- *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester.
- *User*: User who owns the harvested records.

- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester should be executed manually from the harvesters page. If enabled a schedule expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)).

- **Configure connection to Database**
- *Server*: The database server IP/Hostname.
- *Port*: The database port. For example, for Postgres usually 5432.
- *Database name*: The database name to connect.
- *Table name*: Table name with the metadata. The name must begin with a letter (a-z) or underscore (_). Subsequent characters in a name can be letters, digits (0-9), or underscores.
- *Metadata field name*: Table field name that contains the metadata. The name must begin with a letter (a-z) or underscore (_). Subsequent characters in a name can be letters, digits (0-9), or underscores.
- *Database type*: Database type. Currently supported Postgres and Oracle.
- *Remote authentication*: Credentials to connect to the database.

- **Search filter**: allows to define a simple field condition to filter the results.
- *Filter field*: Table field name used to filter the results. The name must begin with a letter (a-z) or underscore (_). Subsequent characters in a name can be letters, digits (0-9), or underscores.
- *Filter value*: Value to filter the results. It can contain wildcards (%).

- **Configure response processing for database**
- *Action on UUID collision*: When a harvester finds the same uuid on a record collected by another method (another harvester, importer, dashboard editor,...), should this record be skipped (default), overriden or generate a new UUID?
- *Validate records before import*: If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped.
- *XSL filter name to apply*: (Optional) The XSL filter is applied to each metadata record. The filter is a process which depends on the metadata schema (see the `process` folder of the metadata schemas).

It could be composed of parameter which will be sent to XSL transformation using the following syntax: `anonymizer?protocol=MYLOCALNETWORK:FILEPATH&[email protected]&thesaurus=MYORGONLYTHEASURUS`

- *Batch edits*: (Optional) Allows to update harvested records, using XPATH syntax. It can be used to add, replace or delete element.
- *Translate metadata content*: (Optional) Allows to translate metadata elements. It requires a translation service provider configured in the System settings.

- **Privileges** - Assign privileges to harvested metadata.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/manual/docs/user-guide/harvesting/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ The following sources can be harvested:
- [GeoPortal REST Harvesting](harvesting-geoportal.md)
- [THREDDS Harvesting](harvesting-thredds.md)
- [WFS GetFeature Harvesting](harvesting-wfs-features.md)
- [Z3950 Harvesting](harvesting-z3950.md)
- [Z3950 Harvesting](harvesting-z3950.md
- [Database Harvesting](harvesting-database.md)

## Mechanism overview

Expand Down
1 change: 1 addition & 0 deletions docs/manual/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,7 @@ nav:
- user-guide/harvesting/harvesting-webdav.md
- user-guide/harvesting/harvesting-wfs-features.md
- user-guide/harvesting/harvesting-z3950.md
- user-guide/harvesting/harvesting-database.md
- user-guide/export/index.md
- 'Administration':
- administrator-guide/index.md
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import org.fao.geonet.domain.AbstractMetadata;
import org.fao.geonet.domain.MetadataCategory;
import org.fao.geonet.kernel.DataManager;
import org.fao.geonet.kernel.GeonetworkDataDirectory;
import org.fao.geonet.kernel.SchemaManager;
import org.fao.geonet.kernel.datamanager.IMetadataManager;
import org.fao.geonet.kernel.harvest.harvester.AbstractHarvester;
Expand Down Expand Up @@ -199,4 +200,38 @@ public Element translateMetadataContent(ServiceContext context,
return md;
}


/**
* Filter the metadata if process parameter is set and corresponding XSL transformation
* exists in xsl/conversion/import.
*
* @param context
* @param md
* @param processName
* @param processParams
* @param log
* @return
*/
protected Element applyXSLTProcessToMetadata(ServiceContext context,
Element md,
String processName,
Map<String, Object> processParams,
org.fao.geonet.Logger log) {
Path filePath = context.getBean(GeonetworkDataDirectory.class).getXsltConversion(processName);
if (!Files.exists(filePath)) {
log.debug(" processing instruction " + processName + ". Metadata not filtered.");
} else {
Element processedMetadata;
try {
processedMetadata = Xml.transform(md, filePath, processParams);
log.debug(" metadata filtered.");
md = processedMetadata;
} catch (Exception e) {
log.warning(" processing error " + processName + ": " + e.getMessage());
}
}
return md;
}


}
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,9 @@
import org.fao.geonet.exceptions.UnknownHostEx;
import org.fao.geonet.kernel.DataManager;
import org.fao.geonet.kernel.MetadataIndexerProcessor;
import org.fao.geonet.kernel.datamanager.IMetadataIndexer;
import org.fao.geonet.kernel.datamanager.IMetadataManager;
import org.fao.geonet.kernel.datamanager.IMetadataSchemaUtils;
import org.fao.geonet.kernel.datamanager.IMetadataUtils;
import org.fao.geonet.kernel.harvest.Common.OperResult;
import org.fao.geonet.kernel.harvest.Common.Status;
Expand Down Expand Up @@ -128,6 +130,8 @@ public abstract class AbstractHarvester<T extends HarvestResult, P extends Abstr
protected DataManager dataMan;
protected IMetadataManager metadataManager;
protected IMetadataUtils metadataUtils;
protected IMetadataSchemaUtils metadataSchemaUtils;
protected IMetadataIndexer metadataIndexer;

protected P params;
protected T result;
Expand Down Expand Up @@ -172,6 +176,8 @@ protected void setContext(ServiceContext context) {
this.harvesterSettingsManager = context.getBean(HarvesterSettingsManager.class);
this.settingManager = context.getBean(SettingManager.class);
this.metadataManager = context.getBean(IMetadataManager.class);
this.metadataSchemaUtils = context.getBean(IMetadataSchemaUtils.class);
this.metadataIndexer = context.getBean(IMetadataIndexer.class);
}

public void add(Element node) throws BadInputEx, SQLException {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//=============================================================================
//=== Copyright (C) 2001-2007 Food and Agriculture Organization of the
//=== Copyright (C) 2001-2024 Food and Agriculture Organization of the
//=== United Nations (FAO-UN), United Nations World Food Programme (WFP)
//=== and United Nations Environment Programme (UNEP)
//===
Expand Down Expand Up @@ -311,7 +311,7 @@ private void addMetadata(RecordInfo ri, String uuidToAssign) throws Exception {
// use that uuid (newMdUuid) for the new metadata to add to the catalogue.
String newMdUuid = null;
if (!params.xslfilter.equals("")) {
md = processMetadata(context, md, processName, processParams);
md = applyXSLTProcessToMetadata(context, md, processName, processParams, log);
schema = dataMan.autodetectSchema(md);
// Get new uuid if modified by XSLT process
newMdUuid = metadataUtils.extractUUID(schema, md);
Expand Down Expand Up @@ -465,7 +465,7 @@ boolean updatingLocalMetadata(RecordInfo ri, String id, Boolean force) throws Ex

boolean updateSchema = false;
if (!params.xslfilter.equals("")) {
md = processMetadata(context, md, processName, processParams);
md = applyXSLTProcessToMetadata(context, md, processName, processParams, log);
String newSchema = dataMan.autodetectSchema(md);
updateSchema = !newSchema.equals(schema);
schema = newSchema;
Expand All @@ -485,9 +485,11 @@ boolean updatingLocalMetadata(RecordInfo ri, String id, Boolean force) throws Ex
metadata.getHarvestInfo().setUuid(params.getUuid());
metadata.getSourceInfo().setSourceId(params.getUuid());
}

if (updateSchema) {
metadata.getDataInfo().setSchemaId(schema);
}

metadataManager.save(metadata);
}

Expand Down Expand Up @@ -619,36 +621,6 @@ private boolean foundDuplicateForResource(String uuid, Element response) {
return false;
}

/**
* Filter the metadata if process parameter is set and corresponding XSL transformation
* exists in xsl/conversion/import.
*
* @param context
* @param md
* @param processName
* @param processParams
* @return
*/
private Element processMetadata(ServiceContext context,
Element md,
String processName,
Map<String, Object> processParams) {
Path filePath = context.getBean(GeonetworkDataDirectory.class).getXsltConversion(processName);
if (!Files.exists(filePath)) {
log.debug(" processing instruction " + processName + ". Metadata not filtered.");
} else {
Element processedMetadata;
try {
processedMetadata = Xml.transform(md, filePath, processParams);
log.debug(" metadata filtered.");
md = processedMetadata;
} catch (Exception e) {
log.warning(" processing error " + processName + ": " + e.getMessage());
}
}
return md;
}

/**
* Retrieves the list of metadata uuids that have the same dataset identifier.
*
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
//=============================================================================
//=== Copyright (C) 2001-2024 Food and Agriculture Organization of the
//=== United Nations (FAO-UN), United Nations World Food Programme (WFP)
//=== and United Nations Environment Programme (UNEP)
//===
//=== This program is free software; you can redistribute it and/or modify
//=== it under the terms of the GNU General Public License as published by
//=== the Free Software Foundation; either version 2 of the License, or (at
//=== your option) any later version.
//===
//=== This program is distributed in the hope that it will be useful, but
//=== WITHOUT ANY WARRANTY; without even the implied warranty of
//=== MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
//=== General Public License for more details.
//===
//=== You should have received a copy of the GNU General Public License
//=== along with this program; if not, write to the Free Software
//=== Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
//===
//=== Contact: Jeroen Ticheler - FAO - Viale delle Terme di Caracalla 2,
//=== Rome - Italy. email: [email protected]
//==============================================================================

package org.fao.geonet.kernel.harvest.harvester.database;

import org.fao.geonet.Logger;
import org.fao.geonet.kernel.harvest.harvester.AbstractHarvester;
import org.fao.geonet.kernel.harvest.harvester.HarvestResult;

import java.sql.SQLException;

public class DatabaseHarvester extends AbstractHarvester<HarvestResult, DatabaseHarvesterParams> {
private static final String TABLE_NAME_PATTERN = "([_a-zA-Z]+[_a-zA-Z0-9]*)";
private static final String FIELD_NAME_PATTERN = "([_a-zA-Z]+[_a-zA-Z0-9]*)";

@Override
protected DatabaseHarvesterParams createParams() {
return new DatabaseHarvesterParams(dataMan);
}

@Override
protected void storeNodeExtra(DatabaseHarvesterParams params, String path, String siteId, String optionsId) throws SQLException {
// Remove non-valid characters
params.setTableName(params.getTableName().replaceAll("[^" + TABLE_NAME_PATTERN + "]", ""));
params.setMetadataField(params.getMetadataField().replaceAll("[^" + FIELD_NAME_PATTERN + "]", ""));
params.setFilterField(params.getFilterField().replaceAll("[^" + FIELD_NAME_PATTERN + "]", ""));

setParams(params);

harvesterSettingsManager.add("id:" + siteId, "icon", params.getIcon());
harvesterSettingsManager.add("id:" + siteId, "server", params.getServer());
harvesterSettingsManager.add("id:" + siteId, "port", params.getPort());
harvesterSettingsManager.add("id:" + siteId, "username", params.getUsername());
harvesterSettingsManager.add("id:" + siteId, "password", params.getPassword());
harvesterSettingsManager.add("id:" + siteId, "database", params.getDatabase());
harvesterSettingsManager.add("id:" + siteId, "databaseType", params.getDatabaseType());
harvesterSettingsManager.add("id:" + siteId, "tableName", params.getTableName());
harvesterSettingsManager.add("id:" + siteId, "metadataField", params.getMetadataField());
harvesterSettingsManager.add("id:" + siteId, "xslfilter", params.getXslfilter());

String filtersID = harvesterSettingsManager.add(path, "filter", "");
harvesterSettingsManager.add("id:" + filtersID, "field", params.getFilterField());
harvesterSettingsManager.add("id:" + filtersID, "value", params.getFilterValue());
}

@Override
protected void doHarvest(Logger l) throws Exception {
log.info("Database harvester start");
DatabaseHarvesterAligner h = new DatabaseHarvesterAligner(cancelMonitor, log, context, params);
result = h.harvest(log);
log.info("Database harvester end");
}
}
Loading

0 comments on commit 6b7fb4d

Please sign in to comment.