This software tool allows for validation of large numbers of metadata records using the API of the INSPIRE Reference Validator. It was developed to support INSPIRE Monitoring & Reporting activities. The tool was built with Pentaho Data Integration Community Edition platform which is required to use it.
- One or more instances of INSPIRE Reference Validator latest release.
- Pentaho Data Integration (PDI) Community Edition (CE), suggested PDI CE version is 9.0 or 8.2, (8.3 suffers from JSON Input step performance deterioration and is not recommended). In case of slow download click "Problems downloading?" and try an alternative download mirror.
- Apache HttpClient components 4.5.12.
- Source metadata compiled according to the INSPIRE Technical Guidelines (TG) version 2.0 and available as XML files with single metadata record per file.
- Unzip PDI,
- copy all .jar files from Apache HttpClient to your PDI lib folder,
- copy inspire-validator.jar to your PDI lib folder,
- in validation.bat insert the path to your PDI data-integration folder.
In pdi/config.properties update the following items:
endpoint
- endpoint id, used to create file- and folder- names [use only characters valid for a filename],source_folder
- folder where source metadata are located (including subfolders) [use forward slashes "/" in the path],results_folder
- folder where results will be written [use forward slashes "/" in the path],source_suffix
- source metadata files suffix, used to filter the files to validate,validator_nodes
- number of validator instances to use,validator_url_X
needs to be provided for each instance,validator_url_X
- URLs for each validator instance, up to "/v2/" [http://.../v2/],authorization_token
- authorization token to include in the header of "TestRuns" validator API POST request,queue_max_size
- maximum number of test runs that can be run in parallel on each validator instance.
Run validation.bat script, it will perform preprocessing, validation and results generation as described below:
- Preprocessing:
- read all files with the given <source_suffix> located in <source_folder> (including subfolders) that were not validated before;
- identify records with missing or unknown type;
- identify duplicate records using MD5 hash values;
- create <endpoint>.md.json metadata summary (after completed preprocessing of all records).
- Validation:
- validate each record using <validator_nodes> number of instances of the INSPIRE Reference Validator with <validator_url_X> URLs; 3 different conformance classes (as specified in the configuration file) are used for the validation of:
- data sets and data set series,
- network services,
- invocable spatial data services (identified by the value other for serviceType XML element);
- save validation reports for each record in <results_folder>/<endpoint> folder:
- the subfolder structure of <source_folder> is preserved,
- filenames correspond to those of source metadata with <source_suffix> removed,
- each report is saved in two versions: .html and .json;
- add results for each record to CSV results <endpoint>.csv, detailed below.
- validate each record using <validator_nodes> number of instances of the INSPIRE Reference Validator with <validator_url_X> URLs; 3 different conformance classes (as specified in the configuration file) are used for the validation of:
- Results:
In case the validation does not complete for all source metadata (due to errors, user interruption, etc.), when the transformation is run for the same endpoint again, it will continue processing source metadata that were not processed before, hence are not included in CSV results. To re-validate an endpoint that was validated before, the CSV results file needs to be renamed or moved out of the results folder.
Alternatively, the procedure can be run from the PDI user interface (Spoon) which provides more control and feedback, and allows for modifications. For this purpose run Spoon.bat, open and run pdi/validation.kjb job.
All result files are saved in <results_folder>:
- <endpoint> - folder where validation reports for each metadata record are saved,
- <endpoint>.md.json - source metadata summary,
- <endpoint>.csv - validation results for each metadata record, detailed below,
- <endpoint>.json - validation results summary and source metadata summary,
- <endpoint>.services.zip - validation reports for service metadata records that failed validation,
- <endpoint>.dataset.zip - validation reports for dataset, series, missing and unkown metadata records that failed validation,
- validation.csv - validation results summary and source metadata summary for each validation run.
File 2 is produced only after completed preprocessing of all metadata records.
Files 4, 5, 6 and 7 are produced/updated only after completed validation of all metadata records.
file_id
- identifies source metadata file and validation reports,md_id
- metadata identifier (from source XML),type
- metadata type (from source XML),result
- validation result,MD5
- metadata file hash value, used to detect duplicates,error_count
- number of failed assertions,errors
- ids of failed assertions.
The metadata Conformity Indicators MDi1.1 and MDi1.2 can be calculated by dividing the number of passed data sets metadata and the number of passed service metadata found in the validation results summary (JSON file) by, respectively, the total number of available data sets (indicator DSi1.1) and the total number of available services (indicator DSi1.2) retrieved from the Harvest Console (see Article 4 of ID M&R below), i.e.:
MDi1.1 = dataset_metadata_passed / DSi1.1
MDi1.2 = service_metadata_passed / DSi1.2
If you experience any issue in the setup and/or use of the software, please open an issue in the INSPIRE Validator helpdesk.
This software tool was developed with contributions by:
This work was supported by the Interoperability solutions for public administrations, businesses and citizens programme through Action 2016.10: European Location Interoperability Solutions for e-Government (ELISE).
Copyright 2020 EUROPEAN UNION
Licensed under the EUPL, Version 1.2 or - as soon as they will be approved by the European Commission - subsequent versions of the EUPL (the "Licence").
You may not use this work except in compliance with the Licence.
You may obtain a copy of the Licence at:
https://ec.europa.eu/isa2/solutions/european-union-public-licence-eupl_en
Unless required by applicable law or agreed to in writing, software distributed under the Licence is distributed on an "AS IS" basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the Licence for the specific language governing permissions and limitations under the Licence.
Date: 2020/06/08
Authors: European Commission, Joint Research Centre - [email protected]