Historically metadata type documents supported fairly unrestricted metadata mapping schemes
For EO3 documents, their function is limited to defining custom search fields that the active index driver maintains optimised search indexes for.
A metadata type document is a YAML or JSON document that conforms to the EO3 Metadata Type JSON Schema at:
https://github.com/opendatacube/eo3/blob/develop/eo3/schema/metadata-type-schema.yaml
The top level of a Metadata Type Document consists of a name and description and a dataset section.
name: eo3_minimal
description: Minimal EO3 compatible
dataset:
...
name
cannot contain whitespace or punctuation - alphanumeric characters (or underscores)
only. Name is required and must be unique within a given index.
description
is a string. It is required but may have any value.
As of datacube-1.8.x, most of the contents of dataset
are no longer used by ODC, the only portion
still used is the search_fields
section described below. The remainder of this document is
required by the schema but mostly ignored by the ODC and/or assumed to have the following
canonical values:
dataset:
id: [id]
sources: [lineage, source_datasets]
grid_spatial: [grid_spatial, projection]
measurements: [measurements]
creation_dt: [properties, 'odc:processing_datetime']
label: [label]
format: [properties, 'odc:file_format']
id
, and sources
, are enforced to exist in the 1.8.x schema but are not used. In 1.9 they
will become optional in the schema, then later deprecated, then dropped from the schema in 2.0.
label
and creation_dt
are enforced to exist in the 1.8.x schema but must match the above
values for EO3 compatibility. In 1.9 these values will become optional in the schema, defaulting
to the above values, then later deprecated, then dropped from the schema in 2.0.
format
, grid_spatial
, and measurements
are optional.
For an EO3-compliant geospatial metadata type these fields must all be present and have the values shown above.
In 1.9 these values will become optional in the schema, defaulting to the above values, then later deprecated, then dropped from the schema in 2.0.
The legacy postgres ODC index driver in datacube 1.8 supports both EO3 and non-EO3 metadata types and also supports both geospatial and non-geospatial metadata types, but "EO3-compatible" is largely assumed to imply a geo-spatial metadata type.
With support for non-EO3-compatible metadata types being dropped in datacube-2.0, support for non-geospatial metadata types will also be vanish.
New structures may be introduced at a later date to support EO3-compliant non-geospatial metadata types (e.g. EO3 telemetry or for non-raster geospatial data).
Any such future extension will be optional and may not be supported by all index drivers.
The search_fields
section contains a collection of search fields. The index driver is responsible for ensuring that
efficient search queries can performed against all declared search fields.
The search_fields
section is a dictionary (i.e. an associative array, or an "object" in json terminology) the keys
being the names of the search fields. Search field names can only contain alphanumeric characters and underscores.
The values of the search_fields
dictionary are a section that may contain the following fields:
An optional free-text description of the search field. For informational purposes only.
indexed
is a an optional boolean field that defaults to True. If False, the field is not indexed by the index
driver (i.e. a search field that cannot be searched.)
Indexed may be deprecated/required to be True in future releases.
type
is an optional string field that describes the data type of the search field. If not specified, type
defaults
to "string"
. The allowed values for type are:
Scalar types:
- string
- double
- integer
- numeric
- datetime
Range types:
- double-range
- integer-range
- numeric-range
- float-range
- datetime-range
float-range
is a synonym for numeric-range
and may be deprecated and removed in future releases.
Some index drivers may treat some combination of integer, double and numeric types as interchangable for indexing purposes.
A search field with a scalar types must have an offset (and may not have a min_offset or max_offset).
A search field with a range types must have a min_offset and max_offset (and may not have an offset).
An offset is a sequence of one or more sequences of strings and describes where the value for that search field can be found in a Dataset document. Range type search fields have two offsets: one for the lower limit of the range and one for the upper limit of the range.
Each sequence of strings in an offset represents the keys to finding the search field in the dataset metadata document. If multiple such sequences is specified, they checked in order, falling back to the second set of keys if the first does not exist in the dataset document, and so on.
For EO3 compatibility the following restrictions apply to offsets:
- The first offset element MUST be
"properties"
. - The offset can only be two elements long.
- The second offset element must be a series of alphanumeric (plus underscore) only strings, separated by colons, e.g. "eo:instrument", "odc:file_format", etc.
I.e. all search offsets must be stored in dataset documents below "properties" with no nesting.
For historical reasons these restrictions are not enforced on some search fields:
1. lat and lon
For EO3 compatibility the "lat" and "lon" search fields MUST have the following values:
search_fields:
lon:
description: Longitude range
type: double-range
min_offset:
- [extent, lon, begin]
max_offset:
- [extent, lon, end]
lat:
description: Latitude range
type: double-range
min_offset:
- [extent, lat, begin]
max_offset:
- [extent, lat, end]
"lat" and "lon" will be deprecated in v1.9 and removed in v2.
2. time
"time" is STRONGLY recommended to have the following value:
time:
description: Acquisition time range
type: datetime-range
min_offset:
- [properties, 'dtr:start_datetime']
- [properties, datetime]
max_offset:
- [properties, 'dtr:end_datetime']
- [properties, datetime]
These values may be enforced in future releases. "time" may be deprecated and removed in future releases.
3. crs_raw
The raw EO3 native CRS (stored at [crs]
) may be indexed as a search field called "crs_raw":
crs_raw:
offset: [crs]
indexed: False
description: The raw CRS string as it appears in metadata