Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial dataset record proposal (WIP) #130

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions proposals/dataset-record/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
## Overview

This document describes the fields needed for an OGC Record to describe a 'dataset'. A dataset is a
"collection of data, published or curated by a single agent, and available for access or download in
one or more serializations or formats" (from [dcat](https://www.w3.org/TR/vocab-dcat-2/#dcat-scope)). In the
geospatial domain datasets typically are defined with the same properties and share higher level metadata. In GIS a
dataset typically corresponds to a 'layer', and in the satellite world a dataset would be all the scene captures that
come from the same sensor or constellation. It corresponds directly to what others call a "dataset series" (ESA, ISO 19115),
"collection" (CNES, NASA), and "dataset" (JAXA, DCAT).

The Dataset Record is the metadata needed for users to actually find the data they need. The data itself may be available as
an OGC API Service, an older OGC W\*S Service, or an actual data file.

A dataset record is an [OGC Record](ogc-record-geojson-spec.md), and uses all the exact same fields, but makes
more of the fields required, in order to more fully describe the metadata users need to understand the dataset.

A Record is the GeoJSON equivalent of an [OGC Dataset Collection](https://github.com/cholmes/ogc-collection/blob/main/ogc-collection-spec.md)
(todo: port this to be a proposal in Features API) that includes 'Dataset Fields', and shares most all the same fields.

Dataset Records are represented in JSON format and are very flexible. Any JSON object that contains all the
required fields is a valid Record.

- Examples:
- See this [example](./examples/record-meetlocaties-example.json) that contains more fields and links.
- JSON Schema: TODO


## Dataset Record Fields

The core Record fields for a 'Dataset Record' remain the same as in the core [OGC Record](ogc-record-geojson-spec.md), with the
exact same Item fields as [specified there](ogc-record-geojson-spec.md#item-fields). (TODO: Link to main spec when Peter's refactor lands)

### Datset Record Property Fields

The property fields are where the Dataset Record has more requirements. It uses all the same core Record definitions, but adds in
more requirements and a couple defaults.

| Field Name | Required in Core Record | Required in Dataset Record | Description |
|-------------------|-------------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type | M | M | Denotes the resource type of the record. For the dataset record this is **required** to be `dataset`. |
| title | M | M | A short descriptive human-readable one-line title for the Collection. |
| description | O | M | Detailed multi-line description to fully explain the Collection. |
| keywords | O | M | List of keywords describing the Collection. |
| keywordsCodespace | O | O (defaults to XXX) | A reference to a controlled vocabulary used for the keywords property. |
| language | O | O (defaults to english) | The natural language used for textual values (i.e. titles, descriptions, etc) that the collection information is given in. |
| externalId | O | O | Identifier for the Collection that is unique across the provider. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment regarding the draft Records API - but Is there a reason this can't be id? the prefix external is confusing unless it has some other meaning that's not stated.

And I suggest it should be M

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully agree a core ID should be mandatory. I think that since record inherits from Feature it has an ID field. But probably be worth being explicit about that in this table, along with 'links'.

| publisher | O | M | The entity making the resource available. |
| created | O | M | The date-time the collection represented by this record was created, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). |
| updated | O | O | The date-time this collection represented by this record was updated, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). |
| themes | O | O | A knowledge organization system used to classify the resource. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is themes necessary vs. keywords or categories (RSS2)

| formats | O | O | A list of available distributions for the resource. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about distribution to align with DCAT. That can then also include dataset formats as well as metadata records. (though as I type this it may overlap with links)

if there is formats what is the object definition to designate url, type, etc.

| contactPoint | O | M | An entity to contact about the resource. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be contact? I generally recommend against multiWord-variable_names.
(but I do see that DCAT uses contactPoint. If this is going to align with DCAT then that should apply to other attributes like spatial and distribution)

Also, does it overlap with publisher

What's the object definition (name, email, links, ...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem to overlap with publisher. I think the current best (only?) way to get a sense of the object definitions is to go to the examples. I think the spec would be a lot more usable if it described the fields in much more depth, and included all the info on the object structures, with in-line examples of the relevant snippet.

| license | O | M | A legal document under which the resource is made available. |
| rights | O | O | A statement that concerns all rights not addressed by the license such as a copyright statement. |
| extent | O | M | Spatial and temporal extents. |
| associations | O | M | A list of links for accessing the resource, links to other resources associated with this resource, etc. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between associations and links?

| crs | O | O (default to latlong) | Coordinate reference system of the data represented by this collection. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from records: what about a more complete spatialReference so it's less obscure for non-GIS users.


### Associations

At least one association is required. This should link to the actual dataset. It could be to OGC API (or OGC W\*S interfaces to the data, it
could link directly to the source format file for the dataset, or it ideally is a combination: several OGC services and a link to the source data.

TODO: Flesh out common rel types for ogc api links, source data file links, etc.

### Dataset definition ideas (Work in Progress)

From STAC: a set of assets that are defined with the same properties and share higher level metadata. In the satellite world these would typically all come from the same sensor or constellation. It corresponds directly to what others call a "dataset series" (ESA, ISO 19115), "collection" (CNES, NASA), and "dataset" (JAXA, DCAT). So if all your Items have the same properties, they probably belong in the same Collection.

We should also reference vector dataset ideas, how it maps to a 'layer', can be a coverage, etc.
99 changes: 99 additions & 0 deletions proposals/dataset-record/examples/record-meetlocaties-example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
{
"id": "de0b0f94-aadb-4db4-a1b5-9a656810682c",
"type": "Feature",
"created": "2021-02-05",
"updated": "2021-02-21T00:14:33Z",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
4.52767,
51.58673
],
[
4.52767,
52.0393
],
[
6.18931,
52.0393
],
[
6.18931,
51.58673
],
[
4.52767,
51.58673
]
]
]
},
"properties": {
"created": "2018-02-05T08:15:56Z",
"updated": "2021-02-21T00:14:33Z",
"type": "dataset",
"title": "Meetlocaties waterkwantiteit Waterschap Rivierenland",
"description": "Binnen het waterschap worden veel oppervlaktewaterpeilen en grondwaterstanden gemeten. De kaart toont waar metingen plaatsvinden en wat voor type meting plaats vindt. Voor meetgegevens of overige informatie over waterkwantiteit kunt u contact opnemen met het waterschap via [email protected]",
"contactPoint": "Waterschap Rivierenland, [email protected]",
"associations": [
{
"href": "https://kaarten.wsrl.nl/arcgis/services/Kaarten/Meetlocaties_Waterkwantiteit_WMS_WFS_OD/MapServer/WMSServer?request=GetCapabilities&service=WMS",
"rel": "item",
"type": "OGC:WMS"
},
{
"href": "https://kaarten.wsrl.nl/arcgis/services/Kaarten/Meetlocaties_Waterkwantiteit_WMS_WFS_OD/MapServer/WFSServer?request=GetCapabilities&service=WFS",
"rel": "item",
"type": "OGC:WFS"
}
],
"externalId": "meetlocaties-waterschap-rivierenland",
"themes": [
{
"concepts": [
"waterkwaniteit peil grondwaterstand peilbuis waterpeil peilbesluit waterschap rivierenland"
],
"scheme": null
}
],
"extent": {
"spatial": {
"bbox": [
[
[
4.52767,
51.58673,
6.18931,
52.0393
]
]
],
"crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
},
"temporal": {
"interval": [
null,
null
],
"trs": "http://www.opengis.net/def/uom/ISO-8601/0/Gregorian"
}
}
},
"links": [
{
"rel": "alternate",
"type": "text/html",
"title": "This document as HTML",
"href": "./meetlocaties.html"
},
{
"rel": "alternate",
"type": "application/json",
"title": "This document as an OGC Collection",
"href": "./collection-meetlocaties-example.json"
},

]
}