Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial dataset record proposal (WIP) #130

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cholmes
Copy link
Member

@cholmes cholmes commented Jul 22, 2021

This PR starts a proposal for a 'Dataset Record', which aims to describe 'datasets'. It does not specify any particular dataset metadata, like iso19115, though it hopefully will be fully compatible with any iso19115, and any other dataset metadata.

This is still very much a draft, but I wanted to start to iterate in the open. So far it doesn't add any new fields to the core Record specification, but it does make more required, as most dataset metadata does have more requirements.

TODO's:

  • port to asciidoc (the other proposal was in markdown, so seemed ok to start it there)
  • Align to main document when Refactor the document ... #129 lands
  • Make JSON schema / openapi definition

@ajturner
Copy link

@cholmes @tomkralidis Is this proposal still WIP? This use case fits our need of searching for datasets (e.g. an open data portal)

@cholmes
Copy link
Member Author

cholmes commented May 25, 2022

Sorta? Just very, very slow progress. I was hoping that there'd be some feedback on it, and I'm not sure what's next to actually get it incorporated. I don't have much time for it these days, but would be happy to help move it along if others were helping out. I still feel it and a a dataset collection are really important missing pieces.

@cnreediii
Copy link

The CDB SWG will be interested in how this effort plays out. The current draft of the CDB 2.0 standard has a core metadata requirements clause that specifies a minimum set of mandatory, optional, or conditional metadata elements for a dataset. Obviously, other elements may be required based on the domain (such as energy) or datasets such as imagery and sensor observations. https://github.com/opengeospatial/cdbswg/blob/master/cdb-2.0/cdb-x-core-metadata.adoc

@ajturner
Copy link

@cnreediii that repo isn’t publicly visible. Can you share it?

@cholmes sorry I haven’t been involved in this spec for awhile. But getting back involved and supporting “datasets” (conceptually at least) is a near-term priority (1-3 months). So would like to start with what would be expected to work, implement, and the provide feedback and work through details for full adoption.

@cnreediii
Copy link

@ajturner - You just need to opt into the CDB SWG as an observer: https://portal.ogc.org/files/?artifact_id=63904 . You can then access the Git repo.

@cnreediii
Copy link

@ajturner - Goto the CDB SWG Project on the portal (https://portal.ogc.org/index.php?m=projects&a=view&project_id=466) and then on the lower right side you can opt into any of the CDB Git repos. For the CDB 2.0 work, you need to opt into the cdbswg repo.

@pvretano
Copy link
Contributor

30-MAY-2022 SWG Meeting: SWG discussion points:

@ajturner
Copy link

ajturner commented Jun 3, 2022

For the CDB 2.0 work, you need to opt into the cdbswg repo.

I've opted in but still blocked from access.

image

@cnreediii
Copy link

@ajturner - If you navigate to the CDB SWG Project page, on the lower right side you can opt into any of the CDB Git repos. https://portal.ogc.org/?m=projects&a=view&project_id=466 For the CDB 2.0 work, you need to opt into the cdbswg repo.

If you still have access issues, we will need to contact Greg Buehler or Kevin Stegemoler.

Copy link

@ajturner ajturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was making comments on specific attributes I realized these are already present on the draft Records API. There is a lot of inconsistency there where sometimes it aligns with DCAT and then deviates - and same with Common.

| license | O | M | A legal document under which the resource is made available. |
| rights | O | O | A statement that concerns all rights not addressed by the license such as a copyright statement. |
| extent | O | M | Spatial and temporal extents. |
| associations | O | M | A list of links for accessing the resource, links to other resources associated with this resource, etc. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between associations and links?

| rights | O | O | A statement that concerns all rights not addressed by the license such as a copyright statement. |
| extent | O | M | Spatial and temporal extents. |
| associations | O | M | A list of links for accessing the resource, links to other resources associated with this resource, etc. |
| crs | O | O (default to latlong) | Coordinate reference system of the data represented by this collection. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from records: what about a more complete spatialReference so it's less obscure for non-GIS users.

| updated | O | O | The date-time this collection represented by this record was updated, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). |
| themes | O | O | A knowledge organization system used to classify the resource. |
| formats | O | O | A list of available distributions for the resource. |
| contactPoint | O | M | An entity to contact about the resource. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be contact? I generally recommend against multiWord-variable_names.
(but I do see that DCAT uses contactPoint. If this is going to align with DCAT then that should apply to other attributes like spatial and distribution)

Also, does it overlap with publisher

What's the object definition (name, email, links, ...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem to overlap with publisher. I think the current best (only?) way to get a sense of the object definitions is to go to the examples. I think the spec would be a lot more usable if it described the fields in much more depth, and included all the info on the object structures, with in-line examples of the relevant snippet.

| created | O | M | The date-time the collection represented by this record was created, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). |
| updated | O | O | The date-time this collection represented by this record was updated, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). |
| themes | O | O | A knowledge organization system used to classify the resource. |
| formats | O | O | A list of available distributions for the resource. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about distribution to align with DCAT. That can then also include dataset formats as well as metadata records. (though as I type this it may overlap with links)

if there is formats what is the object definition to designate url, type, etc.

| publisher | O | M | The entity making the resource available. |
| created | O | M | The date-time the collection represented by this record was created, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). |
| updated | O | O | The date-time this collection represented by this record was updated, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). |
| themes | O | O | A knowledge organization system used to classify the resource. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is themes necessary vs. keywords or categories (RSS2)

| keywords | O | M | List of keywords describing the Collection. |
| keywordsCodespace | O | O (defaults to XXX) | A reference to a controlled vocabulary used for the keywords property. |
| language | O | O (defaults to english) | The natural language used for textual values (i.e. titles, descriptions, etc) that the collection information is given in. |
| externalId | O | O | Identifier for the Collection that is unique across the provider. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment regarding the draft Records API - but Is there a reason this can't be id? the prefix external is confusing unless it has some other meaning that's not stated.

And I suggest it should be M

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully agree a core ID should be mandatory. I think that since record inherits from Feature it has an ID field. But probably be worth being explicit about that in this table, along with 'links'.

@cholmes
Copy link
Member Author

cholmes commented Jun 3, 2022

As I was making comments on specific attributes I realized these are already present on the draft Records API. There is a lot of inconsistency there where sometimes it aligns with DCAT and then deviates - and same with Common.

I agree there's a lot that could be improved in the core Records API fields, indeed most of them could use more complete descriptions and examples - some of them I don't even really understand what they're intended for. For this PR I aimed to just 'use' the ones that were there, and mark up what would be required for a dataset record.

I think (most) all your comments probably make sense as new issues / PR, suggesting changes & more descriptions on the core record spec. I'm a bit out of my depth on what should actually improve, but could at least contribute what I've been confused by - just won't be able to offer great answers. I do think having the core set of record fields be really clear and tight would be a big win, and ideally they'd be reflected directly in a dataset collection

@pvretano
Copy link
Contributor

pvretano commented Jun 3, 2022

@ajturner @cholmes
see #158
see record schema
see these tables

  • Keep in mind that there are lots of stakeholders each wanting record property names to match their domain; herding cats!
  • No difference between links and association. Associations (as a construct) gone. Everything goes into links. Orginally wanted to distinguish between links for navigating the API (previous, next, parent, etc.) and links to asscociated resources (i.e. links to download resource or parts thereof, links to other related resources and records, etc.). Didn't quite pan out. Everything is now just links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants