-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial dataset record proposal (WIP) #130
base: master
Are you sure you want to change the base?
Conversation
@cholmes @tomkralidis Is this proposal still WIP? This use case fits our need of searching for datasets (e.g. an open data portal) |
Sorta? Just very, very slow progress. I was hoping that there'd be some feedback on it, and I'm not sure what's next to actually get it incorporated. I don't have much time for it these days, but would be happy to help move it along if others were helping out. I still feel it and a a dataset collection are really important missing pieces. |
The CDB SWG will be interested in how this effort plays out. The current draft of the CDB 2.0 standard has a core metadata requirements clause that specifies a minimum set of mandatory, optional, or conditional metadata elements for a dataset. Obviously, other elements may be required based on the domain (such as energy) or datasets such as imagery and sensor observations. https://github.com/opengeospatial/cdbswg/blob/master/cdb-2.0/cdb-x-core-metadata.adoc |
@cnreediii that repo isn’t publicly visible. Can you share it? @cholmes sorry I haven’t been involved in this spec for awhile. But getting back involved and supporting “datasets” (conceptually at least) is a near-term priority (1-3 months). So would like to start with what would be expected to work, implement, and the provide feedback and work through details for full adoption. |
@ajturner - You just need to opt into the CDB SWG as an observer: https://portal.ogc.org/files/?artifact_id=63904 . You can then access the Git repo. |
@ajturner - Goto the CDB SWG Project on the portal (https://portal.ogc.org/index.php?m=projects&a=view&project_id=466) and then on the lower right side you can opt into any of the CDB Git repos. For the CDB 2.0 work, you need to opt into the cdbswg repo. |
30-MAY-2022 SWG Meeting: SWG discussion points:
|
@ajturner - If you navigate to the CDB SWG Project page, on the lower right side you can opt into any of the CDB Git repos. https://portal.ogc.org/?m=projects&a=view&project_id=466 For the CDB 2.0 work, you need to opt into the cdbswg repo. If you still have access issues, we will need to contact Greg Buehler or Kevin Stegemoler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I was making comments on specific attributes I realized these are already present on the draft Records API. There is a lot of inconsistency there where sometimes it aligns with DCAT and then deviates - and same with Common.
| license | O | M | A legal document under which the resource is made available. | | ||
| rights | O | O | A statement that concerns all rights not addressed by the license such as a copyright statement. | | ||
| extent | O | M | Spatial and temporal extents. | | ||
| associations | O | M | A list of links for accessing the resource, links to other resources associated with this resource, etc. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between associations
and links
?
| rights | O | O | A statement that concerns all rights not addressed by the license such as a copyright statement. | | ||
| extent | O | M | Spatial and temporal extents. | | ||
| associations | O | M | A list of links for accessing the resource, links to other resources associated with this resource, etc. | | ||
| crs | O | O (default to latlong) | Coordinate reference system of the data represented by this collection. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from records: what about a more complete spatialReference
so it's less obscure for non-GIS users.
| updated | O | O | The date-time this collection represented by this record was updated, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). | | ||
| themes | O | O | A knowledge organization system used to classify the resource. | | ||
| formats | O | O | A list of available distributions for the resource. | | ||
| contactPoint | O | M | An entity to contact about the resource. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be contact
? I generally recommend against multiWord-variable_names
.
(but I do see that DCAT uses contactPoint
. If this is going to align with DCAT then that should apply to other attributes like spatial
and distribution
)
Also, does it overlap with publisher
What's the object definition (name, email, links, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem to overlap with publisher. I think the current best (only?) way to get a sense of the object definitions is to go to the examples. I think the spec would be a lot more usable if it described the fields in much more depth, and included all the info on the object structures, with in-line examples of the relevant snippet.
| created | O | M | The date-time the collection represented by this record was created, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). | | ||
| updated | O | O | The date-time this collection represented by this record was updated, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). | | ||
| themes | O | O | A knowledge organization system used to classify the resource. | | ||
| formats | O | O | A list of available distributions for the resource. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about distribution
to align with DCAT. That can then also include dataset formats as well as metadata records. (though as I type this it may overlap with links)
if there is formats
what is the object definition to designate url, type, etc.
| publisher | O | M | The entity making the resource available. | | ||
| created | O | M | The date-time the collection represented by this record was created, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). | | ||
| updated | O | O | The date-time this collection represented by this record was updated, formatted to [RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). | | ||
| themes | O | O | A knowledge organization system used to classify the resource. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is themes
necessary vs. keywords
or categories
(RSS2)
| keywords | O | M | List of keywords describing the Collection. | | ||
| keywordsCodespace | O | O (defaults to XXX) | A reference to a controlled vocabulary used for the keywords property. | | ||
| language | O | O (defaults to english) | The natural language used for textual values (i.e. titles, descriptions, etc) that the collection information is given in. | | ||
| externalId | O | O | Identifier for the Collection that is unique across the provider. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a comment regarding the draft Records API - but Is there a reason this can't be id
? the prefix external
is confusing unless it has some other meaning that's not stated.
And I suggest it should be M
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agree a core ID should be mandatory. I think that since record inherits from Feature it has an ID field. But probably be worth being explicit about that in this table, along with 'links'.
I agree there's a lot that could be improved in the core Records API fields, indeed most of them could use more complete descriptions and examples - some of them I don't even really understand what they're intended for. For this PR I aimed to just 'use' the ones that were there, and mark up what would be required for a dataset record. I think (most) all your comments probably make sense as new issues / PR, suggesting changes & more descriptions on the core record spec. I'm a bit out of my depth on what should actually improve, but could at least contribute what I've been confused by - just won't be able to offer great answers. I do think having the core set of record fields be really clear and tight would be a big win, and ideally they'd be reflected directly in a dataset collection |
@ajturner @cholmes
|
This PR starts a proposal for a 'Dataset Record', which aims to describe 'datasets'. It does not specify any particular dataset metadata, like iso19115, though it hopefully will be fully compatible with any iso19115, and any other dataset metadata.
This is still very much a draft, but I wanted to start to iterate in the open. So far it doesn't add any new fields to the core Record specification, but it does make more required, as most dataset metadata does have more requirements.
TODO's: