Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple representations of the same data #116

Closed
jerstlouis opened this issue Mar 27, 2020 · 19 comments
Closed

Multiple representations of the same data #116

jerstlouis opened this issue Mar 27, 2020 · 19 comments
Assignees
Labels
Collections Applicable to Collections (consider to use Part 2 instead) Progress: solution merged

Comments

@jerstlouis
Copy link
Member

jerstlouis commented Mar 27, 2020

At the Toulouse TC, the concept of multiple representations of the same data (identified by a single collection) was proposed (e.g. here: #62 (comment)) and welcomed by many. As an example, the same set of observations could be available as Features, a Coverage, a SensorThings API.

Another example is 3D buildings data, which could be available as Features (e.g. based on CityGML data model), as 3D meshes (whether organized in a bounding volume hierarchy as in 3D Tiles and i3s, or following a fixed tiling scheme as in CDB), as a point cloud (e.g. as LAS, or as 3D Tiles .pnts or i3s). A service could provide one or more of these representations, and a client could chose the one it supports best or which is appropriate for specific tasks. For example a 2D-only client could show buildings in 2D, while a 3D client pointed to the same dataset would display them in 3D.

I believe this concept is highly valuable, however Chapter 9 (Collections) does not appear to currently describe and reflect these more recent developments.

@cportele
Copy link
Member

I don't think there is anything in the current Common that excludes that, maybe expect for the fuzzyness in the terminology in the current draft. This comment is about terminology, too, but with architectural implications.

Since a key point of the OGC API standards is alignment with the Web, we should be consistent with their terminology. Different "representations" of the same data/resource are selected using content negotiation (different media type, different language, etc.). See HTTP.

At least at first, this seems to be different from "representation" as used in this issue since there is an underlying assumption that the different representations use different URIs, i.e. are different resources. However, what we have been doing in OGC API Features is that we have been splitting the collection into two resources at least for feature collections. /collections/{collectionId} is the Feature Collection, but it does not embed the features in its representations (JSON, XML, HTML) and instead has typed links to each representation of Features in /collections/{collectionId}/items, which is a sub-resource). See also #105. Conceptually the two resources might also be seen as a single resource.

The current design with the split resources was selected for practical reasons during the March 2018 WFS3/STAC hackathon (see the issue). Before that we simply had the landing page (which included the list of collections and links to them), the feature collection at /{collectionId} and the feature at /{collectionId}/{featureId} based on the WFS 2.x REST binding and the work in the Geonovum testbed.

If we ignore the practical reasons for the change mentioned in the issue for a moment and say that we would have sticked to the original path structure. Then we could have requested, for example, the following representations from /my-observations:

  • application/ogcapi-metadata+json (or similar, we are just using application/json for now): information about the collection
  • application/geo+json: the collection as features in GeoJSON
  • application/gml+xml: the collection as features in GML
  • image/tiff; application=geotiff: the collection as a GeoTIFF coverage
  • text/html: the collection so that a human can understand it (and crawlers)
  • etc.

In that sense, the use of the term "representation" is correct, but since we are using a different resource structure we need to be careful how we use the term. Maybe we should distinguish "views" (feature view, coverage view, tile/container view, etc) and then distinguish the different representations for each data view resource consistent with the HTTP spec?

Note that the practical reasons that led to the current resource structure are still valid. In fact, there would be even more reasons for collections with multiple views since, for example, paging/the limit parameter makes sense for feature content types, but less for coverage content types.

@dblodgett-usgs
Copy link

Great summary @cportele --

@jerstlouis You say:

the concept of multiple representations of the same data (identified by a single collection)

(emphasis on single collection mine)

I don't think the the idea that a collection is analogous to a dataset has been established other than in the zeitgeist. Not saying it's totally wrong -- I see the logic -- but it has not been decided.

If an OGC-API instance were to provide access to one abstract dataset, then each OGC API-standard it conforms to would logically provide a different view of that dataset.

For now, since OGC API-Features provides access to one dataset, (see Note in 7.1) I think this is a non-issue. In a sense, each collection is already a thematic view of a dataset. As more OGC API-standards are finalized, maybe we get to a place where we need to deal with this complexity in them, but I don't think handing this complexity right now is justified.

We certainly may find ourselves in need of a way to support multiple datasets, but I think there is plenty of (more valuable) scope to take on in defining building blocks focused on distribution of a single dataset. I use distribution intentionally -- as I think it's really important to remember that there will be other distributions of any dataset. i.e. An OGC API instance with multiple views of a dataset is just one distribution. Having the OGC API endpoint encompass multiple datasets would just get annoying and cause a bunch of hacks when needing to list other distributions of a dataset if it were to attempt otherwise.

@jerstlouis
Copy link
Member Author

@cportele @dblodgett-usgs This issue is submitted in regard to what Chapter 9 - Collections currently says, where /collections/{collectionID} is still Common (Collections), and where one such collection available as vector features at /collections/{collectionID}/items could be alternatively available as a coverage (at /collections/{collectionID}/coverage), but another collection might not. It would thus not be correct to say that /collections/{collectionID} is (only) the Feature collection.

By representation, I meant the underlying data model, which I believe differs significantly from the media type of Accept-Encoding etc. This probably does need a better term to differentiate it from that other meaning, but I am not convinced that 'view' is the best term. Example of such data models would be:

  • Vector features
  • Gridded raster coverage
  • 3D Meshes

In addition, the data could be available in multiple specifications following the same data model (e.g. i3s and 3D Tiles are two community standards providing 3D Meshes). In addition to this, specific resources could be available as different media types (e.g. GeoTIFF, JPEG2K).

This issue is mostly about the discussion in Toulouse in the Coverage SWG where it was proposed that a collection could be available as multiple representations of the same data, but which is not explicitly mentioned in Chapter 9. This came up in the 3D Pilot where it was pointed out that nothing in there says that all resources after the /collections/{collectionID}/ portion of the path are representations or views of a specific data source (although one could infer that from Features). A Collection available using multiple specifications (representation / view) would have different resources available for each, e.g. /items for vector features, and a number of resources for coverages under /coverage, as in the current OGC API - Coverages draft.

I really disagree with 'tiles' being considered a separate view, because tiles enable to split the data following whichever data model for caching & optimization purposes, is not itself a specific representation of the data (i.e. the data is still raster, or vector or a 3D mesh) and a non-developer user really should not have to understand or have visibility into Tiles unless he really insists on looking under the hood and learn the mystical arts of TileMatrixSets etc. Tiles is a building block that can be used together with multiple data models. You can still make a service serving only tiles, but if you do so you are not embracing the concept of an integrated and unified OGC API. e.g. if you serve vector data, you would normally also offer items. If you serve raster data, you would offer e.g. a PNG output by BBOX/resolution (I am arguing that this is not OGC API - Maps if the service is not doing or pretending to be doing server-side rendering), and potentially also a more advanced OGC API - Coverage interface. I have also been suggesting that tiles can be used within a process daisy chain without the client having visibility into it, even if the client did not explicitly request tiles from the first hop along the chain.

@dblodgett-usgs
Copy link

@jerstlouis Do you have a proposal here that doesn't include the assumption that collections are analogous to dataset? If no, I think we should table this issue until we've settled the numerous open issues around collections.

@jerstlouis
Copy link
Member Author

@dblodgett-usgs As said above, this issue is filed in relation to the current Common Draft, which does assume up to /collections/{collectionId} we are still in 'Common'. I agree we need to settle the numerous issues around collections, but I think the aspects identified here are important to consider while settling those issues.

@joanma747
Copy link
Contributor

I fully support the @cportele idea and include the term "resource view" to express that a resource can be retrieved using "related resources" (many times subresources in the path) that provide other views of the same geospatial resource (e.g. features, maps, tiles,...) or descriptions of the geospatial resource (metadata, schemas,...)

@jerstlouis
Copy link
Member Author

@joanma747 I see all those examples you mentioned as different from different 'underlying representations' (which might be what we want to call views).
e.g. if the underlying representation is simple vector features (points, lines & polygons):

  • You can render this on a map, but you are just using the vector features representation (and could be layering this with other data sources on your map)
  • You could be retrieving these vector features as tiles
  • You could be accessing the metadata or schema for those vector features.

With the multiple underlying representations concept, you could access the data either as a feature collection or as a coverage, or as CityGML features or 3D Tiles meshes, and the client would not really know what is the underlying representation (e.g. whether the source of truth is vector or raster data).

@joanma747 joanma747 added Resources of Collections type Issues related to the /collections path Resources types Issues related to resource types and taxonomy labels Apr 20, 2020
@cmheazel cmheazel added the Collections Applicable to Collections (consider to use Part 2 instead) label May 11, 2020
@dblodgett-usgs
Copy link

We bottomed this out in #140. Need to reflect that outcome in the spec then can close.

@cmheazel cmheazel self-assigned this Jul 27, 2020
@cmheazel
Copy link
Contributor

@dblodgett-usgs #140 was closed through a pull request. Were the changes made sufficient to close this issue as well?

@dblodgett-usgs
Copy link

I think so -- but that's just my personal opinion. I think closing this and encouraging @jerstlouis to open a new issue in the context of the emerging part-2 would be the right path for this if you think it needs to be discussed more.

@cmheazel cmheazel added the Close label Aug 16, 2020
@jerstlouis
Copy link
Member Author

@cmheazel @dblodgett-usgs
The Part 2 specifications should probably explicitly state in an informative manner that this approach is possible, i.e. that multiple "views" or "access mechanisms" using more than one OGC API (e.g. Features & Tiles, or Features & Coverages -- accessing the data as features, vector tiles, coverage tiles or as a coverage) is possible for the same data.

This is what this issue is about, so I would argue for simply making the change before closing it.

@cmheazel
Copy link
Contributor

cmheazel commented Sep 11, 2020

@jerstlouis I will make the recommended updates then close.

@cmheazel cmheazel added Progress: resolution agreed and removed Close Resources of Collections type Issues related to the /collections path Resources types Issues related to resource types and taxonomy labels Nov 16, 2020
@cmheazel
Copy link
Contributor

cmheazel commented Jun 4, 2021

Given that we are developing standards for modular APIs, it is possible for an API implementation to conform to both API-Features and API-Coverages. If both Feature and Coverage views of a collection are supported, then /collections/{collectionId} will not be a unique path. So how does the client know which /collections/{collectionId} is a Feature Collection and which is a Coverage?

@cmheazel cmheazel added the help wanted Extra attention is needed label Jun 4, 2021
@jerstlouis
Copy link
Member Author

jerstlouis commented Jun 4, 2021

@cmheazel The resolution of #140 is that the very same collection at /collections/mycollection can potentially be accessed both as a Coverage (/collections/mycollection/coverage) and as a Feature Collection (/collections/mycollection/items) , i.e. using both OGC API - Features and OGC API - Coverages (and OGC API - Tiles, and OGC API - Maps, and OGC API - GeoVolumes...). The collection will have links with the appropriate relation type for each access mechanism supported by that particular collection.

@cmheazel
Copy link
Contributor

cmheazel commented Jun 15, 2021

@jerstlouis So to be clear, the resource at /collections/mycollection is independent of the type of collection (feature, coverage, map, etc.). The path /collections/mycollection/type (where type = coverage, items, etc.) returns a specific type of collection. This requires that:

  1. we have a standard taxonomy of OGC resource collection types (coverage, feature, map, etc.)
  2. that we have a standard path element assigned to each resource collection type
  3. that each path element is assigned to one and only one resource collection type (subtypes are allowed)
  4. that we come up with a standard term for the concept of resource collection type

@jerstlouis
Copy link
Member Author

jerstlouis commented Jun 15, 2021

@cmheazel I prefer to call it access mechanism for a collection, or a view on the collection of data, since a collection could support multiple access mechanisms/views.

I think what is important is that resources defined by OGC API specifications which can be attached to a collection are properly registered to avoid clashes when implementing them together in the same API.

Some resources (e.g. /collections/{collectionId}/metadata) might be useful with more than a single OGC API specification.

The Features Part for Schema will define /schema as well in addition to /items.

EDR already defines multiple queryType resources as /collections/{collectionId}/{queryType} (in a sense, each of these are probably different access mechanisms)

@cmheazel
Copy link
Contributor

June 15, 2021 - What comes after the collection ID is a "View" or "access mechanism" for the resource. That is out of scope for Part 2. Resolve the proper term in a sprint - use the most intuitive term.

NOTUC

@cmheazel
Copy link
Contributor

Re-wrote Section 2 - Scope
Added Section 7.2 - Views
Removed ATS for the /items path.

@cmheazel
Copy link
Contributor

JUne 26, 2021 - close - NOTUC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Collections Applicable to Collections (consider to use Part 2 instead) Progress: solution merged
Projects
Development

No branches or pull requests

5 participants