Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

edit revisions #1

Merged
merged 2 commits into from
Nov 8, 2015
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

Authors: Edzer Pebesma, Roger Bivand, ...

[Simple features](https://en.wikipedia.org/wiki/Simple_Features) (oficially: _simple feature access_) is an open ([OGC](http://www.opengeospatial.org/standards/sfa) and [ISO](http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=40114)) standard for access and manipulation of spatial vector data (points, lines, polygons). It includes a standard [SQL schema](http://www.opengeospatial.org/standards/sfs) that supports storage, retrieval, query and update of feature collections via a SQL interface. All commonly used databases provide this interface. [GeoJSON](http://geojson.org/) is a standard for encoding simple features in JSON, and is used in javascript and MongoDB. Well-known-text ([WKT](https://en.wikipedia.org/wiki/Well-known_text)) is a text representation of simple features used often in linked data; well-known-binary a standard binary representation used in databases. _Simple Feature Access_ defines coordinate reference systems, and makes it easy to move data from longitude-latitude to projections back and forth in a standardized way.
[Simple features](https://en.wikipedia.org/wiki/Simple_Features) (oficially: _simple feature access_) is an open ([OGC](http://www.opengeospatial.org/standards/sfa) and [ISO](http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=40114)) standard for access and manipulation of spatial vector data (points, lines, polygons). It includes a standard [SQL schema](http://www.opengeospatial.org/standards/sfs) that supports storage, retrieval, query and update of feature collections via a SQL interface. All commonly used databases provide this interface. [GeoJSON](http://geojson.org/) is a standard for encoding simple features in JSON, and is used in javascript and MongoDB. Well-known-text ([WKT](https://en.wikipedia.org/wiki/Well-known_text)) is a text representation of simple features used often in linked data; well-known-binary ([WKB] (https://en.wikipedia.org/wiki/Well-known_text)) a standard binary representation used in databases. _Simple Feature Access_ defines coordinate reference systems, and makes it easy to move data from longitude-latitude to projections back and forth in a standardized way.


[GDAL](http://gdal.org/) is an open source C++ library for reading and writing both raster and vector data with 224 drivers (supported file formats, data base connectors, web service interfaces). GDAL is used by practically all open source geospatial projects and by many industry products (including ESRI's ArcGIS, ERDAS, and FME). It provides coordinate transformations (built on top of PROJ.4) and geometric operations (e.g. polygon intersections, unions, buffers and distance). Standards for coordinates transformations change over time; such changes are typically adopted directly in GDAL/PROJ.4 but do not easily find there way in R-only packages such as `mapproj`.

Since [2005](https://stat.ethz.ch/pipermail/r-sig-geo/2005-April/000378.html), CRAN has package [sp](https://cran.r-project.org/web/packages/sp/) which provides classes and methods for spatial (point, line, polygon and raster) data. The approach `sp` takes is similar to how `xts` and `zoo` handle the time index of time series data: objecst store spatial geometries separately from associated attribute data, matching by order. Package [spacetime](https://cran.r-project.org/web/packages/spacetime/index.html), on CRAN since 2010, extends both `sp` and `xts` to handle data that varies over both space and time.

Today, 221 CRAN packages depend on, import or link to `sp`, 259 when includding _Suggests_; when including recursive dependencies these numbers are 376 and 5040. The implementation of `sp` does not follow simple features, but rather the practice used at the time of release, following how ESRI shapefiles are implemented.
Today, 221 CRAN packages depend on, import or link to `sp`, 259 when includding _Suggests_; when including recursive dependencies these numbers are 376 and 5040. The implementation of `sp` does not follow simple features, but rather the practice used at the time of release, following how ESRI shapefiles are implemented. The cluster of packages around `sp` is shown in [Andrie de Vries' blog](http://blog.revolutionanalytics.com/2015/07/the-network-structure-of-cran.html) in green.

Off-CRAN package [rgdal2](https://github.com/thk686/rgdal2) is an interface to GDAL 2.0, which uses raw pointers to interface features, but does not import any data in R, using GDAL to handle everything. CRAN Packge [wkb](https://cran.r-project.org/web/packages/wkb/index.html), contributed by Tibco Software, converts between WKB representations of several simple feature classes and corresponding classes in `sp`, and seems to be needed for Tibco Software purposes.

Expand All @@ -22,14 +22,15 @@ _What problem do you want to solve? Why is it a problem? Who does it affect? Wha

The problems we want to solve are:

1. R can currently not represent simple features. It can read most simple feature classes in `sp` classes, but uses its own representation for this, and cannot write them back without loss of information (it does for instance internally not distinguish between `POLYGON` and `MULTIPOLYGON`, and cannot deal with several simple feature classes, including `TIN` and `GEOMETRYCOLLECTION`).
2. The current implementation of lines and vector data in package `sp` is partly ambiguous (does slot `ringDir` or slot `hole` indicate whether a polygon is a hole?), complicated (to which exterior polygon does a hole belong?), and by some considered difficult to work with (S4).
3. The lack of simple features makes current interfaces to open source libraries (GDAL/OGR and PROJ.4: rgdal, GEOS: rgeos) difficult to understand and maintain.
4. Several packages (e.g. `ggmap`, `ggplot2`) tend to favor non-standardised, R-only, and partly outdated libraries for coordinate transformations, or tend to make simplifying assumptions (e.g., all spatial data come as longitude/latitude using datum `WGS84`; all web maps use [_web Mercator_](https://en.wikipedia.org/wiki/Web_Mercator)).
1. R can currently not represent simple features directly. It can read most simple feature classes in `sp` classes, but uses its own representation for this, and can only write data back without loss of information if it is furnished with ancilliary metadata encoded in a comment attribute to each Polygons object. It does for instance internally not distinguish between `POLYGON` and `MULTIPOLYGON`. nor deal with several simple feature classes, including `TIN` and `GEOMETRYCOLLECTION`, nor handle `CURVE` geometries.
2. The current implementation of lines and vector data in package `sp` is partly ambiguous (both slot `ringDir` or slot `hole` indicate whether a Polygon is a hole but are superceded by the comment attribute), complicated (to which exterior polygon does a hole belong - handled by the comment attribute), and by some considered difficult to work with (S4). The current implementation is hard to maintain because it contains incremental changes from a baseline that predated the industry-standard OGC SFS representation.
3. The lack of simple features makes current interfaces to open source libraries (GDAL/OGR and PROJ.4: rgdal, GEOS: rgeos) difficult to understand and maintain, even though they work to specification.
4. The current implementation has no scale model for coordinates.
5. It is desirable that other R packages are offered the opportunity to migrate to more up-to-date libraries for coordinate transformations (providing proper support for datum transformation), and to avoid having to make simplifying assumptions (e.g., all spatial data come as longitude/latitude using datum `WGS84`; all web maps use [_web Mercator_](https://en.wikipedia.org/wiki/Web_Mercator)).

Solving this problem will mainly affect those who use data bases or modern javascript-based web APIs, as these largely converged on adopting simple features, and those who need a simpler and more light-weight handling of spatial data in R. It will also reduce the effort for users and developers to understand the way spatial information is represented in R, make it easier to build upon and reuse the R code for this, and lead to a good, sustainable shared R code base.

On the longer run it will affect all packages currently using sp, when we manage to migrate sp to exclusively use the simple feature classes. Since the recent [2.0](http://www.gdal.org/index.html) release of GDAL integrates raster and vector data, having an R package that mirrors its classes makes it possible to implement operations in-database (similar to what `DBI`, `RPostgreSQL` and `dplyr` do), making it possible to work with data that do not fit in memory.
In the longer run it will affect all packages currently using sp, when we manage to migrate sp to exclusively use the simple feature classes for representing vector data. Since the recent [2.0](http://www.gdal.org/index.html) release of GDAL integrates raster and vector data, having an R package that mirrors its classes makes it possible to implement operations in-database (similar to what `DBI`, `RPostgreSQL` and `dplyr` do), making it possible to work with data that do not fit in memory.

## The plan
_How are you going to solve the problem? Include the concrete actions you will take and an estimated timeline. What are likely failure modes and how will you recover from them?_
Expand Down