Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRS header #317

Closed
Nowosad opened this issue Sep 6, 2021 · 7 comments
Closed

CRS header #317

Nowosad opened this issue Sep 6, 2021 · 7 comments

Comments

@Nowosad
Copy link
Contributor

Nowosad commented Sep 6, 2021

With the recent changes in PROJ, CRS information in spatial file formats is stored as WKT2, and no longer as proj4string. However, the terra package still shows the proj4string to the users in its objects' headers. This could be confusing to many people. @rhijmans would you consider switching to some other header style? It could, for example, use some parts of the crs(r, describe = TRUE) function.

Below, you can find a simple comparison between terra ( coord. ref. : +proj=longlat +datum=WGS84 +no_defs) and stars/sf (Geodetic CRS: WGS 84).

library(terra)
library(stars)
library(sf)
f = system.file("ex/elev.tif", package = "terra")
f2 = system.file("shapes/world.gpkg", package = "spData")

r = rast(f)
r
#> class       : SpatRaster 
#> dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
#> resolution  : 0.008333333, 0.008333333  (x, y)
#> extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
#> source      : elev.tif 
#> name        : elevation 
#> min value   :       141 
#> max value   :       547

s = read_stars(f)
s
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>           Min. 1st Qu. Median     Mean 3rd Qu. Max. NA's
#> elev.tif   141     291    333 348.3366     406  547 3942
#> dimension(s):
#>   from to  offset       delta refsys point values x/y
#> x    1 95 5.74167  0.00833333 WGS 84 FALSE   NULL [x]
#> y    1 90 50.1917 -0.00833333 WGS 84 FALSE   NULL [y]

v = vect(f2)
v
#>  class       : SpatVector 
#>  geometry    : polygons 
#>  dimensions  : 177, 10  (geometries, attributes)
#>  extent      : -180, 180, -89.9, 83.64513  (xmin, xmax, ymin, ymax)
#>  coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
#>  names       : iso_a2      name_long continent region_un       subregion
#>  type        :  <chr>          <chr>     <chr>     <chr>           <chr>
#>  values      :     FJ           Fiji   Oceania   Oceania       Melanesia
#>                    TZ       Tanzania    Africa    Africa  Eastern Africa
#>                    EH Western Sahara    Africa    Africa Northern Africa
#>               type  area_km2       pop lifeExp gdpPercap
#>              <chr>     <num>     <num>   <num>     <num>
#>  Sovereign country 1.929e+04 8.858e+05   69.96      8222
#>  Sovereign country 9.327e+05 5.223e+07   64.16      2402
#>      Indeterminate 9.627e+04         0       0         0

v2 = read_sf(f2)
v2
#> Simple feature collection with 177 features and 10 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
#> Geodetic CRS:  WGS 84
#> # A tibble: 177 x 11
#>    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
#>    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
#>  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove…   1.93e4  8.86e5    70.0
#>  2 TZ     Tanzania  Africa    Africa    Eastern … Sove…   9.33e5  5.22e7    64.2
#>  3 EH     Western … Africa    Africa    Northern… Inde…   9.63e4 NA         NA  
#>  4 CA     Canada    North Am… Americas  Northern… Sove…   1.00e7  3.55e7    82.0
#>  5 US     United S… North Am… Americas  Northern… Coun…   9.51e6  3.19e8    78.8
#>  6 KZ     Kazakhst… Asia      Asia      Central … Sove…   2.73e6  1.73e7    71.6
#>  7 UZ     Uzbekist… Asia      Asia      Central … Sove…   4.61e5  3.08e7    71.0
#>  8 PG     Papua Ne… Oceania   Oceania   Melanesia Sove…   4.65e5  7.76e6    65.2
#>  9 ID     Indonesia Asia      Asia      South-Ea… Sove…   1.82e6  2.55e8    68.9
#> 10 AR     Argentina South Am… Americas  South Am… Sove…   2.78e6  4.30e7    76.3
#> # … with 167 more rows, and 2 more variables: gdpPercap <dbl>,
#> #   geom <MULTIPOLYGON [°]>

Created on 2021-09-06 by the reprex package (v2.0.1)

@rhijmans
Copy link
Member

rhijmans commented Sep 7, 2021

I am not convinced that what you suggest is a good approach. For example, taking the raster data from issue #314

library(terra)
r <- rast(ncols=349, nrows=277, nlyrs=1, xmin=-16231.4942528736, xmax=11313351.4942529, ymin=-16231.5, ymax=8976019.5, names=c('vwsh_1'), crs="+proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=5632642.22547 +y_0=4612545.65137 +datum=WGS84")

r
#class       : SpatRaster 
#dimensions  : 277, 349, 1  (nrow, ncol, nlyr)
#resolution  : 32462.99, 32463  (x, y)
#extent      : -16231.49, 11313351, -16231.5, 8976020  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=5632642.22547 +y_0=4612545.65137 +datum=WGS84 +units=m +no_defs 

crs(r, describe=TRUE)
#     name EPSG area         extent
#1 unknown <NA> <NA> NA, NA, NA, NA

It seems to me that, especially for a CRS like this, the PROJ.4 notation is the only available notation that is easily interpretable by a human. (You can do cat(crs(r)) to read the WKT.)

My understanding from this lengthy discussion: r-spatial/discuss#43 is that PROJ.4 strings are perfectly valid, as long as you use the WGS84, NAD83 or NAD27 datums.

I have now added to terra a warning (that should become an error) if you try to use another datum with a PROJ.4 string:

crs(r) <- "+proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +datum=GGRS87 +units=m"
Warning message:
#[crs<-] Only the WGS84, NAD83 and NAD27 datums can be used with a PROJ.4 string. Use WKT2, authority:code, or +towgs84= instead 

CRS information in spatial file formats is stored as WKT2, and no longer as proj4string.

I am not sure if that is generally true, as it depends who produces the file. But obviously the recent changes do lead to confusion. I do not have a good solution, but I am hopeful that the PROJ developers will revert back to more general support for the PROJ.4 string, or at least provide a viable alternative that can be used in interactive environments such as R and python.

@Nowosad
Copy link
Contributor Author

Nowosad commented Sep 10, 2021

Hi @rhijmans - I agree that taking the crs(r, describe=TRUE) directly is the perfect solution. It would be (maybe) preferable to have some sets of functions to derive some information from WKT2, such as the used ellipsoid, datum, units, etc.

I also try to understand all of the PROJ6+ impacts, and yes - it seems that the proj4string representation can be used for coordinate operations as long as you want to use the WGS84, NAD83 or NAD27 datums.
However, the use of proj4strings is discouraged for specifying CRSs ("The use of proj-string to describe a CRS is discouraged. It is a legacy means of conveying CRS descriptions: use of object codes (EPSG:XXXX typically) or WKT description is recommended for better expressivity." - https://proj.org/development/reference/functions.html#c.proj_create).

Also, maybe this is only my wrong assumption (please correct me @rsbivand), but spatial files created with the current (PROJ6+) versions of PROJ store their CRSs using the WKT2 representation. Showing proj4strings to the terra users gives an impression that the CRSs in this package are stored with proj4string.

As I mentioned at the beginning - I do not think we have a perfect solution here. However, the current state of things could be improved without taking major effort. One possible solution is shown below:

library(terra)
#> terra version 1.4.1

# example 1 ----------------------------
f <- system.file("ex/elev.tif", package="terra")
r <- rast(f)
crs(r, describe = TRUE)
#>     name EPSG area         extent
#> 1 WGS 84 4326 <NA> NA, NA, NA, NA

# header:
# coord. ref. : WGS 84 ("EPSG:4326")

# example 2 ----------------------------
r2 <- project(r, "EPSG:3035")
crs(r2, describe = TRUE)
#>                            name EPSG
#> 1 ETRS89-extended / LAEA Europe 3035
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           area
#> 1 Europe - European Union (EU) countries and candidates. Europe - onshore and offshore: Albania; Andorra; Austria; Belgium; Bosnia and Herzegovina; Bulgaria; Croatia; Cyprus; Czechia; Denmark; Estonia; Faroe Islands; Finland; France; Germany; Gibraltar; Greece; Hungary; Iceland; Ireland; Italy; Kosovo; Latvia; Liechtenstein; Lithuania; Luxembourg; Malta; Monaco; Montenegro; Netherlands; North Macedonia; Norway including Svalbard and Jan Mayen; Poland; Portugal including Madeira and Azores; Romania; San Marino; Serbia; Slovakia; Slovenia; Spain including Canary Islands; Sweden; Switzerland; Turkey; United Kingdom (UK) including Channel Islands and Isle of Man; Vatican City State
#>                        extent
#> 1 -35.58, 44.83, 84.17, 24.60

# header:
# coord. ref. : ETRS89-extended / LAEA Europe ("EPSG:3035")

# example 3 ----------------------------
r3 <- rast(ncols=349, nrows=277, nlyrs=1, xmin=-16231.4942528736, xmax=11313351.4942529, ymin=-16231.5, ymax=8976019.5, names=c('vwsh_1'), crs="+proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=5632642.22547 +y_0=4612545.65137 +datum=WGS84")
crs(r3, describe = TRUE)
#>      name EPSG area         extent
#> 1 unknown <NA> <NA> NA, NA, NA, NA
is.lonlat(r3)
#> [1] FALSE

# header:
# coord. ref. : Unknown projected coordinate reference system
# example 4 ----------------------------
r4 <- rast(ncols=349, nrows=277, nlyrs=1, xmin=-16231.4942528736, xmax=11313351.4942529, ymin=-16231.5, ymax=8976019.5, names=c('vwsh_1'), crs="+proj=longlat")
crs(r4, describe = TRUE)
#>      name EPSG area         extent
#> 1 unknown <NA> <NA> NA, NA, NA, NA
is.lonlat(r4)
#> [1] TRUE

# header:
# coord. ref. : Unknown geographic (geodetic) coordinate reference system

Created on 2021-09-10 by the reprex package (v2.0.1)

@rhijmans
Copy link
Member

The most important reason for using the PROJ.4 string in show(object) is to provide clear information about the CRS on one line. I understand why you say it is suggestive of more that that. ​I suppose terra could use the patterns you suggest if the there is a name available and use PROJ.4 if there is not. I will consider that, but I am hesitant because to me, this:

+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m

is much more informative then

ETRS89-extended / LAEA Europe ("EPSG:3035")

Although for some it might be meaningful to see the name or EPSG code.

A hybrid, something like this:

EPSG:3035 / +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m

could also be an option

I think this is not very clear:

WGS 84 ("EPSG:4326")

Because instead of a name, a datum name is now given, and you have to understand that this implies lon/lat.

I know that the use of PROJ-string to describe a CRS is discouraged. But the stated reason that this improves expressivity is odd. WKT2 is surely more expressive for machines, but this is not the case for people.

PROJ.4 notation is the only available practical system that can be used in scripting. I think that should be a good enough reason to actively encourage its use and hope that the PROJ developers change their minds and again provide full support for it (perhaps in extended form).

As for what is stored in files; you may be right about GDAL, but not all files are written with GDAL.

@rsbivand
Copy link
Contributor

The sp "CRS" class has a projargs= slot, which is still a PROJ4 string, and a comment. If the comment is not NULL, it should contain a WKT2_2019 string, which is used if available. If not, sp workflows fall back to the PROJ4 string and issue a warning. Only the WKT2 string is "reliable", but the actual instantiation of both strings also may depend on the version of proj.db. The PROJ developers actively discourage the use of PROJ4 strings, as they simply cannot handle what is required now, and it gets worse with datum ensembles and epochs. Users rarely need more than a few EPSG codes for their study areas. Beyond that, it looks as though axis order is becoming more of a problem, as 4326 is Northing, Easting by specification, and GDAL/PROJ may have difficulty disambiguating data stored Easting, Northing but declaring 4326.

@edzer
Copy link
Contributor

edzer commented Sep 13, 2021

sf shows OGRSpatialReference::GetName() if this is not identical to "unknown", and the user input otherwise. I think that works pretty well: many parameterized PROJ.4 strings have no name so you see that string, many other ones have a name:

> st_crs(read_sf(system.file("gpkg/nc.gpkg", package="sf")))$Name
	- 'VirtualXPath'	[XML Path Language - XPath]
[1] "NAD27"
> st_crs(stars::L7_ETMs)$Name
[1] "UTM Zone 25, Southern Hemisphere"
> st_crs(4326)$Name
[1] "WGS 84"
> st_crs("EPSG:27700")$Name
[1] "OSGB 1936 / British National Grid"
> st_crs("+proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +datum=GGRS87 +units=m")$Name
[1] "unknown"

@rhijmans
Copy link
Member

With the dev version, I now get this behaviour:


library(terra)
#terra version 1.4.5
r <- rast()
r
#class       : SpatRaster 
#dimensions  : 180, 360, 1  (nrow, ncol, nlyr)
#resolution  : 1, 1  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 

crs(r) <- "EPSG:4326"
r
#class       : SpatRaster 
#dimensions  : 180, 360, 1  (nrow, ncol, nlyr)
#resolution  : 1, 1  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (EPSG:4326) 

crs(r) <- "EPSG:27700"
r
#class       : SpatRaster 
#dimensions  : 180, 360, 1  (nrow, ncol, nlyr)
#resolution  : 1, 1  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : OSGB 1936 / British National Grid (EPSG:27700) 

crs(r) <- "+proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +datum=WGS84 +units=m"
r
#class       : SpatRaster 
#dimensions  : 180, 360, 1  (nrow, ncol, nlyr)
#resolution  : 1, 1  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 

crs(r) <- "+proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +datum=GGRS87 +units=m"
#Warning message:
#[crs<-] Only the WGS84, NAD83 and NAD27 datums can be used with a PROJ.4 string. Use WKT2, authority:code, or +towgs84= instead 
r
#class       : SpatRaster 
#dimensions  : 180, 360, 1  (nrow, ncol, nlyr)
#resolution  : 1, 1  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=0 +y_0=0 +ellps=GRS80 +units=m +no_defs 

@Nowosad
Copy link
Contributor Author

Nowosad commented Sep 30, 2021

Thank you, Robert. I think these changes will be helpful to many users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants