Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Reading Shapefile with st_read since version 1.09... #2046

Closed
WayneGitShell opened this issue Nov 24, 2022 · 17 comments
Closed

Error Reading Shapefile with st_read since version 1.09... #2046

WayneGitShell opened this issue Nov 24, 2022 · 17 comments

Comments

@WayneGitShell
Copy link

WayneGitShell commented Nov 24, 2022

Description
As part of the GWSDAT R package - see www.gwsdat.net we ship out an example data set with a shapefile located here:
https://github.com/WayneGitShell/GWSDAT/tree/master/data/GIS_Files

To Reproduce

dat <- sf::st_read("./GIS_Files/GWSDATex2.shp")
Reading layer GWSDATex2' from data source C:\Users\Wayne.W.Jones\GitHub\GWSDAT_v3.12\data\GIS_Files\GWSDATex2.shp' using driver `ESRI Shapefile'
Error in CPL_get_z_range(obj, 2) : z error - expecting three columns;

Has been working fine for all previous versions. Nothing in the news to suggest this change of behaviour.

R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8 LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] sf_1.0-9

loaded via a namespace (and not attached):
[1] compiler_4.2.2 magrittr_2.0.3 class_7.3-20 tools_4.2.2 DBI_1.1.3 units_0.8-0 proxy_0.4-27 Rcpp_1.0.9 KernSmooth_2.23-20
[10] grid_4.2.2 e1071_1.7-12 classInt_0.4-8

Paste the output of your sessionInfo() and sf::sf_extSoftVersion()

sf::sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ
"3.9.3" "3.5.2" "8.2.1" "true" "true" "8.2.1"

@rsbivand
Copy link
Member

> o <- rgdal::readOGR("GWSDATex2.shp")
OGR data source with driver: ESRI Shapefile 
Source: "/home/rsb/tmp/GWSDATex2.shp", layer: "GWSDATex2"
with 415 features
It has 32 fields
Warning messages:
1: OGR support is provided by the sf and terra packages among others 
2: OGR support is provided by the sf and terra packages among others 
3: OGR support is provided by the sf and terra packages among others 
4: OGR support is provided by the sf and terra packages among others 
5: OGR support is provided by the sf and terra packages among others 
6: OGR support is provided by the sf and terra packages among others 
7: OGR support is provided by the sf and terra packages among others 
8: In rgdal::readOGR("GWSDATex2.shp") : Dropping null geometries: 415
9: In rgdal::readOGR("GWSDATex2.shp") : Z-dimension discarded
> o1 <- sf::st_read("GWSDATex2.shp")
Reading layer `GWSDATex2' from data source `/home/rsb/tmp/GWSDATex2.shp' using driver `ESRI Shapefile'
Error in CPL_get_z_range(obj, 2) : z error - expecting three columns;

The features seem to be line strings, which for the sp class drops any Z-dimension. There is no *.prj file, the geometries seem to be planar. One feature appears to have a NULL geometry, Handle "10000940".

Which of these oddities in the very broken shapefile causes problems is unknown. terra says:

> o2 <- terra::vect("GWSDATex2.shp")
Warning message:
[vect] Z coordinates ignored 
> o2
 class       : SpatVector 
 geometry    : lines 
 dimensions  : 415, 32  (geometries, attributes)
 extent      : 551627.6, 551710.7, 224411.1, 224493.2  (xmin, xmax, ymin, ymax)
 source      : GWSDATex2.shp
 coord. ref. :  
 names       :  FID_   Entity Handle Layer LyrFrzn LyrLock LyrOn LyrVPFrzn
 type        : <int>    <chr>  <chr> <chr>   <int>   <int> <int>     <int>
 values      :     0 Polyline    35E  KERB       0       0     1         0
                   0 Polyline    364  KERB       0       0     1         0
                   0 Polyline    368  KERB       0       0     1         0
 LyrHandle Color (and 22 more)
     <chr> <int>              
        7D     7              
        7D     7              
        7D     7              

so does not try to access the Z dimension. My guess would be that sf is now trying harder (or GDAL is trying harder, I'm using GDAL 3.6.0, the GDAL version you are using should be reported) to pick up inconsistencies. Please convert to GPKG and add a relevant CRS.

@WayneGitShell
Copy link
Author

Hi Roger,

I'm using "3.5.2" of GDAL - have updated original message to make my set-up visible.

These files were not generated by me and I'm no expert with Shapefiles. Any advice on tools to convert this shapefile to a geopackage, preferably with R?

@rsbivand
Copy link
Member

  • First you need the CRS of the data if the position of the coordinates is supposed to mean anything.
  • Was the z-dimension supposed to mean anything?
  • Was the null-geometry feature supposed to be there?

@WayneGitShell
Copy link
Author

Thanks for the pointers Roger - will investigate and get back to you.

@edzer
Copy link
Member

edzer commented Nov 24, 2022

This was mentioned and #1683 and re-opened in #1592, where I am still waiting for a solution from @dcooley who introduced CPL_get_z_range.

@rsbivand
Copy link
Member

In this case the shapefile is declared as PolylineZ but none of the matrices of vertices have three columns, all have two columns:

o <- maptools:::read.shape("GWSDATex2.shp") # legacy raw shp reader - defunct
sapply(o$Shapes, "[[", "shp.type")
sapply(o$Shapes, function(x) ncol(x[["verts"]]))

So I guess that the Z dimension is non-operative. In the attribute data, I see DocType is DXF for all features, suggesting that the spurious Z is coming from the example file having been converted at some point from AutoCAD. So CPL_get_z_range is not tolerant of an erroneously declared Z dimension when the data are 2D.

If the indications in the attribute data "Bishop's Stortford" indicate a site close to Stanstead airport, the CRS might be EPSG:27700, but that ends up in a field, so maybe a local surveying CRS was used. terra can read the file and can save it as a 2D GPKG, but a valid CRS would be very useful to be able to place the lines on a contextual map.

@WayneGitShell
Copy link
Author

Roger,

You are correct - the shapefile was generated from aAutoCAD .

When I read the Shapefile via terra and then write to a new Shapefile - the resulting shapefile can then be readin via the sf:st_read function:

library(terra); library(sf)
myshap<-vect("data\GIS_Files\GWSDATex2.shp")
Warning message:
[vect] Z coordinates ignored

writeVector(myshap,'shptest.shp' , overwrite=TRUE)
dat<-st_read("Documents/shptest.shp")
Reading layer shptest' from data source Documents\shptest.shp' using driver `ESRI Shapefile'
Simple feature collection with 415 features and 32 fields (with 1 geometry empty)
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: 551627.6 ymin: 224411.1 xmax: 551710.7 ymax: 224493.2
CRS: NA

This will serve as a temporary fix for me - but I will look in to this in more detail to avoid such issues arising again in the future,.

@kadyb
Copy link
Contributor

kadyb commented Nov 24, 2022

@WayneGitShell, you can also try to write these files without Z dimension using vectortranslate from {sf}:

gdal_utils(util = "vectortranslate",
           source = "GWSDATex2.shp",
           destination = "test.shp",
           options = c("-dim", "XY"))

@ar-puuk
Copy link

ar-puuk commented Dec 3, 2022

I tried using QGIS to export the shapefile to GeoPackage (.gpkg) and then imported it, and it worked perfectly fine. At least a temporary solution, I guess.

@aaronschiff
Copy link

I ran into this issue today too, for a .shp file that I previously had no trouble reading with st_read(). My temporary solution is to read the offending .shp file with readOGR() from the rgdal package and then convert it with st_as_sf().

@rsbivand
Copy link
Member

@aaronschiff Did rgdal::readOGR() report "Z-dimension discarded"? Please try to read using terra::vect(), and see whether it reports "Z coordinates ignored". Did you expect 3D objects? What kinds of objects did you expect - points, lines, polygons? Avoid using rgdal, it will be retired during 2023.

@aaronschiff
Copy link

@rsbivand yes, I did get those messages from readOGR() and vect(). I did not expect 3D objects, the shapefile should be 2D polygons only, but I believe it has (null?) Z values included. Here's a link to (one of) the shapefiles with this issue: https://koordinates.com/layer/105481-nz-police-area-boundaries-29-april-2021/

@amb26
Copy link

amb26 commented Dec 23, 2022

We have the same issue. Would be happy with a workaround that didn't involve deprecated libraries. Do indeed receive "Z-dimension discarded" for the shapefile at https://github.com/IMERSS/maxwell/tree/main/spatial_data/vectors/Shp_files/Watershed_CRD which was exported from QGIS.

@kadyb
Copy link
Contributor

kadyb commented Dec 23, 2022

@amb26, did you try to save geometry without Z dimension (#2046 (comment))?

@amb26
Copy link

amb26 commented Dec 23, 2022

@amb26, did you try to save geometry without Z dimension (#2046 (comment))?

Thanks, that is fine, but not an effective workaround - we need to work with the shapefiles as provided, so I would appreciate a scheme to handle this within R without the use of deprecated libraries or hitting the filesystem again.

@rsbivand
Copy link
Member

So read with terra::vect() and coerce to an ""sf" object, as in #2046 (comment).

Tell your upstream data provider to leave ESRI Shapefile in the last century where it belongs. GeoPackage is the choice for this century. Their malformed files state that three dimensions are provided then fail to provide them. In rgdal the workaround was to read all the geometries checking each for the correct # dimensions, go for 2D if some were 2D, and re-read all the geometries. With larger data sets now, re-reading the geometries is a waste of time, but because the format itself doesn't enforce 3D if declared 3D, we are stuck for this driver.

> o <- maptools:::read.shape("Wastersheds_CRD.shp")
Shapefile type: PolygonZ, (15), # of Shapes: 136
Warning messages:
1: shapelib support is provided by GDAL through the sf and terra paackages among others 
2: shapelib support is provided by GDAL through the sf and terra paackages among others 
3: shapelib support is provided by GDAL through the sf and terra paackages among others 
> sapply(o$Shapes, "[[", "shp.type")
  [1] 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15
 [26] 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15
 [51] 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15
 [76] 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15
[101] 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15
[126] 15 15 15 15 15 15 15 15 15 15 15
> sapply(o$Shapes, function(x) ncol(x[["verts"]]))
  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [38] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[112] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Type 15 is PolygonZ.

@edzer
Copy link
Member

edzer commented Dec 23, 2022

This also happens in GeoJSON files, see #1592; this fix should close the problem, read the mixes, then st_zm() does the rest:

library(sf)
# Linking to GEOS 3.11.1, GDAL 3.6.1, PROJ 9.1.1; sf_use_s2() is TRUE
read_sf("Wastersheds_CRD.shp")
# Simple feature collection with 136 features and 30 fields
# Geometry type: MULTIPOLYGON
# Dimension:     XY, XYZ
# Bounding box:  xmin: 455935 ymin: 5395725 xmax: 473082.9 ymax: 5421613
# z_range:       zmin: 0 zmax: 0
# Projected CRS: NAD83 / UTM zone 10N
# # A tibble: 136 × 31
#      fid OBJECTID Subtype LifeCycleS Name  AliasNa…¹ Alias…² Strea…³ ID    CRDID
#    <dbl>    <dbl>   <dbl> <chr>      <chr> <chr>     <chr>     <dbl> <chr> <chr>
#  1     1        1       1 ACT        <NA>  <NA>      <NA>         NA TEMP… 11693
#  2     2        2       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11701
#  3     3      724       1 ACT        <NA>  <NA>      <NA>         NA TEMP… 11508
#  4     4     1021       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11780
#  5     5     1042       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11776
#  6     6     1074       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11716
#  7     7     1076       3 ACT        <NA>  <NA>      <NA>         NA 8369  11719
#  8     8     1112       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11507
#  9     9     1128       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11468
# 10    10     1139       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11545
# # … with 126 more rows, 21 more variables: MgmtUnit <chr>, Comments <chr>,
# #   DataSource <chr>, DataProjec <chr>, DataDate <chr>, DataSuppli <chr>,
# #   DataAccura <chr>, DataCustod <chr>, MetadataCo <chr>, last_edite <chr>,
# #   last_edi_1 <chr>, created_da <chr>, created_us <chr>, Text1 <chr>,
# #   Text2 <chr>, Num1 <dbl>, Num2 <dbl>, CRDDischag <chr>, Shape.area <dbl>,
# #   Shape.len <dbl>, geometry <MULTIPOLYGON [m]>, and abbreviated variable
# #   names ¹​AliasName1, ²​AliasName2, ³​StreamOrde
read_sf("Wastersheds_CRD.shp") |> st_zm()
# Simple feature collection with 136 features and 30 fields
# Geometry type: MULTIPOLYGON
# Dimension:     XY
# Bounding box:  xmin: 455935 ymin: 5395725 xmax: 473082.9 ymax: 5421613
# Projected CRS: NAD83 / UTM zone 10N
# # A tibble: 136 × 31
#      fid OBJECTID Subtype LifeCycleS Name  AliasNa…¹ Alias…² Strea…³ ID    CRDID
#  * <dbl>    <dbl>   <dbl> <chr>      <chr> <chr>     <chr>     <dbl> <chr> <chr>
#  1     1        1       1 ACT        <NA>  <NA>      <NA>         NA TEMP… 11693
#  2     2        2       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11701
#  3     3      724       1 ACT        <NA>  <NA>      <NA>         NA TEMP… 11508
#  4     4     1021       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11780
#  5     5     1042       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11776
#  6     6     1074       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11716
#  7     7     1076       3 ACT        <NA>  <NA>      <NA>         NA 8369  11719
#  8     8     1112       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11507
#  9     9     1128       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11468
# 10    10     1139       3 ACT        <NA>  <NA>      <NA>         NA TEMP… 11545
# # … with 126 more rows, 21 more variables: MgmtUnit <chr>, Comments <chr>,
# #   DataSource <chr>, DataProjec <chr>, DataDate <chr>, DataSuppli <chr>,
# #   DataAccura <chr>, DataCustod <chr>, MetadataCo <chr>, last_edite <chr>,
# #   last_edi_1 <chr>, created_da <chr>, created_us <chr>, Text1 <chr>,
# #   Text2 <chr>, Num1 <dbl>, Num2 <dbl>, CRDDischag <chr>, Shape.area <dbl>,
# #   Shape.len <dbl>, geometry <MULTIPOLYGON [m]>, and abbreviated variable
# #   names ¹​AliasName1, ²​AliasName2, ³​StreamOrde

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants