Skip to content

Commit

Permalink
[DOCS] pre-commit: remove trailing whitespace from .md and .Rmd f… (
Browse files Browse the repository at this point in the history
  • Loading branch information
jbampton authored and jiayuasu committed Jan 8, 2024
1 parent 3988e96 commit 49e936a
Show file tree
Hide file tree
Showing 43 changed files with 533 additions and 533 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ repos:
- id: end-of-file-fixer
exclude: \.svg$|^docs/image|^spark/common/src/test/resources
- id: trailing-whitespace
files: \.(ipynb|java|py|R|scala|sh|xml|yaml|yml)$
exclude: ^docs-overrides/main\.html$|\.Rd$
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.38.0
hooks:
Expand Down
74 changes: 37 additions & 37 deletions R/vignettes/articles/apache-sedona.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ data_tbl <- copy_to(sc, data)
data_tbl
data_tbl %>%
transmute(geometry = st_geomfromtext(X1)) %>%
data_tbl %>%
transmute(geometry = st_geomfromtext(X1)) %>%
sdf_schema()
```
No automatic translation of `{sf}` objects is provided, they need to be converted to text (or binary) format before copying to spark.
Expand All @@ -146,18 +146,18 @@ data <- sf::st_read(here::here("../spark/common/src/test/resources/testPolygon.j
data %>% glimpse()
data_tbl <-
data_tbl <-
copy_to(
sc,
data %>%
mutate(geometry_wkb = geometry %>% sf::st_as_text()) %>%
sc,
data %>%
mutate(geometry_wkb = geometry %>% sf::st_as_text()) %>%
sf::st_drop_geometry(),
name = "data",
overwrite = TRUE
)
data_tbl %>%
transmute(geometry = st_geomfromtext(geometry_wkb)) %>%
data_tbl %>%
transmute(geometry = st_geomfromtext(geometry_wkb)) %>%
sdf_schema()
```

Expand All @@ -168,20 +168,20 @@ Loading data in R and then copying it to Spark will most likely not be the optim
```{r}
data_tbl <- spark_read_geojson(sc, path = here::here("../spark/common/src/test/resources/testPolygon.json"), name = "data")
data_tbl %>%
data_tbl %>%
glimpse()
data_tbl %>%
# select(geometry) %>%
sdf_schema() %>%
data_tbl %>%
# select(geometry) %>%
sdf_schema() %>%
lobstr::tree()
```


## Manipulating

The dbplyr interface transparently translates dbplyr workflows into SQL, and gives access to all Apache Sedona SQL functions:

* [Vector functions](../../../api/sql/Function/)
* [Vector predicates](../../../api/sql/Predicate/)
* [Vector aggregate functions](../../../api/sql/AggregateFunction/)
Expand All @@ -191,45 +191,45 @@ Results are then collected back into R with `collect`.

```{r}
## ! ST_transform uses lon/lat order since v1.5.0. Before, it used lat/lon order.
data_tbl %>%
data_tbl %>%
mutate(
ALAND = ALAND %>% as.numeric(),
AWATER = AWATER %>% as.numeric(),
area = ALAND + AWATER,
geometry_proj = st_transform(geometry, "epsg:4326", "epsg:5070", TRUE),
area_geom = st_area(geometry_proj)
) %>%
select(STATEFP, COUNTYFP, area, area_geom) %>%
head() %>%
) %>%
select(STATEFP, COUNTYFP, area, area_geom) %>%
head() %>%
collect()
```

Geometries need to be converted to a serializable (text or binary) format before `collect` is called:
```{r}
## Setting the CRS in R post-collect
data_tbl %>%
data_tbl %>%
mutate(
area = st_area(st_transform(geometry, "epsg:4326", "epsg:5070", TRUE)),
geometry_wkb = geometry %>% st_asBinary()
) %>%
select(COUNTYFP, geometry_wkb) %>%
head() %>%
collect() %>%
) %>%
select(COUNTYFP, geometry_wkb) %>%
head() %>%
collect() %>%
sf::st_as_sf(crs = 4326)
```


```{r}
## Setting the CRS in Spark (and using EWKT to keep it)
data_tbl %>%
data_tbl %>%
mutate(
area = st_area(st_transform(geometry, "epsg:4326", "epsg:5070", TRUE)),
geometry_ewkt = geometry %>% st_setsrid(4326) %>% st_asewkt()
) %>%
select(COUNTYFP, geometry_ewkt) %>%
head() %>%
collect() %>%
sf::st_as_sf(wkt = "geometry_ewkt")
) %>%
select(COUNTYFP, geometry_ewkt) %>%
head() %>%
collect() %>%
sf::st_as_sf(wkt = "geometry_ewkt")
```


Expand All @@ -238,8 +238,8 @@ Collected results can be saved from R. In many cases it will be more efficient t

```{r}
dest_file <- tempfile() ## Destination folder
data_tbl %>%
filter(str_sub(COUNTYFP, 1, 2) == "00") %>%
data_tbl %>%
filter(str_sub(COUNTYFP, 1, 2) == "00") %>%
spark_write_geoparquet(path = dest_file)
dest_file %>% dir(recursive = TRUE)
Expand All @@ -249,8 +249,8 @@ The output can be partitioned by the columns present in the data:

```{r}
dest_file <- tempfile() ## Destination folder
data_tbl %>%
filter(str_sub(COUNTYFP, 1, 2) == "00") %>%
data_tbl %>%
filter(str_sub(COUNTYFP, 1, 2) == "00") %>%
spark_write_geoparquet(path = dest_file, partition_by = "COUNTYFP")
dest_file %>% dir(recursive = TRUE)
Expand Down Expand Up @@ -439,21 +439,21 @@ sc <- spark_connect(master = "local", config = config)

Check
```{r}
invoke_new(sc, "org.apache.sedona.core.utils.SedonaConf", invoke(spark_session(sc), "conf"))
invoke_new(sc, "org.apache.sedona.core.utils.SedonaConf", invoke(spark_session(sc), "conf"))
```
(Still true)

Or change at runtime:
```{r}
spark_session(sc) %>%
invoke("conf") %>%
spark_session(sc) %>%
invoke("conf") %>%
invoke("set", "sedona.global.index","false")
invoke_new(sc, "org.apache.sedona.core.utils.SedonaConf", invoke(spark_session(sc), "conf"))
invoke_new(sc, "org.apache.sedona.core.utils.SedonaConf", invoke(spark_session(sc), "conf"))
```

```{r}
invoke_new(sc, "org.apache.sedona.core.utils.SedonaConf", invoke(spark_session(sc), "conf"))
invoke_new(sc, "org.apache.sedona.core.utils.SedonaConf", invoke(spark_session(sc), "conf"))
```


Expand Down
42 changes: 21 additions & 21 deletions R/vignettes/articles/raster.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@ sc <- spark_connect(master = "local")
data_tbl <- spark_read_binary(sc, dir = here::here("/../spark/common/src/test/resources/raster/"), name = "data")
raster <-
data_tbl %>%
raster <-
data_tbl %>%
mutate(raster = RS_FromGeoTiff(content))
raster
raster
raster %>% sdf_schema()
```
Expand All @@ -55,34 +55,34 @@ Functions taking in `raster: Raster` arguments are meant to be used with data lo

For example, getting the number of bands:
```{r}
raster %>%
raster %>%
mutate(
nbands = RS_NumBands(raster)
) %>%
select(path, nbands) %>%
collect() %>%
) %>%
select(path, nbands) %>%
collect() %>%
mutate(path = path %>% basename())
```

Or getting values the envelope:
```{r}
raster %>%
raster %>%
mutate(
env = RS_Envelope(raster) %>% st_astext()
) %>%
select(path, env) %>%
collect() %>%
) %>%
select(path, env) %>%
collect() %>%
mutate(path = path %>% basename())
```

Or getting values at specific points:
```{r}
raster %>%
raster %>%
mutate(
val = RS_Value(raster, ST_Point(-13077301.685, 4002565.802))
) %>%
select(path, val) %>%
collect() %>%
) %>%
select(path, val) %>%
collect() %>%
mutate(path = path %>% basename())
```

Expand All @@ -96,8 +96,8 @@ To write a Sedona binary DataFrame to external storage using Sedona's built-in `

```{r}
dest_file <- tempfile()
raster %>%
mutate(content = RS_AsGeoTiff(raster)) %>%
raster %>%
mutate(content = RS_AsGeoTiff(raster)) %>%
spark_write_raster(path = dest_file)
dir(dest_file, recursive = TRUE)
Expand All @@ -112,10 +112,10 @@ Available options see [Raster writer](../../../api/sql/Raster-writer/):

```{r}
dest_file <- tempfile()
raster %>%
mutate(content = RS_AsArcGrid(raster)) %>%
spark_write_raster(path = dest_file,
options = list("rasterField" = "content",
raster %>%
mutate(content = RS_AsArcGrid(raster)) %>%
spark_write_raster(path = dest_file,
options = list("rasterField" = "content",
"fileExtension" = ".asc",
"pathField" = "path"
))
Expand Down
10 changes: 5 additions & 5 deletions docs/api/flink/Constructor.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ Example:
SELECT ST_GeomFromGML('
<gml:LineString srsName="EPSG:4269">
<gml:coordinates>
-71.16028,42.258729
-71.16028,42.258729
-71.160837,42.259112
-71.161143,42.25932
</gml:coordinates>
Expand Down Expand Up @@ -131,7 +131,7 @@ Example:
SELECT ST_GeomFromKML('
<LineString>
<coordinates>
-71.1663,42.2614
-71.1663,42.2614
-71.1667,42.2616
</coordinates>
</LineString>
Expand Down Expand Up @@ -397,10 +397,10 @@ POINT (1.2345 2.3456)

## ST_PointZ

Introduction: Construct a Point from X, Y and Z and an optional srid. If srid is not set, it defaults to 0 (unknown).
Introduction: Construct a Point from X, Y and Z and an optional srid. If srid is not set, it defaults to 0 (unknown).
Must use ST_AsEWKT function to print the Z coordinate.

Format:
Format:

`ST_PointZ (X: Double, Y: Double, Z: Double)`

Expand Down Expand Up @@ -444,7 +444,7 @@ POINT (40.7128 -74.006)

Introduction: Construct a Polygon from MinX, MinY, MaxX, MaxY.

Format:
Format:

`ST_PolygonFromEnvelope (MinX: Double, MinY: Double, MaxX: Double, MaxY: Double)`

Expand Down
Loading

0 comments on commit 49e936a

Please sign in to comment.