Skip to content

Commit

Permalink
GH-348: [Release] Update versions for 16.0.0 (#357)
Browse files Browse the repository at this point in the history
Closes #348

This PR:

- Updates versions from 15.0.0 to 16.0.0
- Updates the R cookbook to pass under 16.0.0
- Updates the C++ cookbook to pass under 16.0.0
- Changes `test_cpp_cookbook.yml` to now always test against stable libarrow.
  This is a workaround for not having an up-to-date
  conda-nightlies channel, see
  apache/arrow#41856
- Adds a note in the top-level README about the status of the dev cookbooks
  • Loading branch information
amoeba authored Oct 31, 2024
1 parent 51e2880 commit 11e08a2
Show file tree
Hide file tree
Showing 12 changed files with 398 additions and 434 deletions.
9 changes: 0 additions & 9 deletions .github/workflows/test_cpp_cookbook.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,16 +48,7 @@ jobs:
path: ~/conda_pkgs_dir
key:
${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-${{ hashFiles('cpp/conda-linux-64.lock') }}
- name: Setup latest environment
if: github.event.pull_request.base.ref == 'main'
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
activate-environment: cookbook-cpp-dev
environment-file: cpp/dev.yml
auto-activate-base: false
- name: Setup stable environment
if: github.event.pull_request.base.ref == 'stable'
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
Expand Down
2 changes: 2 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ See https://arrow.apache.org/cookbook/ for the latest published version using th
latest stable version of Apache Arrow.
See https://arrow.apache.org/cookbook/dev for the latest published version using
the development version of Apache Arrow.
Please note that the development version of the cookbook will be out of date
while we work to rebuild some of our nightly build infrastructure.

Building All Cookbooks
----------------------
Expand Down
1 change: 0 additions & 1 deletion cpp/code/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,6 @@ recipe(flight)


# Add protobuf to flight
find_package(Protobuf REQUIRED)
find_package(gRPC CONFIG REQUIRED)
find_package(Threads)

Expand Down
252 changes: 128 additions & 124 deletions cpp/conda-linux-64.lock

Large diffs are not rendered by default.

253 changes: 128 additions & 125 deletions cpp/conda-linux-aarch64.lock

Large diffs are not rendered by default.

240 changes: 123 additions & 117 deletions cpp/conda-osx-arm64.lock

Large diffs are not rendered by default.

5 changes: 3 additions & 2 deletions cpp/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,11 @@ channels:
dependencies:
- python=3.9
- compilers
- libarrow==15.0.2
- libarrow==16.0.0
- libarrow-flight==16.0.0
- sphinx
- gtest
- gmock
- pyarrow==15.0.2
- pyarrow==16.0.0
- clang-tools
- zlib
4 changes: 2 additions & 2 deletions java/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@
author = 'The Apache Software Foundation'
arrow_nightly=os.getenv("ARROW_NIGHTLY")
if arrow_nightly and arrow_nightly != '0':
version = "16.0.0-SNAPSHOT"
version = "17.0.0-SNAPSHOT"
else:
version = "15.0.2"
version = "16.0.0"
print(f"Running with Arrow version: {version}")

# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion java/source/demo/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<arrow.version>15.0.2</arrow.version>
<arrow.version>16.0.0</arrow.version>
</properties>
<dependencies>
<dependency>
Expand Down
2 changes: 1 addition & 1 deletion python/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Sphinx>=4.0.2
pyarrow==15.0.2
pyarrow==16.0.0
pandas>=1.2.5
opentelemetry-api>=1.0.0
opentelemetry-sdk>=1.0.0
56 changes: 8 additions & 48 deletions r/content/specify_data_types_and_schemas.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,19 @@

## Introduction

As discussed in previous chapters, Arrow automatically infers the most
appropriate data type when reading in data or converting R objects to Arrow
objects. However, you might want to manually tell Arrow which data types to
use, for example, to ensure interoperability with databases and data warehouse
As discussed in previous chapters, Arrow automatically infers the most
appropriate data type when reading in data or converting R objects to Arrow
objects. However, you might want to manually tell Arrow which data types to
use, for example, to ensure interoperability with databases and data warehouse
systems. This chapter includes recipes for:

* changing the data types of existing Arrow objects
* defining data types during the process of creating Arrow objects

A table showing the default mappings between R and Arrow data types can be found
A table showing the default mappings between R and Arrow data types can be found
in [R data type to Arrow data type mappings](https://arrow.apache.org/docs/r/articles/arrow.html#r-to-arrow).

A table containing Arrow data types, and their R equivalents can be found in
A table containing Arrow data types, and their R equivalents can be found in
[Arrow data type to R data type mapping](https://arrow.apache.org/docs/r/articles/arrow.html#arrow-to-r).

## Update data type of an existing Arrow Array
Expand Down Expand Up @@ -63,7 +63,7 @@ test_that("cast_array works as expected", {

### Discussion

There are some data types which are not compatible with each other. Errors will
There are some data types which are not compatible with each other. Errors will
occur if you try to cast between incompatible data types.

```{r, incompat, eval = FALSE}
Expand Down Expand Up @@ -120,48 +120,9 @@ test_that("cast_table works as expected", {
})
```

### Discussion {#no-compat-type}

There are some Arrow data types which do not have any R equivalent. Attempting
to cast to these data types or using a schema which contains them will result in
an error.

```{r, float_16_conversion, error=TRUE, eval=FALSE}
# Set up a tibble to use in this example
oscars <- tibble::tibble(
actor = c("Katharine Hepburn", "Meryl Streep", "Jack Nicholson"),
num_awards = c(4, 3, 3)
)
# Convert tibble to an Arrow table
oscars_arrow <- arrow_table(oscars)
# Set up schema with "num_awards" as float16 which doesn't have an R equivalent
oscars_schema_invalid <- schema(actor = string(), num_awards = float16())
# The default mapping from numeric column "num_awards" is to a double
oscars_arrow$cast(target_schema = oscars_schema_invalid)
```

```{r}
## Error: NotImplemented: Unsupported cast from double to halffloat using function cast_half_float
```

```{r, test_float_16_conversion, opts.label = "test"}
test_that("float_16_conversion works as expected", {
oscars_schema_invalid <- schema(actor = string(), num_awards = float16())
expect_error(
oscars_arrow$cast(target_schema = oscars_schema_invalid),
"NotImplemented: Unsupported cast from double to halffloat using function cast_half_float"
)
})
```

## Specify data types when creating an Arrow table from an R object

You want to manually specify Arrow data types when converting an object from a
You want to manually specify Arrow data types when converting an object from a
data frame to an Arrow object.

### Solution
Expand Down Expand Up @@ -226,4 +187,3 @@ test_that("use_schema_dataset works as expected", {
```{r, include=FALSE}
unlink("oscars_data", recursive = TRUE)
```

6 changes: 2 additions & 4 deletions r/content/tables.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -191,9 +191,7 @@ test_that("dplyr_func_warning", {
arrow_table(starwars) %>%
mutate(name_split = str_split_fixed(name, " ", 2)) %>%
collect(),
'Expression str_split_fixed(name, " ", 2) not supported in Arrow; pulling data into R',
fixed = TRUE
)
'In str_split_fixed\\(name, " ", 2\\):.*Expression not supported in Arrow.*Pulling data into R')
})
```
Expand Down Expand Up @@ -415,4 +413,4 @@ You can perform these window aggregate operations on Arrow tables by:
- Computing the aggregation separately, and joining the result
- Passing the data to DuckDB, and use the DuckDB query engine to perform the operations

Arrow supports zero-copy integration with DuckDB, and DuckDB can query Arrow datasets directly and stream query results back to Arrow. This integreation uses zero-copy streaming of data between DuckDB and Arrow and vice versa so that you can compose a query using both together, all the while not paying any cost to (re)serialize the data when you pass it back and forth. This is especially useful in cases where something is supported in one of Arrow or DuckDB query engines but not the other. You can find more information about this integration on the [Arrow blog post](https://arrow.apache.org/blog/2021/12/03/arrow-duckdb/).
Arrow supports zero-copy integration with DuckDB, and DuckDB can query Arrow datasets directly and stream query results back to Arrow. This integreation uses zero-copy streaming of data between DuckDB and Arrow and vice versa so that you can compose a query using both together, all the while not paying any cost to (re)serialize the data when you pass it back and forth. This is especially useful in cases where something is supported in one of Arrow or DuckDB query engines but not the other. You can find more information about this integration on the [Arrow blog post](https://arrow.apache.org/blog/2021/12/03/arrow-duckdb/).

0 comments on commit 11e08a2

Please sign in to comment.