GH-348: [Release] Update versions for 16.0.0 (#357)

Closes #348 This PR: - Updates versions from 15.0.0 to 16.0.0 - Updates the R cookbook to pass under 16.0.0 - Updates the C++ cookbook to pass under 16.0.0 - Changes `test_cpp_cookbook.yml` to now always test against stable libarrow. This is a workaround for not having an up-to-date conda-nightlies channel, see apache/arrow#41856 - Adds a note in the top-level README about the status of the dev cookbooks
apache · Oct 31, 2024 · 11e08a2 · 11e08a2
1 parent 51e2880
commit 11e08a2
Show file tree

Hide file tree

Showing 12 changed files with 398 additions and 434 deletions.
diff --git a/.github/workflows/test_cpp_cookbook.yml b/.github/workflows/test_cpp_cookbook.yml
@@ -48,16 +48,7 @@ jobs:
           path: ~/conda_pkgs_dir
           key:
             ${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-${{ hashFiles('cpp/conda-linux-64.lock') }}
-      - name: Setup latest environment
-        if: github.event.pull_request.base.ref == 'main'
-        uses: conda-incubator/setup-miniconda@v2
-        with:
-          auto-update-conda: true
-          activate-environment: cookbook-cpp-dev
-          environment-file: cpp/dev.yml
-          auto-activate-base: false
       - name: Setup stable environment
-        if: github.event.pull_request.base.ref == 'stable'
         uses: conda-incubator/setup-miniconda@v2
         with:
           auto-update-conda: true

diff --git a/README.rst b/README.rst
@@ -44,6 +44,8 @@ See https://arrow.apache.org/cookbook/ for the latest published version using th
 latest stable version of Apache Arrow.
 See https://arrow.apache.org/cookbook/dev for the latest published version using
 the development version of Apache Arrow.
+Please note that the development version of the cookbook will be out of date
+while we work to rebuild some of our nightly build infrastructure.
 
 Building All Cookbooks
 ----------------------

diff --git a/cpp/code/CMakeLists.txt b/cpp/code/CMakeLists.txt
@@ -77,7 +77,6 @@ recipe(flight)
 
 
 # Add protobuf to flight
-find_package(Protobuf REQUIRED)
 find_package(gRPC CONFIG REQUIRED)
 find_package(Threads)
 

diff --git a/cpp/conda-linux-64.lock b/cpp/conda-linux-64.lock
diff --git a/cpp/conda-linux-aarch64.lock b/cpp/conda-linux-aarch64.lock
diff --git a/cpp/conda-osx-arm64.lock b/cpp/conda-osx-arm64.lock
diff --git a/cpp/environment.yml b/cpp/environment.yml
@@ -20,10 +20,11 @@ channels:
 dependencies:
   - python=3.9
   - compilers
-  - libarrow==15.0.2
+  - libarrow==16.0.0
+  - libarrow-flight==16.0.0
   - sphinx
   - gtest
   - gmock
-  - pyarrow==15.0.2
+  - pyarrow==16.0.0
   - clang-tools
   - zlib
diff --git a/java/source/conf.py b/java/source/conf.py
@@ -38,9 +38,9 @@
 author = 'The Apache Software Foundation'
 arrow_nightly=os.getenv("ARROW_NIGHTLY")
 if arrow_nightly and arrow_nightly != '0':
-    version = "16.0.0-SNAPSHOT"
+    version = "17.0.0-SNAPSHOT"
 else:
-    version = "15.0.2"
+    version = "16.0.0"
 print(f"Running with Arrow version: {version}")
 
 # -- General configuration ---------------------------------------------------

diff --git a/java/source/demo/pom.xml b/java/source/demo/pom.xml
@@ -41,7 +41,7 @@
     <properties>
         <maven.compiler.source>11</maven.compiler.source>
         <maven.compiler.target>11</maven.compiler.target>
-        <arrow.version>15.0.2</arrow.version>
+        <arrow.version>16.0.0</arrow.version>
     </properties>
     <dependencies>
         <dependency>

diff --git a/python/requirements.txt b/python/requirements.txt
@@ -1,5 +1,5 @@
 Sphinx>=4.0.2
-pyarrow==15.0.2
+pyarrow==16.0.0
 pandas>=1.2.5
 opentelemetry-api>=1.0.0
 opentelemetry-sdk>=1.0.0
diff --git a/r/content/specify_data_types_and_schemas.Rmd b/r/content/specify_data_types_and_schemas.Rmd
@@ -21,19 +21,19 @@
 
 ## Introduction
 
-As discussed in previous chapters, Arrow automatically infers the most 
-appropriate data type when reading in data or converting R objects to Arrow 
-objects.  However, you might want to manually tell Arrow which data types to 
-use, for example, to ensure interoperability with databases and data warehouse 
+As discussed in previous chapters, Arrow automatically infers the most
+appropriate data type when reading in data or converting R objects to Arrow
+objects.  However, you might want to manually tell Arrow which data types to
+use, for example, to ensure interoperability with databases and data warehouse
 systems.  This chapter includes recipes for:
 
 * changing the data types of existing Arrow objects
 * defining data types during the process of creating Arrow objects
 
-A table showing the default mappings between R and Arrow data types can be found 
+A table showing the default mappings between R and Arrow data types can be found
 in [R data type to Arrow data type mappings](https://arrow.apache.org/docs/r/articles/arrow.html#r-to-arrow).
 
-A table containing Arrow data types, and their R equivalents can be found in 
+A table containing Arrow data types, and their R equivalents can be found in
 [Arrow data type to R data type mapping](https://arrow.apache.org/docs/r/articles/arrow.html#arrow-to-r).
 
 ## Update data type of an existing Arrow Array
@@ -63,7 +63,7 @@ test_that("cast_array works as expected", {
 
 ### Discussion
 
-There are some data types which are not compatible with each other. Errors will 
+There are some data types which are not compatible with each other. Errors will
 occur if you try to cast between incompatible data types.
 
 ```{r, incompat, eval = FALSE}
@@ -120,48 +120,9 @@ test_that("cast_table works as expected", {
 })
 ```
 
-### Discussion {#no-compat-type}
-
-There are some Arrow data types which do not have any R equivalent.  Attempting 
-to cast to these data types or using a schema which contains them will result in
-an error.
-
-```{r, float_16_conversion, error=TRUE, eval=FALSE}
-# Set up a tibble to use in this example
-oscars <- tibble::tibble(
-  actor = c("Katharine Hepburn", "Meryl Streep", "Jack Nicholson"),
-  num_awards = c(4, 3, 3)
-)
-
-# Convert tibble to an Arrow table
-oscars_arrow <- arrow_table(oscars)
-
-# Set up schema with "num_awards" as float16 which doesn't have an R equivalent
-oscars_schema_invalid <- schema(actor = string(), num_awards = float16())
-
-# The default mapping from numeric column "num_awards" is to a double
-oscars_arrow$cast(target_schema = oscars_schema_invalid)
-```
-
-```{r}
-## Error: NotImplemented: Unsupported cast from double to halffloat using function cast_half_float
-```
-
-```{r, test_float_16_conversion, opts.label = "test"}
-test_that("float_16_conversion works as expected", {
-  
-  oscars_schema_invalid <- schema(actor = string(), num_awards = float16())
-  
-  expect_error(
-    oscars_arrow$cast(target_schema = oscars_schema_invalid),
-    "NotImplemented: Unsupported cast from double to halffloat using function cast_half_float"
-  )
-})
-```
-
 ## Specify data types when creating an Arrow table from an R object
 
-You want to manually specify Arrow data types when converting an object from a 
+You want to manually specify Arrow data types when converting an object from a
 data frame to an Arrow object.
 
 ### Solution
@@ -226,4 +187,3 @@ test_that("use_schema_dataset works as expected", {
 ```{r, include=FALSE}
 unlink("oscars_data", recursive = TRUE)
 ```
-
diff --git a/r/content/tables.Rmd b/r/content/tables.Rmd
@@ -191,9 +191,7 @@ test_that("dplyr_func_warning", {
      arrow_table(starwars) %>%
       mutate(name_split = str_split_fixed(name, " ", 2)) %>%
       collect(),
-    'Expression str_split_fixed(name, " ", 2) not supported in Arrow; pulling data into R',
-    fixed = TRUE
-  )
+    'In str_split_fixed\\(name, " ", 2\\):.*Expression not supported in Arrow.*Pulling data into R')
 
 })
 ```
@@ -415,4 +413,4 @@ You can perform these window aggregate operations on Arrow tables by:
 - Computing the aggregation separately, and joining the result
 - Passing the data to DuckDB, and use the DuckDB query engine to perform the operations
 
-Arrow supports zero-copy integration with DuckDB, and DuckDB can query Arrow datasets directly and stream query results back to Arrow. This integreation uses zero-copy streaming of data between DuckDB and Arrow and vice versa so that you can compose a query using both together, all the while not paying any cost to (re)serialize the data when you pass it back and forth. This is especially useful in cases where something is supported in one of Arrow or DuckDB query engines but not the other. You can find more information about this integration on the [Arrow blog post](https://arrow.apache.org/blog/2021/12/03/arrow-duckdb/).
+Arrow supports zero-copy integration with DuckDB, and DuckDB can query Arrow datasets directly and stream query results back to Arrow. This integreation uses zero-copy streaming of data between DuckDB and Arrow and vice versa so that you can compose a query using both together, all the while not paying any cost to (re)serialize the data when you pass it back and forth. This is especially useful in cases where something is supported in one of Arrow or DuckDB query engines but not the other. You can find more information about this integration on the [Arrow blog post](https://arrow.apache.org/blog/2021/12/03/arrow-duckdb/).