Skip to content

Commit

Permalink
ARROW-16510: [R] Add bindings for GCS filesystem (#13404)
Browse files Browse the repository at this point in the history
This adds basic bindings for GcsFileSystem to R, turns it on in the macOS, Windows, and Linux packaging (same handling as ARROW_S3), and basic R tests.

Followups: 

- Bindings for FromImpersonatedServiceAccount (ARROW-16885)
- Set up testbench for fuller tests, like how we do with minio (ARROW-16879)
- GcsFileSystem::Make should return Result (ARROW-16884)
- Explore auth integration/compatibility with `gargle`, `googleAuthR`, etc.: can we pick up the same credentials they use (ARROW-16880)
- macOS binary packaging: push dependencies upstream (ARROW-16883)
- Windows binary packaging: push dependencies upstream (ARROW-16878)
- Update cloud/filesystem documentation (ARROW-16887)

Lead-authored-by: Neal Richardson <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Neal Richardson <[email protected]>
  • Loading branch information
nealrichardson and kou authored Jun 26, 2022
1 parent 65a6929 commit 3ac0959
Show file tree
Hide file tree
Showing 26 changed files with 627 additions and 234 deletions.
8 changes: 6 additions & 2 deletions .github/workflows/cpp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -276,8 +276,12 @@ jobs:
ARROW_DATASET: ON
ARROW_FLIGHT: ON
ARROW_GANDIVA: ON
# google-could-cpp uses _dupenv_s() but it can't be used with msvcrt.
# We need to use ucrt to use _dupenv_s().
# With GCS on,
# * MinGW 32 build OOMs (maybe turn off unity build?)
# * MinGW 64 fails to compile the GCS filesystem tests, some conflict
# with boost. First error says:
# D:/a/_temp/msys64/mingw64/include/boost/asio/detail/socket_types.hpp:24:4: error: #error WinSock.h has already been included
# TODO(ARROW-16906)
# ARROW_GCS: ON
ARROW_HDFS: OFF
ARROW_HOME: /mingw${{ matrix.mingw-n-bits }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ jobs:
name: AMD64 Windows C++ RTools ${{ matrix.config.rtools }} ${{ matrix.config.arch }}
runs-on: windows-2019
if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
timeout-minutes: 60
timeout-minutes: 90
strategy:
fail-fast: false
matrix:
Expand Down
5 changes: 5 additions & 0 deletions ci/scripts/PKGBUILD
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ arch=("any")
url="https://arrow.apache.org/"
license=("Apache-2.0")
depends=("${MINGW_PACKAGE_PREFIX}-aws-sdk-cpp"
"${MINGW_PACKAGE_PREFIX}-curl" # for google-cloud-cpp bundled build
"${MINGW_PACKAGE_PREFIX}-libutf8proc"
"${MINGW_PACKAGE_PREFIX}-re2"
"${MINGW_PACKAGE_PREFIX}-thrift"
Expand Down Expand Up @@ -79,11 +80,13 @@ build() {
export PATH="/C/Rtools${MINGW_PREFIX/mingw/mingw_}/bin:$PATH"
export CPPFLAGS="${CPPFLAGS} -I${MINGW_PREFIX}/include"
export LIBS="-L${MINGW_PREFIX}/libs"
export ARROW_GCS=OFF
export ARROW_S3=OFF
export ARROW_WITH_RE2=OFF
# Without this, some dataset functionality segfaults
export CMAKE_UNITY_BUILD=ON
else
export ARROW_GCS=ON
export ARROW_S3=ON
export ARROW_WITH_RE2=ON
# Without this, some compute functionality segfaults in tests
Expand All @@ -101,6 +104,7 @@ build() {
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_GCS="${ARROW_GCS}" \
-DARROW_HDFS=OFF \
-DARROW_JEMALLOC=OFF \
-DARROW_JSON=ON \
Expand All @@ -112,6 +116,7 @@ build() {
-DARROW_SNAPPY_USE_SHARED=OFF \
-DARROW_USE_GLOG=OFF \
-DARROW_UTF8PROC_USE_SHARED=OFF \
-DARROW_VERBOSE_THIRDPARTY_BUILD=ON \
-DARROW_WITH_LZ4=ON \
-DARROW_WITH_RE2="${ARROW_WITH_RE2}" \
-DARROW_WITH_SNAPPY=ON \
Expand Down
6 changes: 3 additions & 3 deletions ci/scripts/r_windows_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ if [ -d mingw64/lib/ ]; then
# These may be from https://dl.bintray.com/rtools/backports/
cp $MSYS_LIB_DIR/mingw64/lib/lib{thrift,snappy}.a $DST_DIR/${RWINLIB_LIB_DIR}/x64
# These are from https://dl.bintray.com/rtools/mingw{32,64}/
cp $MSYS_LIB_DIR/mingw64/lib/lib{zstd,lz4,brotli*,crypto,utf8proc,re2,aws*}.a $DST_DIR/lib/x64
cp $MSYS_LIB_DIR/mingw64/lib/lib{zstd,lz4,brotli*,crypto,curl,ss*,utf8proc,re2,aws*}.a $DST_DIR/lib/x64
fi

# Same for the 32-bit versions
Expand All @@ -97,15 +97,15 @@ if [ -d mingw32/lib/ ]; then
mkdir -p $DST_DIR/lib/i386
mv mingw32/lib/*.a $DST_DIR/${RWINLIB_LIB_DIR}/i386
cp $MSYS_LIB_DIR/mingw32/lib/lib{thrift,snappy}.a $DST_DIR/${RWINLIB_LIB_DIR}/i386
cp $MSYS_LIB_DIR/mingw32/lib/lib{zstd,lz4,brotli*,crypto,utf8proc,re2,aws*}.a $DST_DIR/lib/i386
cp $MSYS_LIB_DIR/mingw32/lib/lib{zstd,lz4,brotli*,crypto,curl,ss*,utf8proc,re2,aws*}.a $DST_DIR/lib/i386
fi

# Do the same also for ucrt64
if [ -d ucrt64/lib/ ]; then
ls $MSYS_LIB_DIR/ucrt64/lib/
mkdir -p $DST_DIR/lib/x64-ucrt
mv ucrt64/lib/*.a $DST_DIR/lib/x64-ucrt
cp $MSYS_LIB_DIR/ucrt64/lib/lib{thrift,snappy,zstd,lz4,brotli*,crypto,utf8proc,re2,aws*}.a $DST_DIR/lib/x64-ucrt
cp $MSYS_LIB_DIR/ucrt64/lib/lib{thrift,snappy,zstd,lz4,brotli*,crypto,curl,ss*,utf8proc,re2,aws*}.a $DST_DIR/lib/x64-ucrt
fi

# Create build artifact
Expand Down
31 changes: 31 additions & 0 deletions cpp/build-support/google-cloud-cpp-curl-static-windows.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

diff -ru google_cloud_cpp_ep.orig/cmake/FindCurlWithTargets.cmake google_cloud_cpp_ep/cmake/FindCurlWithTargets.cmake
--- google_cloud_cpp_ep.orig/cmake/FindCurlWithTargets.cmake 2022-04-05 06:00:53.000000000 +0900
+++ google_cloud_cpp_ep/cmake/FindCurlWithTargets.cmake 2022-06-24 10:06:00.177969962 +0900
@@ -68,6 +68,10 @@
TARGET CURL::libcurl
APPEND
PROPERTY INTERFACE_LINK_LIBRARIES crypt32 wsock32 ws2_32)
+ set_property(
+ TARGET CURL::libcurl
+ APPEND
+ PROPERTY INTERFACE_COMPILE_DEFINITIONS "CURL_STATICLIB")
endif ()
if (APPLE)
set_property(
Loading

0 comments on commit 3ac0959

Please sign in to comment.