Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41952: [R] Turn S3 and ZSTD on by default for macOS #42210

Merged
merged 6 commits into from
Jun 23, 2024

Conversation

jonkeane
Copy link
Member

@jonkeane jonkeane commented Jun 19, 2024

Changeup nixlibs.R so that we enable S3 and ZSTD by default on CRAN. I've checked this against the CRAN macbuilder to confirm it does work (that's not a guarantee it'll work on other CRAN maintained macOS builders, but a good indication).

It also removes gcs from the list of features we expect to be on and warn folks about if it is not.

Rationale for this change

So that our builds are more fully featured.

What changes are included in this PR?

Enable ARROW_S3 by default when building form source on macOS.

Are these changes tested?

Existing CI should not fail.

Are there any user-facing changes?

Getting Arrow from the canonical repository would be more fully featured.

Copy link

⚠️ GitHub issue #41952 has been automatically assigned in GitHub to PR creator.

@@ -536,7 +536,7 @@ build_libarrow <- function(src_dir, dst_dir) {
}
cleanup(build_dir)

env_var_list <- c(
env_var_list <- list(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't technically necessary, but it was the easiest way to do what I wanted to in is_feature_requested(). Further, it's a bit strange that this thing called X_X_list is not a list.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 19, 2024
Comment on lines 819 to 825
is_feature_requested <- function(env_varname, env_var_list, default = env_is("LIBARROW_MINIMAL", "false")) {
# look in our env_var_list first, if it's not found there go to
# the actual environment
env_value <- tolower(env_var_list[[env_varname]])
if (is.null(env_value)) {
env_value <- tolower(Sys.getenv(env_varname))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change in behavior that warrants some thought. Though I will say this is how I thought this worked already — or rather I thought is_feature_requested() was looking at env_var_list and not only looking at the literal environment. It's possible this was a mistake / it should have never looked at the environment but instead be looking at the env_var_list. But I kept it for now in case this is actually intended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original intent of the function was to check whether the user requested a feature. env_var_list doesn't contain any user-requested features, at least not in the active sense. But now that you're putting some default selections in there, it makes sense to check it here. The function is currently only used to check for S3/GCS, to know whether we need to have openssl and curl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AAAAH I was missing the user- part there, that makes it make more sense now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See, this is why passive voice is bad.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 19, 2024
@jonkeane jonkeane changed the title GH-41952: [R] Turn S3 on by default for macOS GH-41952: [R] Turn S3, GCS, and ZSTD on by default for macOS Jun 19, 2024
@jonkeane
Copy link
Member Author

I also noticed two other optional libraries that were being flagged as not a full build. I added those, and ran mac builder: https://mac.r-project.org/macbuilder/results/1718828010-3fa0fa8b00b3b11c/

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the least bad option for turning these features on. The other places where we handle defaults, like by the value of LIBARROW_MINIMAL, aren't good for S3/GCS because we have to check up front for openssl/curl.

r/tools/nixlibs.R Outdated Show resolved Hide resolved
Comment on lines 819 to 825
is_feature_requested <- function(env_varname, env_var_list, default = env_is("LIBARROW_MINIMAL", "false")) {
# look in our env_var_list first, if it's not found there go to
# the actual environment
env_value <- tolower(env_var_list[[env_varname]])
if (is.null(env_value)) {
env_value <- tolower(Sys.getenv(env_varname))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original intent of the function was to check whether the user requested a feature. env_var_list doesn't contain any user-requested features, at least not in the active sense. But now that you're putting some default selections in there, it makes sense to check it here. The function is currently only used to check for S3/GCS, to know whether we need to have openssl and curl.

Co-authored-by: Neal Richardson <[email protected]>
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 20, 2024
@jonkeane
Copy link
Member Author

jonkeane commented Jun 20, 2024

I also wonder whether GCS is worth including.

I agree + though similar when I checked if S3 alone was sufficient for quieting the message. I'm happy to pull GCS out of this, though IMO we should also add GCS to the block list so that it's not being checked if we do that too.

I'm happy to do this in this PR if we want.

@nealrichardson
Copy link
Member

I also wonder whether GCS is worth including.

I agree + though similar when I checked if S3 alone was sufficient for quieting the message. I'm happy to pull GCS out of this, though IMO we should also add GCS to the block list so that it's not being checked if we do that too.

I'm happy to do this in this PR if we want.

Right, we would want to change that message.

On the one hand, users won't generally experience this because they'll get a binary from CRAN. On the other, GCS increases build time there and increases the BDR attack surface. I don't know how big that risk is and if demand for GCS outweighs it--I haven't noticed demand for GCS, but maybe I'm not looking in the right places.

I guess we could leave it on, and if we get dinged on CRAN, we know how to turn it back off?

@jonkeane
Copy link
Member Author

MM, I think you've convinced me that we should pull GCS back for now (and see if someone requests it before we open up our attack surface more). At the very least, it would be nice to send something with a more minimal expansion at first in case the CRAN macOS builders are different in key ways, we'll know that it's just an issue with S3 or zstd now. And maybe add GCS in 18 (or only when someone asks for it in a CRAN release). Like you said, it's easy enough to see how to change these now.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 20, 2024
@jonkeane jonkeane changed the title GH-41952: [R] Turn S3, GCS, and ZSTD on by default for macOS GH-41952: [R] Turn S3 and ZSTD on by default for macOS Jun 20, 2024
@jonkeane
Copy link
Member Author

@github-actions crossbow submit -g r

Copy link

Revision: 548b6b2

Submitted crossbow builds: ursacomputing/crossbow @ actions-e5cdc79a11

Task Status
r-binary-packages GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-clang-sanitizer GitHub Actions
test-r-depsource-bundled Azure
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-minimal-build Azure
test-r-offline-maximal GitHub Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-release-latest Azure
test-r-rocker-r-ver-latest Azure
test-r-rstudio-r-base-4.1-opensuse155 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions
test-ubuntu-r-sanitizer GitHub Actions

@assignUser
Copy link
Member

I also wonder whether GCS is worth including.

IMO the CRAN builds should be as feature complete as possible as they are (well after the issue with the last two released binaries maybe used to be...) the way to get the package for mac users and there shouldn't be a reason for users to user alternative sources like r-universe or source build but:

On the other, GCS increases build time there and increases the BDR attack surface.

While I would have disagreed previously reality has beaten me down and eh... ok.

@jonkeane
Copy link
Member Author

@github-actions crossbow submit -g r

Copy link

Revision: b23878a

Submitted crossbow builds: ursacomputing/crossbow @ actions-f75e6668f4

Task Status
r-binary-packages GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-clang-sanitizer GitHub Actions
test-r-depsource-bundled Azure
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-minimal-build Azure
test-r-offline-maximal GitHub Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-release-latest Azure
test-r-rocker-r-ver-latest Azure
test-r-rstudio-r-base-4.1-opensuse155 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions
test-ubuntu-r-sanitizer GitHub Actions

@jonkeane
Copy link
Member Author

This is probably paranoid, but I'm going to run one last mc builder run before I merge.

Comment on lines +830 to +834
env_var_list_value <- env_var_list[[env_varname]]
if (is.null(env_var_list_value)) {
env_var_list_value <- ""
}
env_value <- tolower(Sys.getenv(env_varname, env_var_list_value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equivalent?

Suggested change
env_var_list_value <- env_var_list[[env_varname]]
if (is.null(env_var_list_value)) {
env_var_list_value <- ""
}
env_value <- tolower(Sys.getenv(env_varname, env_var_list_value))
env_value <- tolower(Sys.getenv(env_varname, env_var_list[[env_varname]]))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I feel slightly better having thought the exact same thing. But turns out: no! env_var_list[[env_varname]] is NULL if it doesn't exist (which is what we want) but:

Sys.getenv("env_varname", NULL)
Error in Sys.getenv("env_varname", NULL) : wrong type for argument

Though I only found that out when I pushed (+ got a little too eager and triggered all of crossbow r) and then saw lots of red.

This would be a perfect place for %||% or the native equivalent, but I didn't want to require rlang here (I mean we require it for the package, so maybe that's ok?)

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 21, 2024
@jonkeane
Copy link
Member Author

I'm glad I submitted to mac builder again, cause I noticed there that we're doing something wrong with linking snappy: ... -lparquet -larrow_acero -larrow -larrow_bundled_dependencies /opt/homebrew/lib/libsnappy.1.1.9.dylib ...

Which then means when you try to load without having snappy from homebrew it fails. I will look into why this is and push it to this branch.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 21, 2024
@@ -65,6 +65,7 @@ esac
mkdir -p "${BUILD_DIR}"
pushd "${BUILD_DIR}"
${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \
-DARROW_SNAPPY_USE_SHARED=OFF \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents shared (homebrew) snappy from being used on the builder. Are there other dependencies we should add here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARROW_DEPENDENCY_USE_SHARED should be the default for all the more specific versions, so just setting that should work?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 21, 2024
@jonkeane jonkeane merged commit 67bbf84 into apache:main Jun 23, 2024
12 checks passed
@jonkeane jonkeane removed the awaiting changes Awaiting changes label Jun 23, 2024
@jonkeane jonkeane deleted the macos_default_s3 branch June 23, 2024 21:03
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 67bbf84.

There were 4 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 88 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants