-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41952: [R] Turn S3 and ZSTD on by default for macOS #42210
Conversation
|
@@ -536,7 +536,7 @@ build_libarrow <- function(src_dir, dst_dir) { | |||
} | |||
cleanup(build_dir) | |||
|
|||
env_var_list <- c( | |||
env_var_list <- list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't technically necessary, but it was the easiest way to do what I wanted to in is_feature_requested()
. Further, it's a bit strange that this thing called X_X_list
is not a list.
r/tools/nixlibs.R
Outdated
is_feature_requested <- function(env_varname, env_var_list, default = env_is("LIBARROW_MINIMAL", "false")) { | ||
# look in our env_var_list first, if it's not found there go to | ||
# the actual environment | ||
env_value <- tolower(env_var_list[[env_varname]]) | ||
if (is.null(env_value)) { | ||
env_value <- tolower(Sys.getenv(env_varname)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change in behavior that warrants some thought. Though I will say this is how I thought this worked already — or rather I thought is_feature_requested()
was looking at env_var_list
and not only looking at the literal environment. It's possible this was a mistake / it should have never looked at the environment but instead be looking at the env_var_list
. But I kept it for now in case this is actually intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intent of the function was to check whether the user requested a feature. env_var_list
doesn't contain any user-requested features, at least not in the active sense. But now that you're putting some default selections in there, it makes sense to check it here. The function is currently only used to check for S3/GCS, to know whether we need to have openssl and curl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AAAAH I was missing the user-
part there, that makes it make more sense now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See, this is why passive voice is bad.
I also noticed two other optional libraries that were being flagged as not a full build. I added those, and ran mac builder: https://mac.r-project.org/macbuilder/results/1718828010-3fa0fa8b00b3b11c/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the least bad option for turning these features on. The other places where we handle defaults, like by the value of LIBARROW_MINIMAL
, aren't good for S3/GCS because we have to check up front for openssl/curl.
r/tools/nixlibs.R
Outdated
is_feature_requested <- function(env_varname, env_var_list, default = env_is("LIBARROW_MINIMAL", "false")) { | ||
# look in our env_var_list first, if it's not found there go to | ||
# the actual environment | ||
env_value <- tolower(env_var_list[[env_varname]]) | ||
if (is.null(env_value)) { | ||
env_value <- tolower(Sys.getenv(env_varname)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intent of the function was to check whether the user requested a feature. env_var_list
doesn't contain any user-requested features, at least not in the active sense. But now that you're putting some default selections in there, it makes sense to check it here. The function is currently only used to check for S3/GCS, to know whether we need to have openssl and curl.
Co-authored-by: Neal Richardson <[email protected]>
I agree + though similar when I checked if S3 alone was sufficient for quieting the message. I'm happy to pull GCS out of this, though IMO we should also add GCS to the block list so that it's not being checked if we do that too. I'm happy to do this in this PR if we want. |
Right, we would want to change that message. On the one hand, users won't generally experience this because they'll get a binary from CRAN. On the other, GCS increases build time there and increases the BDR attack surface. I don't know how big that risk is and if demand for GCS outweighs it--I haven't noticed demand for GCS, but maybe I'm not looking in the right places. I guess we could leave it on, and if we get dinged on CRAN, we know how to turn it back off? |
MM, I think you've convinced me that we should pull GCS back for now (and see if someone requests it before we open up our attack surface more). At the very least, it would be nice to send something with a more minimal expansion at first in case the CRAN macOS builders are different in key ways, we'll know that it's just an issue with S3 or zstd now. And maybe add GCS in 18 (or only when someone asks for it in a CRAN release). Like you said, it's easy enough to see how to change these now. |
@github-actions crossbow submit -g r |
Revision: 548b6b2 Submitted crossbow builds: ursacomputing/crossbow @ actions-e5cdc79a11 |
IMO the CRAN builds should be as feature complete as possible as they are (well after the issue with the last two released binaries maybe used to be...) the way to get the package for mac users and there shouldn't be a reason for users to user alternative sources like r-universe or source build but:
While I would have disagreed previously reality has beaten me down and eh... ok. |
@github-actions crossbow submit -g r |
Revision: b23878a Submitted crossbow builds: ursacomputing/crossbow @ actions-f75e6668f4 |
This is probably paranoid, but I'm going to run one last mc builder run before I merge. |
env_var_list_value <- env_var_list[[env_varname]] | ||
if (is.null(env_var_list_value)) { | ||
env_var_list_value <- "" | ||
} | ||
env_value <- tolower(Sys.getenv(env_varname, env_var_list_value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this equivalent?
env_var_list_value <- env_var_list[[env_varname]] | |
if (is.null(env_var_list_value)) { | |
env_var_list_value <- "" | |
} | |
env_value <- tolower(Sys.getenv(env_varname, env_var_list_value)) | |
env_value <- tolower(Sys.getenv(env_varname, env_var_list[[env_varname]])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I feel slightly better having thought the exact same thing. But turns out: no! env_var_list[[env_varname]]
is NULL
if it doesn't exist (which is what we want) but:
Sys.getenv("env_varname", NULL)
Error in Sys.getenv("env_varname", NULL) : wrong type for argument
Though I only found that out when I pushed (+ got a little too eager and triggered all of crossbow r) and then saw lots of red.
This would be a perfect place for %||%
or the native equivalent, but I didn't want to require rlang here (I mean we require it for the package, so maybe that's ok?)
I'm glad I submitted to mac builder again, cause I noticed there that we're doing something wrong with linking snappy: Which then means when you try to load without having snappy from homebrew it fails. I will look into why this is and push it to this branch. |
@@ -65,6 +65,7 @@ esac | |||
mkdir -p "${BUILD_DIR}" | |||
pushd "${BUILD_DIR}" | |||
${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \ | |||
-DARROW_SNAPPY_USE_SHARED=OFF \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prevents shared (homebrew) snappy from being used on the builder. Are there other dependencies we should add here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARROW_DEPENDENCY_USE_SHARED
should be the default for all the more specific versions, so just setting that should work?
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 67bbf84. There were 4 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 88 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Changeup
nixlibs.R
so that we enable S3 and ZSTD by default on CRAN. I've checked this against the CRAN macbuilder to confirm it does work (that's not a guarantee it'll work on other CRAN maintained macOS builders, but a good indication).It also removes gcs from the list of features we expect to be on and warn folks about if it is not.
Rationale for this change
So that our builds are more fully featured.
What changes are included in this PR?
Enable
ARROW_S3
by default when building form source on macOS.Are these changes tested?
Existing CI should not fail.
Are there any user-facing changes?
Getting Arrow from the canonical repository would be more fully featured.