-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bsweger/skip data checks option #47
Merged
annakrystalli
merged 11 commits into
hubverse-org:main
from
bsweger:bsweger/skip_data_checks_option
Jul 26, 2024
Merged
Bsweger/skip data checks option #47
annakrystalli
merged 11 commits into
hubverse-org:main
from
bsweger:bsweger/skip_data_checks_option
Jul 26, 2024
Commits on Jul 25, 2024
-
Add skip_checks parameter to hub connection functions
This changeset adds an optional skip_checks parameter to connect_hub.R and connect_model_output.R per the requirements outlined in hubverse-org#37. When working with hub data on a local filesystem, the behavior is unchanged. When working with hub data in an S3 bucket, the connect functions will now skip data checks by default to improve performance. The former connection behavior for S3-based hubs can obtained by explicitly setting skip_checks=FALSE. This comment fixes the test suite to work when using skip_checks=FALSE to force the previous behavior. The next commit will add new tests to ensure the new behavior works as intended.
Configuration menu - View commit details
-
Copy full SHA for 2795333 - Browse repository at this point
Copy the full SHA 2795333View commit details -
Test S3-based hubs with skip_checks = TRUE
This changeset updates the test suite to test the behavior of skip_checks = TRUE (which is the default for S3-based hubs). However, the code as written will not work when there multiple file types (e.g., csv and parquet), because it performs an Arrow open_dataset for each file type. That doesn't work when exclude_invalid_files is FALSE because open_dataset will then grab every file every time it is run.
Configuration menu - View commit details
-
Copy full SHA for 255b505 - Browse repository at this point
Copy the full SHA 255b505View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6ae859d - Browse repository at this point
Copy the full SHA 6ae859dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d11380 - Browse repository at this point
Copy the full SHA 2d11380View commit details -
Disallow skip_checks = TRUE when hub has multiple file formats
Because connect_hub and connect_model_output rely on the use of "exclude_invalid_files=TRUE" when making multiple passes of arrow::open_dataset (one for each file format), we cannot allow skip_checks=TRUE for hubs that contain more than one model-output format. Otherwise, open_dataset would grab all the files every time and cause errors when a user tries to run queries against the resulting arrow table.
Configuration menu - View commit details
-
Copy full SHA for 090f78c - Browse repository at this point
Copy the full SHA 090f78cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0bee551 - Browse repository at this point
Copy the full SHA 0bee551View commit details -
Configuration menu - View commit details
-
Copy full SHA for 59d8bc6 - Browse repository at this point
Copy the full SHA 59d8bc6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3dbe9e9 - Browse repository at this point
Copy the full SHA 3dbe9e9View commit details -
Configuration menu - View commit details
-
Copy full SHA for b6ecaae - Browse repository at this point
Copy the full SHA b6ecaaeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 250301f - Browse repository at this point
Copy the full SHA 250301fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 762a001 - Browse repository at this point
Copy the full SHA 762a001View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.