Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bsweger/skip data checks option #47

Merged

Commits on Jul 25, 2024

  1. Add skip_checks parameter to hub connection functions

    This changeset adds an optional skip_checks parameter to
    connect_hub.R and connect_model_output.R per the requirements
    outlined in hubverse-org#37.
    
    When working with hub data on a local filesystem, the behavior
    is unchanged. When working with hub data in an S3 bucket, the
    connect functions will now skip data checks by default to
    improve performance. The former connection behavior for
    S3-based hubs can obtained by explicitly setting
    skip_checks=FALSE.
    
    This comment fixes the test suite to work when using
    skip_checks=FALSE to force the previous behavior. The
    next commit will add new tests to ensure the new behavior
    works as intended.
    bsweger committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    2795333 View commit details
    Browse the repository at this point in the history
  2. Test S3-based hubs with skip_checks = TRUE

    This changeset updates the test suite to test the behavior
    of skip_checks = TRUE (which is the default for S3-based hubs).
    However, the code as written will not work when there multiple file
    types (e.g., csv and parquet), because it performs an Arrow
    open_dataset for each file type. That doesn't work when
    exclude_invalid_files is FALSE because open_dataset will then
    grab every file every time it is run.
    bsweger committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    255b505 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6ae859d View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2d11380 View commit details
    Browse the repository at this point in the history
  5. Disallow skip_checks = TRUE when hub has multiple file formats

    Because connect_hub and connect_model_output rely on the use of
    "exclude_invalid_files=TRUE" when making multiple passes of
    arrow::open_dataset (one for each file format), we cannot allow
    skip_checks=TRUE for hubs that contain more than one model-output
    format. Otherwise, open_dataset would grab all the files every
    time and cause errors when a user tries to run queries against
    the resulting arrow table.
    bsweger committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    090f78c View commit details
    Browse the repository at this point in the history
  6. Documentation updates

    bsweger committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    0bee551 View commit details
    Browse the repository at this point in the history
  7. Appease the linter

    bsweger committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    59d8bc6 View commit details
    Browse the repository at this point in the history
  8. Updates from code review

    bsweger committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    3dbe9e9 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    b6ecaae View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    250301f View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    762a001 View commit details
    Browse the repository at this point in the history