Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding parallel, filtering by county, fixing CI, and R CMD check fixes #50

Merged
merged 11 commits into from
May 10, 2022

Conversation

1beb
Copy link
Collaborator

@1beb 1beb commented May 6, 2022

  1. Added parallel processing census_geo_api to expedite the process of connecting with the API via the get_census_api function. The implementation is setup such that there will be a county-per-thread model. Ostensibly, a user would specify their own plan, for example: library(future); plan(multisession). This has significantly decreased the total time to run the process. Currently uses furrr::future_map_dfr with a progress indicator. Reduces the processing time for a state from more than a day to less than an hour.
  2. Converted print to message so that it follows stdin/out convention.
  3. Setting default retries to 3. This avoids issues where minor internet instability requires a process restart.
  4. Additional log messaging to identify the start of a long process.
  5. Updated description to alphabetize imports, add future, furr and purr.
  6. Updated roxygen docs.
  7. Adding wru.png to .Rbuildignore to relagate an R CMD check note.
  8. Integrated changes with hwru branch
  9. Implemented piggyback for last_c, mid_c, and first_c files, temporarily using private repo "solivella/wruData"
  10. Adding "use_counties = TRUE" flag such that census data is captured only for those counties that appear in voter.file data.
  11. Added functionality for reading CENSUS_API_KEY from env var, to align with tidycensus. Updated readme with instructions.
  12. Fixes some issues with undocumented or missing arguments for a number of functions
  13. Fixes tests
  14. Fixes CI

@1beb
Copy link
Collaborator Author

1beb commented May 7, 2022

Notes:

  • In sample data (data(voters)) when running by block with surname = F
  • In new branch, data(voters) includes a duplicate column named "last" where "surname" already exists. "first" was also added.
  • When using the new use_counties flag, there is an unexpected variation in results which I noted in slack.
  • 12x speed up using use_counties flag.
  • Tests are passing locally (with CENSUS_API_KEY in .Rprofile). However, two probabilities have changed and need to be verified.

@1beb 1beb requested a review from solivella May 7, 2022 15:55
@1beb
Copy link
Collaborator Author

1beb commented May 7, 2022

@solivella ready for review.

@1beb 1beb changed the title WIP: Updates for wru Adding parallel, filtering by county, fixing CI, and R CMD check fixes May 7, 2022
Copy link
Collaborator

@solivella solivella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thank you, @1beb!

@solivella solivella merged commit 0b4c4d0 into kosukeimai:hwru May 10, 2022
.hasData()

## Preliminary Data quality checks
wru_data_preflight()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@solivella here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants