Skip to content

Latest commit

 

History

History
33 lines (28 loc) · 2.76 KB

etl.md

File metadata and controls

33 lines (28 loc) · 2.76 KB

Running the ETL Script

8/23/24

The following instructions describe the process of running the ETL scripts. The last step includes CSV export(s) to data-app, to facilitate creation of apps and data visualizations for those who don't want to interact with our Postgres database.

Initial Setup

Prior to running these scripts, you'll need to:

Developing the pipeline.

See ETL and Web App Development procedures for more guidance:

  1. Switch to the appropriate branch (feature/*, develop, staging, production)
  2. Configure setting_env in functions_database.R to specify whether you're developing the staging or ETL pipeline

Running the pipeline

  1. Configure setting_env in functions_database.R to specify whether you're developing the staging or ETL pipeline
  2. run scripts/etl_main.R, which calls the following scripts in sequence:
script description
functions_database.R scripts to connect to Postgres, write tables, and test inputs
01_request_api_legiscan.R requests data from LegiScan via API
02a_raw_parse_legiscan.R parses LegiScan JSON data
02b_raw_read_csvs.R reads csv files including user-entered data and exported Dave's Redistricting data
02z_raw_load.R saves all acquired data into Postgres as the raw layer
03a_process.R organizes and adds calculations to parsed and user-entered data
03z_process_load.R writes organized data frames (processed layer) to Postgres
04a_app_settings.R creates views based on settings
04b_app_prep.R prepares and filters data for web apps
04z_app_load.R writes app data to Postgres, and exports data to CSV
qa_checks.R Reviews raw and processed data frames for missing records and other anomalies