Running the ETL Script

8/23/24

The following instructions describe the process of running the ETL scripts. The last step includes CSV export(s) to data-app, to facilitate creation of apps and data visualizations for those who don't want to interact with our Postgres database.

Initial Setup

Prior to running these scripts, you'll need to:

acquire a free single-state API key from LegiScan
set up two local Docker containers and Postgres databases- one for staging and one for production. See also ETL and Web App Development procedures (MORE GUIDANCE NEEDED HERE)
copy config.template.yml into config.yml, and store your API key and Postgres database passwords in this local file

Developing the pipeline.

See ETL and Web App Development procedures for more guidance:

Switch to the appropriate branch (feature/*, develop, staging, production)
Configure setting_env in functions_database.R to specify whether you're developing the staging or ETL pipeline

Running the pipeline

Configure setting_env in functions_database.R to specify whether you're developing the staging or ETL pipeline
run scripts/etl_main.R, which calls the following scripts in sequence:

script	description
functions_database.R	scripts to connect to Postgres, write tables, and test inputs
01_request_api_legiscan.R	requests data from LegiScan via API
02a_raw_parse_legiscan.R	parses LegiScan JSON data
02b_raw_read_csvs.R	reads csv files including user-entered data and exported Dave's Redistricting data
02z_raw_load.R	saves all acquired data into Postgres as the raw layer
03a_process.R	organizes and adds calculations to parsed and user-entered data
03z_process_load.R	writes organized data frames (processed layer) to Postgres
04a_app_settings.R	creates views based on settings
04b_app_prep.R	prepares and filters data for web apps
04z_app_load.R	writes app data to Postgres, and exports data to CSV
qa_checks.R	Reviews raw and processed data frames for missing records and other anomalies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etl.md

etl.md

Running the ETL Script

Initial Setup

Developing the pipeline.

Running the pipeline

Files

etl.md

Latest commit

History

etl.md

File metadata and controls

Running the ETL Script

Initial Setup

Developing the pipeline.

Running the pipeline