About

This is a compilation of my data pipeline scripts written in Python.

Conventions

Each pipeline function is an executable python file that accepts flags to modify the specific configurations of the pipeline (i.e. MSSQL DB Name, GCS Bucket Name).

When loading data from a third-party source you can set the temporary destination of the data to the data/ folder. After the data has been successfully ingested remove the file from the data folder.

Folder Structure

config/
- contains any specific configurations that need to be modified within the Docker container
data/
- a temporary landing zone for any data that is ingested from a third-party source
functions/
- contains all pipeline functions
functions/utils/
- contains all reusable code and can be organized futher as either a Source (where data is pulled from), or a Target (where data is placed)

Setup

Create and activate a python virtual environment.

python3 -m venv venv

source /venv/bin/activate

Install python dependencies

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
data		data
functions		functions
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Conventions

Folder Structure

Setup

About

Releases

Packages

Languages

License

mighabana/pypeline-functions

Folders and files

Latest commit

History

Repository files navigation

About

Conventions

Folder Structure

Setup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages