Skip to content

MugemaneBertin2001/LSEP-coding-challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LSEP-coding-challenge

Overview

This project is 2 tier application namely; an ETL (Extract, Transform, Load) pipeline designed to process and load tweet data into a PostgreSQL database. The pipeline includes modules for extracting data from a JSON file, transforming it, and loading it into the database. Additionally, there is a Flask-based web tier to interact with the data.

Project Structure

.
├── etl-pipeline
│   ├── config.py
│   ├── db_utils.py
│   ├── etl_extract.py
│   ├── etl_transform.py
│   ├── etl_load.py
│   └── app.py
├── web-tier
│   ├── app
│   │   ├── __init__.py
│   │   ├── config.py
│   │   ├── routes.py
│   │   ├── .gitignore
│   ├── run.py
├── .gitignore
├── README.md
└── requirements.txt

Prerequisites

  • Python 3.8+
  • PostgreSQL
  • Virtual Environment (optional but recommended)

Setup

Step 1: Clone the Repository

git clone https://github.com/mugemanebertin2001/LSEP-coding-challenge.git
cd LSEP-coding-challenge

Step 2: Set Up Virtual Environment

On Windows

python -m venv .venv
.venv\Scripts\activate

On macOS/Linux

python3 -m venv .venv
source .venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Configure Database

Ensure your PostgreSQL server is running and accessible. Modify etl-pipeline/config.py and web-tier/config.pywith your database credentials. For example:

# config.py
DB_CONFIG = {
    'host': 'localhost',
    'port': '5432',
    'dbname': 'yourdbname',
    'user': 'yourdbuser',
    'password': 'yourdbpassword'
}

Step 5: Run the ETL Pipeline

python etl-pipeline/app.py

Step 6: Run the Web Application

cd web-tier
python app.py

Accessing the Web Application

The web application will be accessible at http://127.0.0.1:5000/.

API Endpoints

initial route

GET /

welcome message

Trigger etl to load data

GET /run_etl

This end point will return success confirmation message after loading data into db.

Query Tweets

GET /q2?user_id=<user_id>&type=<type>&phrase=<phrase>&hashtag=<hashtag>

Queries tweets based on user_id, type, phrase, and hashtag. Any of these parameters can be omitted.

Additional Notes

  • Ensure PostgreSQL is properly installed and running on your machine.

  • Modify the JSON file path in etl-pipeline/app.py if necessary:

    file_path = os.path.join('D:', 'query2_ref.json')  # Modify as needed

    Sample screen shot for testing endpoints

    Screenshot 2024-07-23 015758

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any inquiries or support, please contact [email protected].

Releases

No releases published

Packages

No packages published

Languages