Spotify ETL Project Using AWS Serverless Architecture

Introduction

This Project simply demonstrates how to build a robust automated and scalable data pipeline for extracting, transforming and loading Spotify data using AWS Serverless Data architecture.

Data was extracted from Spotify API and stored in S3 bucket. The extraction process was automated using Cloudwatch Events and scheduled to run every 24 hours. After extraction, the data was transformed and stored back in S3 with a S3 event trigger activated to automate the cleaning process as well.

In order to allow the analytics team consume the data in a structured format, AWS Glue was used to crawl the data from S3 and saved it in a table format inside the spotify database in AWS Glue Data Catalogue.

Finally, AWS Athena was deployed to query the data and run analytics so as to derive useful and actionable insights to aid Data Driven decisions.

AWS Services and Development Environment

Jupyter Notebook
AWS Lambda
S3
CloudWatch
AWS Glue
AWS Athena

Libraries

pip install numpy as py
pip install pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

Architecture Diagram

Work Flow

The first thing i did was to develop the code in Jupyter notebook where all the data needed was extracted from the spotify API and transformed accordingly to meet business requirements and ready to be deployed to AWS Lambda.

After developing the code in the dev environment and everything works fine, i then deployed the extraction code in AWS Lambda and a Cloudwatch Event trigger was activated to schedule the extrcation of data every 24 hours.
The Data extracted was then stored in S3 bucket so that transformation can be performed on it as well to meet business requirement.

Next, i deployed the transformation code to lambda, which takes the raw data from s3 bucket and transforms it before storing it in another folder in the same S3 bucket.All of this process was also automated by adding an S3 event trigger to the lambda function.

Now the data is clean and ready for the analytics team to consume. For seamless running of Analytics on the data inside our S3 bucket, i deployed a Glue crawler which infers the schema from the data in the S3 and automatically loads the data to Spotify database in Glue Catalogue.

Finally, AWS Athena was deployed to run analytics using SQL queries on the data

Conclusion

This Project was able to demonstrate how we can build a robust, automated, scalable end to end data pipeline using AWS Serverless Data Architecture.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Spotify Images		Spotify Images
script		script
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify ETL Project Using AWS Serverless Architecture

Introduction

AWS Services and Development Environment

Libraries

Architecture Diagram

Work Flow

Conclusion

About

Releases

Packages

Languages

jaykay04/Spotify-ETL-Project-Using-AWS-Serverless-Architecture

Folders and files

Latest commit

History

Repository files navigation

Spotify ETL Project Using AWS Serverless Architecture

Introduction

AWS Services and Development Environment

Libraries

Architecture Diagram

Work Flow

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages