Project Objective:
Develop big data recommender system using Spark ML on AWS EMR cluster. Repo consist of scripts to run Spark ML.
Steps and process:
- First notebook sandbox codes tested on Google colab.
- PY file was converted from notebook which was used on AWS EMR. Script was part of the Data Pipeline that would automatically train 1.5M data of beer reviews and generate Top 10 recommendations for each users.
- Data was to be pushed to AWS S3 for storage which also triggers Lambda to automatically save the information in DynamoDB.
- DynamoDB was used to for data retrieval directly from web via Gateway.