=======
This repository contains a collection of production level Databricks notebooks that I have developed as part of various data analysis work. These notebooks are provided as sample code to showcase my coding skills, problem-solving abilities, and familiarity with Databricks and related technologies. Some data has been removed.
Each notebook is documented with comments explaining the purpose of the code and the approach taken. Here is a brief overview of what each notebook demonstrates:
- Purpose: This notebook outlines the process for collecting and processing data from MailChimp for an interactive dashboard for analysis.
- Technologies used: SQL, PySpark, Databricks, MailChimp API
- Key Concepts: Data collection, API integration, data transformation.
- Purpose: This notebook demonstrates how to stream data from an S3 bucket, process the data in real-time, and load it into a Delta table for subsequent dashboard analysis.
- Technologies used: SQL, PySpark, Databricks Delta Lake, AWS S3
- Key Concepts: Real-time data streaming, Delta Lake integration, data visualization for client engagement analysis.
- Purpose: This system automates the backup of data queries from Databricks to Amazon S3 and provides functionalities for reliable data recovery. It ensures data safety and accessibility for analytics and operational continuity.
- Technologies used: Python, Pandas, Databricks API, Amazon S3, boto3
- Key Concepts: Automated data backups, cloud-based storage solutions, data integrity and recovery, secure and scalable data handling.
These notebooks are intended for display purposes only and may require specific environment configurations to run successfully. They are not configured to run in environments outside of the original Databricks platform where they were developed.
While this repository is primarily for showcasing purposes, feedback and suggestions are welcome.
This project is under the MIT License.
If you have any questions about the notebooks or would like to contact me regarding job opportunities or collaborations, please email me at [[email protected]].