This repository is for collaboration of our big data project within the "Big Data" course. We used Spark to analyse spotify top100 dataset. The questions we are trying to answer are as below:
- Which artist has the most top-rankings?
- Who is the most popular artist in the respective regions?
- Which song stays longest in the top-ranking?
- Which song is on the top 50 list but never on the top 10?
- Which song has the highest streams in the last two years?
- How long time does a top ranking song takes to get to other countries?
- In which region do the top 10 change the most?
- For which artists is the variance of streams (per day) the lowest?