This is a Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence) which uses a diamond dataset from Kaggle.
School of Computer Science and Engineering Nanyang Technological University Lab: A136 Group: 8
Name | GitHub Acount | Main Contributions |
---|---|---|
Juan Frederick | @dreonic | Data Resampling, Machine Learning, Hyperparameter Tuning |
Hilarius Jeremy | @HilariusJeremy | Data Visualisation, Problem Definition, Data Driven Insight |
Poh Jing Ting, Felicia | @feliciapjt | Exploratory Data Analysis, Machine Learning, Problem Definition |
- Building an accurate model for estimation and prediction diamond price
- Determine related factors to diamond price to suggest further action from buyers and sellers.
- Identifying Data and Data Cleaning
- Exploratory Data Analysis
- Machine Learning
- Hyperparametric Tuning and Optimization
- Data Driven Insights
- Linear Regression > A simple model to predict the dependent variable based on the independent variable by fitting a line
- Random Forest > A classification algorithm that involves many decision trees, using bagging and feature randomness to build each tree
- XGBoost > An implmentation of gradient-boosting decision trees. It uses a gradient descent algorithm to minimize the loss function and it includes regularization techniques to avoid overfitting.
- Price of diamonds can reliably be predicted using their attributes
- XGBoost is the best model we found to predict price
- Main variables that affect price: carat weight, cut, lab certificate
- Color, culet size, and cut quality is irrelevant to predict diamond price
- How to handle missing values using imputation
- Types of visualizations to better understand relationship between variables
- Encoding numerical variables to get better results
- Handling random forest and XGBoost models along with their parameters
- Hyperparametric tuning and how to choose the parameters
- How to use GitHub and markdown syntax
- 123dartist. (n.d.). Luxury Shinning Diamond Balanced on Rippled Water Surface with Reflected Sky Background, Diamond Wallpaper, [Image]. Adobe Stock. https://stock.adobe.com/sg/images/luxury-shinning-diamond-balanced-on-rippled-water-surface-with-reflected-sky-background-diamond-wallpaper/452760672?prev_url=detail
- How to Deal with Missing Data. (n.d.). Master’s in Data Science. https://www.mastersindatascience.org/learning/how-to-deal-with-missing-data/
- Upgrade Your Diamond. (n.d.). Long Jewelers. https://www.longjewelers.net/Upgrade-Your-Diamond/3001108/EN
- HROKR. (2023). The largest diamond dataset currently on Kaggle. Kaggle. https://www.kaggle.com/code/hrokrin/kaggle-s-largest-dataset-of-diamonds-eda
- Kulkarni, M. (n.d.). Machine Learning Notebook. GitHub. https://github.com/maykulkarni/Machine-Learning-Notebooks/tree/master/02.%20Regression
- Yiu, T. (2019). Understanding Random Forest. Towards Data Science. https://towardsdatascience.com/understanding-random-forest-58381e0602d2
- What is linear regression? (n.d.). IBM. https://www.ibm.com/topics/linear-regression#:~:text=Resources-,What%20is%20linear%20regression%3F,is%20called%20the%20independent%20variable.
- Brownlee, J. (2016). A Gentle Introduction to XGBoost for Applied Machine Learning. Machine Learning Mastery. https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/