In this project I will show an example of how determining a probability threshold from an ROC curve can be useful for unbalanced datasets in binary classification problems. I'll be using Logistic Regression to demonstrate this. I'm aware there are many combinations of methods used to deal with an unbalanced dataset, but the sole purpose of this notebook is show the pros and cons of calculating an ROC curve.
dataset: https://www.kaggle.com/datasets/yashpaloswal/fraud-detection-credit-card