Project 1: Predict Churn for Bank Customers
Python code available on my GitHub
Techniques I practiced for this industrial project
- Load and read data using Google colab
- Data exploration: check general info of our dataset; understand the feature such as missing values and hot-encode categorical features.
- Feature processing such as drop useless features, identify column types split, split dataset, encoding, standardize and normalize data.
- Model training and result evaluation: model training using random forest, logistic regression, K nearest neighbors; use grid search to find optimal hyperparameters; build confusion matrix to calculate precision, recall, accuracy, ROC, AUC
- Feature important discussion: random forest feature extraction