This project is started with data preprocessing, the dataset we have is of the banking industry of Nepal. The dataset was cleaned and processed before labeling the data as defaulter loans. After the preprocessing and cleaning process, it’s important to understand how the dataset is distributed and which of the features are important for the process. Most of the errors are made within these phases, the accuracy of the model depends on how well you understand the data. A most important part of data preprocessing is to clean and fill datasets. In the filling process, we need to make sure the dataset is filled with correct value or object, I would suggest using describe function and correlation function before filling the data so that you have a proper understanding of the dataset. The second important part before starting the modeling is visualizing data. In data visualization, you need to visualize data from each feature point. I have not used high-level visualization here, it’s because I had a proper understanding of the dataset. Some of the visualizations is shown below.
The figure here shows how is the distribution of the loans. We can see around 30% of the loans are bad loans or default loans. Now let’s visualize how the features are correlated with each other,
Some of the features are correlated whereas most of the features are less correlated with the defaulter loans. The higher the correlation more likely that features will be more important to the classifier. We also visualized how interest rate determines the defaulter loans.
On visualizing the data we can observe that loans issued under 12% to 14% interest rate have a higher default rate. Around 99.13% of loans have defaulted within this range.
The highest default rate is between the range of loans issued 20 lakhs to 50 lakhs followed by a range of 50 lakhs to 1 crore.
Let’s dive into the machine learning code used here, we started with splitting the dataset between training dataset and test dataset with
from sklearn.model_selection import train_test_split
and used standard scaler to normalize the datasets, point to be noted here we have a lot of object features in data set to deal with those one-hot encoding is used. Dataset is split as 70% training data and 30% as test data.
from sklearn.preprocessing import StandardScaler
After passing through the normal coding and splitting, I started with classifier Logistic Regression.
from sklearn.linear_model import LogisticRegression
The output of this classifier is
This means this supervised machine learning classifier can classify good loans and bad loans with 88.37% accuracy on the unseen datasets. This is a good result considering a very low featured dataset.
Now the next thing in the line is the Random Forest classifier, let's jump into the result of the classifier.
The accuracy is very high in the case of random forest which is excellent for this dataset but the random forest has some limitations to be used on its own.
Now let’s train some deep learning module to see how our training data will behave on neural networks. Tensor flow is used in backend with Keras.
import tensorflow.keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2
from tensorflow.keras.layers import Dropout
A sequential neural network is designed with dense layers and the dropout used is 30% and the activation function used is relu || sigmoid function. The details of the module used are
We used four hidden layers with 639,450 parameters and batch sized I used is very low i,e, 10, and total epochs are only 100, and the optimizer I prefer is adam.
I trained this module in my cloud GPU which took around 2-3 minutes to train and the result is very optimistic considering the short time.
The validation accuracy I got was 87. 13 % and Test loss was around 36.19 % which is still high, I will add some more layers and will change the activation function to get better results.
Now, let’s visualize how our training accuracy and test loss performed over the epochs.
I want to share a testimony on how Mr Pedro loan offer helped my life,
ReplyDeleteIt isn't a good idea to use a payday loan on a regular basis. In the event that you endlessly prolong your repayment date and borrow often towards your subsequent paycheck, it could run you a good amount of money. However, it is just as sensible to decide on payday loans as they can be swiftly approved exactly the same day you put in your loan application form. You can contact Mr Pedro loan offer because my payday loan was very fast to proceed, email pedroloanss@gmail.com to request any kind of loan. Whatsapp +18632310632