PREDICTING NEPALESE BANK LOAN DEFAULT RATE POST PANDEMIC USING A.I.

Introduction

End to End learning of features in the dataset has led to advances in prediction and classification of data which outperform expert humans at gameplay. AI is used in every industry for different purposes, this paper is focused on the Banking Industry of Nepal on assisting the credit department for defaulter prediction which is certain to rise after the pandemic. I made an AI module to predict loans, which may get default in the future with the use of high level deep learning neural networks and classifiers. There has been extensive research inspired by the financial world and AI engineers for bridging the gap. Many more implications of AI on business will be witnessed by us within a few years.

 Goal:

To Predict loans which might get default in Future.

Requirement:

1.     Python 3 or higher
2.     Jupyter NoteBook or GPU processer
3.     Intermediate Knowledge of Python Libraries
4.     Working Knowledge of Machine Learning Libraries
5.     Working Knowledge Deep Learning Libraries
6.     Labeled Dataset

 Process:

This project is started with data preprocessing, the dataset we have is of the banking industry of Nepal. The dataset was cleaned and processed before labeling the data as defaulter loans. After the preprocessing and cleaning process, it’s important to understand how the dataset is distributed and which of the features are important for the process. Most of the errors are made within these phases, the accuracy of the model depends on how well you understand the data. A most important part of data preprocessing is to clean and fill datasets. In the filling process, we need to make sure the dataset is filled with correct value or object, I would suggest using describe function and correlation function before filling the data so that you have a proper understanding of the dataset. The second important part before starting the modeling is visualizing data. In data visualization, you need to visualize data from each feature point. I have not used high-level visualization here, it’s because I had a proper understanding of the dataset. Some of the visualizations is shown below.


The figure here shows how is the distribution of the loans. We can see around 30% of the loans are bad loans or default loans. Now let’s visualize how the features are correlated with each other,

Some of the features are correlated whereas most of the features are less correlated with the defaulter loans. The higher the correlation more likely that features will be more important to the classifier. We also visualized how interest rate determines the defaulter loans.


On visualizing the data we can observe that loans issued under 12% to 14% interest rate have a higher default rate. Around 99.13% of loans have defaulted within this range.


The highest default rate is between the range of loans issued 20 lakhs to 50 lakhs followed by a range of 50 lakhs to 1 crore.

Let’s dive into the machine learning code used here, we started with splitting the dataset between training dataset and test dataset with

from sklearn.model_selection import train_test_split

and used standard scaler to normalize the datasets, point to be noted here we have a lot of object features in data set to deal with those one-hot encoding is used. Dataset is split as 70% training data and 30% as test data.

from sklearn.preprocessing import StandardScaler

After passing through the normal coding and splitting, I started with classifier Logistic Regression.
from sklearn.linear_model import LogisticRegression

The output of this classifier is



This means this supervised machine learning classifier can classify good loans and bad loans with 88.37% accuracy on the unseen datasets. This is a good result considering a very low featured dataset.

Now the next thing in the line is the Random Forest classifier, let's jump into the result of the classifier.


The accuracy is very high in the case of random forest which is excellent for this dataset but the random forest has some limitations to be used on its own.

Now let’s train some deep learning module to see how our training data will behave on neural networks. Tensor flow is used in backend with Keras.

import tensorflow.keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2
from tensorflow.keras.layers import Dropout

A sequential neural network is designed with dense layers and the dropout used is 30% and the activation function used is relu || sigmoid function. The details of the module used are



We used four hidden layers with 639,450 parameters and batch sized I used is very low i,e, 10, and total epochs are only 100, and the optimizer I prefer is adam.

I trained this module in my cloud GPU which took around 2-3 minutes to train and the result is very optimistic considering the short time.


The validation accuracy I got was 87. 13 % and Test loss was around 36.19 % which is still high, I will add some more layers and will change the activation function to get better results.

Now, let’s visualize how our training accuracy and test loss performed over the epochs.





Result

As I said earlier, the accuracy is fluctuating from 86% to 88%, that shows more training is required for the module, I will add dense layers to overcome this issue. In conclusion, our module has very good accuracy and we have achieved the goal to train our module which can predict the default loans. The overall accuracy we got is around 87% with Deep Learning Neural Network and 99% with the Random Forest Classifiers. This module can be used for production with few added features and training layers.

Comments

  1. I want to share a testimony on how Mr Pedro loan offer helped my life,
    It isn't a good idea to use a payday loan on a regular basis. In the event that you endlessly prolong your repayment date and borrow often towards your subsequent paycheck, it could run you a good amount of money. However, it is just as sensible to decide on payday loans as they can be swiftly approved exactly the same day you put in your loan application form. You can contact Mr Pedro loan offer because my payday loan was very fast to proceed, email pedroloanss@gmail.com to request any kind of loan. Whatsapp +18632310632

    ReplyDelete

Post a Comment

Popular posts from this blog

MACHINE LEARNING for BUSINESS PROFESSIONALS

Intro: Shaping Nepalese Banking Sector With AI