Page 282 - Data Science Algorithms in a Week
P. 282
Agent-Based Modeling Simulation and Its Application to Ecommerce 263
Data
Real data of LC business model is available via its platform. Data on arrival patterns
and arrival intervals are generated stochastically according to the data collected for years
2013 and 2014. There were 235,629 accepted loan requests during the period of interest.
Error! Reference source not found. summarizes descriptive statistics for variables relating
to the funded (accepted) borrowers within the time period.
Table 1. Borrower Profiles
Variable name Minimum Maximum Mean Std. Deviation
funded_amnt ($) 1000 35000 14870 8438
int_rate (%) 6.00 26.06 13.78 4.32
annual_inc ($) 3000 7500000 74854 55547
dti 0 39.99 18.04 8.02
delinq_2yrs 0 22.00 0.34 0.89
inq_last_6mths 0 6.00 0.76 1.03
revol_util ($) 0 892.30 55.69 23.10
total_acc 2.00 156.00 26.01 11.89
Variables of interest include loan amount (funded_amnt), interest generated based on
user characteristics (int_rate), annual income of the borrower (annual_inc), debt-to-
income ratio (dti), number of delinquencies in the past 2 years (delinq_2yrs), number of
inquiries in the past 6 months (inq_last_6mths), revolving utilization ratio (revol_util),
verification status of the user, number of accounts open in the last 2 years (total_acc) and
the term of the loan (36 or 64 months).
The loan status includes Charged Off, Current, Default, Fully Paid, In Grace Period,
Late (16-30 days) and Late (31-120 days). Only completed loans are considered i.e.,
those that have been fully paid or charged off.
Neural Network
The neural network (NN) is used to map the characteristics of users to different risk
decisions and to copy trust. Profiles of completed loans are used to build the NN model
representations using combined datasets of the accepted and rejected loans. A random
sample of 2062 data points from the combined dataset forms the training data used in the
learning process. The input is normalized by dividing amount requested by 3.5, FICO
score by 850 and employment length by 10.
The network structure consisted on four layers (Fig. 2). The first layer has 4 neurons
representing each of the following variables: amount, FICO, dti and employment length.