Page 4 - Konect Science & Technology Magazine Cover
P. 4
Enter, data science to the rescue. Digitization of 1.Multivariate Gaussian Model: It’s a
banking has generated an enormous amount of parametric modeling technique (where the
21
data that can be used to analyze such fraudulent independent variables are assumed to follow
behavior. Not only that, it can proactively identify some specific distribution for e.g., in this case,
the probability of fraud even before it occurs
(reminds me of Minority Report too!). Gaussian distribution) which is also used to
identify faulty jet engines of airplanes. It
Understanding the fraudster mindset can help the identifies outliers on distribution space n
fraud strategist to devise effective rules and models. dimension (or variables) just like we get p-
The biggest task here is to identify data trails of such values of a point on the normal curve. Lower
behaviors and translate them into meaningful the value higher the probability of fraud.
features which can be used in the fraud models. For
example number of inquiries when the customer is
not present, the withdrawal amount is significantly 2. Isolation forest: It’s a tree-based non-
above or below the average withdrawal pattern of
the customer, teller working outside office timings, parametric modeling technique (where there is
customers deviation from preferred bank branch or no assumption about the independent
channel of spending behavior, etc. can be important variables) that identifies the anomaly by
variables in such models. And just like with any ML segregating them in isolated branches. Lower
models, Fraud models are only as good as the the distance from the base node higher the
quality of the variables used to create them. Only probability of fraud.
one in million transactions is a fraud (literally!). 48
3. OneclassSVM: A non-parametric modeling
A supervised approach will break in this highly technique that tries to identify boundaries in a
skewed fraud event. And that’s is not even the
biggest limitation. We can still create a supervised cluster of data. And then identify any datapoint
model if we know the fraud cases. But the model will outside these boundaries as anomalies.
only be able to predict the known frauds on which it
was trained. It will not be able to identify new types
of fraud. Generally, in fraud, the landscape never 4. Autoencoder Neural Network: Based on
remains the same and fraudsters keep up with new human-like learning of identifying and labeling
rules and models to beat them at their game. objects, this technique tries to reduce the
dimensionality of the data and in the process
Hence unsupervised techniques can be used to cast identifies the anomalies. Although
a wide net and identify anomalous behavior which
has not been seen previously. This method of unsupervised algorithms are powerful in
finding abnormal datapoint from datasets is also detecting anomalies, they are prone to
called “Anomaly Detection”. Some of these overfitting and unstable results. The solution
techniques that I have successfully used in my is to train multiple models then aggregate the
experience are: scores. All these methodologies deployed with
the fire-power of big data can be a lethal
weapon against such internal frauds. Needless
to say, once invested in such infrastructure – it
can reap benefits in other internal frauds like
insider threat, internal security, etc. I hope this
has piqued your interest in fraud and gave you
a new perspective
to fight against it.
DARVIX CONNECT | 4 www.konectmag.com