Page 4 - Konect Science & Technology Magazine Cover
P. 4

Enter,  data  science  to  the  rescue.  Digitization  of     1.Multivariate   Gaussian     Model:     It’s   a
       banking  has  generated  an  enormous  amount  of             parametric  modeling  technique  (where  the
                                                                                         21
       data  that  can  be  used  to  analyze  such  fraudulent      independent  variables  are  assumed  to  follow
       behavior.  Not  only  that,  it  can  proactively  identify   some specific distribution for e.g., in this case,
       the  probability  of  fraud  even  before  it  occurs
       (reminds me of Minority Report too!).                         Gaussian  distribution)  which  is  also  used  to
                                                                     identify  faulty  jet  engines  of  airplanes.  It
       Understanding  the  fraudster  mindset  can  help  the        identifies  outliers  on  distribution  space  n
       fraud strategist to devise effective rules and models.        dimension  (or  variables)  just  like  we  get  p-
       The biggest task here is to identify data trails of such      values  of  a  point  on  the  normal  curve.  Lower
       behaviors  and  translate  them  into  meaningful             the value higher the probability of fraud.
       features which can be used in the fraud models. For
       example number of inquiries when the customer is
       not  present,  the  withdrawal  amount  is  significantly     2.  Isolation  forest:  It’s  a  tree-based  non-
       above  or  below  the  average  withdrawal  pattern  of
       the  customer,  teller  working  outside  office  timings,    parametric modeling technique (where there is
       customers deviation from preferred bank branch or             no   assumption     about    the   independent
       channel of spending behavior, etc. can be important           variables)  that  identifies  the  anomaly  by
       variables in such models. And just like with any ML           segregating  them  in  isolated  branches.  Lower
       models,  Fraud  models  are  only  as  good  as  the          the  distance  from  the  base  node  higher  the
       quality  of  the  variables  used  to  create  them.  Only    probability of fraud.
       one in million transactions is a fraud (literally!).                              48
                                                                     3.  OneclassSVM:  A  non-parametric  modeling
         A  supervised  approach  will  break  in  this  highly      technique that tries to identify boundaries in a
       skewed  fraud  event.  And  that’s  is  not  even  the
       biggest  limitation.  We  can  still  create  a  supervised   cluster of data. And then identify any datapoint
       model if we know the fraud cases. But the model will          outside these boundaries as anomalies.
       only be able to predict the known frauds on which it
       was trained. It will not be able to identify new types
       of  fraud.  Generally,  in  fraud,  the  landscape  never     4.  Autoencoder  Neural  Network:  Based  on
       remains the same and fraudsters keep up with new              human-like learning of identifying and labeling
       rules and models to beat them at their game.                  objects,  this  technique  tries  to  reduce  the

                                                                     dimensionality of the data and in the process
        Hence unsupervised techniques can be used to cast            identifies    the      anomalies.      Although
       a  wide  net  and  identify  anomalous  behavior  which
       has  not  been  seen  previously.  This  method  of           unsupervised  algorithms  are  powerful  in
       finding  abnormal  datapoint  from  datasets  is  also        detecting  anomalies,  they  are  prone  to
       called  “Anomaly  Detection”.  Some  of  these                overfitting and unstable results. The solution
       techniques  that  I  have  successfully  used  in  my         is to train multiple models then aggregate the
       experience are:                                               scores. All these methodologies deployed with
                                                                     the  fire-power  of  big  data  can  be  a  lethal
                                                                     weapon against such internal frauds. Needless
                                                                     to say, once invested in such infrastructure – it
                                                                     can reap benefits in other internal frauds like
                                                                     insider threat, internal security, etc. I hope this
                                                                     has piqued your interest in fraud and gave you
                                                                     a new perspective
                                                                     to fight against it.



       DARVIX CONNECT | 4                                                                        www.konectmag.com
   1   2   3   4   5   6   7   8   9