Page 55 - ASBIRES-2017_Preceedings
P. 55

AVOID EMAIL SPAMMING BY SERVER AUTHENTICATION AND SAILJS ORM



                     2.2 Naïve Bayes Classifier Method                 therefore  provoking  an  immune  response
                                                                       Recognition  in  the  immune  system  is
                         In 1998, the Naïve Bayes classifier was
                     proposed  for  the  recognition  of  spam.  The   performed by lymphocytes.
                     Bayesian  classifier  is  working  on  the                   3 METHODOLOGY
                     dependent  events  and  the  probability  of  an
                     event  occurring  in  the  future  that  can  be
                     detected from the previous occurrence of the
                     same event.
                         This  technique  can  be  used  to  classify
                     spam  emails;  Probabilities  of  words  play
                     the main  role  here.  If  the  total  word
                     probabilities exceed a certain limit, the filter
                     will mark the email in any of the categories.
                     Here, only two categories are needed: spam
                     or  ham.  Almost  all  statistical-based  spam
                     filters   use   the   Bayesian   probability
                     calculation  to  combine  individual  token
                     statistics  with  a  global  score.  he  following         Figure 1: System diagram
                     equation  is  used  to  calculate  spam
                     probability.                                       3.1 Preprocessing Email

                                                                           The  content  of  the  email  is  received
                                                                       through  our  software.  The  information  is
                                                                       then extracted as mentioned above. Then the

                                                                       extracted information (feature) is stored in a
                         Where, S spam (T) and C  Ham (T) are the      corresponding  database.  Each  message
                     number  of  spam  or  ham  messages  that         became  a  function  Vector  with  21700
                     contain  the  T  token,  respectively.  To        attributes (this is approximately the number
                     calculate  the  possibility  of  an  M  message   of  different  words  in  all  corpus  messages).
                     with  tokens  {T1...  TN},  it  is  necessary  to   An  attribute  n  was  set  to  1  if  the
                     combine  the  spamming  of  the  individual       corresponding  word  was  present  in  a
                     token  to  evaluate  the  general  message  of    message and in 0 otherwise.
                     spamminess.                                              This  feature  extraction  scheme  was

                         A simple way to make classifications is       used  for  all  algorithms  used  for  spam
                     to  calculate  the  spamminess  product  of  an   filtering.
                     individual  token  and  compare  it  with  the
                     individual  token  product  (Rao  &  Reiley,      3.2 Description of the Extracted Feature
                     2012).                                                 Email Sending feature mainly consisted

                     2.3 Artificial Immune System Classifier           of  Lossless  data  compression  algorithms
                        Method                                         usually  exploit  statistical  redundancy  to
                                                                       represent   data    without    losing   any
                         Biological    immune     system     has       information,  so  the  process  is  reversible.
                     succeeded  in  protecting  the  human  body       Lossless  compression  is  possible  because
                     against a wide variety of foreign pathogens.      most  real-world  data  show  statistical
                     One role of the immune system is to protect       redundancy.
                     our  body  from  infectious  agents  such  as          For example, an image may have color
                     viruses, bacteria, etc. On the surface of these   areas that do not change over several pixels;
                     agents  are  antigens  that  allow  the           Instead of encoding "red pixels, red pixels",
                     identification  of  the  invading  agents,
                                                                       the data can be encoded as "279 red pixels".



                                                                     45
   50   51   52   53   54   55   56   57   58   59   60