Page 29 - Data Science Algorithms in a Week
P. 29

14                               Ramazan Ünlü

                       Mainly, after they have created some base partitions, an improved similarity matrix is
                       created to get an optimal partition by using spectral clustering. An improved version of
                       LCE  is  proposed  by  (Iam-On,  Boongoen,  &  Garrett,  2010)with  the  goal  of  using
                       additional  information  by  implementing  'Weighted  Triple  Uniqueness  (WTU)'.  An
                       iterative  consensus  clustering  is  applied  to  a  complex  network  (Lancichinetti  &
                       Fortunato, 2012). Lancichinetti and Fortunat stress that there might be a noisy connection
                       in  consensus  graph  which  should  be  removed.  Thus,  they  refined  consensus  graph  by
                       removing some edges whose value is lower than some threshold value and reconnected it
                       to the closest neighbor until a block diagonal matrix is obtained. At the end, a graph-
                       based algorithm is applied to consensus graph to get final partition. To efficiently find the
                       similarity between two data points, which can be interpreted as the probability of being in
                       the same cluster, a new index, called the Probabilistic Rand Index (PRI) is developed by
                       (Carpineto & Romano, 2012). According to the author, they obtained better results than
                       existing methods. One of the possible problem in consensus framework is an inability to
                       handle uncertain data points which are assigned the same cluster in about the half of the
                       partitions and assigned to different clusters in rest of the partitions. This can yield a final
                       partition  with  the  poor  quality.  To  overcome  this  limitation,  (Yi,  Yang,  Jin,  Jain,  &
                       Mahdavi,  2012)  proposed  an  ensemble  clustering  method  based  on  the  technique  of
                       matrix  completion.  The  proposed  algorithm  constructs  a  partially  observed  similarity
                       matrix based on the pair of samples which are assigned to the same cluster by most of the
                       clustering algorithms. Therefore, the similarity matrix consists of three elements 0,1, and
                       unobserved. It is then used in the matrix completion algorithm to complete unobserved
                       elements. The final data partition is obtained by applying a spectral clustering algorithm
                       to final matrix (Yi et al., 2012).
                          A boosting theory based hierarchical clustering ensemble algorithm called Bob-Hic is
                       proposed by (Rashedi & Mirzaei, 2013) as an improved version of the method suggested
                       by (Rashedi & Mirzaei, 2011). Bob-Hic includes several boosting steps, and in each step,
                       first a weighted random sampling is implied on the data, and then a single hierarchical
                       clustering  is  created  on  the  selected  samples.  At  the  end,  the  results  of  individual
                       hierarchical  clustering  are  combined  to  obtain  final  partition.  The  diversity  and  the
                       quality  of  combined  partitions  are  critical  properties  for  a  strong  ensemble.  Validity
                       Indexes are used to select high-quality partition among the produced ones by (Naldi et al.,
                       2013).  In  this  study,  the  quality  of  a  partition  is  measured  by  using  a  single  index  or
                       combination of some indexes. APMM is another criterion used in determining the quality
                       of  partition  proposed  by  (Alizadeh, Minaei-Bidgoli, &  Parvin,  2014). This criterion  is
                       also used to select some partitions among the all the produced partitions. A consensus
                       particle  swarm  clustering  algorithm  based  on  the  particle  swarm  optimization  (PSO)
                       (Kennedy, 2011) is proposed by (Esmin & Coelho, 2013). According to the results of this
                       study, the PSO algorithm produces results as good as or better  than other well-known
                       consensus clustering algorithms.
   24   25   26   27   28   29   30   31   32   33   34