Page 28 - Data Science Algorithms in a Week
P. 28

Unsupervised Ensemble Learning                       13

                              process  is  the  main  logic  of  these  kinds  of  methods.  sCSPA,  sMCLA,  and
                              sHBGF (Punera & Ghosh, 2008) can be found as examples in literature.


                                    RECENT STUDIES IN CONSENSUS CLUSTERING

                          In  the  literature,  the  various  studies  focus  on  the  development  of  the  consensus
                       clustering or application of the existing methods. In this section, some relatively recent
                       and  related  works  are  summarized.  One  can  find  many  different  terms  corresponding
                       consensus clustering frameworks. That’s why the search for this study is limited to the
                       following terms:

                            Consensus clustering
                            Ensemble clustering
                            Unsupervised ensemble learning

                          Ayad  and  Kamel  proposed  the  cumulative  voting-based  aggregation  algorithm
                       (CVAA)  as  multi-response  regression  problem  (Ayad  &  Kamel,  2010). The  CVAA  is
                       enhanced  by  assigning  weights  to  the  individual  clustering  method  that  is  used  to
                       generate the consensus based on the mutual information associated with each method,
                       which is measured by the entropy (Saeed, Ahmed, Shamsir, & Salim, 2014). Weighted
                       partition consensus via Kernels (WPCK) is proposed by (Vega-Pons et al., 2010). This
                       method uses an intermediate step called Partition Relevance Analysis to assign weights to
                       represent the significance of the partition in the ensemble. Also, this method defines the
                       consensus clustering via the median partition problem by using a kernel function as the
                       similarity  measure  between  the  clusters.  Different  from  partitional  clustering  methods
                       whose results can be represented by vectors hierarchical clustering methods produce a
                       more  complex  solution  which  is  shown  by  dendrograms  or  trees.  This  makes  using
                       hierarchical  clustering  in  consensus  framework  more  challenging.  A  hierarchical
                       ensemble clustering is proposed by (Yu, Liu, & Wang, 2014) to handle with this difficult
                       problem. This algorithm combines both partitional and hierarchical clustering and yield
                       the output as hierarchical consensus clustering.
                          Link-based clustering ensemble (LCE) is proposed as an extension of hybrid bipartite
                       graph  (HBGF)  technique  (Iam-On,  Boongeon,  Garrett,  &  Price,  2012;  Iam-On  &
                       Boongoen,  2012).  They  applied  a  graph  based  consensus  function  to  an  improved
                       similarity matrix instead of conventional one. The main difference between the proposed
                       method  and  HBGF  is  the  similarity  matrix.  While  the  association  between  samples  is
                       represented by the binary values [0,1] in traditional similarity matrix, the approximate
                       value of unknown relationships (0) is used in the improved one. This is accomplished
                       through the link-based similarity measure called ‘Weighted Connected Triple (WCT)’.
   23   24   25   26   27   28   29   30   31   32   33