Page 25 - Data Science Algorithms in a Week
        P. 25
     10                               Ramazan Ünlü
                              produce multiple partitions. They aimed to reduce variability in the partitioning
                              based  algorithm  result  by  averaging.  And,  they  successfully  produced  more
                              accurate clusters than an application of a single algorithm.
                          The consensus function is the crucial and leading step of any consensus clustering
                       algorithm. These functions are used to combine a set of labels produced by individual
                       clustering algorithms in the previous step. The combined labels - or final partition- can be
                       considered  as  a  result  of  another  clustering  algorithm.  Foundation  or  definition  of  a
                       consensus function can profoundly impact the goodness of final partition which is the
                       product of any consensus clustering. However, the way of the combination of multiple
                       partitions is not the same in all cases. A sharp -but well-accepted- division of consensus
                       functions are (1) objects co-occurrence and (2) median partition approaches.
                          The  idea  of  objects  co-occurrence  methods  works  based  on  similar  and  dissimilar
                       objects. If two data points are in the same cluster, those can be considered as similar,
                       otherwise they are dissimilar. Therefore, in objects co-occurrence methods it should be
                       analyzed that how many times data samples belong to one cluster. In median partition
                       approach, the final partition is obtained by solving an optimization problem which is the
                       problem  of  finding  the  median  partition  concerning cluster  ensemble.  Now  the  formal
                       version of the median partition problem can be defined. Given a set of    partitions and a
                                                                                                 ∗
                       similarity measure such as distance   (, ) between two partitions, a set of partition     can
                       be found such that:
                                           
                            ∗
                             =              ∑   (   ,   )
                                                 
                                         
                                          =1
                       It can be found the detailed review of consensus functions, and taxonomy of principal
                       consensus  functions  in  different  studies  such  as  (Ghaemi,  Sulaiman,  Ibrahim,  &
                       Mustapha,  2009;  Topchy  et  al.,  2004;  Vega-Pons  &  Ruiz-Shulcloper,  2011;  D.  Xu  &
                       Tian,  2015).  Also,  relations  among  different  consensus  functions  can  be  found  in  (Li,
                       Ogihara, & Ma, 2010). some of the main functions are summarized as follows:
                            Based  on  relabeling  and  voting:  These  methods  are  based  on  two  important
                              steps. At the first step, the labeling correspondence problem needs to be solved.
                              The label of each sample is symbolic; a set of the label given by an algorithm
                              might be different than labels given by another algorithm. However, both sets of
                              labels correspond to the same partition. Solving this problem makes the partitions
                              ready  for  the  combination  process.  If  the  labeling  correspondence  problem  is
                              solved,  then  at  the  second  step  voting  procedure  can  be  applied.  The  voting
                              process finds how many times a sample is labeled with the same label. To apply





