Page 23 - Data Science Algorithms in a Week
P. 23

8                                Ramazan Ünlü

                            Robustness: The consensus clustering might have better overall performance than
                              majority of individual clustering methods.
                            Consistency: The combination of individual clustering methods is similar to all
                              combined ones.
                            Stability: The consensus clustering shows less variability across iterations than
                              all combined algorithms.

                          In terms of properties like these, the better partitions can be produced in comparison
                       to  most  individual  clustering  methods.  The  result  of  consensus  clustering  cannot  be
                       expected to be the best result in all cases as there could be exceptions. It can only be
                       ensured that consensus clustering outperforms most of the single algorithms combined
                       concerning some properties by assuming the fact that combination of good characteristics
                       of various partition is more reliable than any single algorithm.
                          Over  the  past  years,  many  different  algorithms  have  been  proposed  for  consensus
                       clustering (Al-Razgan & Domeniconi, 2006; Ana & Jain, 2003; Azimi & Fern, 2009; d
                       Souto, de Araujo, & da Silva, 2006; Hadjitodorov, Kuncheva, & Todorova, 2006; Hu,
                       Yoo, Zhang, Nanavati, & Das, 2005; Huang, Lai, & Wang, 2016; Li & Ding, 2008; Li,
                       Ding, & Jordan, 2007; Naldi, Carvalho, & Campello, 2013; Ren, Domeniconi, Zhang, &
                       Yu, 2016). As it is mentioned earlier, it can be seen in the literature that the consensus
                       clustering  framework  is  able  to  enhance  the  robustness  and  stability  of  clustering
                       analysis. Thus, consensus clustering has gained a lot of real-world applications such as
                       gene  classification,  image  segmentation  (Hong,  Kwong,  Chang,  &  Ren,  2008),  video
                       retrieval and so on (Azimi, Mohammadi, & Analoui, 2006; Fischer & Buhmann, 2003; A.
                       K.  Jain  et  al.,  1999).  From  a  combinatorial  optimization  point  of  view,  the  task  of
                       combining  different  partitions  has  been  formulated  as  a  median  partitioning  problem
                       which is known to be N-P complete (Křivánek & Morávek, 1986). Even with the use of
                       recent  breakthroughs  this  approach  cannot  handle  datasets  of  size  greater  than  several
                       hundreds  of  samples  (Sukegawa,  Yamamoto,  &  Zhang,  2013).  For  a  comprehensive
                       literature  of  formulation  of  0-1  linear  program  for  the  consensus  clustering  problem,
                       readers can refer to (Xanthopoulos, 2014).
                          The problem of consensus clustering can be verbally defined such that by using given
                       multiple partitions of the dataset, find a combined clustering model- or final partition-
                       that  somehow  gives  better  quality  regarding  some  aspects  as  pointed  out  above.
                       Therefore,  every  consensus  clustering  method  is  made  up  of two  steps in  general: (1)
                       generation of multiple partition and (2) consensus function as shown in Figure 6 (Topchy,
                       Jain, & Punch, 2003; Topchy et al., 2004; D. Xu & Tian, 2015).
                          Generation of multiple partitions is the first step of consensus clustering. This action
                       aims to create multiple partitions that will be combined. It might be imperative for some
                       problems because final partition will depend on partitions produced in this step. Several
                       methods are proposed to create multiple partitions in literature as follows:
   18   19   20   21   22   23   24   25   26   27   28