Page 24 - Data Science Algorithms in a Week
P. 24

Unsupervised Ensemble Learning                        9























                       Figure 6. Process of consensus clustering.

                            For  the  same  dataset,  employ  different  traditional  clustering  methods:  Using
                              different  clustering  algorithms  might  be  the  most  commonly  used  method  to
                              create multiple partitions for a given dataset. Even though there is no particular
                              rule to choose the conventional algorithms to apply, it is advisable to use those
                              methods that can have more information about the data in general. However, it is
                              not  easy  to  know  in  advance  which  methods  will  be  suitable  for  a  particular
                              problem.  Therefore,  an  expert  opinion  could  be  very  useful  (Strehl  &  Ghosh,
                              2002; Vega-Pons & Ruiz-Shulcloper, 2011; D. Xu & Tian, 2015).
                            For  the  same  dataset,  employ  different  traditional  clustering  methods  with
                              different initializations or parameters: Using different algorithms with a different
                              parameter  or  initialization  is  an  another  efficient  method  (Ailon,  Charikar,  &
                              Newman, 2008).A simple algorithm can produce different informative partition
                              about  the  data,  and  it  can  yield  an  effective  consensus  in  conjunction  with  a
                              suitable  consensus  function.  For  example,  using  the  k-means  algorithm  with
                              different  random  initial  centers  and  number  of  clusters  to  generate  different
                              partitions introduced by (A. L. Fred & Jain, 2005).
                            Using  weak  clustering  algorithms:  In  generation  step,  the  weak  clustering
                              algorithms are also used. These methods produce a set of partitions for data using
                              very  straightforward  methodology.  Despite  the  simplicity  of  this  kind  of
                              methods,  it  is  observed  that  weak  clustering  algorithms  could  provide  high-
                              quality consensus clustering along with a proper consensus function (Luo, Jing,
                              & Xie, 2006; Topchy et al., 2003; Topchy, Jain, & Punch, 2005)
                            Data resampling: Data resampling such as bagging and boosting is an another
                              useful method to create multiple partitions (Dudoit & Fridlyand, 2003; Hong et
                              al.,  2008).  Dudoit  S.  and  Jane  Fridlyand  J.  applied  a  partitioning  clustering
                              method (e.g., Partitioning Around Medoids) to a set of bootstrap learning data to
   19   20   21   22   23   24   25   26   27   28   29