Page 19 - Data Science Algorithms in a Week
P. 19

4                                Ramazan Ünlü
















                       Figure 3. Clustering process.
                             ≠ 0 for    = 1, … ,   
                             
                          ∪      =1     =   
                                  
                             ∩    = ∅ for   ,    = 1, … ,   
                                  
                             

                          Through  this  clustering  process,  clusters  are  created  based  on  dissimilarities  and
                       similarities between samples. Those dissimilarities and similarities are assessed based on
                       the  feature  values  describing  the  objects  and  are  relevant  to  the  purpose  of  the  study,
                       domain-specific  assumptions  and  prior  knowledge  of  the  problem  (Grira,  Crucianu,  &
                       Boujemaa, 2005). Since the similarity is an essential part of a cluster, a measure of the
                       similarity between two objects is very crucial in clustering algorithms. This action must
                       be  chosen  very  carefully  because  the  quality  of  a  clustering  model  depends  on  this
                       decision. Instead of using similarity measure, the dissimilarity between two samples are
                       commonly used as well. For the dissimilarity metrics, a distance measure defined on the
                       feature space such as Euclidean distance, Minkowski distance, and City-block distance
                       (Kantardzic, 2011).
                          The standard process of clustering can be divided into the several steps. The structure
                       of those necessary steps of a clustering model are depicted in Figure 3 inspired by (R. Xu
                       &  Wunsch,  2005).  On  the  other  hand,  several taxonomies  of  clustering  methods  were
                       proposed by researchers (Nayak, Naik, & Behera, 2015; D. Xu & Tian, 2015; R. Xu &
                       Wunsch, 2005). It is not easy to give the strong diversity of clustering methods because
                       of  different  starting  point  and  criteria.  A  rough  but  widely  agreed  categorization  of
                       clustering methods is to classify them as hierarchical clustering and partitional clustering,
                       based on  the  properties of  clusters  generated  (R.  Xu  &  Wunsch,  2005).  However,  the
                       detailed taxonomy listed below in Table 1 inspired by the one suggested in (D. Xu &
                       Tian, 2015) is put forwarded.
                          In this study, details of algorithms categorized in Table 1 are not discussed. We can
                       refer the reader to (D. Xu & Tian, 2015) for a detailed explanation of these clustering
                       algorithms. However, a brief overview about ensemble based clustering is given. Detailed
                       discussion will be introduced in the section below.
   14   15   16   17   18   19   20   21   22   23   24