Page 19 - Data Science Algorithms in a Week

P. 19

4 Ramazan Ünlü

Figure 3. Clustering process.
≠ 0 for = 1, … ,

∪ =1 =

∩ = ∅ for , = 1, … ,

Through this clustering process, clusters are created based on dissimilarities and
similarities between samples. Those dissimilarities and similarities are assessed based on
the feature values describing the objects and are relevant to the purpose of the study,
domain-specific assumptions and prior knowledge of the problem (Grira, Crucianu, &
Boujemaa, 2005). Since the similarity is an essential part of a cluster, a measure of the
similarity between two objects is very crucial in clustering algorithms. This action must
be chosen very carefully because the quality of a clustering model depends on this
decision. Instead of using similarity measure, the dissimilarity between two samples are
commonly used as well. For the dissimilarity metrics, a distance measure defined on the
feature space such as Euclidean distance, Minkowski distance, and City-block distance
(Kantardzic, 2011).
The standard process of clustering can be divided into the several steps. The structure
of those necessary steps of a clustering model are depicted in Figure 3 inspired by (R. Xu
& Wunsch, 2005). On the other hand, several taxonomies of clustering methods were
proposed by researchers (Nayak, Naik, & Behera, 2015; D. Xu & Tian, 2015; R. Xu &
Wunsch, 2005). It is not easy to give the strong diversity of clustering methods because
of different starting point and criteria. A rough but widely agreed categorization of
clustering methods is to classify them as hierarchical clustering and partitional clustering,
based on the properties of clusters generated (R. Xu & Wunsch, 2005). However, the
detailed taxonomy listed below in Table 1 inspired by the one suggested in (D. Xu &
Tian, 2015) is put forwarded.
In this study, details of algorithms categorized in Table 1 are not discussed. We can
refer the reader to (D. Xu & Tian, 2015) for a detailed explanation of these clustering
algorithms. However, a brief overview about ensemble based clustering is given. Detailed
discussion will be introduced in the section below.

14 15 16 17 18 19 20 21 22 23 24