Page 29 - Data Science Algorithms in a Week
P. 29
14 Ramazan Ünlü
Mainly, after they have created some base partitions, an improved similarity matrix is
created to get an optimal partition by using spectral clustering. An improved version of
LCE is proposed by (Iam-On, Boongoen, & Garrett, 2010)with the goal of using
additional information by implementing 'Weighted Triple Uniqueness (WTU)'. An
iterative consensus clustering is applied to a complex network (Lancichinetti &
Fortunato, 2012). Lancichinetti and Fortunat stress that there might be a noisy connection
in consensus graph which should be removed. Thus, they refined consensus graph by
removing some edges whose value is lower than some threshold value and reconnected it
to the closest neighbor until a block diagonal matrix is obtained. At the end, a graph-
based algorithm is applied to consensus graph to get final partition. To efficiently find the
similarity between two data points, which can be interpreted as the probability of being in
the same cluster, a new index, called the Probabilistic Rand Index (PRI) is developed by
(Carpineto & Romano, 2012). According to the author, they obtained better results than
existing methods. One of the possible problem in consensus framework is an inability to
handle uncertain data points which are assigned the same cluster in about the half of the
partitions and assigned to different clusters in rest of the partitions. This can yield a final
partition with the poor quality. To overcome this limitation, (Yi, Yang, Jin, Jain, &
Mahdavi, 2012) proposed an ensemble clustering method based on the technique of
matrix completion. The proposed algorithm constructs a partially observed similarity
matrix based on the pair of samples which are assigned to the same cluster by most of the
clustering algorithms. Therefore, the similarity matrix consists of three elements 0,1, and
unobserved. It is then used in the matrix completion algorithm to complete unobserved
elements. The final data partition is obtained by applying a spectral clustering algorithm
to final matrix (Yi et al., 2012).
A boosting theory based hierarchical clustering ensemble algorithm called Bob-Hic is
proposed by (Rashedi & Mirzaei, 2013) as an improved version of the method suggested
by (Rashedi & Mirzaei, 2011). Bob-Hic includes several boosting steps, and in each step,
first a weighted random sampling is implied on the data, and then a single hierarchical
clustering is created on the selected samples. At the end, the results of individual
hierarchical clustering are combined to obtain final partition. The diversity and the
quality of combined partitions are critical properties for a strong ensemble. Validity
Indexes are used to select high-quality partition among the produced ones by (Naldi et al.,
2013). In this study, the quality of a partition is measured by using a single index or
combination of some indexes. APMM is another criterion used in determining the quality
of partition proposed by (Alizadeh, Minaei-Bidgoli, & Parvin, 2014). This criterion is
also used to select some partitions among the all the produced partitions. A consensus
particle swarm clustering algorithm based on the particle swarm optimization (PSO)
(Kennedy, 2011) is proposed by (Esmin & Coelho, 2013). According to the results of this
study, the PSO algorithm produces results as good as or better than other well-known
consensus clustering algorithms.