Page 26 - Data Science Algorithms in a Week
P. 26
Unsupervised Ensemble Learning 11
these methods, each produced partition should have the same number of the
cluster with final partition (Topchy et al., 2005; Vega-Pons & Ruiz-Shulcloper,
2011). On the other hand, the strength of this method is easy to understand and
employ. Plurality Voting (PV) (Fischer & Buhmann, 2003), Voting-Mergin
(VM) (Weingessel, Dimitriadou, & Hornik, 2003), Voting for fuzzy clustering
(Dimitriadou, Weingessel, & Hornik, 2002), Voting Active Cluster (VAC)
(Tumer & Agogino, 2008). and Cumulative Voting (CV) (Ayad & Kamel,
2008)can be given as examples.
Based on co-association matrix: Algorithms based on the co-association matrix
is used to avoid the labeling correspondence problem. The main idea of this
approach is to create a co-association matrix in which each element is computed
based on how many times two particular samples are in the same cluster. A
clustering algorithm is necessary to produce the final partition. One of the
deficiencies of this kind of algorithm is that the computational complexity of the
methods is quadratic in the number of samples. Therefore, it is not suitable for
large datasets. On the other hand, they are very easy to understand and employ.
Evidence accumulation in conjunction with Single Link (EA-CL) or Complete
Link algorithms (EA-CL) (A. Fred, 2001) can be given as examples.
Based on graph partition: This kind of methods transform the combination of
multiple partitions into graph or hypergraph partitioning problem (Vega-Pons &
Ruiz-Shulcloper, 2011). All partitions in ensemble procedure can be represented
by a hyperedge, and final partition is obtained by implementing a graph-based
clustering algorithm. Three graph partitioning algorithms, Cluster-based
Similarity Partitioning Algorithm (CSPA), Hypergraph Partitioning Algorithm
(HGPA), and Meta-CLustering Algorithm (MCLA), are proposed by (Strehl &
Ghosh, 2002). In CSPA, a similarity matrix is created from a hypergraph. Each
element of this matrix shows how many times two points are assigned to the
same cluster. Final partition can be obtained by applying a graph similarity-based
algorithm such as spectral clustering or METIS. In HGPA, the hypergraph is
directly clustered by removing the minimum number of hyperedges. To get the
final partition from the hypergraph, an algorithm which is suitable to cluster
hypergraph such as HMETIS (Karypis, Aggarwal, Kumar, & Shekhar, 1999) is
used. In MCLA, the similarity between two clusters is defined based on the
number of common samples by using Jaccard index. The similarity matrix
between the clusters is the adjacency matrix of the graph whose nodes are the
clusters and edge is the similarity between the clusters. METIS algorithm is used
to recluster that graph. Computational and storage complexity of CSPA is
quadratic in the number of sample n, while HGPA and MCLA are linear.
Another graph based method is Hybrid Bipartite Graph Formulation (HBGF) is
proposed by (Fern & Brodley, 2004). As different from the previous methods,