Page 28 - Data Science Algorithms in a Week
P. 28
Unsupervised Ensemble Learning 13
process is the main logic of these kinds of methods. sCSPA, sMCLA, and
sHBGF (Punera & Ghosh, 2008) can be found as examples in literature.
RECENT STUDIES IN CONSENSUS CLUSTERING
In the literature, the various studies focus on the development of the consensus
clustering or application of the existing methods. In this section, some relatively recent
and related works are summarized. One can find many different terms corresponding
consensus clustering frameworks. That’s why the search for this study is limited to the
following terms:
Consensus clustering
Ensemble clustering
Unsupervised ensemble learning
Ayad and Kamel proposed the cumulative voting-based aggregation algorithm
(CVAA) as multi-response regression problem (Ayad & Kamel, 2010). The CVAA is
enhanced by assigning weights to the individual clustering method that is used to
generate the consensus based on the mutual information associated with each method,
which is measured by the entropy (Saeed, Ahmed, Shamsir, & Salim, 2014). Weighted
partition consensus via Kernels (WPCK) is proposed by (Vega-Pons et al., 2010). This
method uses an intermediate step called Partition Relevance Analysis to assign weights to
represent the significance of the partition in the ensemble. Also, this method defines the
consensus clustering via the median partition problem by using a kernel function as the
similarity measure between the clusters. Different from partitional clustering methods
whose results can be represented by vectors hierarchical clustering methods produce a
more complex solution which is shown by dendrograms or trees. This makes using
hierarchical clustering in consensus framework more challenging. A hierarchical
ensemble clustering is proposed by (Yu, Liu, & Wang, 2014) to handle with this difficult
problem. This algorithm combines both partitional and hierarchical clustering and yield
the output as hierarchical consensus clustering.
Link-based clustering ensemble (LCE) is proposed as an extension of hybrid bipartite
graph (HBGF) technique (Iam-On, Boongeon, Garrett, & Price, 2012; Iam-On &
Boongoen, 2012). They applied a graph based consensus function to an improved
similarity matrix instead of conventional one. The main difference between the proposed
method and HBGF is the similarity matrix. While the association between samples is
represented by the binary values [0,1] in traditional similarity matrix, the approximate
value of unknown relationships (0) is used in the improved one. This is accomplished
through the link-based similarity measure called ‘Weighted Connected Triple (WCT)’.