Page 24 - Data Science Algorithms in a Week
P. 24
Unsupervised Ensemble Learning 9
Figure 6. Process of consensus clustering.
For the same dataset, employ different traditional clustering methods: Using
different clustering algorithms might be the most commonly used method to
create multiple partitions for a given dataset. Even though there is no particular
rule to choose the conventional algorithms to apply, it is advisable to use those
methods that can have more information about the data in general. However, it is
not easy to know in advance which methods will be suitable for a particular
problem. Therefore, an expert opinion could be very useful (Strehl & Ghosh,
2002; Vega-Pons & Ruiz-Shulcloper, 2011; D. Xu & Tian, 2015).
For the same dataset, employ different traditional clustering methods with
different initializations or parameters: Using different algorithms with a different
parameter or initialization is an another efficient method (Ailon, Charikar, &
Newman, 2008).A simple algorithm can produce different informative partition
about the data, and it can yield an effective consensus in conjunction with a
suitable consensus function. For example, using the k-means algorithm with
different random initial centers and number of clusters to generate different
partitions introduced by (A. L. Fred & Jain, 2005).
Using weak clustering algorithms: In generation step, the weak clustering
algorithms are also used. These methods produce a set of partitions for data using
very straightforward methodology. Despite the simplicity of this kind of
methods, it is observed that weak clustering algorithms could provide high-
quality consensus clustering along with a proper consensus function (Luo, Jing,
& Xie, 2006; Topchy et al., 2003; Topchy, Jain, & Punch, 2005)
Data resampling: Data resampling such as bagging and boosting is an another
useful method to create multiple partitions (Dudoit & Fridlyand, 2003; Hong et
al., 2008). Dudoit S. and Jane Fridlyand J. applied a partitioning clustering
method (e.g., Partitioning Around Medoids) to a set of bootstrap learning data to