Page 78 - ECLECTIC MARCH-2022 INSIDE PAGES_p2.indd
P. 78
refore, the application of prioritization to encompass that aims to partition n observations into k clusters in
the data-driven approach towards justified investment of which each observation belongs to the cluster with the
low voltage calls is given below: nearest mean (cluster centers or cluster centroid), serving
as a prototype of the cluster.
➤ Location-based Clustering of the calls by their
consumer number to identify the proximity of the Hierarchical Clustering: In data mining and statistics,
calls i.e., low voltage pockets hierarchical clustering (also called hierarchical cluster
analysis or HCA) is a method of cluster analysis that seeks
➤ Sorting of the pockets in descending order based on to build a hierarchy of clusters. Strategies for hierarchical
power consumption, revenue collection, frequency clustering generally fall into two types:
of calls, long pending complaints etc.
➤ Agglomerative: This is a “bottom-up” approach:
➤ Ranking to be given to the higher value of factors to each observation starts in its cluster, and pairs of
recognize the most critical pockets clusters are merged as one moves up the hierarchy.
➤ Selection of top-ranked pocket for investment as per ➤ Divisive: This is a “top-down” approach: all
the allocated budget observations start in one cluster, and splits are
performed recursively as one moves down the
Clustering Technique hierarchy.
Clustering The basic di erences are tabulated below:
Clustering is the task of dividing the population or data Table 2: K-Means Clustering VS Hierarchical Clustering
points into a number of groups such that data points in
the same groups are more similar to other data points Sl. Clustering Pros Cons
in the same group than those in other groups. In simple No. Model
words, the aim is to segregate groups with similar traits
and assign them into clusters. 1. K-Means Simple to understand, Need to
Clustering easily adaptable, works choose the
Requirements of Clustering in Data Mining well on small or large number of
datasets, fast, efficient clusters
➤ Interpretability: The result of clustering should be and performant
usable, understandable and interpretable.
2. Hierarchical The optimal number Not
➤ Helps in dealing with messed-up data: Grouping Clustering of clusters can be appropriate
can give some structure to the data by organizing it obtained by the for large
into groups of similar data objects. model itself, practical datasets
visualization with the
➤ High dimensional: Data clustering is also able to dendrogram
handle the data of high dimension along with the
data of small size. We will approach the clustering problem by
implementing the k-means algorithm. K-means is a
➤ Attribute shape clusters are discovered: Arbitrary distance-based method that iteratively updates the
shape clusters are detected by using the algorithm location of k cluster centroids until convergence. The
of clustering. Small size clusters with spherical main user-defined ingredients of the k-means algorithm
shapes can also be found. are the distance function (often Euclidean distance) and
the number of clusters k. This parameter needs to be set
➤ Algorithm usability with multiple data: Many according to the application or problem domain. In a
different kinds of data can be used with algorithms nutshell, k-means groups the data by minimizing the sum
of clustering. The data can be like binary data, of squared distances between the data points and their
categorical and interval-based data. respective closest centroid.
Types of Clustering
K-Means Clustering: k-means clustering is a method
of vector quantization, originally from signal processing,
78
the data-driven approach towards justified investment of which each observation belongs to the cluster with the
low voltage calls is given below: nearest mean (cluster centers or cluster centroid), serving
as a prototype of the cluster.
➤ Location-based Clustering of the calls by their
consumer number to identify the proximity of the Hierarchical Clustering: In data mining and statistics,
calls i.e., low voltage pockets hierarchical clustering (also called hierarchical cluster
analysis or HCA) is a method of cluster analysis that seeks
➤ Sorting of the pockets in descending order based on to build a hierarchy of clusters. Strategies for hierarchical
power consumption, revenue collection, frequency clustering generally fall into two types:
of calls, long pending complaints etc.
➤ Agglomerative: This is a “bottom-up” approach:
➤ Ranking to be given to the higher value of factors to each observation starts in its cluster, and pairs of
recognize the most critical pockets clusters are merged as one moves up the hierarchy.
➤ Selection of top-ranked pocket for investment as per ➤ Divisive: This is a “top-down” approach: all
the allocated budget observations start in one cluster, and splits are
performed recursively as one moves down the
Clustering Technique hierarchy.
Clustering The basic di erences are tabulated below:
Clustering is the task of dividing the population or data Table 2: K-Means Clustering VS Hierarchical Clustering
points into a number of groups such that data points in
the same groups are more similar to other data points Sl. Clustering Pros Cons
in the same group than those in other groups. In simple No. Model
words, the aim is to segregate groups with similar traits
and assign them into clusters. 1. K-Means Simple to understand, Need to
Clustering easily adaptable, works choose the
Requirements of Clustering in Data Mining well on small or large number of
datasets, fast, efficient clusters
➤ Interpretability: The result of clustering should be and performant
usable, understandable and interpretable.
2. Hierarchical The optimal number Not
➤ Helps in dealing with messed-up data: Grouping Clustering of clusters can be appropriate
can give some structure to the data by organizing it obtained by the for large
into groups of similar data objects. model itself, practical datasets
visualization with the
➤ High dimensional: Data clustering is also able to dendrogram
handle the data of high dimension along with the
data of small size. We will approach the clustering problem by
implementing the k-means algorithm. K-means is a
➤ Attribute shape clusters are discovered: Arbitrary distance-based method that iteratively updates the
shape clusters are detected by using the algorithm location of k cluster centroids until convergence. The
of clustering. Small size clusters with spherical main user-defined ingredients of the k-means algorithm
shapes can also be found. are the distance function (often Euclidean distance) and
the number of clusters k. This parameter needs to be set
➤ Algorithm usability with multiple data: Many according to the application or problem domain. In a
different kinds of data can be used with algorithms nutshell, k-means groups the data by minimizing the sum
of clustering. The data can be like binary data, of squared distances between the data points and their
categorical and interval-based data. respective closest centroid.
Types of Clustering
K-Means Clustering: k-means clustering is a method
of vector quantization, originally from signal processing,
78