Page 114 - Data Science Algorithms in a Week
P. 114

5









                              Clustering into K Clusters





            Clustering is a technique to divide the data into clusters so that features in the same cluster
            are in a certain sense similar.
            In this chapter you will learn:

                      The k-means clustering algorithm on example about household incomes
                      An example about gender classification to classify features by clustering them
                      first with the features with the known classes
                      To implement k-means clustering algorithm in Python in section Implementation of
                      k-means clustering algorithm
                      An example about house ownership and how to choose an appropriate number
                      of clusters for your analysis
                      Using the example about house ownership how to scale given data appropriately
                      to improve the accuracy of the classification by a clustering algorithm
                      An example about document clustering to understand how a different number of
                      clusters alters the meaning of the dividing boundary between the clusters




            Household incomes - clustering into k

            clusters

            For example let us take households with the yearly earnings in USD dollars 40k, 55k, 70k,
            100k, 115k, 130k, 135k. Then if we require to cluster the households into the two clusters
            taking their earnings as a measure of similarity, then the first cluster would have the
            households earning 40k, 55k, 70k; the second cluster would have the households earning
            100k, 115k, 130k, 135k.
   109   110   111   112   113   114   115   116   117   118   119