Page 290 - Understanding Machine Learning
P. 290

Clustering
           272

                 Of course, this is not what we want to achieve in clustering, as clusters should be
                 reasonably large groups of points.
                    Several solutions to this problem have been suggested. The simplest solution is
                 to normalize the cut and define the normalized mincut objective as follows:
                                                        k
                                                           1

                                  RatioCut(C 1 ,...,C k ) =          W r,s .
                                                          |C i |
                                                       i=1   r∈C i ,s /∈C i
                 The preceding objective assumes smaller values if the clusters are not too small.
                 Unfortunately, introducing this balancing makes the problem computationally hard
                 to solve. Spectral clustering is a way to relax the problem of minimizing RatioCut.


                 22.3.2 Graph Laplacian and Relaxed Graph Cuts
                 The main mathematical object for spectral clustering is the graph Laplacian matrix.
                 There are several different definitions of graph Laplacian in the literature, and in
                 the following we describe one particular definition.
                 Definition 22.2 (Unnormalized Graph Laplacian). The unnormalized graph Lapla-
                 cian is the m × m matrix L = D − W where D is a diagonal matrix with
                         m

                  D i,i =   W i, j . The matrix D is called the degree matrix.
                         j=1
                    The following lemma underscores the relation between RatioCut and the
                 Laplacian matrix.
                 Lemma 22.3. Let C 1 ,...,C k be a clustering and let H ∈ R m,k  be the matrix such that
                                                     1
                                             H i, j = √  1 [i∈C j ] .
                                                     |C j |
                 Then, the columns of H are orthonormal to each other and
                                     RatioCut(C 1 ,...,C k ) = trace(H LH).

                 Proof. Let h 1 ,...,h k be the columns of H. The fact that these vectors are orthonor-
                 mal is immediate from the definition. Next, by standard algebraic manipulations, it
                                                   k

                 can be shown that trace(H LH) =     h Lh i and that for any vector v we have
                                                   i=1 i

                             1                                       1
                      
                2                        2                   2
                     v Lv =        D r,r v − 2  v r v s W r,s +  D s,s v s  =  W r,s (v r − v s ) .
                                       r
                             2                                       2
                                 r          r,s          s             r,s
                                                              2
                 Applying this with v = h i and noting that (h i,r −h i,s ) is nonzero only if r ∈ C i ,s /∈ C i
                 or the other way around, we obtain that
                                                   1

                                         h Lh i =            W r,s .
                                          i
                                                  |C i |
                                                      r∈C i ,s /∈C i
                    Therefore, to minimize RatioCut we can search for a matrix H whose columns

                 are orthonormal and such that each H i, j is either 0 or 1/ |C j |. Unfortunately, this
                 is an integer programming problem which we cannot solve efficiently. Instead, we
   285   286   287   288   289   290   291   292   293   294   295