Page 290 - Understanding Machine Learning
P. 290
Clustering
272
Of course, this is not what we want to achieve in clustering, as clusters should be
reasonably large groups of points.
Several solutions to this problem have been suggested. The simplest solution is
to normalize the cut and define the normalized mincut objective as follows:
k
1
RatioCut(C 1 ,...,C k ) = W r,s .
|C i |
i=1 r∈C i ,s /∈C i
The preceding objective assumes smaller values if the clusters are not too small.
Unfortunately, introducing this balancing makes the problem computationally hard
to solve. Spectral clustering is a way to relax the problem of minimizing RatioCut.
22.3.2 Graph Laplacian and Relaxed Graph Cuts
The main mathematical object for spectral clustering is the graph Laplacian matrix.
There are several different definitions of graph Laplacian in the literature, and in
the following we describe one particular definition.
Definition 22.2 (Unnormalized Graph Laplacian). The unnormalized graph Lapla-
cian is the m × m matrix L = D − W where D is a diagonal matrix with
m
D i,i = W i, j . The matrix D is called the degree matrix.
j=1
The following lemma underscores the relation between RatioCut and the
Laplacian matrix.
Lemma 22.3. Let C 1 ,...,C k be a clustering and let H ∈ R m,k be the matrix such that
1
H i, j = √ 1 [i∈C j ] .
|C j |
Then, the columns of H are orthonormal to each other and
RatioCut(C 1 ,...,C k ) = trace(H LH).
Proof. Let h 1 ,...,h k be the columns of H. The fact that these vectors are orthonor-
mal is immediate from the definition. Next, by standard algebraic manipulations, it
k
can be shown that trace(H LH) = h Lh i and that for any vector v we have
i=1 i
1 1
2 2 2
v Lv = D r,r v − 2 v r v s W r,s + D s,s v s = W r,s (v r − v s ) .
r
2 2
r r,s s r,s
2
Applying this with v = h i and noting that (h i,r −h i,s ) is nonzero only if r ∈ C i ,s /∈ C i
or the other way around, we obtain that
1
h Lh i = W r,s .
i
|C i |
r∈C i ,s /∈C i
Therefore, to minimize RatioCut we can search for a matrix H whose columns
are orthonormal and such that each H i, j is either 0 or 1/ |C j |. Unfortunately, this
is an integer programming problem which we cannot solve efficiently. Instead, we