Page 290 - Understanding Machine Learning

P. 290

Clustering
272

Of course, this is not what we want to achieve in clustering, as clusters should be
reasonably large groups of points.
Several solutions to this problem have been suggested. The simplest solution is
to normalize the cut and deﬁne the normalized mincut objective as follows:
k
1

RatioCut(C 1 ,...,C k ) = W r,s .
|C i |
i=1 r∈C i ,s /∈C i
The preceding objective assumes smaller values if the clusters are not too small.
Unfortunately, introducing this balancing makes the problem computationally hard
to solve. Spectral clustering is a way to relax the problem of minimizing RatioCut.

22.3.2 Graph Laplacian and Relaxed Graph Cuts
The main mathematical object for spectral clustering is the graph Laplacian matrix.
There are several different deﬁnitions of graph Laplacian in the literature, and in
the following we describe one particular deﬁnition.
Deﬁnition 22.2 (Unnormalized Graph Laplacian). The unnormalized graph Lapla-
cian is the m × m matrix L = D − W where D is a diagonal matrix with
m

D i,i = W i, j . The matrix D is called the degree matrix.
j=1
The following lemma underscores the relation between RatioCut and the
Laplacian matrix.
Lemma 22.3. Let C 1 ,...,C k be a clustering and let H ∈ R m,k be the matrix such that
1
H i, j = √ 1 [i∈C j ] .
|C j |
Then, the columns of H are orthonormal to each other and
RatioCut(C 1 ,...,C k ) = trace(H LH).

Proof. Let h 1 ,...,h k be the columns of H. The fact that these vectors are orthonor-
mal is immediate from the deﬁnition. Next, by standard algebraic manipulations, it
k

can be shown that trace(H LH) = h Lh i and that for any vector v we have
i=1 i

1 1

2 2 2
v Lv = D r,r v − 2 v r v s W r,s + D s,s v s = W r,s (v r − v s ) .
r
2 2
r r,s s r,s
2
Applying this with v = h i and noting that (h i,r −h i,s ) is nonzero only if r ∈ C i ,s /∈ C i
or the other way around, we obtain that
1

h Lh i = W r,s .
i
|C i |
r∈C i ,s /∈C i
Therefore, to minimize RatioCut we can search for a matrix H whose columns

are orthonormal and such that each H i, j is either 0 or 1/ |C j |. Unfortunately, this
is an integer programming problem which we cannot solve efﬁciently. Instead, we

285 286 287 288 289 290 291 292 293 294 295