Page 350 - Understanding Machine Learning

P. 350

Rademacher Complexities
332

supremum is not affected by replacing a and a . Therefore,
m m
1

mR(A 1 ) ≤ E sup a 1 − a + σ i a i + σ i a i . (26.13)
1
2 σ 2 ,...,σ m a,a ∈A

i=2 i=2
But, using the same equalities as in Equation (26.12), it is easy to see that the right-
hand side of Equation (26.13) exactly equals mR(A), which concludes our proof.

26.2 RADEMACHER COMPLEXITY OF LINEAR CLASSES

In this section we analyze the Rademacher complexity of linear classes. To simplify
the derivation we ﬁrst deﬁne the following two classes:

H 1 ={x → w,x : w 1 ≤ 1}, H 2 ={x → w,x : w 2 ≤ 1}. (26.14)
The following lemma bounds the Rademacher complexity of H 2 . We allow the
x i to be vectors in any Hilbert space (even inﬁnite dimensional), and the bound
does not depend on the dimensionality of the Hilbert space. This property becomes
useful when analyzing kernel methods.

Lemma 26.10. Let S = (x 1 ,...,x m ) be vectors in a Hilbert space. Deﬁne: H 2 ◦ S =
{( w,x 1 ,..., w,x m ): w 2 ≤ 1}. Then,

max i x i 2
R(H 2 ◦ S) ≤ √ .
m
Proof. Using Cauchy-Schwartz inequality we know that for any vectors w,v we
have w,v ≤ w v . Therefore,

mR(H 2 ◦ S) = E sup σ i a i
σ
a∈H 2 ◦S
i=1

= E sup σ i w,x i
σ
w: w ≤1
i=1

= E sup w, σ i x i
σ
w: w ≤1
i=1

≤ E σ i x i 2 . (26.15)
σ
i=1
Next, using Jensen’s inequality we have that
 
  1/2    1/2
0 0 0 0 2 0 0 2
0 m 0 0 m 0 0 m 0

0 0  0 0  0 0
   ≤    . (26.16)
E 0 σ i x i0 = E 0 σ i x i0 E 0 σ i x i0
σ
σ 0 0 0 0 σ 0 0
i=1 2 i=1 2 i=1 2

345 346 347 348 349 350 351 352 353 354 355