Page 350 - Understanding Machine Learning
P. 350
Rademacher Complexities
332
supremum is not affected by replacing a and a . Therefore,
m m
1
mR(A 1 ) ≤ E sup a 1 − a + σ i a i + σ i a i . (26.13)
1
2 σ 2 ,...,σ m a,a ∈A
i=2 i=2
But, using the same equalities as in Equation (26.12), it is easy to see that the right-
hand side of Equation (26.13) exactly equals mR(A), which concludes our proof.
26.2 RADEMACHER COMPLEXITY OF LINEAR CLASSES
In this section we analyze the Rademacher complexity of linear classes. To simplify
the derivation we first define the following two classes:
H 1 ={x → w,x : w 1 ≤ 1}, H 2 ={x → w,x : w 2 ≤ 1}. (26.14)
The following lemma bounds the Rademacher complexity of H 2 . We allow the
x i to be vectors in any Hilbert space (even infinite dimensional), and the bound
does not depend on the dimensionality of the Hilbert space. This property becomes
useful when analyzing kernel methods.
Lemma 26.10. Let S = (x 1 ,...,x m ) be vectors in a Hilbert space. Define: H 2 ◦ S =
{( w,x 1 ,..., w,x m ): w 2 ≤ 1}. Then,
max i x i 2
R(H 2 ◦ S) ≤ √ .
m
Proof. Using Cauchy-Schwartz inequality we know that for any vectors w,v we
have w,v ≤ w v . Therefore,
m
mR(H 2 ◦ S) = E sup σ i a i
σ
a∈H 2 ◦S
i=1
m
= E sup σ i w,x i
σ
w: w ≤1
i=1
m
= E sup w, σ i x i
σ
w: w ≤1
i=1
m
≤ E σ i x i 2 . (26.15)
σ
i=1
Next, using Jensen’s inequality we have that
1/2 1/2
0 0 0 0 2 0 0 2
0 m 0 0 m 0 0 m 0
0 0 0 0 0 0
≤ . (26.16)
E 0 σ i x i0 = E 0 σ i x i0 E 0 σ i x i0
σ
σ 0 0 0 0 σ 0 0
i=1 2 i=1 2 i=1 2