Page 349 - Understanding Machine Learning
P. 349
26.1 The Rademacher Complexity 331
and therefore
m 2
a i 2
mR(A ) ≤ log exp = log exp a /2
2
a∈A i=1 a∈A
2
2
≤ log |A |maxexp a /2 = log(|A |) + max( a /2).
a∈A a∈A
1
Since R(A) = R(A ) we obtain from the equation that
λ
2
2
log(|A|) + λ max a∈A ( a /2)
R(A) ≤ .
λm
2
Setting λ = 2log(|A|)/max a∈A a and rearranging terms we conclude our
proof.
The following lemma shows that composing A with a Lipschitz function does not
blow up the Rademacher complexity. The proof is due to Kakade and Tewari.
Lemma 26.9 (Contraction Lemma). For each i ∈ [m],let φ i : R → R be a ρ-Lipschitz
m
function; namely, for all α,β ∈ R we have |φ i (α) − φ i (β)|≤ ρ |α − β|.For a ∈ R let
φ(a) denote the vector (φ 1 (a 1 ),...,φ m (y m )).Let φ ◦ A ={φ(a): a ∈ A}. Then,
R(φ ◦ A) ≤ ρ R(A).
Proof. For simplicity, we prove the lemma for the case ρ = 1. The case ρ =
1
1 will follow by defining φ = φ and then using Lemma 26.6.Let A i =
ρ
{(a 1 ,...,a i−1 ,φ i (a i ),a i+1 ,...,a m ): a ∈ A}. Clearly, it suffices to prove that for any
set A and all i we have R(A i ) ≤ R(A). Without loss of generality we will prove the
latter claim for i = 1 and to simplify notation we omit the subscript from φ 1 . We have
m
mR(A 1 ) = E sup σ i a i
σ
a∈A 1 i=1
m
= E supσ 1 φ(a 1 ) + σ i a i
σ
a∈A
i=2
m m
1
= E sup φ(a 1 ) + σ i a i + sup −φ(a 1 ) + σ i a i
2 σ 2 ,...,σ m a∈A a∈A
i=2 i=2
m m
1
= E sup φ(a 1 ) − φ(a ) + σ i a i + σ i a i
1
2 σ 2 ,...,σ m a,a ∈A
i=2 i=2
m m
1
≤ E sup |a 1 − a |+ σ i a i + σ i a i , (26.12)
1
2 σ 2 ,...,σ m
a,a ∈A i=2 i=2
where in the last inequality we used the assumption that φ is Lipschitz. Next, we
note that the absolute value on |a 1 − a | in the preceding expression can be omitted
1
since both a and a are from the same set A and the rest of the expression in the