Page 349 - Understanding Machine Learning

P. 349

26.1 The Rademacher Complexity 331

and therefore

m 2
a i 2

mR(A ) ≤ log exp = log exp a /2
2
a∈A i=1 a∈A

2
2

≤ log |A |maxexp a /2 = log(|A |) + max( a /2).

a∈A a∈A
1
Since R(A) = R(A ) we obtain from the equation that

λ
2
2
log(|A|) + λ max a∈A ( a /2)
R(A) ≤ .
λm

2
Setting λ = 2log(|A|)/max a∈A a and rearranging terms we conclude our
proof.
The following lemma shows that composing A with a Lipschitz function does not
blow up the Rademacher complexity. The proof is due to Kakade and Tewari.
Lemma 26.9 (Contraction Lemma). For each i ∈ [m],let φ i : R → R be a ρ-Lipschitz
m
function; namely, for all α,β ∈ R we have |φ i (α) − φ i (β)|≤ ρ |α − β|.For a ∈ R let
φ(a) denote the vector (φ 1 (a 1 ),...,φ m (y m )).Let φ ◦ A ={φ(a): a ∈ A}. Then,
R(φ ◦ A) ≤ ρ R(A).
Proof. For simplicity, we prove the lemma for the case ρ = 1. The case ρ =
1
1 will follow by deﬁning φ = φ and then using Lemma 26.6.Let A i =
ρ
{(a 1 ,...,a i−1 ,φ i (a i ),a i+1 ,...,a m ): a ∈ A}. Clearly, it sufﬁces to prove that for any
set A and all i we have R(A i ) ≤ R(A). Without loss of generality we will prove the
latter claim for i = 1 and to simplify notation we omit the subscript from φ 1 . We have
m

mR(A 1 ) = E sup σ i a i
σ
a∈A 1 i=1

= E supσ 1 φ(a 1 ) + σ i a i
σ
a∈A
i=2
m m
1
= E sup φ(a 1 ) + σ i a i + sup −φ(a 1 ) + σ i a i
2 σ 2 ,...,σ m a∈A a∈A
i=2 i=2
m m
1

= E sup φ(a 1 ) − φ(a ) + σ i a i + σ i a i
1
2 σ 2 ,...,σ m a,a ∈A

i=2 i=2
m m
1

≤ E sup |a 1 − a |+ σ i a i + σ i a i , (26.12)
1
2 σ 2 ,...,σ m
a,a ∈A i=2 i=2
where in the last inequality we used the assumption that φ is Lipschitz. Next, we
note that the absolute value on |a 1 − a | in the preceding expression can be omitted

since both a and a are from the same set A and the rest of the expression in the

344 345 346 347 348 349 350 351 352 353 354