Page 348 - Understanding Machine Learning

P. 348

Rademacher Complexities
330
m N ( j) ( j)

Lemma 26.7. Let A be a subset of R and let A ={ j=1 α j a : N ∈ N,∀ j,a ∈

A,α j ≥ 0, α 1 = 1}. Then, R(A ) = R(A).
Proof. The main idea follows from the fact that for any vector v we have
N

sup α j v j = maxv j .
α≥0: α 1 =1 j=1 j
Therefore,

m N
( j)

mR(A ) = E sup sup σ i α j a i
σ
α≥0: α 1 =1 a (1) ,...,a (N) i=1 j=1
N m
( j)
= E sup α j sup σ i a i
σ
α≥0: α 1 =1 a ( j)
j=1 i=1
m

= Esup σ i a i
σ
a∈A
i=1
= mR(A),
and we conclude our proof.
The next lemma, due to Massart, states that the Rademacher complexity of a
ﬁnite set grows logarithmically with the size of the set.
m
Lemma 26.8 (Massart Lemma). Let A ={a 1 ,...,a N } be a ﬁnite set of vectors in R .
1 N
a
Deﬁne ¯ a = i=1 i . Then,
N

2log(N)
R(A) ≤ max a − ¯ a .
a∈A m
Proof. On the basis of Lemma 26.6, we can assume without loss of generality that

¯ a = 0.Let λ> 0and let A ={λa 1 ,...,λa N }. We upper bound the Rademacher
complexity as follows:

σ,a
mR(A ) = E max σ,a = E log maxe
σ a∈A σ a∈A

≤ E log e σ,a
σ
a∈A

≤ log E e σ,a // Jensen’s inequality
σ
a∈A

= log E[e σ i a i ] ,
σ i
a∈A i=1

where the last equality occurs because the Rademacher variables are independent.
Next, using Lemma A.6 we have that for all a i ∈ R,
exp(a i ) + exp( − a i ) 2
Ee σ i a i = ≤ exp(a /2),
i
σ i 2

343 344 345 346 347 348 349 350 351 352 353