Page 348 - Understanding Machine Learning
P. 348
Rademacher Complexities
330
m N ( j) ( j)
Lemma 26.7. Let A be a subset of R and let A ={ j=1 α j a : N ∈ N,∀ j,a ∈
A,α j ≥ 0, α 1 = 1}. Then, R(A ) = R(A).
Proof. The main idea follows from the fact that for any vector v we have
N
sup α j v j = maxv j .
α≥0: α 1 =1 j=1 j
Therefore,
m N
( j)
mR(A ) = E sup sup σ i α j a i
σ
α≥0: α 1 =1 a (1) ,...,a (N) i=1 j=1
N m
( j)
= E sup α j sup σ i a i
σ
α≥0: α 1 =1 a ( j)
j=1 i=1
m
= Esup σ i a i
σ
a∈A
i=1
= mR(A),
and we conclude our proof.
The next lemma, due to Massart, states that the Rademacher complexity of a
finite set grows logarithmically with the size of the set.
m
Lemma 26.8 (Massart Lemma). Let A ={a 1 ,...,a N } be a finite set of vectors in R .
1 N
a
Define ¯ a = i=1 i . Then,
N
2log(N)
R(A) ≤ max a − ¯ a .
a∈A m
Proof. On the basis of Lemma 26.6, we can assume without loss of generality that
¯ a = 0.Let λ> 0and let A ={λa 1 ,...,λa N }. We upper bound the Rademacher
complexity as follows:
σ,a
mR(A ) = E max σ,a = E log maxe
σ a∈A σ a∈A
≤ E log e σ,a
σ
a∈A
≤ log E e σ,a // Jensen’s inequality
σ
a∈A
m
= log E[e σ i a i ] ,
σ i
a∈A i=1
where the last equality occurs because the Rademacher variables are independent.
Next, using Lemma A.6 we have that for all a i ∈ R,
exp(a i ) + exp( − a i ) 2
Ee σ i a i = ≤ exp(a /2),
i
σ i 2