Page 351 - Understanding Machine Learning

P. 351

26.3 Generalization Bounds for SVM 333

Finally, since the variables σ 1 ,...,σ m are independent we have
 

2
E σ i x i 2 = E  σ i σ j x i ,x j 
σ σ
i=1 i, j
m

2
= x i ,x j E[σ i σ j ] + x i ,x i E σ i
σ σ
i = j i=1
m

2
2
= x i ≤ m max x i .
2
2
i
i=1
Combining this with Equation (26.15) and Equation (26.16) we conclude our proof.
Next we bound the Rademacher complexity of H 1 ◦ S.
n
Lemma 26.11. Let S = (x 1 ,...,x m ) be vectors in R . Then,

2log(2n)
.
R(H 1 ◦ S) ≤ max x i ∞
i m
Proof. Using Holder’s inequality we know that for any vectors w,v we have w,v ≤
w 1 v ∞ . Therefore,

mR(H 1 ◦ S) = E sup σ i a i
σ
a∈H 1 ◦S
i=1

= E sup σ i w,x i
σ
w: w 1 ≤1
i=1

= E sup w, σ i x i
σ
w: w 1 ≤1
i=1

≤ E σ i x i ∞ . (26.17)
σ
i=1
√
m
For each j ∈ [n], let v j = (x 1, j ,...,x m, j ) ∈ R . Note that v j 2 ≤ m max i x i ∞ .Let
V ={v 1 ,...,v n ,−v 1 ,...,−v n }. The right-hand side of Equation (26.17)is mR(V ).
Using Massart lemma (Lemma 26.8) we have that

2log(2n)/m,
R(V ) ≤ max x i ∞
i
which concludes our proof.
26.3 GENERALIZATION BOUNDS FOR SVM
In this section we use Rademacher complexity to derive generalization bounds for
generalized linear predictors with Euclidean norm constraint. We will show how this
leads to generalization bounds for hard-SVM and soft-SVM.

346 347 348 349 350 351 352 353 354 355 356