Page 351 - Understanding Machine Learning
P. 351
26.3 Generalization Bounds for SVM 333
Finally, since the variables σ 1 ,...,σ m are independent we have
m
2
E σ i x i 2 = E σ i σ j x i ,x j
σ σ
i=1 i, j
m
2
= x i ,x j E[σ i σ j ] + x i ,x i E σ i
σ σ
i = j i=1
m
2
2
= x i ≤ m max x i .
2
2
i
i=1
Combining this with Equation (26.15) and Equation (26.16) we conclude our proof.
Next we bound the Rademacher complexity of H 1 ◦ S.
n
Lemma 26.11. Let S = (x 1 ,...,x m ) be vectors in R . Then,
2log(2n)
.
R(H 1 ◦ S) ≤ max x i ∞
i m
Proof. Using Holder’s inequality we know that for any vectors w,v we have w,v ≤
w 1 v ∞ . Therefore,
m
mR(H 1 ◦ S) = E sup σ i a i
σ
a∈H 1 ◦S
i=1
m
= E sup σ i w,x i
σ
w: w 1 ≤1
i=1
m
= E sup w, σ i x i
σ
w: w 1 ≤1
i=1
m
≤ E σ i x i ∞ . (26.17)
σ
i=1
√
m
For each j ∈ [n], let v j = (x 1, j ,...,x m, j ) ∈ R . Note that v j 2 ≤ m max i x i ∞ .Let
V ={v 1 ,...,v n ,−v 1 ,...,−v n }. The right-hand side of Equation (26.17)is mR(V ).
Using Massart lemma (Lemma 26.8) we have that
2log(2n)/m,
R(V ) ≤ max x i ∞
i
which concludes our proof.
26.3 GENERALIZATION BOUNDS FOR SVM
In this section we use Rademacher complexity to derive generalization bounds for
generalized linear predictors with Euclidean norm constraint. We will show how this
leads to generalization bounds for hard-SVM and soft-SVM.