Page 276 - Understanding Machine Learning
P. 276
Online Learning
258
Theorem 21.15. The Online Gradient Descent algorithm enjoys the following regret
bound for every w ∈ H,
2 T
w η 2
Regret (w ,T ) ≤ + v t .
A
2η 2
t=1
√
If we further assume that f t is ρ-Lipschitz for all t, then setting η = 1/ T yields
1 2 2 √
Regret (w ,T ) ≤ ( w + ρ ) T .
A
2
B
If we further assume that H is B-bounded and we set η = √ then
ρ T
√
Regret (H,T ) ≤ B ρ T .
A
Proof. The analysis is similar to the analysis of Stochastic Gradient Descent with
1
projections. Using the projection lemma, the definition of w (t+ )
2 , and the definition
of subgradients, we have that for every t,
2
2
w (t+1) − w −⊂w (t) − w
1
1
2
2
2 − w + w
2 − w −⊂w
= w (t+1) − w −⊂w (t+ ) 2 (t+ ) 2 (t) − w
1
2
2 − w −⊂w
≤√w (t+ ) 2 (t) − w
2
2
= w (t) − ηv t − w −⊂w (t) − w
2
=−2η w (t) − w ,v t + η v t 2
2
(t)
2
≤−2η( f t (w ) − f t (w )) + η v t .
Summing over t and observing that the left-hand side is a telescopic sum we
obtain that
T T
(t) 2 2
2
(1)
2
(T +1)
w − w −⊂w − w ≤−2η ( f t (w ) − f t (w )) + η v t .
t=1 t=1
Rearranging the inequality and using the fact that w (1) = 0, we get that
T (1) 2 (T +1) 2 T
w − w −⊂w − w η
(t) 2
( f t (w ) − f t (w )) ≤ + v t
2η 2
t=1 t=1
2 T
w η 2
≤ + v t .
2η 2
t=1
This proves the first bound in the theorem. The second bound follows from the
assumption that f t is ρ-Lipschitz, which implies that v t ≤ ρ.
21.4 THE ONLINE PERCEPTRON ALGORITHM
The Perceptron is a classic online learning algorithm for binary classification with
the hypothesis class of homogenous halfspaces, namely, H ={x → sign( w,x ):