Page 272 - Understanding Machine Learning
P. 272
Online Learning
254
and obtain
Z t+1 (t) 2 2
log ≤− w i ηv t,i − η v /2
t,i
Z t
i
(t) 2
2
(t)
=−η w ,v t + η w v /2
i t,i
i
(t)
2
≤−η w ,v t + η /2.
Summing this inequality over t we get
T T 2
Z t+1 T η
(t)
log(Z T+1 ) − log(Z 1 ) = log ≤−η w ,v t + . (21.3)
Z t 2
t=1 t=1
(T +1) −η v
Next, we lower bound Z T +1 . For each i,we can rewrite ˜w = e t t,i and we
i
get that
−η v −η v
log Z T +1 = log e t t,i ≥ log maxe t t,i =−ηmin v t,i .
i i
i t
Combining the preceding with Equation (21.3) and using the fact that log(Z 1 ) =
log(d) we get that
T 2
T η
(t)
−ηmin v t,i − log(d) ≤− η w ,v t + ,
i 2
t t=1
which can be rearranged as follows:
T
(t) log(d) η T
w ,v t − min v t,i ≤ + .
i η 2
t=1 t
Plugging the value of η into the equation concludes our proof.
Proof of Theorem 21.10
Equipped with the Weighted-Majority algorithm and Theorem 21.11, we are ready to
prove Theorem 21.10. We start with the simpler case, in which H is a finite class,
and let us write H ={h 1 ,...,h d }. In this case, we can refer to each hypothesis, h i ,
as an expert, whose advice is to predict h i (x t ), and whose cost is v t,i =|h i (x t ) − y t |.
(t)
The prediction of the algorithm will therefore be p t = w h i (x t ) ∈ [0,1], and the
i i
loss is
d
d
(t)
(t)
i i
|p t − y t |=
w h i (x t ) − y t
=
w (h i (x t ) − y t )
.
i=1 i=1
Now, if y t = 1, then for all i, h i (x t ) − y t ≤ 0. Therefore, the above equals to
(t)
w |h i (x t ) − y t |.If y t = 0 then for all i, h i (x t ) − y t ≥ 0, and the above also equals
i i