Page 124 - Understanding Machine Learning
P. 124
Boosting
106
“strong” classifier that is based on a weighted sum of all the weak hypotheses. The
pseudocode of AdaBoost is presented in the following.
AdaBoost
input:
training set S = (x 1 , y 1 ),...,(x m , y m )
weak learner WL
number of rounds T
1
1
initialize D (1) = ( ,..., ).
m m
for t = 1,...,T :
(t)
invoke weak learner h t = WL(D , S)
m (t)
compute
t = D i 1 [y i =h t (x i )]
i=1
1
let w t = log 1 − 1
2
t
(t)
(t+1) D exp(−w t y i h t (x i ))
update D i for all i = 1,...,m
i = m (t)
j=1 D j exp(−w t y j h t (x j ))
T
output the hypothesis h s (x) = sign w t h t (x) .
t=1
The following theorem shows that the training error of the output hypothesis
decreases exponentially fast with the number of boosting rounds.
Theorem 10.2. Let S be a training set and assume that at each iteration of AdaBoost,
the weak learner returns a hypothesis for which
t ≤ 1/2 − γ . Then, the training error
of the output hypothesis of AdaBoost is at most
m
1 2
L S (h s ) = 1 [h s (x i ) =y i ] ≤ exp( − 2γ T).
m
i=1
Proof. For each t, denote f t = w p h p . Therefore, the output of AdaBoost is
p≤t
f T . In addition, denote
m
1 −y i f t (x i )
Z t = e .
m
i=1
Note that for any hypothesis we have that 1 [h(x) =y] ≤ e −yh(x) . Therefore, L S ( f T ) ≤
2
Z T , so it suffices to show that Z T ≤ e −2γ T . To upper bound Z T we rewrite it as
Z T Z T Z T−1 Z 2 Z 1
Z T = = · ··· · , (10.2)
Z 0 Z T−1 Z T−2 Z 1 Z 0
whereweused thefact that Z 0 = 1 because f 0 ≡ 0. Therefore, it suffices to show that
for every round t,
Z t+1 −2γ 2
≤ e . (10.3)
Z t
To do so, we first note that using a simple inductive argument, for all t and i,
e −y i f t (x i )
(t+1)
D .
i = m −y j f t (x j )
j=1 e