Page 444 - Deep Learning
P. 444
Notes to Pages 173–175 427
for which the average time to completion on the first training trial is 10 minutes,
or 600 seconds. In this case, if the learning rate parameter a is equal to .25, then
it takes approximately 15 training trials to cut the time to task completion in half,
to 300 seconds. If the learning rate parameter is instead .75, it only takes 3 train-
ing trials to cut the time in half. There are few other task variables that will yield
behavioral effects of this magnitude.
10. See Campitelli and Gobet (2005) and Holding (1985, Chap. 3) for cognitive
research on chess in general and blindfolded chess in particular. See, e.g., Chase
and Ericsson (1981) for a particularly striking case study of a person whose per-
formance improved 10-fold with extended practice on a task that is often consid-
ered as measuring a stable characteristic of a person’s cognition.
11. The learning curve (or practice curve) is constructed by plotting performance
on a task (e.g., time to task completion) as a function of the amount of prac-
tice (e.g., number of practice problems completed, total time in the relevant
task environment, etc.). The first researchers to display data on the acquisition
of a complex skill in this way were probably Edward L. Thorndike (1898) in his
study of animals learning how to escape from problem boxes, and Bryan and
Harter (1887, 1899) in an influential study of telegraph operators. Ebbinghaus
(1885/1964) had displayed data from the memorization of lists of syllables in
this way 30 years earlier. Learning curves are typically jagged when plotted for
a single learner but become regular when plotted in terms of averages across
learners. The shape of the curve emerged early as a research issue. The work
by Bryan and Harter (1887, 1899) made researchers search for plateaus, periods
during which the learner appears to make no improvements but which are fol-
lowed by periods of rapid improvement. This supposed phenomenon invited
the interpretation that the learner was revising his skill internally, and that the
advantages of the revision could not be realized in performance until it was com-
plete. Intriguing as this hypothesis is, subsequent research did not support the
existence of plateaus (Keller, 1958). Most researchers have concluded that learn-
ing/practice curves exhibit uniform negative acceleration (“Almost any learning
curve shows … a negative acceleration; it flattens out as practice advances; the
rate of improvement decreases”; Woodworth, 1938, p. 164), but some are more
reluctant to let go of the plateau than others (see, e.g., Stadler, Vetter, Haynes &
Kruse, 1996). Interestingly, the learning curve is also a topic of research in eco-
nomics. Economists discovered that entire factories exhibit such curves when
unit cost (a measure of economic performance) is plotted as a function of the
length of the production run (a measure of amount of practice). Economists are
more prepared to accept different shapes of empirical learning curves (see Note 5,
Chapter 8), and they are not as convinced as psychologists that plateaus in the
sense of Bryan and Harter are a pseudo-phenomenon (although it should be
noted that the word “plateau” is often used in economic contexts to refer to the
final leveling off of a learning curve, which is properly called the asymptote). The
difference between psychologists and economists on this point is likely due to the
fact that the former typically study learning in one-hour experiments and their
learning curves hence describe short-term effects, while the latter study improve-
ment processes in economic organizations that last for months and years and that