Page 455 - Deep Learning

P. 455

438 Notes to Pages 257–260

4. The learning curve (or the practice curve) is constructed by plotting performance
on a task (e.g., time to task completion) as a function of the amount of practice
(e.g., number of practice problems completed, total time in the relevant task envi-
ronment). The first researchers to display data on the acquisition of a complex
skill in this way were probably Edward L. Thorndike (1898) in his study of animals
learning to escape from problem boxes, and Bryan and Harter (1897, 1899) in an
influential study of telegraph operators. (Ebbinghaus, 1885/1964, had displayed data
from the memorization of lists of syllables in this way 30 years earlier.) Learning
curves are typically jagged when plotted for a single learner but become regular
when plotted in terms of averages across learners. The shape of the learning curve
emerged early as a research issue. The results of Bryan and Harter (1897, 1899)
made researchers search for plateaus, periods during which the learner appears
to make no improvements but which are followed by periods of rapid improve-
ment. This supposed phenomenon invited the interpretation that the learner was
revising his skill internally, but that the advantages of the revisions could not be
realized in performance until they were complete. Intriguing as this hypothesis is,
subsequent research did not support the existence of plateaus (Keller, 1958). Most
researchers have concluded that learning/practice curves exhibit uniform nega-
tive acceleration (“Almost any learning curve shows … a negative acceleration; it
flattens out as practice advances; the rate of improvement decreases”; Woodworth,
1938, p. 164), but some are more reluctant to let go of the plateau than others (see,
e.g., Stadler, Vetter, Haynes & Kruse, 1996). Interestingly, the learning curve is
also a topic of research in economics where it was discovered that entire factories
exhibit such curves when unit cost – a measure of performance – is plotted as
a function of the length of the production run – a measure of amount of prac-
tice. Economists are more prepared than psychologists to accept different types
of equations as accurate descriptions of the shapes of empirical learning curves,
and they are not as convinced as psychologists that plateaus in the sense of Bryan
and Harter are a pseudo-phenomenon (although it should be noted that the word
“plateau” is often used in economics to refer to the final leveling off of a learning
curve, which is properly called its asymptote). The difference between psycholo-
gists and economists on this point is likely due to the fact that the former typically
study learning in one-hour experiments and their learning curves hence describe
short-term effects in single individuals, while the latter study improvement pro-
cesses in economic organizations that last for months and years. Obviously, both
types of effects have to asymptote, but the appearance of intermediate plateaus is
more likely in temporally extended cases.
5. The search for a mathematical equation for the learning curve began in the
first decades of the 20th century (Woodworth, 1938, pp. 170–173). Snoddy
(1926) may have been the first person to report that learning curves fit power
law equations, but others followed (Stevens & Savin, 1962). The modern for-
mulation of the power law interpretation of the learning curve and a review of
some supporting evidence is available in Lane (1987), Newell and Rosenbloom
(1981) and Delaney, Reder, Staszewski and Ritter (1998). Although most data
sets fit power law equations at least marginally better than other types of equa-
tions, there is a long-standing debate whether some other type of equation is

450 451 452 453 454 455 456 457 458 459 460