Page 455 - Deep Learning
P. 455

438                    Notes to Pages 257–260

              4.  The learning curve (or the practice curve) is constructed by plotting performance
               on a task (e.g., time to task completion) as a function of the amount of practice
               (e.g., number of practice problems completed, total time in the relevant task envi-
               ronment). The first researchers to display data on the acquisition of a complex
               skill in this way were probably Edward L. Thorndike (1898) in his study of animals
               learning to escape from problem boxes, and Bryan and Harter (1897, 1899) in an
               influential study of telegraph operators. (Ebbinghaus, 1885/1964, had displayed data
               from the memorization of lists of syllables in this way 30 years earlier.) Learning
               curves are typically jagged when plotted for a single learner but become regular
               when plotted in terms of averages across learners. The shape of the learning curve
               emerged early as a research issue. The results of Bryan and Harter (1897, 1899)
               made researchers search for plateaus, periods during which the learner appears
               to make no improvements but which are followed by periods of rapid improve-
               ment. This supposed phenomenon invited the interpretation that the learner was
               revising his skill internally, but that the advantages of the revisions could not be
               realized in performance until they were complete. Intriguing as this hypothesis is,
               subsequent research did not support the existence of plateaus (Keller, 1958). Most
               researchers have concluded that learning/practice curves exhibit uniform nega-
               tive acceleration (“Almost any learning curve shows … a negative acceleration; it
               flattens out as practice advances; the rate of improvement decreases”; Woodworth,
               1938, p. 164), but some are more reluctant to let go of the plateau than others (see,
               e.g., Stadler, Vetter, Haynes & Kruse, 1996). Interestingly, the learning curve is
               also a topic of research in economics where it was discovered that entire factories
               exhibit such curves when unit cost – a measure of performance – is plotted as
               a function of the length of the production run – a measure of amount of prac-
               tice. Economists are more prepared than psychologists to accept different types
               of equations as accurate descriptions of the shapes of empirical learning curves,
               and they are not as convinced as psychologists that plateaus in the sense of Bryan
               and Harter are a pseudo-phenomenon (although it should be noted that the word
               “plateau” is often used in economics to refer to the final leveling off of a learning
               curve, which is properly called its asymptote). The difference between psycholo-
               gists and economists on this point is likely due to the fact that the former typically
               study learning in one-hour experiments and their learning curves hence describe
               short-term effects in single individuals, while the latter study improvement pro-
               cesses in economic organizations that last for months and years. Obviously, both
               types of effects have to asymptote, but the appearance of intermediate plateaus is
               more likely in temporally extended cases.
              5.  The  search  for  a  mathematical  equation  for  the  learning  curve  began  in  the
               first  decades  of  the  20th  century  (Woodworth,  1938,  pp.  170–173).  Snoddy
               (1926) may have been the first person to report that learning curves fit power
               law equations, but others followed (Stevens & Savin, 1962). The modern for-
               mulation of the power law interpretation of the learning curve and a review of
               some supporting evidence is available in Lane (1987), Newell and Rosenbloom
               (1981) and Delaney, Reder, Staszewski and Ritter (1998). Although most data
               sets fit power law equations at least marginally better than other types of equa-
               tions, there is a long-standing debate whether some other type of equation is
   450   451   452   453   454   455   456   457   458   459   460