Page 444 - Deep Learning
P. 444

Notes to Pages 173–175                427

                for which the average time to completion on the first training trial is 10 minutes,
                or 600 seconds. In this case, if the learning rate parameter a is equal to .25, then
                it takes approximately 15 training trials to cut the time to task completion in half,
                to 300 seconds. If the learning rate parameter is instead .75, it only takes 3 train-
                ing trials to cut the time in half. There are few other task variables that will yield
                behavioral effects of this magnitude.
              10.  See  Campitelli  and  Gobet  (2005)  and  Holding  (1985,  Chap.  3)  for  cognitive
                research on chess in general and blindfolded chess in particular. See, e.g., Chase
                and Ericsson (1981) for a particularly striking case study of a person whose per-
                formance improved 10-fold with extended practice on a task that is often consid-
                ered as measuring a stable characteristic of a person’s cognition.
              11.  The learning curve (or practice curve) is constructed by plotting performance
                on a task (e.g., time to task completion) as a function of the amount of prac-
                tice  (e.g.,  number  of  practice  problems  completed,  total  time  in  the  relevant
                task environment, etc.). The first researchers to display data on the acquisition
                of a complex skill in this way were probably Edward L. Thorndike (1898) in his
                study of animals learning how to escape from problem boxes, and Bryan and
                Harter (1887, 1899) in an influential study of telegraph operators. Ebbinghaus
                (1885/1964)  had  displayed  data  from  the  memorization  of  lists  of  syllables  in
                this way 30 years earlier. Learning curves are typically jagged when plotted for
                a single learner but become regular when plotted in terms of averages across
                learners. The shape of the curve emerged early as a research issue. The work
                by Bryan and Harter (1887, 1899) made researchers search for plateaus, periods
                during which the learner appears to make no improvements but which are fol-
                lowed  by  periods  of  rapid  improvement.  This  supposed  phenomenon  invited
                the interpretation that the learner was revising his skill internally, and that the
                advantages of the revision could not be realized in performance until it was com-
                plete. Intriguing as this hypothesis is, subsequent research did not support the
                existence of plateaus (Keller, 1958). Most researchers have concluded that learn-
                ing/practice curves exhibit uniform negative acceleration (“Almost any learning
                curve shows … a negative acceleration; it flattens out as practice advances; the
                rate of improvement decreases”; Woodworth, 1938, p. 164), but some are more
                reluctant to let go of the plateau than others (see, e.g., Stadler, Vetter, Haynes &
                Kruse, 1996). Interestingly, the learning curve is also a topic of research in eco-
                nomics. Economists discovered that entire factories exhibit such curves when
                unit cost (a measure of economic performance) is plotted as a function of the
                length of the production run (a measure of amount of practice). Economists are
                more prepared to accept different shapes of empirical learning curves (see Note 5,
                Chapter 8), and they are not as convinced as psychologists that plateaus in the
                sense of Bryan and Harter are a pseudo-phenomenon (although it should be
                noted that the word “plateau” is often used in economic contexts to refer to the
                final leveling off of a learning curve, which is properly called the asymptote). The
                difference between psychologists and economists on this point is likely due to the
                fact that the former typically study learning in one-hour experiments and their
                learning curves hence describe short-term effects, while the latter study improve-
                ment processes in economic organizations that last for months and years and that
   439   440   441   442   443   444   445   446   447   448   449