Page 278 - Deep Learning
P. 278

Error Correction in Context             261

            replaced  by  their  logarithms,  the  curve  becomes  a  straight  line.  To  decide
            whether a particular data set conforms to a power law it is sufficient to plot
            it with logarithmic coordinates and verify that the data points cluster along a
            straight line. Panel (b) in Figure 8.1 shows the same data as in panel (a), plotted
            with logarithmic coordinates on both axes. What appears as a curve in panel
            (a) appears as a straight line in panel (b).
               Some researchers have claimed that empirical learning curves exhibit bet-
            ter fits to exponential equations, that is, equations of the general form

                                             +
                                     P t() =  A Be − at
            in which the symbols have the same meaning as before and e is the natural
            logarithm. Exponential curves have the mathematical property of falling on a
            straight line when plotted with logarithmic coordinates on the y-axis (but not
            the x-axis). Researchers have debated which type of equation best represents
            the true shape of improvement. Power laws and exponential equations gener-
            ate similar curves, so empirical data that fit one type of equation well tend to
            also fit the other type equally well, or nearly so. The issue has been debated for
            the better part of a century.
               The curve in Figure 8.2 is an exponential curve, not a power law curve.
            This is demonstrated in panel (b), which shows that the data points fall on a
            straight line when the y-axis (but not the x-axis) is plotted with logarithmic
            coordinates. Hence, error correction does not by itself explain why empirical
            data so often are found to fit power laws. But according to the Nine Modes
            Theory proposed in Chapter 6, we should not expect learning from error or
            any other learning mechanism to provide the complete explanation for the
            learning curve. The learning curve is a statistical construct that aggregates the
            effects of a large number of learning events. For example, consider a training
            study in which 25 subjects learn a skill that requires, on average, 50 learning
            events for mastery. The total number of learning events behind the resulting
            learning curve, if plotted in terms of averages across subjects, is then 50 * 25 =
            1,250 events. According to the Nine Modes Theory, a person learns in multiple
            ways during practice, so those 1,250 learning events will be of diverse kinds.
            Some, but only some, will be error-correction events. Any observed behav-
            ioral regularity is a cumulative result of the interactions among all the different
            learning mechanisms.
               To  predict  the  expected  shape  of  change  would  require  a  simulation
            model that learns in all the nine ways described in Chapter 6. No such model
            exists. However, an approximation can be constructed by focusing on the
   273   274   275   276   277   278   279   280   281   282   283