Page 278 - Deep Learning
P. 278
Error Correction in Context 261
replaced by their logarithms, the curve becomes a straight line. To decide
whether a particular data set conforms to a power law it is sufficient to plot
it with logarithmic coordinates and verify that the data points cluster along a
straight line. Panel (b) in Figure 8.1 shows the same data as in panel (a), plotted
with logarithmic coordinates on both axes. What appears as a curve in panel
(a) appears as a straight line in panel (b).
Some researchers have claimed that empirical learning curves exhibit bet-
ter fits to exponential equations, that is, equations of the general form
+
P t() = A Be − at
in which the symbols have the same meaning as before and e is the natural
logarithm. Exponential curves have the mathematical property of falling on a
straight line when plotted with logarithmic coordinates on the y-axis (but not
the x-axis). Researchers have debated which type of equation best represents
the true shape of improvement. Power laws and exponential equations gener-
ate similar curves, so empirical data that fit one type of equation well tend to
also fit the other type equally well, or nearly so. The issue has been debated for
the better part of a century.
The curve in Figure 8.2 is an exponential curve, not a power law curve.
This is demonstrated in panel (b), which shows that the data points fall on a
straight line when the y-axis (but not the x-axis) is plotted with logarithmic
coordinates. Hence, error correction does not by itself explain why empirical
data so often are found to fit power laws. But according to the Nine Modes
Theory proposed in Chapter 6, we should not expect learning from error or
any other learning mechanism to provide the complete explanation for the
learning curve. The learning curve is a statistical construct that aggregates the
effects of a large number of learning events. For example, consider a training
study in which 25 subjects learn a skill that requires, on average, 50 learning
events for mastery. The total number of learning events behind the resulting
learning curve, if plotted in terms of averages across subjects, is then 50 * 25 =
1,250 events. According to the Nine Modes Theory, a person learns in multiple
ways during practice, so those 1,250 learning events will be of diverse kinds.
Some, but only some, will be error-correction events. Any observed behav-
ioral regularity is a cumulative result of the interactions among all the different
learning mechanisms.
To predict the expected shape of change would require a simulation
model that learns in all the nine ways described in Chapter 6. No such model
exists. However, an approximation can be constructed by focusing on the