Page 276 - Deep Learning

P. 276

Error Correction in Context 259

or irregular curve? The consequence of a fault in practical knowledge is
floundering, that is, backtracking, hesitation and repetitions. Faulty practi-
cal knowledge produces unnecessary steps and to learn is, to some extent, to
excise those steps. Performance speeds up when a fault is corrected because
the amount of floundering decreases. The unnecessary cognitive work caused
by a faulty strategy or skill varies in magnitude from fault to fault, but per-
formance measures such as solution time refer to entire performances and
hence effectively average the savings produced by individual error correction
events. At the outset of practice the learner makes many errors in each prac-
tice trial, so there are many learning opportunities per trial. As the learner
approaches mastery, the number of errors per trial decreases because the faults
in the underlying rules are successively eliminated. Consequently, there are
fewer learning opportunities and hence fewer error-correcting events per trial,
so the rate of improvement slows down.
This qualitative argument is consistent with the results of computer sim-
ulations. Figure 8.2 shows a simulation result achieved with the HS simula-
tion model described in Chapter 7. The model learned to construct structural
formulas, so-called Lewis structures. In this particular simulation, the model
worked on nine different Lewis structures taken from a college chemistry text.
The problems were presented to the model multiple times in random order
until each one had been mastered. The values shown in the figure are averages
across multiple simulated students. (Empirical learning curves are typically
constructed by aggregating data from multiple human learners.) The model’s
learning curve exhibits the same gradually declining rate of improvement as
do the learning curves of human learners. No aspect of the constraint-based
error-correcting mechanism was specifically designed to achieve this result.
The shape of the resulting learning curve emerges out of the interactions
among the processes postulated in the model.
Since the beginning of the 20th century, researchers have debated whether
the shape of the practice curve follows a particular mathematical form and, if
so, which one. The leading hypothesis is that it conforms to the shape described
by power laws – equations of the general form

+
P() = t A Bt −a
where P is a measure of performance, usually the time to perform the target
task, t is the amount of practice, usually measured in number of training trials
and P(t) is the performance on trial t. The parameter A is the asymptote, the
best possible performance, while B is the performance on the first trial. The

271 272 273 274 275 276 277 278 279 280 281