Page 282 - Deep Learning
P. 282
Error Correction in Context 265
back toward zero eventually. The obvious hypothesis is that this is due to the
competitive evaluation of the new strategy vis-à-vis any previous strategy. If the
new strategy is more powerful, in the sense of executing more efficiently, apply-
ing more broadly or producing better outcomes, then it will win out during
conflict resolution, accrue strength and eventually dominate the prior strategies
until some other, yet more powerful strategy is discovered. Each time a new,
better strategy is discovered, the gradual increase in its probability of use drives
down the probability of use for the prior strategies for that task.
Neither strategy evaluation nor competitive conflict resolution derives
directly from the specific properties of any particular mode of learning. The
multiple overlapping waves pattern is therefore an example of a long-term pat-
tern in the growth of competence that owes nothing to the internal mechan-
ics of the responsible cognitive change mechanisms. The pattern will appear if
new, more powerful cognitive strategies can be discovered (somehow), and if
the probability of use is determined (somehow) by competitive conflict resolu-
tion. At the level of life-span growth, it does not matter how new strategies are
discovered or how conflicts between competing strategies are resolved. The only
properties that punch through to the life-span time band are the very capabilities
of discovering better strategies and of assessing their relative cognitive utility.
Longterm growth of complexity
Strategy discovery cannot, in practice, continue indefinitely. In some cases, no
strategy of higher effectiveness remains to be discovered. This process is only
a partial answer to the question of how improvements in performance can
continue for 10 years or longer.
A second possible answer is that the bigger the skill, the bigger the learn-
ing curve. Greater complexity of the target skill implies a greater value for the
B parameter in the power law equation, which implies that learning begins
at a higher point on the y-axis and that more practice trials are required to
reach asymptotic performance. Simple skills asymptote in a few practice trials;
complex skills asymptote only after a long series of trials. The magnitude of
the change from the first trial to asymptotic performance is greater in absolute
terms for a more complex skill, so the process takes longer.
Some empirical findings support this “big curve” view. I once analyzed
the learning curve for writing books, based on data from the late science fic-
tion writer Isaac Asimov, who wrote a total of 500 books in his lifetime. The
11
time to completion for his Nth book turned out to be a power law function
of N. A famous study by E. R. F. W. Crossman on cigar rolling in Cuba exhib-
ited power law improvement over millions of trials representing many years of