Page 282 - Deep Learning
P. 282

Error Correction in Context             265

            back toward zero eventually. The obvious hypothesis is that this is due to the
            competitive evaluation of the new strategy vis-à-vis any previous strategy. If the
            new strategy is more powerful, in the sense of executing more efficiently, apply-
            ing more broadly or producing better outcomes, then it will win out during
            conflict resolution, accrue strength and eventually dominate the prior strategies
            until some other, yet more powerful strategy is discovered. Each time a new,
            better strategy is discovered, the gradual increase in its probability of use drives
            down the probability of use for the prior strategies for that task.
               Neither  strategy  evaluation  nor  competitive  conflict  resolution  derives
            directly from the specific properties of any particular mode of learning. The
            multiple overlapping waves pattern is therefore an example of a long-term pat-
            tern in the growth of competence that owes nothing to the internal mechan-
            ics of the responsible cognitive change mechanisms. The pattern will appear if
            new, more powerful cognitive strategies can be discovered (somehow), and if
            the probability of use is determined (somehow) by competitive conflict resolu-
            tion. At the level of life-span growth, it does not matter how new strategies are
            discovered or how conflicts between competing strategies are resolved. The only
            properties that punch through to the life-span time band are the very capabilities
            of discovering better strategies and of assessing their relative cognitive utility.

            Long­term growth of complexity
            Strategy discovery cannot, in practice, continue indefinitely. In some cases, no
            strategy of higher effectiveness remains to be discovered. This process is only
            a partial answer to the question of how improvements in performance can
            continue for 10 years or longer.
               A second possible answer is that the bigger the skill, the bigger the learn-
            ing curve. Greater complexity of the target skill implies a greater value for the
            B parameter in the power law equation, which implies that learning begins
            at a higher point on the y-axis and that more practice trials are required to
            reach asymptotic performance. Simple skills asymptote in a few practice trials;
            complex skills asymptote only after a long series of trials. The magnitude of
            the change from the first trial to asymptotic performance is greater in absolute
            terms for a more complex skill, so the process takes longer.
               Some empirical findings support this “big curve” view. I once analyzed
            the learning curve for writing books, based on data from the late science fic-
            tion writer Isaac Asimov, who wrote a total of 500 books in his lifetime.  The
                                                                        11
            time to completion for his Nth book turned out to be a power law function
            of N. A famous study by E. R. F. W. Crossman on cigar rolling in Cuba exhib-
            ited power law improvement over millions of trials representing many years of
   277   278   279   280   281   282   283   284   285   286   287