Page 280 - Deep Learning
P. 280

Error Correction in Context             263

            parameters. If those parameters vary from learner to learner, it follows that
            empirical data will sometimes fit one type of equation better than the other,
            and sometimes the reverse.
               In the end, then, the answer to whether learning curves follow power law
            equations or exponential equations is “neither.” Either type of equation is an
            equally arbitrary description of the observed behavior. The underlying learn-
            ing mechanisms are not intrinsically connected to either type of equation, or
            indeed to any type of equation, and the learning curves they produce if they
            were to operate in isolation might not be the same as the curve generated by
            the interactions among the entire set of mechanisms. Furthermore, the learn-
            ing curve is not a behavior but a statistical construct, created by aggregating
            data from multiple performances. The exact shape of any one learning curve
            emerges in the aggregation process, and the outcome depends on the mixture
            of learning modes and the rates associated with the latter in the particular
            learning process studied. What remains constant is the negatively accelerated
            shape of the short-term learning curve.

            Multiple overlapping waves
            Learning curves like the one in Figure 8.1 are typically obtained in short-term
            laboratory  experiments  or  training  studies.  It  is  not  immediately  obvious
            how such short-term practice effects are related to practice in the long term.
            If a learning curve can reach asymptote within an hour of practice, how can
            improvements continue for 10 years or longer?
               One answer is that strategies are replaced by qualitatively different and
            more effective strategies. The phenomenon of strategy discovery for already
            mastered tasks is particularly salient in the study of cognitive development.
            Since  Jean  Piaget’s  monumental  contribution,  developmental  psychologists
            have struggled to describe the progressive growth of competence through the
            first 15 years of life.  No theorist now subscribes to the sequence of develop-
                            7
            mental stages that Piaget proposed. Careful empirical studies have shown that
            children’s cognitive competence does not grow in such a lock-step manner.
                                                                            8
            Competence is domain-specific and the rate of growth varies from domain
            to domain and from child to child, so a child might have reached some level
            of competence in domain X without being at the same level of competence
            in some other domain Y. What Piaget called décalages and cast as exceptions
            turned out to be the normal case.
               A developmental progression is more likely to consist of a succession
            of ever more powerful strategies for any one task, with mastery of qualita-
            tively different tasks progressing more or less independently of each other.
   275   276   277   278   279   280   281   282   283   284   285