Page 283 - Deep Learning
P. 283

266                         Adaptation

                   12
            practice.  If a skill is complex, the absolute values on the axes are correspond-
            ingly greater, but the shape of the learning curve is invariant.
               The  “big  curve”  way  of  scaling  from  short-term  to  long-term  practice
            effects overlooks the fact that the large knowledge base of an expert is a set
            of partially overlapping but distinct skills rather than a single, integrated skill.
            Consider cooking: Chopping vegetables is a different skill from filleting a fish,
            and sautéing sea scallops is not exactly the same as scrambling eggs. These
            skills are complex and there is some overlap in their components: Both chop-
            ping  and  filleting  require  careful  handling  of  the  knife;  both  sautéing  and
            scrambling depend on keeping the pan at the right temperature. Nevertheless,
            it is possible to master each of those four skills without mastering any of the
            other three, so they are distinct. A professional chef has obviously mastered
            all four. The total competence of the chef consists of hundreds of connected,
            partially overlapping but nevertheless distinct subskills. The same description
            applies to the skill sets of coffee shop managers, engineers, fighter pilots and
            physicians. In general, competence has a clumpy, granular structure. Rules
            form loosely integrated clusters that encode distinct subskills. A task is likely
            to exercise some subskills more than others.
               The  clumpy  character  of  practical  knowledge  invites  the  view  that  the
            learning curve applies to each subskill separately instead of to the learner’s
            competence as a whole. There is no direct proof of this. But researchers have
            observed evidence for the closely related fact that when an expert discovers
            multiple strategies for one and the same subtask, the standard learning curve
            applies to each successive strategy. So it is plausible that the acquisition of
            multiple distinct but related subskills moves simultaneously down multiple
            learning curves, each shaped by how much practice the person experiences
            on the particular subtask, as opposed to the overall task. That is, the level of
            mastery of subskill X is a function of how many times the novice has encoun-
            tered a task that requires that particular subskill, not how much experience he
            has with the overall task. Skill at chopping only improves when the chef-to-be
            makes a dish that requires chopping.
               The learning curves for the subskills should not be thought of as synchro-
            nized. If the target competence takes 10 years to acquire, all relevant subskills are
            not introduced on the first day of training. The novice faces challenges of grad-
            ually increasing complexity. This happens in part because novices migrate from
            peripheral to central roles in learning-on-the-job contexts and hence assume
            more and more responsibility, and in part because instructors and trainers delib-
            erately design training sequences so that they lead the novice up the complexity
            gradient. The musical prodigy first learns to play simple tunes, then moves on
   278   279   280   281   282   283   284   285   286   287   288