Page 283 - Deep Learning
P. 283
266 Adaptation
12
practice. If a skill is complex, the absolute values on the axes are correspond-
ingly greater, but the shape of the learning curve is invariant.
The “big curve” way of scaling from short-term to long-term practice
effects overlooks the fact that the large knowledge base of an expert is a set
of partially overlapping but distinct skills rather than a single, integrated skill.
Consider cooking: Chopping vegetables is a different skill from filleting a fish,
and sautéing sea scallops is not exactly the same as scrambling eggs. These
skills are complex and there is some overlap in their components: Both chop-
ping and filleting require careful handling of the knife; both sautéing and
scrambling depend on keeping the pan at the right temperature. Nevertheless,
it is possible to master each of those four skills without mastering any of the
other three, so they are distinct. A professional chef has obviously mastered
all four. The total competence of the chef consists of hundreds of connected,
partially overlapping but nevertheless distinct subskills. The same description
applies to the skill sets of coffee shop managers, engineers, fighter pilots and
physicians. In general, competence has a clumpy, granular structure. Rules
form loosely integrated clusters that encode distinct subskills. A task is likely
to exercise some subskills more than others.
The clumpy character of practical knowledge invites the view that the
learning curve applies to each subskill separately instead of to the learner’s
competence as a whole. There is no direct proof of this. But researchers have
observed evidence for the closely related fact that when an expert discovers
multiple strategies for one and the same subtask, the standard learning curve
applies to each successive strategy. So it is plausible that the acquisition of
multiple distinct but related subskills moves simultaneously down multiple
learning curves, each shaped by how much practice the person experiences
on the particular subtask, as opposed to the overall task. That is, the level of
mastery of subskill X is a function of how many times the novice has encoun-
tered a task that requires that particular subskill, not how much experience he
has with the overall task. Skill at chopping only improves when the chef-to-be
makes a dish that requires chopping.
The learning curves for the subskills should not be thought of as synchro-
nized. If the target competence takes 10 years to acquire, all relevant subskills are
not introduced on the first day of training. The novice faces challenges of grad-
ually increasing complexity. This happens in part because novices migrate from
peripheral to central roles in learning-on-the-job contexts and hence assume
more and more responsibility, and in part because instructors and trainers delib-
erately design training sequences so that they lead the novice up the complexity
gradient. The musical prodigy first learns to play simple tunes, then moves on