Page 456 - Deep Learning
P. 456
Notes to Pages 260–263 439
theoretically better motivated. Mazur and Hastie (1978) compared exponential
with hyperbolic functions because they suggest that learning is either replace-
ment or accumulation, respectively, and concluded in favor of the latter, but
Heathcote, Brown and Mewhort (2000) nevertheless champion the exponential
alternative. The debate about the exact shape of the practice curve is compli-
cated by the fact that many published curves were produced by averaging across
subjects. Averaging across behavior that do not follow power laws might yield a
power law, in which case the law might be a statistical artifact (Estes, 1956). This
debate is ongoing (Brown & Heathcote, 2003; Haider & Frensch, 2002; Stratton
et al., 2007). It is unclear what the proponents of the artifact view make of data
from individual subjects that exhibit near-perfect fit to a power law (Newell &
Rosenbloom, 1981, Figure 1.3, Figure 1.5 and Figure 1.6; also Stevens & Savin,
1962). Confusingly, there is a structurally similar debate about power laws in
memorization studies (Myung, Kim & Pitt, 2000), but it concerns the shape of
forgetting curves for the free recall of memorized lists rather than the improve-
ment curves for skills. One would hardly expect forgetting of, for example, word
lists and improvement in, for example, geometry proof finding to depend on
the same cognitive mechanism or to generate the same behavioral phenomena,
so this second debate is not as relevant for the shape of practice curves as the
structural similarity of the arguments suggests. Economic analysts consider a
wider spectrum of possible learning curve shapes than do psychologists, per-
haps for the reasons stated in the previous note. For example, Uzumeri and
Nembhard (1998), following Mazur and Hastie (1978), proposed a hyperbolic
function, asserting that it is “known to reflect the way in which individuals learn
both conceptual and motor skills” (p. 518); this claim is a surprise to cognitive
psychologists who study skill acquisition.
6. Ohlsson and Jewett (1995, 1997).
7. Flavell (1963) is the best summary and overview of Piaget’s research and theoreti-
cal system. See also Furth (1969). Piaget’s own works are so numerous that it is
difficult to single out any one text as more central than others; Piaget (1950) and
Piaget (1985) are as good as any.
8. See Klahr and Wallace (1976) and Young (1976) for examples of transitional works
that took a critique of Piaget’s stage theory as the starting point for pioneering
an information-processing approach to cognitive development. A decade later,
Siegler (1986) summarized this period. Siegler (1987, 1989) performed detailed,
response-by-response analyses to demonstrate the variety of strategies used by any
one child in the domain of number knowledge, thereby undermining the notion
of cognitive stages. There have been later attempts to rescue the idea of cognitive
stages by defining them within other formalisms than the pseudo-logic used by
Piaget himself – see, e.g., Commons et al. (1998) and van der Maas and Molenaar
(1992) – and by being rigorous about the empirical criteria for the existence of
stages (Dawson-Tunik, Commons, Wilson & Fischer, 2005). However, it is a fact
that children (e.g., Luwel, Verschaffel, Onghena & De Corte, 2003; Ohlsson &
Bee, 1991) as well as young adults (Nokes, 2009; Nokes & Ohlsson, 2004, 2005)
shift flexibly among multiple strategies within task domains, so it is not clear what
meaning can be attached to the notion of cognitive stages.