Page 448 - Deep Learning
P. 448

Notes to Pages 197–207                 431

              62.  There are several different computational models of generalization of practical
                knowledge; see, e.g., Anderson (1983, 1987), Lewis (1988), Mitchell (1982), Sun,
                Merrill and Peterson (2001) and Sun, Sluzarz and Terry (2005).
              63.  Examples of such learning mechanisms are found in, among other works, Elio
                and  Scharf  (1990),  Jones  and  VanLehn  (1994),  Larkin  (1981),  Neches  (1987),
                Ohlsson (1987b), Ruiz and Newell (1993), Shrager and Siegler (1998) and Siegler
                and Araya (2005).
              64.  Altmann and Bums (2005), Gray and Boehm-Davis (2000), Gray, Shelles and
                Sims (2005) and Schooler and Hertzwig (2005). This form of learning was origi-
                nally studied by the behaviorists under the label statistical learning theory (Estes,
                1950) and the main phenomenon was called probability matching (Grant, Hake &
                Hornseth, 1951).
              65.  Logan (1998).
              66.  There is a persistent itch among cognitive theorists to reach for a description of
                cognition at a level of abstraction above that of process models. Indeed, there
                is a need for a type of description that constrains and specifies the system that
                a process model is to instantiate and hence explain. Such formulations include
                Noam  Chomsky’s  distinction  between  linguistic  competence  and  performance
                (Chomsky, 1964; Pylyshyn, 1973); Zenon W. Pylyshyn’s distinction between the
                functional architecture and cognitive processes (Pylyshyn, 1980, 1986); David Marr’s
                distinction between the computational and the algorithmic levels in vision (Marr,
                1982); Allen Newell’s distinction between the knowledge level and the symbol level
                in the description of an information processing system (Newell, 1982); and John
                R. Anderson’s rationality principle, which says that a first approximation model
                of human cognition can assume that the latter is maximally efficient (Anderson,
                1989, 1990). The Principle of Maximally Efficient Learning is a rationality prin-
                ciple of this sort, and it was inspired by these prior formulations but differs from
                them in its exclusive focus on skill acquisition.
              67.  Anderson (1989, 1990) and Newell (1990, p. 33).
              68.  Ohlsson and Jewett (1997).


                Chapter 7.  Error Correction: The Specialization Theory
              1.   Norman (1981, p. 3).
              2.   Bruner (1970, p. 67).
              3.   Cavalli-Sforza (2000), Olson (2002) and Stringer and McKie (1997).
              4.    These two proverbs regarding learning from error appear in European sources
                from  way  back.  The  Web  page  http://www.answers.com/topic/a-burnt-child-
                dreads-the-fire gives sources as far back as a.d. 1250 for the one about dreading
                the fire, and the Web page http://wwww. answers.com/topic/once-bitten-twice-
                shy gives multiple 19th-century sources for the second one. The two proverbs
                appear to have been fused in the American variant once burned, twice shy.
              5.   Thorndike (1927).
              6.   Thorndike (1898, p. 45).
              7.   James (1890, vol. 1, pp. 24–27).
              8.   James (1890, vol. 1, p. 25).
   443   444   445   446   447   448   449   450   451   452   453