Page 22 - Understanding Machine Learning
P. 22

Introduction
           4

                         digitally recorded data, it becomes obvious that there are treasures of mean-
                         ingful information buried in data archives that are way too large and too
                         complex for humans to make sense of. Learning to detect meaningful pat-
                         terns in large and complex data sets is a promising domain in which the
                         combination of programs that learn with the almost unlimited memory
                         capacity and ever increasing processing speed of computers opens up new
                         horizons.
                 Adaptivity. One limiting feature of programmed tools is their rigidity – once the
                     program has been written down and installed, it stays unchanged. However,
                     many tasks change over time or from one user to another. Machine learning
                     tools – programs whose behavior adapts to their input data – offer a solution to
                     such issues; they are, by nature, adaptive to changes in the environment they
                     interact with. Typical successful applications of machine learning to such prob-
                     lems include programs that decode handwritten text, where a fixed program can
                     adapt to variations between the handwriting of different users; spam detection
                     programs, adapting automatically to changes in the nature of spam e-mails; and
                     speech recognition programs.


                 1.3 TYPES OF LEARNING
                 Learning is, of course, a very wide domain. Consequently, the field of machine
                 learning has branched into several subfields dealing with different types of learning
                 tasks. We give a rough taxonomy of learning paradigms, aiming to provide some
                 perspective of where the content of this book sits within the wide field of machine
                 learning.
                    We describe four parameters along which learning paradigms can be classified.

                 Supervised versus Unsupervised Since learning involves an interaction between the
                     learner and the environment, one can divide learning tasks according to the
                     nature of that interaction. The first distinction to note is the difference between
                     supervised and unsupervised learning. As an illustrative example, consider the
                     task of learning to detect spam e-mail versus the task of anomaly detection.
                     For the spam detection task, we consider a setting in which the learner receives
                     training e-mails for which the label spam/not-spam is provided. On the basis of
                     such training the learner should figure out a rule for labeling a newly arriving
                     e-mail message. In contrast, for the task of anomaly detection, all the learner
                     gets as training is a large body of e-mail messages (with no labels) and the
                     learner’s task is to detect “unusual” messages.
                       More abstractly, viewing learning as a process of “using experience to gain
                     expertise,” supervised learning describes a scenario in which the “experience,”
                     a training example, contains significant information (say, the spam/not-spam
                     labels) that is missing in the unseen “test examples” to which the learned exper-
                     tise is to be applied. In this setting, the acquired expertise is aimed to predict
                     that missing information for the test data. In such cases, we can think of the
                     environment as a teacher that “supervises” the learner by providing the extra
                     information (labels). In unsupervised learning, however, there is no distinction
                     between training and test data. The learner processes input data with the goal
   17   18   19   20   21   22   23   24   25   26   27