Page 36 - FINAL CFA II SLIDES JUNE 2019 DAY 3
P. 36

SUPERVISED AND UNSUPERVISED                                   READING 8: MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
                      MACHINE LEARNING
                                                                                        MODULE 8.10: SUPERVISED  AND UNSUPERVISED  MACHINE LEARNING


       Multiple regression and other tools are often inadequate to model the complex relationships in Big Data because the
       underlying relationships are often nonlinear and non-direct for linear models.

       Big Data: Very large data sets which may include both structured (e.g., spreadsheet) data and unstructured (e.g., emails,
       text, or pictures) data.


       Data analytics uses computer-based algorithms to analyze Big Data and draw meaningful inferences, by:
       • Measuring correlations between variables;
       • Making predictions about some variable of interest;
       • Making causal inferences;
       • Classifying data into distinct categories;
       • Clustering into relatively homogenous observations (i.e., observations with some similarities in traits); and
       • Reducing the dimension (or no. of data attributes) discarding redundant and insignificant attributes of objects and entities.


       Machine learning (ML) refers to computer programs that learn from their errors and refine predictive models to improve their
       predictive accuracy over time. ML is one method used to extract useful information from Big Data. ML terms:
       • Target variable or tag variable is the dependent variable (Y). Target variables can be continuous, categorical, or ordinal;
       • Features are the independent variables (X); and
       • Feature engineering is curating (collect, organize) a dataset of features for ML processing.

       Candidates are expected to have a high-level understanding of the terminology and techniques used here!

       LOS 8.p: Distinguish between supervised and unsupervised machine learning.

       Supervised learning uses labeled training data (manipulators) to guide the ML program towards superior forecasting accuracy.


       In unsupervised learning, the ML program is NOT given labeled training data. In the absence of any tagged data, the program
       seeks out structure or interrelationships in the data (e.g. clustering).
   31   32   33   34   35   36   37   38   39   40   41