Page 37 - FINAL CFA II SLIDES JUNE 2019 DAY 3
P. 37

MODULE 8.11: MACHINE                                           READING 8: MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

     LEARNING ALGORITHMS
                                                                                        MODULE 8.10: SUPERVISED  AND UNSUPERVISED  MACHINE LEARNING

     LOS 8.q: Describe machine learning algorithms used in prediction, classification, clustering, and dimension reduction.


    Supervised learning algorithms: Used for prediction (i.e., regression) and classification. When the y-variable is:
    • Continuous, the appropriate approach is that of regression (used in a broad, ML context).
    • Categorical (i.e., belonging to a category/classification) or ordinal (i.e., ordered/ranked), a classification model is used.


     Regression Models: Linear and nonlinear regression models can be used to generate forecasts. A special case of generalized
     linear model (GLM) is penalized regression. These seek to minimize forecasting errors by reducing the problem of overfitting:


      • Overfitting results when a large number of features (i.e., independent variables) are included in the data sample. The resulting model can
        use the “noise” in the dependent variables to improve the model fit. Overfitting the model in this way will decrease the accuracy of model
        forecasts on other (out-of-sample) data. To reduce the problem of overfitting, researchers may impose a penalty based on the number of
        features used by the model.

      • Penalized regression models seek to minimize the sum of square errors as well as a penalty value. This penalty value increases with the
        number of independent variables (features). Imposing such a penalty can exclude features that are not meaningfully contributing to out-of-
        sample prediction accuracy (thus making the model more parsimonious). In summary, penalized regression models seek to reduce the
        number of features included in the model while retaining as much predictive information in the data as possible.
   32   33   34   35   36   37   38   39   40   41   42