Page 37 - FINAL CFA II SLIDES JUNE 2019 DAY 3

P. 37

MODULE 8.11: MACHINE READING 8: MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

LEARNING ALGORITHMS
MODULE 8.10: SUPERVISED AND UNSUPERVISED MACHINE LEARNING

LOS 8.q: Describe machine learning algorithms used in prediction, classification, clustering, and dimension reduction.

Supervised learning algorithms: Used for prediction (i.e., regression) and classification. When the y-variable is:
• Continuous, the appropriate approach is that of regression (used in a broad, ML context).
• Categorical (i.e., belonging to a category/classification) or ordinal (i.e., ordered/ranked), a classification model is used.

Regression Models: Linear and nonlinear regression models can be used to generate forecasts. A special case of generalized
linear model (GLM) is penalized regression. These seek to minimize forecasting errors by reducing the problem of overfitting:

• Overfitting results when a large number of features (i.e., independent variables) are included in the data sample. The resulting model can
use the “noise” in the dependent variables to improve the model fit. Overfitting the model in this way will decrease the accuracy of model
forecasts on other (out-of-sample) data. To reduce the problem of overfitting, researchers may impose a penalty based on the number of
features used by the model.

• Penalized regression models seek to minimize the sum of square errors as well as a penalty value. This penalty value increases with the
number of independent variables (features). Imposing such a penalty can exclude features that are not meaningfully contributing to out-of-
sample prediction accuracy (thus making the model more parsimonious). In summary, penalized regression models seek to reduce the
number of features included in the model while retaining as much predictive information in the data as possible.

32 33 34 35 36 37 38 39 40 41 42