Page 36 - FINAL CFA II SLIDES JUNE 2019 DAY 3
P. 36
SUPERVISED AND UNSUPERVISED READING 8: MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MACHINE LEARNING
MODULE 8.10: SUPERVISED AND UNSUPERVISED MACHINE LEARNING
Multiple regression and other tools are often inadequate to model the complex relationships in Big Data because the
underlying relationships are often nonlinear and non-direct for linear models.
Big Data: Very large data sets which may include both structured (e.g., spreadsheet) data and unstructured (e.g., emails,
text, or pictures) data.
Data analytics uses computer-based algorithms to analyze Big Data and draw meaningful inferences, by:
• Measuring correlations between variables;
• Making predictions about some variable of interest;
• Making causal inferences;
• Classifying data into distinct categories;
• Clustering into relatively homogenous observations (i.e., observations with some similarities in traits); and
• Reducing the dimension (or no. of data attributes) discarding redundant and insignificant attributes of objects and entities.
Machine learning (ML) refers to computer programs that learn from their errors and refine predictive models to improve their
predictive accuracy over time. ML is one method used to extract useful information from Big Data. ML terms:
• Target variable or tag variable is the dependent variable (Y). Target variables can be continuous, categorical, or ordinal;
• Features are the independent variables (X); and
• Feature engineering is curating (collect, organize) a dataset of features for ML processing.
Candidates are expected to have a high-level understanding of the terminology and techniques used here!
LOS 8.p: Distinguish between supervised and unsupervised machine learning.
Supervised learning uses labeled training data (manipulators) to guide the ML program towards superior forecasting accuracy.
In unsupervised learning, the ML program is NOT given labeled training data. In the absence of any tagged data, the program
seeks out structure or interrelationships in the data (e.g. clustering).