Page 185 - Data Science Algorithms in a Week

P. 185

Statistics

Cross-validation

Cross-validation is a method to validate an estimated hypothesis on data. In the beginning
of the analysis process, the data is split into the learning data and the testing data. A
hypothesis is fit to the learning data, then its actual error is measured on the testing data.
This way, we can estimate how well a hypothesis may perform on the future data. Reducing
the amount of learning data can also be beneficial in the end, as it reduces the chance of
hypothesis over-fitting – a hypothesis being trained to a particular narrow data subset of
the data.

K-fold cross-validation

Original data is partitioned randomly into the k folds. 1 fold is used for the validation, k-1
folds of data are used for hypothesis training.

A/B Testing

A/B testing is the validation of the 2 hypotheses on the data – usually on the real data. Then,
the hypothesis with the better result (lower error of the estimation) is chosen as an estimator
for future data.

[ 173 ]

180 181 182 183 184 185 186 187 188 189 190