Page 185 - Data Science Algorithms in a Week
P. 185
Statistics
Cross-validation
Cross-validation is a method to validate an estimated hypothesis on data. In the beginning
of the analysis process, the data is split into the learning data and the testing data. A
hypothesis is fit to the learning data, then its actual error is measured on the testing data.
This way, we can estimate how well a hypothesis may perform on the future data. Reducing
the amount of learning data can also be beneficial in the end, as it reduces the chance of
hypothesis over-fitting – a hypothesis being trained to a particular narrow data subset of
the data.
K-fold cross-validation
Original data is partitioned randomly into the k folds. 1 fold is used for the validation, k-1
folds of data are used for hypothesis training.
A/B Testing
A/B testing is the validation of the 2 hypotheses on the data – usually on the real data. Then,
the hypothesis with the better result (lower error of the estimation) is chosen as an estimator
for future data.
[ 173 ]