Page 185 - Data Science Algorithms in a Week
P. 185

Statistics


            Cross-validation

            Cross-validation is a method to validate an estimated hypothesis on data. In the beginning
            of the analysis process, the data is split into the learning data and the testing data. A
            hypothesis is fit to the learning data, then its actual error is measured on the testing data.
            This way, we can estimate how well a hypothesis may perform on the future data. Reducing
            the amount of learning data can also be beneficial in the end, as it reduces the chance of
            hypothesis over-fitting – a hypothesis being trained to a particular narrow data subset of
            the data.


            K-fold cross-validation

            Original data is partitioned randomly into the k folds. 1 fold is used for the validation, k-1
            folds of data are used for hypothesis training.



            A/B Testing

            A/B testing is the validation of the 2 hypotheses on the data – usually on the real data. Then,
            the hypothesis with the better result (lower error of the estimation) is chosen as an estimator
            for future data.
































                                                    [ 173 ]
   180   181   182   183   184   185   186   187   188   189   190