Page 202 - Data Science Algorithms in a Week

P. 202

Glossary of Algorithms and Methods in Data Science

Time series analysis: The analysis of data dependent on time; it mainly includes
the analysis of trend and seasonality.
Support vector machines: A classification algorithm that finds the hyperplane
that divides the training data into the given classes. This division by the
hyperplane is then used to classify the data further.
Principal component analysis: The preprocessing of the individual components
of the given data in order to achieve better accuracy, for example, rescaling of the
variables in the input vector depending on how much impact they have on the
end result.
Text mining: The search and extraction of text and its possible conversion to
numerical data used for data analysis.
Neural networks: A machine learning algorithm consisting of a network of
simple classifiers making decisions based on the input or the results of the other
classifiers in the network.
Deep learning: The ability of a neural network to improve its learning process.
A priori association rules: The rules that can be observed in the training data
and, based on which, a classification of the future data can be made.
PageRank: A search algorithm that assigns the greatest relevance to the search
result that has the greatest number of incoming web links from the most relevant
search results on a given search term. In mathematical terms, PageRank
calculates a certain eigenvector representing these measures of relevance.
Ensemble learning: A method of learning where different learning algorithms
are used to make a final conclusion.
Bagging: A method of classifying a data item by the majority vote of the
classifiers trained on the random subsets of the training data.
Genetic algorithms: Machine learning algorithms inspired by the genetic
processes, for example, an evolution where classifiers with the best accuracy are
trained further.
Inductive inference: A machine learning method learning the rules that
produced the actual data.
Bayesian networks: A graph model representing random variables with their
conditional dependencies.
Singular value decomposition: A factorization of a matrix, a generalization of
eigen decomposition, used in least squares methods.
Boosting: A machine learning meta algorithm decreasing the variance in the
estimation by making a prediction based on the ensembles of the classifiers.
Expectation maximization: An iterative method to search the parameters in the
model that maximize the accuracy of the prediction of the model.

[ 190 ]

197 198 199 200 201 202 203 204 205