Page 106 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 106
Appendix E
Recent Technical Advances
in Data Science
We have taken the position throughout this book that data scientists must do much more than
technical work. Still in our view, it is critically important that data science builds on solid
theoretical and technical foundations. So, while a full review is beyond scope, a few remarks
on technical aspects of data science are in order.
Fisher (1922) laid the foundations for statistics as a discipline. He considered the object
of statistical methods to be reducing data into the essential statistics, and he identified three
problems that arise in doing so:
1. specification – choosing the right mathematical model for a population;
2. estimation – methods to calculate, from a sample, estimates of the parameters of the
hypothetical population; and
3. distribution – properties of statistics derived from samples.
Since then, others, some quoted in our book, have built on these foundations. Of particular
relevance here, Tukey (1962) envisioned a data‐centric development of statistics. Huber
(2011) and Donoho (2017) celebrated the 50th anniversary of Tukey’s paper with an outlook
on the role of statistics and reference to data science. Data science, which pulls together
domain knowledge, computer science/IT, and statistics, builds further still, and today’s data
scientist has a large variety of powerful methods, such as regression, ANOVA, visualization,
Bayesian methods, statistical control, neural networks (e.g. machine learning), bootstrapping
and cross validation, cluster analysis, text analytics, logistic regression, structural equation
models, time‐series analysis, decision trees, association rules, and so on, at his or her
disposal.
We are hopeful that data science will continue to grow, with better and better methods for
analyzing data; extracting the essential information; interpreting, presenting, and summarizing
results; and making valid inferences. These in turn will require better technical methods for
accessing, manipulating, storing, searching, and sorting data.
The Real Work of Data Science: Turning Data into Information, Better Decisions, and Stronger Organizations,
First Edition. Ron S. Kenett and Thomas C. Redman.
© 2019 Ron S. Kenett and Thomas C. Redman. Published 2019 by John Wiley & Sons Ltd.
Companion website: www.wiley.com/go/kenett-redman/datascience