Page 106 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 106

Appendix E







             Recent Technical Advances

             in Data Science







             We have taken the position throughout this book that data scientists must do much more than
             technical work. Still in our view, it is critically important that data science builds on solid
               theoretical and technical foundations. So, while a full review is beyond scope, a few remarks
             on technical aspects of data science are in order.
               Fisher (1922) laid the foundations for statistics as a discipline. He considered the object
             of statistical methods to be reducing data into the essential statistics, and he identified three
             problems that arise in doing so:

             1.  specification – choosing the right mathematical model for a population;
             2.  estimation – methods to calculate, from a sample, estimates of the parameters of the
               hypothetical population; and
             3.  distribution – properties of statistics derived from samples.

             Since then, others, some quoted in our book, have built on these foundations. Of particular
             relevance  here,  Tukey  (1962) envisioned  a  data‐centric  development  of  statistics.  Huber
             (2011) and Donoho (2017) celebrated the 50th anniversary of Tukey’s paper with an outlook
             on the role of statistics and reference to data science. Data science, which pulls together
             domain knowledge, computer science/IT, and statistics, builds further still, and today’s data
             scientist has a large variety of powerful methods, such as regression, ANOVA, visualization,
             Bayesian methods, statistical control, neural networks (e.g. machine learning), bootstrapping
             and cross validation, cluster analysis, text analytics, logistic regression, structural equation
             models, time‐series analysis, decision trees, association rules, and so on, at his or her
             disposal.
               We are hopeful that data science will continue to grow, with better and better methods for
             analyzing data; extracting the essential information; interpreting, presenting, and summarizing
             results; and making valid inferences. These in turn will require better technical methods for
             accessing, manipulating, storing, searching, and sorting data.


             The Real Work of Data Science: Turning Data into Information, Better Decisions, and Stronger Organizations,
             First Edition. Ron S. Kenett and Thomas C. Redman.
             © 2019 Ron S. Kenett and Thomas C. Redman. Published 2019 by John Wiley & Sons Ltd.
             Companion website: www.wiley.com/go/kenett-redman/datascience
   101   102   103   104   105   106   107   108   109   110   111