Page 96 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 96

88                                                  The Real Work of Data Science


             With respect to our topic, a breakthrough came when statisticians realized that data analysis
           involves a range of concerns, beyond mathematical properties of statistical tools. To quote
           John Tukey:

             For a long time I have thought I was a statistician, interested in inferences from the particular to
             the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and
             to doubt.…All in all I have come to feel that my central interest is in data analysis, which I take
             to include, among other things: procedures for analyzing data, techniques for interpreting the
             results of such procedures, ways of planning the gathering of data to make its analysis easier,
             more precise or more accurate, and all the machinery and results of (mathematical) statistics
             which apply to analyzing data.

             An amazing insight, now over 50 years old. Continuing, Tukey notes that:

             data analysis is a very difficult field. It must adapt itself to what people can and need to do with
             data. In the sense that biology is more complex than physics, and the behavioral sciences are more
             complex than either, it is likely that the general problems of data analysis are more complex than
             all three. It is too much to ask for close and effective guidance for data analysis from any highly
             formalized structure, either now or in the near future. Data analysis can gain much from formal
             statistics, but only if the connection is kept adequately loose. (Tukey 1962)

             Going back still further, as early as the 1930s, W. Edwards Deming wrote in the preface of
           Shewhart’s book on The Economic Control of Quality of Manufactured Product:

             Tests of variables that affect a process are useful only if they predict what will happen if this or
             that variable is increased or decreased. Statistical theory, as taught in the books, is valid and leads
             to operationally verifiable tests and criteria for an enumerative study. Not so with an analytic
             problem, as the conditions of the experiment will not be duplicated in the next trial. Unfortunately,
             most problems in industry are analytic. (Deming 1931)

             The most important parts of data science are analytic (e.g. predictive), addressing the con-
           cern voiced by Deming.
             These are the foundations on which we build in this book.

           A Bridge to the Future
           Our goal is to help data scientists navigate the individual and organizational complexities.
           Experienced data scientists will recognize the points made in these 18 chapters. We hope
           not to have scared newcomers off – but be aware that your technical knowhow is just table
           stakes.
             The book explores, in short chapters, the things data scientists aren’t usually taught in class,
           but that the giants of statistics knew were essential. Of course, solid analyses are essential, but
           it is other, more complex steps that data scientists must take to ensure their analyses are given
           their due, lead to good decisions, and produce results. This stuff is messy, and we’ve intro-
           duced several models to help simplify. We proposed a life‐cycle model and the organizational
           ecosystem right up front, in Chapter 1, and explored each step in subsequent chapters. For
           example, we urge data scientists to invest in understanding the business they serve and the real
           problems they must confront, attack data quality proactively, bring both hard and soft data to
   91   92   93   94   95   96   97   98   99   100   101