Page 73 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 73

13







             Evaluating Data Science Outputs

             More Formally






             In the last chapter, we focused on teaching your colleagues some basics and providing a starter
             set of questions for decision‐makers. Of course, this business of helping decision‐makers
             become increasingly better consumers of data science never ends. As they gain experience,
             you need to provide them a more formal template based on the eight dimensions of the
             information quality model (Kenett and Shmueli 2016a). This will help them go deeper, facili-
             tate discussions regarding trade‐offs, and help them improve the quality of information gener-
             ated  in  their  organizations.  Breiman  (2001)  depicts  two  cultures  in  the  use  of  statistical
             modeling to reach conclusions from data, data modeling, and algorithmic analysis. The InfoQ
             framework addresses outputs from both approaches, in the context of business, academic,
               services, and industrial work.

             Assessing Information Quality

             The InfoQ framework provides a structured approach for evaluating the analytic work. InfoQ
             is defined as the utility, U, derived by conducting a certain analysis, f, on a given data set, X,
             with respect to a given goal, g. For the mathematically inclined:


                                                 ,
                                             ,,
                                      InfoQU fX g    U f X  g .

               As an example, consider cellular operators who want to reduce churn by launching a cus-
             tomer retention campaign. Their goal, g, is to correctly identify customers with high poten-
             tial for churn – the logical target of the campaign. The data, X, consists of customer usage,
             lists of customers who’ve changed operators, traffic patterns, and problems reported to the
             call center. The data scientist plans to use a decision tree, f, which will help him or her
             define business rules that identify groups of customers with similar churn probabilities. The
             utility, U, is increased profits by targeting this campaign only on customers with a high
             churn potential.


             The Real Work of Data Science: Turning Data into Information, Better Decisions, and Stronger Organizations,
             First Edition. Ron S. Kenett and Thomas C. Redman.
             © 2019 Ron S. Kenett and Thomas C. Redman. Published 2019 by John Wiley & Sons Ltd.
             Companion website: www.wiley.com/go/kenett-redman/datascience
   68   69   70   71   72   73   74   75   76   77   78