Page 74 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 74

64                                                  The Real Work of Data Science


             InfoQ is determined by eight dimensions that can be assessed individually in the context of
           the specific problem and goal. These dimensions include the following:
           1.  Data resolution. Are the measurement scale, measurement uncertainty, and level of data
              aggregation appropriate relative to the goal?
           2.  Data structure. Are the available data sources (including both structured and unstructured
              data) comprehensive with respect to the goal?
           3.  Data integration. Are the possibly disparate data sources properly integrated together?
              Note: this step may involve resolving poor and confusing data definitions, different units of
              measure, and varying time stamps.
           4.  Temporal relevance. Is the time frame in which the data was collected relevant to the goal?
           5.  Generalizability. Are results relevant in a wider context? In particular, is the inference from
              the sample population to the target population appropriate (statistically  generalizable –
              Chapter 8)? Can other considerations be used to generalize the findings?
           6.  Chronology of data and goal. Are the analyses and needs of the decision‐maker synched up
              in time?
           7.  Operationalization. Are results presented in terms that can drive action?
           8.  Communication. Are results presented to decision‐makers at the right time and in the right
              way (as described in Chapter 7)?

             See Appendix C for a detailed list of questions used in InfoQ assessments.
             Importantly, InfoQ helps structure discussions about trade‐offs, strengths, and weaknesses.
           Consider the cellular operator noted above and consider a second potential data set X*. X*
           includes everything X has, plus data on credit‐card churn, but that additional data won’t be
           available for two months. Resolution (the first dimension) goes up, while temporal resolution
           (the fourth) goes down. Or suppose a new machine learning analysis, f*, has been conducted
           in parallel, but results from f and f* don’t quite line up. “What to do?” These are the most
           important discussions for decision‐makers, data scientists, and CAOs.
             Further, the InfoQ framework can be used in a variety of settings, not just for helping decision‐
           makers become more sophisticated. It can also be used to assist in the design of a data science
           project, as a midproject assessment, and as a postmortem to sort out lessons learned. See Kenett
           and Shmueli (2016a) for a comprehensive discussion of InfoQ and its applications in risk
           management, health care, customer surveys, education, and official statistics.
           A Hands‐On Information Quality Workshop

           This workshop uses InfoQ to help an entire team understand the importance of clear goals and
           what it takes to achieve information quality with respect to those goals. It combines individual
           work, team discussions, and group presentations, using the InfoQ framework.

           Phase I: Individual Work
           Please  complete  the  following  four  steps  and  document  the  results  of  each  for  further
           discussion.

           Step 1: The Background
           Pick an organization to focus on. It should be one that you know reasonably well, such as your
           current or previous place of employment, a school, hospital, or restaurant.
   69   70   71   72   73   74   75   76   77   78   79