Page 74 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 74
64 The Real Work of Data Science
InfoQ is determined by eight dimensions that can be assessed individually in the context of
the specific problem and goal. These dimensions include the following:
1. Data resolution. Are the measurement scale, measurement uncertainty, and level of data
aggregation appropriate relative to the goal?
2. Data structure. Are the available data sources (including both structured and unstructured
data) comprehensive with respect to the goal?
3. Data integration. Are the possibly disparate data sources properly integrated together?
Note: this step may involve resolving poor and confusing data definitions, different units of
measure, and varying time stamps.
4. Temporal relevance. Is the time frame in which the data was collected relevant to the goal?
5. Generalizability. Are results relevant in a wider context? In particular, is the inference from
the sample population to the target population appropriate (statistically generalizable –
Chapter 8)? Can other considerations be used to generalize the findings?
6. Chronology of data and goal. Are the analyses and needs of the decision‐maker synched up
in time?
7. Operationalization. Are results presented in terms that can drive action?
8. Communication. Are results presented to decision‐makers at the right time and in the right
way (as described in Chapter 7)?
See Appendix C for a detailed list of questions used in InfoQ assessments.
Importantly, InfoQ helps structure discussions about trade‐offs, strengths, and weaknesses.
Consider the cellular operator noted above and consider a second potential data set X*. X*
includes everything X has, plus data on credit‐card churn, but that additional data won’t be
available for two months. Resolution (the first dimension) goes up, while temporal resolution
(the fourth) goes down. Or suppose a new machine learning analysis, f*, has been conducted
in parallel, but results from f and f* don’t quite line up. “What to do?” These are the most
important discussions for decision‐makers, data scientists, and CAOs.
Further, the InfoQ framework can be used in a variety of settings, not just for helping decision‐
makers become more sophisticated. It can also be used to assist in the design of a data science
project, as a midproject assessment, and as a postmortem to sort out lessons learned. See Kenett
and Shmueli (2016a) for a comprehensive discussion of InfoQ and its applications in risk
management, health care, customer surveys, education, and official statistics.
A Hands‐On Information Quality Workshop
This workshop uses InfoQ to help an entire team understand the importance of clear goals and
what it takes to achieve information quality with respect to those goals. It combines individual
work, team discussions, and group presentations, using the InfoQ framework.
Phase I: Individual Work
Please complete the following four steps and document the results of each for further
discussion.
Step 1: The Background
Pick an organization to focus on. It should be one that you know reasonably well, such as your
current or previous place of employment, a school, hospital, or restaurant.