Page 54 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 54
Take Accountability for Results 41
P{I} = Probability the Solution Is Actually Implemented
It is all well and good to propose grand solutions that look good in theory. But can they
succeed in practice? And overcoming resistance to change is often the most difficult part of
data science. A high value of P{I} implies that a proper match between management approach
and analytic methods has been established. More on this in Chapter 16.
T{I} = Time the Solution Stays Implemented
Problems have the tendency to reoccur. This is why we emphasize holding the gains in any
process improvement. Take the billing example – suppose that in the first year the company
saves $700,000. With tight controls in place, the original problem stays solved, and the
company saves more than $2 million in only three years. More generally, a high T{I} reflects
a problem that stays solved for a long time.
E{R} = Expected Number of Replications
Good data science solutions are often replicable beyond their initial focus. A high E{R}
represents a very large number of potential replications of the solution.
A PSE assessment does not have to be a big, formal exercise – it can even be done quali-
tatively with a verbal description of the eight elements. One can also apply scores (say, 1–5)
for each element and aggregate these using a multiplication formula or geometric means.
What’s important is structuring a full discussion of the various elements affecting the true
impact of a specific project or collection of projects.
Using Data Science to Perform Impact Analysis
Further, one of the most important uses of data science involves helping others understand the
impact of policy, legal, and methodological changes in both government and business. These
can have enormous political overtones, but data scientists should not back away.
An example of this is the assessment of changes conducted by the Australian Bureau of
Statistics to make their longitudinal surveys less expensive. Part of the value of such surveys
lies in generating time‐series data, which makes them useful for social, economic, and
environmental analyses and policy‐making. Changing the survey methodology, platform, or
questions can affect the continuity of the time series, making them harder to use in inter-
preting the impact of policy decisions, as one cannot know if changes are because of the
policy or the new methodology. One approach to assess the impact of methodological
changes is to run the old method and the new one in parallel for a time. Zhang et al. (2018)
describe exactly how to do so.
This parallel testing approach is essential because national statistical offices all over the
world face the same efficiency improvement challenges. So do organizations carrying out
customer or employee satisfaction surveys. It forms the basis of what is known as “A/B testing,”
which is widely used in web applications.
In terms of the impact assessment, monetary savings are measured easily enough. But the
Zhang et al. work is of great importance because it scores high on E{R}, the expected number
of repetitions, across government and industry.
Now consider as another example, the busing of school children, a topic as controversial as
any. In one district of Tel Aviv, an experiment allowed some children to choose their middle
and high schools, while others were not given this choice. Lavy (2010), a labor economist,