Page 53 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 53
40 The Real Work of Data Science
V{P} = value of the problem to be solved
V{PS} = value of the problem actually solved
P{S} = probability level the problem actually gets solved
P{I} = probability level the solution is actually implemented
T{I} = time the solution stays implemented
E{R} = expected number of replications.
Let’s explore each in turn.
V{D} = Value of the Collected Data
The application of data science depends on data, so obtaining the right data of the right quality
is critical. A high V{D} corresponds to data being most relevant to the problem, trusted,
clearly understood by relevant stakeholders, and collected comprehensively without bias. We
discussed this in Chapter 6.
V{M} = Value of the Analytic Methods Employed
This concept is closest to the original idea of mathematical statistical efficiency and includes
the idea that the method should be as efficient as possible. As an example, suppose a manager
wishes to reduce billing errors and must first obtain an accurate baseline error rate. Suppose
there are two candidate methods, A and B. Method A is more efficient than method B if
method A requires a smaller sample to provide the required estimate with the same prespecified
error. More generally, a high V{M} is assigned to methods with proven mathematical prop-
erties, such as unbiasedness and consistency.
V{P} = Value of the Problem to Be Solved
Data scientists sometimes forget this part of the equation. Some might choose problems on the
basis of technical depth rather than the value of solving them. To illustrate, one of us spent
time figuring out how to reduce billing errors that were worth over $700,000/year, a fact
crucial to management, even though solving the problem was not particularly difficult. A high
V{P} is assigned to problems of strategic importance to the organization.
V{PS} = Value of the Problem Actually Solved
Usually no one method actually solves the entire problem, only part of it, so this part of the
equation is expressed as a fraction of V{P}. In the case of the billing example, the manager
expected to reduce the billings errors from 24,000 to 3,000 per billing cycle, a success rate of
87.5%. Problems with high V{P} that are fully solved get a high V{PS}.
P{S} = Probability the Problem Actually Gets Solved
This is both a statistical question and a management question. Did the method work and lead
to a solution that worked, and were the data, information, and resources available to solve the
problem? Part of this PSE component is related to management and technical personnel’s
buy‐in and in meeting the challenge of facing the problem tackled. This is achieved by getting
the relevant stakeholders to play an active role in specifying the problem and interpreting
results. A high value of P{S} implies that proper planning and effective execution have been
carried out.