Page 64 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 64
52 The Real Work of Data Science
3. anchoring
4. representativeness
5. availability
6. regression toward the mean
7. spurious correlation
8. framing.
To illustrate, consider the data from two experiments (Tversky and Kahneman 1981). The
value of N listed in brackets represents the number of respondents in these experiments, who
were randomly assigned to Problem 1 or Problem 2.
Problem 1
(N = 152): Imagine that the United States is preparing for the outbreak of an unusual disease,
which is expected to kill 600 people. Two alternative programs to combat the disease have
been proposed. Assume that the exact scientific estimates of the consequences of the programs
are as follows:
• If Program A is adopted, 200 people will be saved (72% made this selection).
• If Program B is adopted, the probability that all 600 people will be saved is one‐third and
the probability that no one will be saved is two‐thirds (28% made this one).
Which of the two programs would you favor?
Now consider an alternative formulation:
Problem 2
(N = 155): Imagine that the United States is preparing for the outbreak of an unusual disease,
which is expected to kill 600 people. Two alternative programs to combat the disease have
been proposed. Assume that the exact scientific estimates of the consequences of the programs
are as follows:
• If Program C is adopted, 400 people will die (22%).
• If Program D is adopted, the probability that no one will die is one‐third, and the probability
that 600 people will die is two‐thirds (78%).
The expected number of deaths is the same in both problems, but people ignore this. The majority
choice in Problem 2 shows a willingness to take a risk: the certain death of 400 people is less
acceptable than the two‐in‐three chance that 600 will die. The preferences in Problems 1 and 2
illustrate a common pattern: choices involving possible benefits more often elicit risk‐averse
decisions, and those involving possible losses more often elicit risk‐taking decisions.
The implications for data scientists are profound – small changes in the ways you present
findings can have enormous consequences. Be conscious of your own biases, and make sure
they do not intrude.
Building on this theme, Figure 11.1 is based on a famous Muller–Lyer optical illusion. On
the left, the lower horizontal line seems longer. On the right, with a frame, we clearly see that
the lines are equal. Just so, data scientists should make sure they frame their results fairly.