Page 64 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 64

52                                                  The Real Work of Data Science


           3.  anchoring
           4.  representativeness
           5.  availability
           6.  regression toward the mean
           7.  spurious correlation
           8.  framing.

             To illustrate, consider the data from two experiments (Tversky and Kahneman 1981). The
           value of N listed in brackets represents the number of respondents in these experiments, who
           were randomly assigned to Problem 1 or Problem 2.

           Problem 1
           (N = 152): Imagine that the United States is preparing for the outbreak of an unusual disease,
           which is expected to kill 600 people. Two alternative programs to combat the disease have
           been proposed. Assume that the exact scientific estimates of the consequences of the programs
           are as follows:

              • If Program A is adopted, 200 people will be saved (72% made this selection).
              • If Program B is adopted, the probability that all 600 people will be saved is one‐third and
             the probability that no one will be saved is two‐thirds (28% made this one).

           Which of the two programs would you favor?
           Now consider an alternative formulation:

           Problem 2
           (N = 155): Imagine that the United States is preparing for the outbreak of an unusual disease,
           which is expected to kill 600 people. Two alternative programs to combat the disease have
           been proposed. Assume that the exact scientific estimates of the consequences of the programs
           are as follows:

              • If Program C is adopted, 400 people will die (22%).
              • If Program D is adopted, the probability that no one will die is one‐third, and the probability
             that 600 people will die is two‐thirds (78%).

           The expected number of deaths is the same in both problems, but people ignore this. The majority
           choice in Problem 2 shows a willingness to take a risk: the certain death of 400 people is less
           acceptable than the two‐in‐three chance that 600 will die. The preferences in Problems 1 and 2
           illustrate a common pattern: choices involving possible benefits more often elicit risk‐averse
           decisions, and those involving possible losses more often elicit risk‐taking decisions.
             The implications for data scientists are profound – small changes in the ways you present
           findings can have enormous consequences. Be conscious of your own biases, and make sure
           they do not intrude.
             Building on this theme, Figure 11.1 is based on a famous Muller–Lyer optical illusion. On
           the left, the lower horizontal line seems longer. On the right, with a frame, we clearly see that
           the lines are equal. Just so, data scientists should make sure they frame their results fairly.
   59   60   61   62   63   64   65   66   67   68   69