Page 70 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 70

Teach, Teach, Teach                                                      59


             The Starter Kit of Questions to Ask Data Scientists

             Quite naturally, decision‐makers do not fully trust an analysis or the results or its full impli-
             cations when their understanding of data science is weak. Many learn by asking tough,
             penetrating questions but, for data science, most simply do not know where to start. You can
             help them by providing them this eight‐question “starter kit” (Redman and Sweeney 2013a).
             These questions will also help you be better prepared!

             1.  What problem are you trying to solve? Does it align with my own? It is far too easy for data
               scientists (and others for that matter) to go on extended “fishing expeditions,” seeking
               “interesting insights” that are not tethered to the business. While a certain amount of explo-
               ration is healthy, most  innovation is  of the small‐scale,  one improvement  at a  time
               variety – even with data. Encourage your data scientists to focus initially on known issues
               and opportunities as well as more tangible insights. As your confidence in them (or at least
               a few individuals) grows, give them freer rein. At the same time, you should develop a keen
               eye for the difference between “exploring a difficult path” and “wallowing around.”
             2.  Do you have a deep understanding of what the data really means? We discussed the nuance
               and subtleties in data quality in Chapter 6. Unfortunately, too often people gather data
               without a complete understanding of the wider context in which the data was created, and
               misunderstandings find ways to hide themselves until it is too late. All data, even well‐
               known quantities like “force,” are subtle and nuanced. NASA (which truly has “rocket
               scientists”) crashed a Mars lander because one team used the English measurement
               “foot‐pounds” and another used the Metric measurement “newtons” (Pollack 1999). The poten-
               tial for such problems only grows with the less familiar the data – especially social media,
               the IoT, automatic measurement devices, etc. – and as more intermediaries touch the data.
             3.  Should we trust the data? As also discussed in Chapter 6, untrustworthy, inaccurate data is
               the norm. Just as a car can be no better than its parts, so, too, analyses can be no better than
               the data. Some data is inherently inaccurate (GDP forecasts); other data becomes inaccu-
               rate through processing errors (Barrett 2003). All too often, data collection is just not up to
               snuff. For example, far too many credit reports contain inaccuracies (Bernard 2011). Unless
               there is a solid quality program in place, expect the data to be poor! Demand that data
               scientists explain how they’ve identified and dealt with the issues and are fully transparent
               about whether the data used in analyses really is “good enough.”
             4.  How did the analytic work go? Some analyses proceed quickly and easily – there are a
               minimum of integration issues; it is obvious what the few best analytic techniques are, and
               they yield similar results; good graphics seem to suggest themselves; and further uses of the
               results come easily to mind. Other times, everything about the work is an enormous chore – the
               data scientist had to make too many choices about the data resolution, integration took longer
               than expected, and so forth. Demand that data scientists be fully transparent about their work,
               their level of confidence, and their intuitions about implications beyond the stated goal.
             5.  Are there “big factors,” preconceived notions, hidden assumptions, or conflicting data that
               could compromise your analyses? There is much going on here. First, it’s natural to expect
               a return from our investment in data and analytics, but there’s a sneaky side effect. People
               will “find” what they think you want. Saying upfront that you expect a 10% uptick in
               revenue can cause people to find a short‐term 10% growth that’s not there in the long term
               or to be so busy looking for the 10% that they miss a potential 100% gain.
                 Second, advanced analytics involves considerable judgment. Data scientists may have
               included some data sets in their analyses and excluded others. This affects the structure of
   65   66   67   68   69   70   71   72   73   74   75