Page 70 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 70
Teach, Teach, Teach 59
The Starter Kit of Questions to Ask Data Scientists
Quite naturally, decision‐makers do not fully trust an analysis or the results or its full impli-
cations when their understanding of data science is weak. Many learn by asking tough,
penetrating questions but, for data science, most simply do not know where to start. You can
help them by providing them this eight‐question “starter kit” (Redman and Sweeney 2013a).
These questions will also help you be better prepared!
1. What problem are you trying to solve? Does it align with my own? It is far too easy for data
scientists (and others for that matter) to go on extended “fishing expeditions,” seeking
“interesting insights” that are not tethered to the business. While a certain amount of explo-
ration is healthy, most innovation is of the small‐scale, one improvement at a time
variety – even with data. Encourage your data scientists to focus initially on known issues
and opportunities as well as more tangible insights. As your confidence in them (or at least
a few individuals) grows, give them freer rein. At the same time, you should develop a keen
eye for the difference between “exploring a difficult path” and “wallowing around.”
2. Do you have a deep understanding of what the data really means? We discussed the nuance
and subtleties in data quality in Chapter 6. Unfortunately, too often people gather data
without a complete understanding of the wider context in which the data was created, and
misunderstandings find ways to hide themselves until it is too late. All data, even well‐
known quantities like “force,” are subtle and nuanced. NASA (which truly has “rocket
scientists”) crashed a Mars lander because one team used the English measurement
“foot‐pounds” and another used the Metric measurement “newtons” (Pollack 1999). The poten-
tial for such problems only grows with the less familiar the data – especially social media,
the IoT, automatic measurement devices, etc. – and as more intermediaries touch the data.
3. Should we trust the data? As also discussed in Chapter 6, untrustworthy, inaccurate data is
the norm. Just as a car can be no better than its parts, so, too, analyses can be no better than
the data. Some data is inherently inaccurate (GDP forecasts); other data becomes inaccu-
rate through processing errors (Barrett 2003). All too often, data collection is just not up to
snuff. For example, far too many credit reports contain inaccuracies (Bernard 2011). Unless
there is a solid quality program in place, expect the data to be poor! Demand that data
scientists explain how they’ve identified and dealt with the issues and are fully transparent
about whether the data used in analyses really is “good enough.”
4. How did the analytic work go? Some analyses proceed quickly and easily – there are a
minimum of integration issues; it is obvious what the few best analytic techniques are, and
they yield similar results; good graphics seem to suggest themselves; and further uses of the
results come easily to mind. Other times, everything about the work is an enormous chore – the
data scientist had to make too many choices about the data resolution, integration took longer
than expected, and so forth. Demand that data scientists be fully transparent about their work,
their level of confidence, and their intuitions about implications beyond the stated goal.
5. Are there “big factors,” preconceived notions, hidden assumptions, or conflicting data that
could compromise your analyses? There is much going on here. First, it’s natural to expect
a return from our investment in data and analytics, but there’s a sneaky side effect. People
will “find” what they think you want. Saying upfront that you expect a 10% uptick in
revenue can cause people to find a short‐term 10% growth that’s not there in the long term
or to be so busy looking for the 10% that they miss a potential 100% gain.
Second, advanced analytics involves considerable judgment. Data scientists may have
included some data sets in their analyses and excluded others. This affects the structure of