Page 26 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
        P. 26
     The Difference Between a Good Data Scientist and a Great One             11
               It rarely works out that way. As Jeff Hooper, of Bell Labs, liked to say, “Data do not give
               up their secrets easily. They must be tortured to confess.”
                 This is a really big deal. Even under the best of circumstances, too much data is poorly
               defined and simply wrong, and most turns out to be irrelevant to the problem at hand.
               Staring through this noisy data is arduous, frustrating work. Even good data scientists may
               move on to the next problem. Great data scientists stick with it.
                 Great data scientists also persist in making themselves heard. Dealing with a recalci-
               trant bureaucracy can be even more frustrating than dealing with noisy data. Continuing
               the vignette from above, the intern spent his summer defending his discovery. Whichever
               group made the error took great offense, even attacking him personally. Others reacted
               with glee as they celebrated the ignorance of their peers. And he was caught in the
                 middle. Great data scientists know how to handle such situations, persisting through
               thick and thin.
             4.  Finally, they have raw statistical muscle. The abilities to access and analyze data using all
               the newest tools (including classic packages and newer ones such as machine learning)
               are obviously important. But these can learned – of bigger concern is the ability to bring
               statistical rigor to bear. At the risk of oversimplifying, there are two kinds of analyses –
               descriptive and predictive. Descriptive analyses are tough enough. But the really profitable
               analyses involve prediction, which is inherently uncertain (Shmueli 2010).
                 Great data scientists embrace uncertainty. They recognize when a prediction rests on
               solid foundations and when it is merely wishful thinking. They are simply outstanding in
               describing what has to go right for the prediction to hold, what will really foul it up, and
               what are the unknowns that keep them awake at night. When they can, they quantify the
               uncertainty, and they are good at suggesting simple experiments to confirm or deny
               assumptions, reduce uncertainty, explore the next set of questions, etc.
                 To say this in a different way, there are some who opine that, for big data, it is enough to
               understand “correlation” without getting into the complexities of “causation.” There are
               surely some problems for which this is true. But not the really important ones! Understanding
               causation leads to better predictions. The great data scientists will work to establish the
               causative links.
                 This requires them to generalize on a higher level. Focusing only on the data at hand can
               lead to “overfitting,” leading to models that are too complex for future use. Scientific
               generalization invokes domain‐specific knowledge, general principles, and intuition, far
               beyond cross‐validations or comparison of training‐set and hold‐out‐set results (Kenett and
               Shmueli 2016a).
                 To be clear, this ability is not “that certain quantitative knack.” It is trained, sophisti-
               cated, disciplined inferential horsepower, practiced and honed by both success and failure.
                 Some of this is covered in data science curricula (De Veaux et al. 2017; Coleman and
               Kenett 2017). Most is not.
             Implications
             To conclude, the real work of a data scientist is to continually become more effective. You
             probably cannot teach yourself “that certain quantitative knack.” But you can work to develop
             outside interests, read extensively, build a wider, more diverse network, develop a thick skin,
             and study statistical inference. You should start doing so immediately.





