Page 49 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 49
36 The Real Work of Data Science
Modes of Decisions
about
generalization
Population
Data from which
the data were
Laws of nature Mechanistic drawn
models
Hard data Statistical
generalization
Predictive The future
analytics
Domain
generalization
Transportability
Soft A related
data Intuition population
Figure 8.1 Modes of generalization.
Modes of Generalization
The best, most reliable form of generalization involves the laws of nature. These include
conservation of mass, conservation of energy, conservation of momentum, Newton’s laws, the
principle of least action, the laws of thermodynamics, and Maxwell’s equations. Sometimes
these are called “mechanistic models of modes of action.” These laws started as empirical
laws that were embraced as laws of nature and have stood the test of time. They have been
verified time and again, and today, we do not need further data to invoke them, only knowledge
of physics, chemistry, biology, or other scientific disciplines
Mathematics, the queen of the sciences, offers a unique context. Paul Erdos, the famous
mathematician, used to talk about The Book, in which God maintains the perfect proofs of
mathematical theorems (Aigner and Ziegler 2000). The laws of nature build on The Book.
Now consider statistical generalizability. Sorting it out requires deep understanding of the
goals (Chapter 4). In making inference about a population parameter from a sample, statistical
generalizability and sampling bias are the focus, and the question of interest is, “What
population does the sample represent?” (Rao 1985). In contrast, for predicting the values of
new observations, the question is whether the analysis captures associations in the training
data (i.e. the data used in model building) that generalize to the to‐be‐predicted situations.
Control charts present a good example. The logic goes like this: “Assuming the process
remains stable, we expect performance to vary within the upper and lower control limits. We
further expect average performance to be close to the center line.”