Page 87 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)

Page 87 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat

P. 87

78 The Real Work of Data Science

shallow, because they focus only on immediately available data, and they lack insight, because
management does not give data scientists an opportunity to reflect on their findings.
Inspection provides a way out of firefighting. To prevent problems from reaching the
customer, companies inspect every product and activity. Inspection data – for example, from
the IoT, such as usage tracking and in‐line process control applications – helps determine the
quality of a product or service. When collected over time, inspection data provides a rearview
mirror perspective.
Organizations at the inspection maturity level have much data to analyze. Business intelli-
gence platforms, such as Power BI, Tableau, and the pivoting capabilities in Excel, allow data
scientists to visualize this data in various perspectives and slices. The typical report in such
organizations is based on dashboards with descriptive statistics such as bar charts and pie
charts.
Just as it is impossible to drive a car by looking in the mirror, so too it is difficult to run an
organization with historical data only. Drivers need to look ahead, through the windshield, and
organizations require a similar look‐forward capability. The science of good prediction took a
big leap forward with the control chart, invented by Walter Shewhart at Bell Laboratories in
1924. Shewhart explicitly embraced variation in his formulation. Critically, the chart triggers
an alarm when a process is out of control, provides a platform for improvement, and often
helps identify those opportunities.
Imagine you are in 1924 and you work for a company developing, producing, installing,
and maintaining the telephone system in America. Your boss asks you to provide to the factory
floor a tool for managing the telephone assembly line. The idea is that instead of relying on
mass inspection, you are asked to develop a tool that helps control the production process. In
this context, Walter Shewhart wrote to his boss on May 16, 1924: “The attached form of report
is designed to indicate whether or not the observed variations in the percent of defective
apparatus of a given type are significant; that is, to indicate whether or not the product is
satisfactory.”
Shewhart did not stop there: “The theory underlying the method of determining the sig-
nificance of the variations in the value of p is somewhat involved when considered in such a
form as to cover practically all types of problems.” The control chart proposed by Shewhart
extends to many other domains (Shewhart 1926; Kenett et al. 2014). It is a great example of
the contributions data scientists and statisticians can make. The control chart addresses a real
problem, provides a practical tool for operators and managers, and is theoretically sound.
Figure 16.1 is an example from a modern web‐based system.
In this spirit, in 2016, at a major semiconductor company, data scientists integrated data
from the wafer production line with testing data to determine how much additional testing was
required. Thus, chips whose supporting data indicate they are more likely to be defective are
tested in greater depth, and those deemed less likely to be defective undergo lesser scrutiny.
It’s a “win‐win‐win,” saving time, improving quality, and contributing to the bottom line. This
reflects 90 years of advances in industrial statistics.
Organizations with such a process view require data collection, data analysis, and data pre-
sentation that include predictive analytics and online monitoring. This requires a data scientist
with significantly greater capabilities. The control chart is based on the concept of a statistical
distribution representing the performance of a stable process. It indicates when the underlying
distribution has changed and the process has gone “out of control.” The data scientist, in such
an environment, works with probability distributions in the background.

82 83 84 85 86 87 88 89 90 91 92