Page 87 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 87

78                                                  The Real Work of Data Science


           shallow, because they focus only on immediately available data, and they lack insight, because
           management does not give data scientists an opportunity to reflect on their findings.
             Inspection provides a way out of firefighting. To prevent problems from reaching the
           customer, companies inspect every product and activity. Inspection data – for example, from
           the IoT, such as usage tracking and in‐line process control applications – helps determine the
           quality of a product or service. When collected over time, inspection data provides a rearview
           mirror perspective.
             Organizations at the inspection maturity level have much data to analyze. Business intelli-
           gence platforms, such as Power BI, Tableau, and the pivoting capabilities in Excel, allow data
           scientists to visualize this data in various perspectives and slices. The typical report in such
           organizations is based on dashboards with descriptive statistics such as bar charts and pie
           charts.
             Just as it is impossible to drive a car by looking in the mirror, so too it is difficult to run an
           organization with historical data only. Drivers need to look ahead, through the windshield, and
           organizations require a similar look‐forward capability. The science of good prediction took a
           big leap forward with the control chart, invented by Walter Shewhart at Bell Laboratories in
           1924. Shewhart explicitly embraced variation in his formulation. Critically, the chart triggers
           an alarm when a process is out of control, provides a platform for improvement, and often
           helps identify those opportunities.
             Imagine you are in 1924 and you work for a company developing, producing, installing,
           and maintaining the telephone system in America. Your boss asks you to provide to the factory
           floor a tool for managing the telephone assembly line. The idea is that instead of relying on
           mass inspection, you are asked to develop a tool that helps control the production process. In
           this context, Walter Shewhart wrote to his boss on May 16, 1924: “The attached form of report
           is designed to indicate whether or not the observed variations in the percent of defective
           apparatus of a given type are significant; that is, to indicate whether or not the product is
           satisfactory.”
             Shewhart did not stop there: “The theory underlying the method of determining the sig-
           nificance of the variations in the value of p is somewhat involved when considered in such a
           form as to cover practically all types of problems.” The control chart proposed by Shewhart
           extends to many other domains (Shewhart 1926; Kenett et al. 2014). It is a great example of
           the contributions data scientists and statisticians can make. The control chart addresses a real
           problem, provides a practical tool for operators and managers, and is theoretically sound.
           Figure 16.1 is an example from a modern web‐based system.
             In this spirit, in 2016, at a major semiconductor company, data scientists integrated data
           from the wafer production line with testing data to determine how much additional testing was
           required. Thus, chips whose supporting data indicate they are more likely to be defective are
           tested in greater depth, and those deemed less likely to be defective undergo lesser scrutiny.
           It’s a “win‐win‐win,” saving time, improving quality, and contributing to the bottom line. This
           reflects 90 years of advances in industrial statistics.
             Organizations with such a process view require data collection, data analysis, and data pre-
           sentation that include predictive analytics and online monitoring. This requires a data scientist
           with significantly greater capabilities. The control chart is based on the concept of a statistical
           distribution representing the performance of a stable process. It indicates when the underlying
           distribution has changed and the process has gone “out of control.” The data scientist, in such
           an environment, works with probability distributions in the background.
   82   83   84   85   86   87   88   89   90   91   92