Page 43 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 43

30                                                  The Real Work of Data Science


                 Validation sampling
                     1. A subset of quarterly discharge medical records, originally abstracted by the
                        primary data collection staff, for a given measure should be sampled for
                        reabstraction by a second staff responsible for data validation.
                          •  Approximately 5% of the abstracted records should be targeted for
                             reabstraction for a given measure in a given quarter.
                          •  The minimum quarterly sampling requirement for reabstraction is nine
                             sampled cases per measure.
                          •  If the originally abstracted quarterly medical record size is less than 180
                             cases, then the minimum sample requirement for reabstraction would
                             be nine cases.
                                 Figure 6.2  JCI data validation guidelines.


             This admonition applies for entire industries as well. Here standards can help. For example,
           the Joint Commission International (JCI), which accredits hospital procedures all over the
           world, has devised guidelines to ensure data quality. Some involve definitions, such as what
           qualifies as an “infection” in counts of infection. Others involve controls, such as a data
           validation step whereby two independent and qualified people retrieve data from a hospital’s
           system and compare results. Figure 6.2 is an extract from the data validation guidelines
           (Joint Commission International 2018).
           Implications

           Data quality is probably the toughest issue data scientists face. Worse, it impacts your entire
           organization. Thus, the real work of data scientists involves stepping up to the near‐term issues
           and  addressing them  in a  coordinated,  professional  manner. And  the real  work  of CAOs
           involves clarifying the larger issues for the rest of the company and helping start programs that
           get to the root causes of these issues.
   38   39   40   41   42   43   44   45   46   47   48