Page 43 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 43
30 The Real Work of Data Science
Validation sampling
1. A subset of quarterly discharge medical records, originally abstracted by the
primary data collection staff, for a given measure should be sampled for
reabstraction by a second staff responsible for data validation.
• Approximately 5% of the abstracted records should be targeted for
reabstraction for a given measure in a given quarter.
• The minimum quarterly sampling requirement for reabstraction is nine
sampled cases per measure.
• If the originally abstracted quarterly medical record size is less than 180
cases, then the minimum sample requirement for reabstraction would
be nine cases.
Figure 6.2 JCI data validation guidelines.
This admonition applies for entire industries as well. Here standards can help. For example,
the Joint Commission International (JCI), which accredits hospital procedures all over the
world, has devised guidelines to ensure data quality. Some involve definitions, such as what
qualifies as an “infection” in counts of infection. Others involve controls, such as a data
validation step whereby two independent and qualified people retrieve data from a hospital’s
system and compare results. Figure 6.2 is an extract from the data validation guidelines
(Joint Commission International 2018).
Implications
Data quality is probably the toughest issue data scientists face. Worse, it impacts your entire
organization. Thus, the real work of data scientists involves stepping up to the near‐term issues
and addressing them in a coordinated, professional manner. And the real work of CAOs
involves clarifying the larger issues for the rest of the company and helping start programs that
get to the root causes of these issues.