Page 40 - Clinical Small Animal Internal Medicine
P. 40

8  Section 1  Evaluation and Management of the Patient

                 concordant. The latter can be influenced by how   It  would  be  a  grievous  error,  however,  to  accept  a
  VetBooks.ir    different pathologists measure, count, and ulti-  study’s outcomes solely on the basis of high precision.
                 mately determine MI, and whether such determi-
                                                              Invalidity, another source of error in studies and equiva-
                 nations are so precise as to be perfectly replicable
                 by others. Such studies are then relegated to   lently called bias, is a distinctly different statistical meas-
                                                              ure, and represents the disparity between what one
                 determining average effects across potentially het-  empirically measures in a study and what one strives to
                 erogeneous patient populations even within a   measure (namely, factual truth). Returning to the previ-
                 community, much less between countries. Thus,   ous example, if a hospital scale is miscalibrated, then
                 patient characteristics such as age, breed, sex, and   regardless of how precise or imprecise the replicate
                 owner propensity for diagnosis (including biopsy)   measurements are, the average weight will invariably be
                 would be expected to vary, perhaps substantially,   incorrect (biased), with the degree of the bias propor-
                 between institutions such as ours and Elston   tional to the degree of miscalibration.
                 et al.’s. [5]                                  All medical research is potentially susceptible to
                                                              imprecision and bias, and it is only through circumspect
             These illustrations should presumably compel veteri-  study design that these errors can be prevented or con-
            narians to be circumspect about relying too heavily on   trolled. Such errors exist on a quantitative continuum
            any single study, particularly one conducted in a geo-  (parenthetically often reduced to crude descriptors, such
            graphically restricted region, and for authors to exercise   as “very” imprecise or “small” bias), so it becomes impor-
            restraint in advocating wider than justified implications   tant to recognize that all studies must be judged by both
            for a single study. Although the heterogeneity existing   error criteria and not just one (or neither). Such lack of
            across populations may have a negligible impact on the   circumspection is pervasive, even in academic settings,
            universality of study findings, a cautious approach to   and helps explain the often contradictory results one so
            generalizability is often warranted.              often reads about in the popular media [6]. It also under-
                                                              scores the often unappreciated theme that even very
                                                              large‐sample studies are susceptible to bias, and despite
                                                              their exceedingly high precision can conceivably be as
              Internal Validity                               misleading as a much smaller study.

            Unlike external validity, which relies on information not
            usually obtained in a study to make inferences beyond
            the bounds of a study population, internal validity     Hypothesis Testing
            depends on the relative presence or absence of two
            sources of error pervasive to all clinical studies: impreci-  Perhaps the most obvious manifestation of the use of sta-
            sion (random error) and invalidity (systematic error).  tistical analysis in clinical research is through much mis-
             Imprecision has multiple formal statistical definitions   understood hypothesis testing. Although investigators
            and interpretations, but can best be understood as how   and clinicians typically want to know if the differences
            variable the results are expected to be from multiple   between study groups are real, or if the association
            studies measuring the same quantity. As an example, if a   between risk factors and health outcomes is real, hypoth-
            veterinarian were to repeatedly weigh a patient using a   esis tests are unable to answer these questions. Moreover,
            scale, precision would imply obtaining similar weights   the P‐values so ubiquitously reported in clinical articles
            each time; conversely, imprecision implies considerable   fail to inform readers about the probabilities that meas-
            disagreement in weights with each attempt. Standard   ured differences or associations are real. In reality, their
            deviations, standard errors, variances, and ranges are all   interpretation is surprisingly counterintuitive.
            statistical measures for quantifying imprecision. We can   Consider the example of comparing the average (mean)
            make two broad qualitative statements about variability   alanine aminotransferase (ALT) levels between young
            without resorting to mathematical formulas: (1) the fur-  dogs and geriatric dogs. The average in young dogs is
            ther apart observations are from each other, the more   40 μ/L, and in geriatric dogs is 80 μ/L. The null hypothesis
            imprecise any summary statistic based on them will be;   is that the averages in the two groups are equal; the two‐
            and (2) the larger a study is, the more precise any statistic   sided (meaning that either group could have a higher/
            derived from it will be. This has an intuitive appeal: the   lower average than the other) alternative hypothesis is
            findings from a large study are more likely to be accepted   that they are unequal. A two‐group Student’s t‐test is per-
            as more precise than those from a small study. Provided   formed, and a P‐value of 0.10 is calculated. If the level of
            there are no biases, the source of statistical variability in   significance (α) is 0.05, then the difference is not statisti-
            a study is typically ascribed to random error.    cally significant. What can one conclude from this?
   35   36   37   38   39   40   41   42   43   44   45