Page 40 - Clinical Small Animal Internal Medicine
P. 40
8 Section 1 Evaluation and Management of the Patient
concordant. The latter can be influenced by how It would be a grievous error, however, to accept a
VetBooks.ir different pathologists measure, count, and ulti- study’s outcomes solely on the basis of high precision.
mately determine MI, and whether such determi-
Invalidity, another source of error in studies and equiva-
nations are so precise as to be perfectly replicable
by others. Such studies are then relegated to lently called bias, is a distinctly different statistical meas-
ure, and represents the disparity between what one
determining average effects across potentially het- empirically measures in a study and what one strives to
erogeneous patient populations even within a measure (namely, factual truth). Returning to the previ-
community, much less between countries. Thus, ous example, if a hospital scale is miscalibrated, then
patient characteristics such as age, breed, sex, and regardless of how precise or imprecise the replicate
owner propensity for diagnosis (including biopsy) measurements are, the average weight will invariably be
would be expected to vary, perhaps substantially, incorrect (biased), with the degree of the bias propor-
between institutions such as ours and Elston tional to the degree of miscalibration.
et al.’s. [5] All medical research is potentially susceptible to
imprecision and bias, and it is only through circumspect
These illustrations should presumably compel veteri- study design that these errors can be prevented or con-
narians to be circumspect about relying too heavily on trolled. Such errors exist on a quantitative continuum
any single study, particularly one conducted in a geo- (parenthetically often reduced to crude descriptors, such
graphically restricted region, and for authors to exercise as “very” imprecise or “small” bias), so it becomes impor-
restraint in advocating wider than justified implications tant to recognize that all studies must be judged by both
for a single study. Although the heterogeneity existing error criteria and not just one (or neither). Such lack of
across populations may have a negligible impact on the circumspection is pervasive, even in academic settings,
universality of study findings, a cautious approach to and helps explain the often contradictory results one so
generalizability is often warranted. often reads about in the popular media [6]. It also under-
scores the often unappreciated theme that even very
large‐sample studies are susceptible to bias, and despite
their exceedingly high precision can conceivably be as
Internal Validity misleading as a much smaller study.
Unlike external validity, which relies on information not
usually obtained in a study to make inferences beyond
the bounds of a study population, internal validity Hypothesis Testing
depends on the relative presence or absence of two
sources of error pervasive to all clinical studies: impreci- Perhaps the most obvious manifestation of the use of sta-
sion (random error) and invalidity (systematic error). tistical analysis in clinical research is through much mis-
Imprecision has multiple formal statistical definitions understood hypothesis testing. Although investigators
and interpretations, but can best be understood as how and clinicians typically want to know if the differences
variable the results are expected to be from multiple between study groups are real, or if the association
studies measuring the same quantity. As an example, if a between risk factors and health outcomes is real, hypoth-
veterinarian were to repeatedly weigh a patient using a esis tests are unable to answer these questions. Moreover,
scale, precision would imply obtaining similar weights the P‐values so ubiquitously reported in clinical articles
each time; conversely, imprecision implies considerable fail to inform readers about the probabilities that meas-
disagreement in weights with each attempt. Standard ured differences or associations are real. In reality, their
deviations, standard errors, variances, and ranges are all interpretation is surprisingly counterintuitive.
statistical measures for quantifying imprecision. We can Consider the example of comparing the average (mean)
make two broad qualitative statements about variability alanine aminotransferase (ALT) levels between young
without resorting to mathematical formulas: (1) the fur- dogs and geriatric dogs. The average in young dogs is
ther apart observations are from each other, the more 40 μ/L, and in geriatric dogs is 80 μ/L. The null hypothesis
imprecise any summary statistic based on them will be; is that the averages in the two groups are equal; the two‐
and (2) the larger a study is, the more precise any statistic sided (meaning that either group could have a higher/
derived from it will be. This has an intuitive appeal: the lower average than the other) alternative hypothesis is
findings from a large study are more likely to be accepted that they are unequal. A two‐group Student’s t‐test is per-
as more precise than those from a small study. Provided formed, and a P‐value of 0.10 is calculated. If the level of
there are no biases, the source of statistical variability in significance (α) is 0.05, then the difference is not statisti-
a study is typically ascribed to random error. cally significant. What can one conclude from this?