Page 41 - Clinical Small Animal Internal Medicine
P. 41
2 Statistical Interpretation for Practitioners 9
First, it would be an error to state that there was “no is more than just reflective of the magnitude of differ-
VetBooks.ir difference” between the groups. Clearly, there was a dif- ences or associations and the chosen level of significance;
it is also a function of the variance of the point estimates.
ference (of 40 μ/L). It would, however, be correct to state
that because the null hypothesis of no group difference
was not rejected, there was no significant difference Because such variances are inversely proportional to
sample sizes, two studies with identical differences or
between the groups, assuming the assumptions of the associations can have completely different P‐values: the
statistical model are correct (because (P = 0.10) > (α = smaller study’s differences could be nonsignificant, while
0.05)). the larger study’s differences may be significant.
Second, the P‐value does not provide a quantitative Sixth, it directly follows that any group differences or
assay of the probabilities that the null or alternative associations can eventually be made statistically signifi-
hypotheses are correct. Conventional (superiority) cant if the study size becomes sufficiently large. To illus-
hypothesis testing is predicated on the veracity of the trate this, two random samples of 25 individuals each
null hypothesis, and so does not provide any assessment were created, one assuming blood hemoglobin was nor-
of its truth. Instead, the P‐value addresses an entirely dif- mally distributed with a mean of 15 g/dL and standard
ferent question: how likely (probabilistically speaking) is deviation of 2 g/dL, and the other with a mean and stand-
it that one would observe differences at least as large as ard deviation of 15.1 and 2 g/dL, respectively. No one
the one found in the study (40 μ/L) when the null hypoth- would seriously argue that a hemoglobin difference of
esis is true? In other words, instead of the P‐value equal- 0.1 g/dL is clinically important, and indeed it is not sig-
ing the probability that the null hypothesis is true given nificant at α = 0.05 (P = 0.79). However, if the two groups
the data observed in the study, it provides the probability were constructed to have 2500 individuals each, with the
of observing the difference in the data observed (or more same means and standard deviations, this same clinically
extreme) given the null hypothesis is true. It follows that unimportant difference (0.1 g/dL) becomes statistically
“large” P‐values indicate substantial concordance with significant (P = 0.023). This underscores an important
the null hypothesis, while “small” P‐values indicate poor distinguishing principle of statistical analysis: statistical
concordance with the null hypothesis (presumably moti- significance is distinctly different from and does not
vating its rejection in favor of its alternative). imply medical importance. In a large enough study, even
Third, a P‐value is only numerically correct under a trivial and unimportant differences can become statisti-
particular statistical model. With the Student’s t‐test cally significant; in a small study, differences that appear
example, the underlying model assumes that the ALT to be worthy of medical pursuit may be statistically insig-
values in each group are approximately normally distrib- nificant. In recognition of this principle, alternative
uted. If the assumption of normality is violated, however, methods of hypothesis testing have been developed that
the P‐value will no longer be correct; the greater the instead of examining superiority of one group over
departure of the data distribution from normality, the another, evaluate whether groups are noninferior or
more incorrect the P‐value will be. equivalent based on a determination of what constitutes
Fourth, another assumption is that the study data are an acceptably important difference or association [7].
independent, meaning that knowing one individual’s Finally, each decision to reject or not reject a null
ALT value does not allow the ability to predict another hypothesis following hypothesis testing is prone to error.
individual’s ALT value. In this example, such an assump- What is often unappreciated is that the more tests that
tion is reasonable when each individual contributes only are performed, the greater the probability that at least
one ALT value. However, when replicates from a single one decision will be incorrect. In a clinical setting, this is
individual are included, it is plausible to assume that perhaps most manifest when performing clinical labora-
knowing an individual’s value at one time can better pre- tory testing panels to screen for hematologic or chemical
dict the same individual’s value at another time than the abnormalities in blood. Reference ranges for blood
value from a different individual. Such a violation of the parameters are typically constructed to encompass
data independence assumption leads to the estimation of approximately 95% of normal animals, implying that 5%
an incorrect P‐value; typically, the use of correlated of normal animals will have values falling outside the
(nonindependent or dependent) data underestimates P‐ ranges.
values, and hence more type I statistical errors (improp- When a reference range is appropriate for a patient’s
erly rejecting the null hypothesis) arise. age, sex, and any other factors that can influence a par-
Fifth, it is important to recognize that statistical analy- ticular blood parameter, it is reassuring that a normal ani-
sis is more than the analysis of single numbers (i.e., point mal’s value will fall within the reference range 95% of the
estimates, as in averages) in groups; instead, it is more time. However, a universally accepted practice is to run
correctly described as the analysis of variability. For this laboratory panels simultaneously evaluating many
reason, the presence or absence of statistical significance parameters. Suppose, for example, that a blood chemistry