AJSM signin
HOME HELP CONTACT US SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     

Sign In to gain access to subscriptions and/or personal tools.
This Article
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Lyman, S.
Right arrow Articles by Kirkley, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lyman, S.
Right arrow Articles by Kirkley, S.
The American Journal of Sports Medicine 28:918-919 (2000)
© 2000 American Orthopaedic Society for Sports Medicine


Letters to the Editor

Letter

Stephen Lyman, PhD

Birmingham, Alabama

I was surprised to hear that the authors of "The Effect of Neuromuscular Training on the Incidence of Knee Injury in Female Athletes: A Prospective Study" (Hewett et al., November/December 1999, pages 699–706; see "Letters to the Editor" July/August 1999, pp 615–616) continue to deny that their statistical analysis was suboptimal. The rationale for the use of the chi-square test is that it is both easy to calculate and relatively flexible. The primary weakness of the chi-square test is that it is an approximate method. This means, quite simply, that the chi-square test represents an approximation of the results that would be found using an exact test (such as Fisher’s exact test). The limitation is that an expected cell frequency of five or greater is needed to obtain a chi-square estimate that is an accurate approximation of the exact probability.1 I could refer to hundreds of statistical tests, but in the interest of space I have mentioned one.

In the study in question, the expected number of knee injuries in each group was five or less. Additional analysis focused on noncontact knee injuries and noncontact ACL injuries. These subgroups had expected cell frequencies of approximately three and two, respectively. In none of these instances can a reasonable case be made for the chi-square test being a stable and accurate approximation of the Fisher’s exact test. It takes a desktop computer with SAS or SPSS mere seconds to calculate a chi-square or Fisher’s exact test. The chi-square test is useful in a multitude of analyses in which adequate cell sizes are present, but this is not one of them. Why would a statistician use an approximate method when the exact probability is at his fingertips?

I am curious as to how the authors, in their response to Dr. Clancy’s letter, were able to calculate a Fisher’s exact test for three groups? The Fisher’s exact test is only capable of calculating exact probabilities for two independent samples.2 I will refrain from addressing the art of one-tailed tests as well. I hope this serves as a lesson to the readers of your esteemed Journal. It is imperative to understand and adhere to the assumptions of the statistics that we use to let our data speak.

REFERENCES

  1. Daniel WW: Biostatistics: A Foundation for Analysis in the Health Sciences. New York, John Wiley & Sons,1995 , p337
  2. Kuhn JE, Greenfield MLVH, Wojtys EM: A statistics primer: Statistical tests for discrete data. Am J Sports Med 25:585 –586,1997[Free Full Text]

 

Letter

Sandy Kirkley, MD, FRCSC

London, Ontario, Canada

Editor’s Note: The following "Letters to the Editor" illustrate the problem that some of us may have with statistics. This is an honest attempt on the part of authors and critics to come to an agreement that will be helpful to readers. In contrast to most previous such letters and controversies, we have not definitely resolved this issue. The letter from Dr. Kirkley was sent to Dr. Clancy; the letter by Ashikaga is from an independent and competent statistics person who comments on the previous letters as well as one being published here for the first time.

I have reviewed the article by Hewett et al. and would like to point out that this type of discussion is a great opportunity to learn from each other’s thoughts and expertise.

I would agree that the chi-square test of association is a very approximate test, yielding a P value that is somewhat optimistic, and I would agree that, for the analysis of discrete outcomes, the exact test for association is better given by the Fisher’s exact test. However, both methods of testing are valid for analysis of this type of data as long as the assumptions or limitations for each test are recognized. In this article, the total number of events is 14, 10 for the untrained female group, 2 for the untrained male group, and 2 for the trained female group. The chi-square test for association is inaccurate when the expected frequency in any cell is small, or less than 5. When considering the chi-square comparison of all three groups, the expected value for the trained group is 4 and the expected value for the male group is 4.79, and thus the test becomes inaccurate. In fact, when testing these data using the statistics program SAS (Version 6.0), the printout contains a warning that reads "33% of the cells have expected counts less than 5. Chi-square may not be a valid test."

For the remaining analysis comparing two different groups, the Fisher’s exact test can be used and should be used for the same reason stated above, several of these cells will contain cells with an expected count less than 5. However, the P value given by the Fisher’s exact test for these comparisons, when considering the one-tailed value, are very close to the values given by the chi-square test and certainly do not change the finding that the groups are significantly different.

The argument as to whether a one-tailed or two-tailed test should be applied is really one of preference. In most cases, to be conservative, a two-tailed test is used. However, if stated a priori, and given a suitable explanation, a one-tailed test can be used. In this case, the authors have clearly stated in their hypothesis that they expected a decrease in injury rates in female athletes and they have cited various studies throughout the literature to support their theory. These authors have also indicated in their explanation of the statistical analysis that a one-way or one-tailed chi-square test was used in the analysis. As long as these authors have clearly stated their intent, the issue of whether a one- or two-tailed analysis is more appropriate is left up to the reader. If the reader believes that a two-tailed test is a more appropriate analysis, or prefers a more conservative approach to be convinced that these findings are real, then he will be influenced less by this article than by other articles contained within the literature that are more conservative. Or he may decide that he is willing to accept the analysis as it has been performed.

Another discussion that is relevant to this article is the issue of the conventional alpha level of 0.05. A P value of 0.05 is the probability that the results of the experiment could be obtained (concluding that the groups are different) if in fact the null hypothesis is true (the groups are not really different). In other words, if the groups are found to be statistically different by alpha equal to 0.05, the probability that this difference occurred by chance alone, and that the difference does not normally exist is 5 times out of 100 or 1 in 20. When considering the significance of study results, one has to keep in mind that this value is arbitrary and not absolute. In other words, if the P value is found to be equal to 0.05, as it is in this study (when a two-tailed Fisher’s exact test is used), then there is an 8% probability that this result occurred by chance alone. In other words, a P value of 0.08, although insignificant by convention, should still be considered for what it is, a very strong trend that approaches significance.

Lastly, the definition of knee injury as given by these authors is simply that, defined by these authors. A study is valid if the definition of the event is clearly stated and, as with one- or two-tailed tests or issues concerning the conventional P value, the generalizability of this paper to one’s own practice is individual. If the definition is not believed to be representative of the reader’s clinical population, then the results of the paper may not be useful to him. The same applies to the training program or any intervention implemented for a study. If the reader believes that it is inappropriate, he may decide to write a more appropriate protocol, perform a study, report his own results, and discuss the pros and cons associated with different interventions. However, just because a study is not believed to be generalizable to all clinical practices does not make the study invalid or mean that we cannot learn from it.

Again, I would like to reiterate how important these discussions are. We can learn so much from what others "bring to the table."





This Article
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Lyman, S.
Right arrow Articles by Kirkley, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lyman, S.
Right arrow Articles by Kirkley, S.


HOME HELP CONTACT US SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS