Statistical Fallacies Compiled and Explained by Darrel Raines

The following categories delineate common fallacies that are present in statistical discussions. This web page is an attempt to capture some of the more common fallacies that show up in newspaper articles and everyday discussion. It is hoped that the information contained here will help in a critical analysis of such discussions.

One of the most basic, yet easily missed, fallacies is one of definition. If I don't know how you define something, then your measurement will be meaningless. The result is that the statistic will, likewise, be meaningless.

Example: The city of Houston is the 4th largest city in the United States of America.

Fallacy: The term "largest" is not defined and is ambiguous. Does largest mean "largest acreage" (9th), or "most number of people within the city limits" (4th), or "most number of people within the city and all suburbs" (6th), or "highest density of people per square mile within the city limits" (119th)? Without an adequate definition of the term "largest", the statement is not useful. And it certainly does not provide a useful statistic.

Avoiding Problems with Definitions

  • Ensure that all terms used are clearly defined so that there is no confusion about the meaning of the statistic.
  • Look for bias on the part of the entity providing a statistic. Does an undefined term work to the advantage of the one giving the data?
  • Look for unusual accuracy in the information provided. Too many significant digits should be a flag that the presentation is meant to appear more accurate is justified.
  • Look for proxy measurements. If detected, ensure that the proxy is appropriate for the measure being identified.

Estimates are given all the time by experts in a field of study. The difficulty arises when those estimates, either intentionally or otherwise, produce results that are not scrutinized for believability.

Example: Recent studies show the homeless population in the United States to be 671,859 in 2007.

Fallacy: A person who is homeless is not going to stand around waiting to be counted. It is almost impossible to get an accurate statistic in a situation that is so fluid. Many people who are counted as homeless are without a home for a short period of time and then find a place to live. And in most cases, they do not raise their hands to be counted. People move to new areas to find work, food, and shelter.

So, is the figure given too low? Too high? How do we tell? The key to this estimate is that it is probably an unknowable statistic. We will have to satisfy ourselves with the fact that we can only obtain an educated guess at the true number. However, that guess should not be given in the guise of an accurate estimate, or (even worse) an exact count.

A side note on the example: The "accuracy" of the population count does not seem to warrant the number of significant digits used in the statement. Can we really know, down to the person, how many people are homeless. This does not seem likely.

Avoiding Problems with Estimates

  • Some estimates are attempts to guess at a statistic that is not knowable. A measurement may be impossible to take because there is a physical barrier to gathering the data, or people do not know (or will not say) the true count, or people do not report all incidents. Beware this category of estimates.
  • Is the estimate based on an eccentric theory? Then the estimate may be suspect as well.
  • Is the estimate immediately preposterous? Any statistic that, upon closer examination, produces results that are easy to disprove will fall into this category. Many times, the root cause of an immediately preposterous estimate is extrapolation of a small sample to the whole population. (See "projection of trends" below.)
  • Look to see if the estimate is a buildup from a dubious cluster. A sample set that does not represent the whole is inappropriate to serve as the basis for an estimate.
  • Ensure that the estimate is not an uncritical projection of a trend to the future.
  • Use common sense to question estimates. An estimate is, by its very nature, not as reliable as a measured value. Critical examination of any estimate is appropriate.

Illuminating examples of cheating charts will be shown here.

Better than average discussion of averages will appear here.

Percentages are used in a wide variety of ways to provide information comparing one item to others or measuring a rate of change in a single item. Fallacies can arise in a number of different areas, but the most common problems stem from the improper use of

Most people have an intuitive feel for probability. Therefore, they are stunned when they are presented with the fact that they have a faulty understanding of probability when applied to everyday situations. The fallacy lies in the inability of most people to get beyond the fact that their intuition will sometimes lead them astray.

Example: Joey Homer is a baseball player that normally bats a .320 average. He has produced zero (0) base hits in his past 20 at-bats. Joey is due for a hit! He will probably get multiple hits today.

Fallacy: The notion of the next few samples conforming to the average is faulty thinking. Any small random sample will be just that: random. Joey may get 4 hits today or he may not get any. Over the course of a whole season, he will produce 320 hits for every 1000 at-bats. But any 4 or 5 samples of his hitting will not necessarily conform to the average since that is such a small sample size.

People have a hard time with this concept. They think that for an average to "work out", the likelihood of an event is increased when the recent samples have not conformed to the average. Their intuition leads them astray when applied to small samples.

A simple example is a coin flip. 50% of the time the coin will come up heads. If I have just flipped the coin 5 times and each time the coin has come up tails, what are the odds that the next flip will be heads? The answer is 50%. Yet we want to believe that the coin should have some preference for heads simply because the last 5 flips have been tails. We seem to think that the coin suddenly has a 60% or 70% chance of coming up tails. Obviously this is an incorrect belief, but it is very difficult to remove this notion from people's minds.

Sometimes statistical information is given and then compared to similar information taken from another set. This will be a fallacious statistic if the comparison is improper.

Example: More people die in car accidents every year than die in aircraft accidents. Therefore, it is safer to fly than it is to travel by car.

Fallacy: The problem is that two sets are being compared that are different in size. The absolute number of deaths due to car accidents will be a larger number than deaths due to aircraft accidents, in part, because a very large number of people are transported by cars. A relatively small number of people are transported by aircraft.

In this case, a comparison of the proportional number of accidents would have been a valid comparison. Another valid statement would be to list the full number of deaths vs. trips for each mode of transportation. The second method has the advantage of providing more exact information to the reader.

Improper comparisons are often the result of wanting to draw analogies between data sets. Careful reasoning will inform the observer that, in fact, the comparison is not correct.