Biostatistics for the Clinician

1.7 Distributions

1.7.1 Why Important?

Why do you need to know about frequency distributions, typically referred to just as "distributions"? Again, the answer is that they occur repeatedly in the medical research literature. To understand the literature and to be able to critically evaluate reported research studies so you can apply their results, you need to know about the distributions.

What is meant by a distribution? A frequency distribution is simply a table, chart or graph which pairs each different value obtained with the number or proportion of times it occurs. So, anytime you have a set of values, each value may be plotted against the number or proportion of times it occurs using a graph having the values on the horizontal axis and the counts or proportions on the vertical axis. Such a graph is a very convenient way to represent a frequency distribution (see any of the figures below).

It turns out that some distributions are particularly important because they occur frequently in clinical situations. Some of the most important distributions are the Gaussian, the binomial and the Poisson distributions.

Distributions
Practice
Exercise 1:
You need to know about distributions to:

No Response
Evaluate medical research studies
Diagnose illness
Compute statistics
None of the above

1.7.2 Gaussian

The family of Gaussian distributions or bell curves (also known as normal distributions) is by far the most important set of distributions. When values are obtained by summing over a number of random outcomes, which tends to be the case with many variables encountered in the clinic, the sum tends to assume a Gaussian distribution. So, Gaussian distributions occur extremely frequently and turn out to be the basis for many inferential statistical tests.

The Gaussian distribution gives a precise mathematical formulation to the "law of errors". When measurements are made there will typically be errors. Most of the errors will be small and close to the actual value. On the other hand, there will be some measurements having greater error. But, as the size of the errors of measurement increase the number of such errors decreases. The Gaussian distribution or Gaussian curve shows the precise relationship between the size of the error and how often the error is likely to occur (the frequency of the error). The figure below shows what a Gaussian distribution looks like (see Figure).

Gaussian Distribution

Distributions
Practice
Exercise 2:
Gaussian distributions are particularly important because they:

No Response
Occur frequently
Represent a law of errors
Support inferential tests
All of the above

1.7.3 Binomial

The family of binomial distributions is relevant when independent trials occur which can be categorized as having two possible outcomes that could be described as "success" or "failure" and known probabilities are associated with each of the outcomes. For example, without knowing the correct answers for true-false questions there would be equal probabilities of each answer being right or wrong. A question to be answered by the binomial distribution might then be, "What is the probability of getting at least 7 correct on a 10 item true/false test?

Suppose you wanted to know the probability, when throwing a die, that in 5 throws a six, or two sixes would come up. Here the probability of a "success" is 1/6. The probability of "failure" is 5/6. The binomial distribution again would describe the probabilities associated with various numbers of "successes" and "failures" in such situations.

Finally, as another example, assume that you wanted to determine the probability that a genetically based defect will occur in the children of families of various sizes, given the presence of the characteristic in one of the parents. The binomial distribution would describe the probabilities that any number of children from a family would be expected to inherit the defect. All of these situations describe dichotomous (two-valued) variables, which when graphed over multiple trials can be expected to assume a binomial distribution (see the figure below).

Binomial Distributions

Distributions
Practice
Exercise 3:
Binomial distributions describe probabilities of a given number of "successful" trials out of a greater number of trials when each trial may have:

No Response
One outcome
Two outcomes
Multiple outcomes
None of the above

1.7.4 Poisson

Another important set of discrete distributions is the Poisson distribution. It is useful to think of the Poisson distribution as a special case of the binomial distribution, where the number of trials is very large and the probability is very small. More specifically, the Poisson is often used to model situations where the number of trials is indefinitely large, but the probability of a particular event at each trial approaches zero. The number of bacteria on a petri plate can be modeled as a Poisson distribution. Tiny areas on the plate can be viewed as trials, and a bacterium may or may not occur in such an area. The probability of a bacterium being within any given area is very small, but there are a very large number of such areas on the plate. A similar case would be encountered when counting the number of red cells that fall in a square on a hemocytometer grid, looking at the distribution of the number of individuals in America killed by lightening strikes in one year, or the occurrence of HIV associated needle sticks in US hospitals each year.

So, the Poisson can be viewed as an approximation to the binomial distribution. The approximation is good enough to be useful even when the sample size (N) is only moderately large (say N > 50) and the probability (p) is only relatively small (p < .2) (Hayes, 1981) (see the figure below). The advantage of the Poisson distribution, of course, is that if N is large you need only know p to determine the approximate distribution of events. With the binomial distribution you also need to know N.

Poisson Distributions

Distributions
Practice
Exercise 4:
The simpler Poisson distribution may be used as an approximation to the binomial distribution when:

No Response
Samples are large
Variability is large
The probability is small
Samples are large & probability is small

Final Instructions

Press Button below for your score.

• After completing Lesson 1.7, including all practice exercises, press the "Submit... " button below for Lesson 1.7 research participation credit.
• After you press "Submit..." it is possible Netscape may tell you it is unable to connect because of unusually high system demands. If you receive no error message upon submission you're OK. But, if Netscape gives you an error message after you press the "Submit..." button, wait a moment and resubmit or consult the attendant.