Biostatistics for the Clinician Hypertext Glossary Section I

George Oser, Ph.D., Craig W. Johnson, Ph.D. Allan J.Abedor, Ph.D.

Biostatistics for the Clinician Term: Biostatistics for the Clinician Alpha Alpha is the probability of rejecting a true null hypothesis. It is the probability that the investigator will conclude that a relationship exists between independent and dependent variables when no such relationship exists. It is represented by the lowercase Greek letter alpha. Null Hypothesis Statistical Significance Type I Error Type II Error
Biostatistics for the Clinician Term: Biostatistics for the Clinician Beta Beta is the probability of accepting or retaining a false null hypothesis. It is the probability that the investigator will conclude that no relationship exists between independent and dependent variables when such relationship does exist. It is represented by the lowercase Greek letter beta. Alpha Null Hypothesis Statistical Significance Type I Error Type II Error
Biostatistics for the Clinician Term: Biostatistics for the Clinician Binomial Distribution The family of binomial distributions is a category of discrete frequency distributions showing distributions of events having two possible outcomes, like success or failure. Consequently, if you know the probability of success on any given trial, binomial distributions can be used to predict probabilities of given numbers of successes in given numbers of trials. This means that a researcher can determine whether an empirical distribution deviates significantly from what would typically be expected. Binomial distributions are important to the clinician because they can be used to represent many situations which have two possible outcomes (e.g., success/failure, life/death, improved/not improved, pregnant/not pregnant, etc.). Gaussian Distribution Poisson Distribution
Biostatistics for the Clinician Term: Biostatistics for the Clinician Box Plot A box plot or box and whiskers plot is a graphical way of representing the salient features of a distribution. It can be used with either Gaussian or non-Gaussian distributions. The box plot shows a rectangle stretching from the first to the third quartile of the distribution, these quartiles, the edges of the box, are called "hinges". The box displays in a pictoral fashion the variability in the data. A line inside the box shows the approximate position of the median. If the median is not in the middle of the box the distribution is skewed. The further the median is from the middle, the more skewed is the distribution. The box contains 50% of the data in the distribution. The box and whisker plot has lines extending from the box showing the approximate regions occupied by outliers and extreme values. Exploratory Data Analysis Interquartile Range Median Standard Deviation
Biostatistics for the Clinician Term: Biostatistics for the Clinician Causal Relationship A causal relationship among variables is a relationship among variables in which changes in one variable produce changes in another variable. Changes in one variable affect another variable. Changes in one variable depend on changes in another variable. Relationships among Variables Variable
Biostatistics for the Clinician Term: Biostatistics for the Clinician Central Tendency Measures of central tendency are summary statistics or descriptive statistics used to indicate the central location of a group of data values. The three most commonly used measures of central tendency are the mean, the median, and the mode. Mean Median Mode Summary Statistics
Biostatistics for the Clinician Term: Biostatistics for the Clinician Chi-Square Tests Chi-square tests are frequently used to detect significant relationships between two variables measured on nominal scales, or to determine whether a distribution differs significantly from expectations. Chi-square tests belong to the class of statistical inferential procedures known as nonparametric or distribution free tests. Inferential Statistics Population. t-Tests.
Biostatistics for the Clinician Term: Biostatistics for the Clinician Correlation Correlation refers to the degree of relationship among variables. A correlation coefficient is a measure of the degree of relationship among variables. There are many correlation coefficients. Two of the most important measures are the Pearson Product Moment Correlation Coefficient (by far the most frequently used) and the Spearman Rank Correlation Coefficient (often used with ordinal measures and or non-Gaussian variables). The Pearson is parametric and the Spearman is a nonparametric measure of relationship. Causal Relationship Chi-square Tests Inferential Statistics Population Relationships among Variables t-Tests.
Biostatistics for the Clinician Term: Biostatistics for the Clinician Dependent Variable In an experimental setting, dependent variable refers to the variables which are observed by the experimenter. More generally dependent variables values depend upon the values of independent variables. Independent Variable Qualitative Variable Types of Variables Variable
Biostatistics for the Clinician Term: Biostatistics for the Clinician Distribution A distribution or, more formally, a frequency distribution is simply a table, chart or graph which pairs each different value obtained from a sample or population with the number or proportion of times it occurs. So, any time a set of values is obtained from a sample, each value may be plotted against the number or proportion of times it occurs using a graph having the values on the horizontal axis and the counts or proportions on the vertical axis. Such a graph is a very convenient way to represent a frequency distribution. Binomial Distribution Poisson Distribution Sampling Distribution Standard Deviation Standard Error
Biostatistics for the Clinician Term: Biostatistics for the Clinician Effect Size Effect size refers to the size of the effect produced by the independent variable on the dependent variable in a research study. For example, if the study compares the effectiveness of two treatments by comparing the means of the two treatments on the dependent variable, the difference between the means is the effect size. Medical researchers try to estimate effect size before conducting a research study because the size of the effect plays an important role in determining the statistical power, and thus the optimal sample size for conducting the research. Sample Size Statistical Power
Biostatistics for the Clinician Term: Biostatistics for the Clinician Empirical An empirical effort or process is one which is data-based. The fundamental difference between scientific research and other methods of inquiry is that scientific research is data-based. A fundamental tenet of the scientific method is that an outcome or result is not regarded as valid until there is a substantial body of hard evidence or data to support it. Another way to say this is that empirical studies are ones which make use of hypothesis testing or reality-testing to determine whether assertions, hypotheses, or theoretical frameworks will be regarded as valid. Inferential Statistics Null Hypothesis Research Hypothesis
Biostatistics for the Clinician Term: Biostatistics for the Clinician Exploratory Data Analysis Exploratory data analysis (EDA) provides a simple way to obtain a big picture look at the data, and a quick way to check data for mistakes to prevent contamination of subsequent analyses. Exploratory data analysis can be thought of as preliminary to more in depth statistical data analysis. Box plots are a primary tool in exploratory data analysis. Box Plot Interquartile Range Median Standard Deviation
Biostatistics for the Clinician Term: Biostatistics for the Clinician Gaussian Distribution The family of Gaussian or normal distributions is a category of frequency distributions fitting a precise mathematical model. When plotted on a graph, they are characterized by continuous, symmetrical, bell-shaped curves. These curves represent the mathematical law of errors. The curves precisely describe the phenomenon that measurements often include small errors, and that as errors become larger they decrease in number. Gaussian distributions are important to the clinician because they represent many situations where a condition is the result of a variety of factors summing together. Binomial Distribution Poisson Distribution Sampling Distribution Standard Deviation
Biostatistics for the Clinician Term: Biostatistics for the Clinician Independent Variable In an experimental setting, independent variable refers to the variables that are manipulated by the investigator. More generally, Independent variables are the causes or causal factors in medical research studies. Dependent Variable Qualitative Variable Types of Variables Variable
Biostatistics for the Clinician Term: Biostatistics for the Clinician Inferential Statistics Inferential statistics concern that branch of statistics that has as its primary focus generalizing from samples to populations with knowns degrees of accuracy and probabilities. Inferential statistical methods allow us to compare small random samples and then to make statements about the much large populations they represent with known probabilities of truth. Inferential methods typically take the form of statistical tests. EXAMPLES: Chi-square Tests, t-Tests, Analysis of Variance, Kruskal-Wallis Test, z-Tests, etc. Alpha Mean Population. Power Random Sample Sampling Distribution Standard Deviation Summary Statistics

Biostatistics for the Clinician Term: Biostatistics for the Clinician Interquartile Range The interquartile range of a distribution is one of the measures of variability of a distribution. It is the difference between the value at the 3rd quartile (75th percentile) and the value at the 1st quartile (25th percentile) of a distribution. Range Standard Error Standard Deviation
Biostatistics for the Clinician Term: Biostatistics for the Clinician Interval Variables Interval variables, the third level of measurement, have all the properties of ordinal variables, but in addition have the property that equal differences between measures represent equal differences in the values of the variable. A variable must be at least interval to be able to compute a meaningful average. Level of Measurement Nominal Variables Ordinal Variables Qualitative Variable Ratio Variables Types of Variables Variable
Biostatistics for the Clinician Term: Biostatistics for the Clinician Lesson Each Main Menu Option from the represents a self-instructional lesson that can be chosen and used to learn, present, practice or review statistics concepts and skills. Use the arrow keys or mouse to select the desired lesson. Lessons typically have a number of sections also accessed from menu. The browser interface is quite intuitive and consistent with typical graphical user interface standards. Click on the "Back" button to return to previous documents or higher level menus. Instructions Table of Contents
Biostatistics for the Clinician Term: Biostatistics for the Clinician Level of Measurement "Level of measurement" refers to the four different (nominal, ordinal, interval, ratio) hierarchially ordered types of variables. Interval Variables Nominal Variables Ordinal Variables Ratio Variables Types of Variables Variable
Biostatistics for the Clinician Term: Biostatistics for the Clinician Mean The mean refers to the average. It is one of the most useful measures of central tendency. The mean is calculated by finding the sum of the measures and dividing by the number of measures. Central Tendency Median Mode Summary Statistics Standard Error
Biostatistics for the Clinician Term: Biostatistics for the Clinician Measurement Measurement is a systematic process of assigning names, labels, or numbers to the different values of a variable. Interval Variables Nominal Variables Ordinal Variables Ratio Variables Types of Variables
Biostatistics for the Clinician Term: Biostatistics for the Clinician Median The median is one of the three most commonly used measures of central tendency. It is the middle value or 50th percentile in a distribution. It is often used in nonparametric statistical procedures. It also appears in box plots. The median is often preferable to the mean as a measure of central tendency when distributions are skewed. Central Tendency Mean Mode Summary Statistics
Biostatistics for the Clinician Term: Biostatistics for the Clinician Mode The mode is summary statistic and is one of the three most commonly used measures of central tendency. It is the most frequent value in a distribution. The mode may be preferable to the mean as a measure of central tendency, particularly with multimodal (many mode) distributions. Central Tendency Mean Median Summary Statistics
Biostatistics for the Clinician Term: Biostatistics for the Clinician Nominal Variables Nominal variables are the lowest level qualitative variable and the lowest level of measurement Nominal measures simply name, group, type, classify or categorize values of a variable. Interval Variables Level of Measurement Ordinal Variables Qualitative Variable Ratio Variables Types of Variables Variable
Biostatistics for the Clinician Term: Biostatistics for the Clinician Nonparametric Tests Nonparametic statistical procedures are sometimes referred to as distribution-free procedures. In general these procedures can be used with nominal or ordinal measures and do not have assumptions requiring that distributions of variables be of certain shapes (in contrast to parametric procedures which invariably require normal distributions and interval or ratio measures). Examples of nonparametric procedures include the Chi-square Tests, and the Spearman Rank Correlation Coefficient. Chi-square Tests Inferential Statistics Parametric Population t-Tests.
Biostatistics for the Clinician Term: Biostatistics for the Clinician Null Hypothesis The null hypothesis is a statement infering there is no difference between population parameters. That is, there is no relationship between independent and dependent variables in the population under study. Typically, this is not the anticipated outcome of an experiment. Usually the investi- gator conducts an experiment because he/she has reason to believe manipu- lation of the independent variable will influence the dependent variable. So, rejection of the null hypothesis is interpreted as a significant finding. Research Hypothesis