HI5353 - Multiple Regression Summary
Data Analysis: Multiple Regression and Correlation
When to Consider Use
When you have measures obtained from a sample on one or more
independent variables (predictors) and a
dependent (criterion) variable having normally distributed
errors of prediction around the observed values
and you wish to describe the
importance of all or individual predictors,
or the relationship(s) between predictors
and criteria, or make predictions
about the criterion variable for the larger population
from which the sample was obtained.
Null Hypotheses
Standard: There is no relationship between independent and dependent variables.
Sequential or Stepwise: There is no relationship between additional
independent variables and the dependent variable.
Assumptions & Some Relevant Tests & Assessment Tools
- Random selection/assignment
- Interval or ratio dependent measures
- Independent errors (residuals) or data observations
- Normal errors (residuals) or data observations for dependent variable
- skewness, kurtosis, Kolmogorov-Smirnov, normality plots -
- Linearity - scatterplots, tests for nonlinear trend
- Homoscedasticity, homogeneity of variance of residuals around predicted
values - scatterplots
Consequences of Violation of Assumptions
If samples are not random, validity of inferences about
populations sampled is brought into question.
If dependent variables are not measured on either interval or
ratio scales results of the test are suspect.
Problematical Issues
(And How to Handle Them)
- Normality (Transform variables, e.g., square root, log, inverse, etc.)
- Linearity (Check scatterplots - transform variables, use nonlinear or
logistic regression, sometimes test for nonlinear trends.)
- Homoscedasticity (Problem exists if ratio of largest to smallest SD > 3:1 -
reduces statistical power - transform troublesome variables)
- Multicollinearity (Program Options - Check correlation matrix - delete redundant variables;
use multicollinearity diagnostics (i.e., if a row condition index > 30 and 2 or more variance
proportions in the row > .50 a multicollinearity problem exists (Belsely et al., (1980))
- Singularity - must have more cases than dependent variables,
covariates and dependent variables must be linearly independent
(Get larger samples for first, delete redundancies for second.).
- Reliability - reliable dependent variables and covariates - r > .80
(Obtain more reliable measures.)
- Outliers (Identify Univariate (generally p < .001; z > 3.29) & Multivariate
(Mahalanobis Chi-square < .001 with df's = # of IV's) and then check for reasons and
correct/change,scores transform variables, or eliminate from analysis)
- Categorical dependent variable (Use logistic regression or discriminant analysis)
- Sample Size - Have at least 10 times (Roscoe, 1974) as many cases as predictors to
preferably 50 + 8m (where "m" is the number of predictors) for testing R and
104 + m for testing individual predictors (Tabachnick & Fidell, 2000,1996)
or Cohen 1982, 1988 power tables.
- Missing Data (Replace blanks with a constant, preferably the mean
and treat missingness as a variable to be controlled
(Cohen & Cohen, 1983)
Procedural Checklist
(Adapted from Tabachnick & Fidell, 2000, 1996)
I. Handle Assumptions
- Assure adequate power & ratio of cases to IV's during
design of study (use Cohen & Cohen, 1982; Cohen, 1988)
- Attend to normality, linearity & homoscedasticity of residuals using
significance tests or appropriate graphs (use transformations
where needed (e.g., square root, log, inverse, square,
1 - log, etc.)
- Examine outliers for anomalies, interpret & delete
- Assess multicollinarity & singularity
- Use a hierarchical strategy for controlling alpha inflation
II. Conduct Major Analyses of Importance
- Multiple R, R2, overall (standard reg.) and/or
incremental (sequential or hierarchical reg.) F ratio
- Adjusted multiple R2 proportion of variance
accounted for
- Significance of regression coefficients
- Squared semipartial correlations (equal R2 changes)
III. Consider Additional Analyses
- Post hoc significance of correlations
- Unstandardized (B) & standardized (beta) weights
regression coefficients, confidence limits
- Standardized (beta) weights, confidence limits
- Unique versus shared variability
- Suppressor variables
- Prediction equation
- Cross validation (for stepwise or setwise reg.)
Reporting Results
(Report what happened. Keep it simple and clear.
Typically include the following.)
- Univariate F's, significance levels or source tables
- Measures of association (e.g., R, R2, difference in R2, correlations)
- Regression equations if relevant
- Regression coefficients, standard errors, and sample sizes
- Graphs (Use where it substantially helps communicate results.)
HI5353

cwj-2/13/05.