Correlation-Based Variable Selection

Read It: Correlation-Based Variable Selection

One way to selet the optimal variables for multiple linear regression is through a correlation analysis. By determining which explanatory variables are most highly correlated with the response, you can get a better idea of important variables, thus implement models with higher adjusted R² values. Below, we demonstrate how to implement this process in Python.

Watch It: Video - Correlation Variable Selection (10:13 minutes)

Click here for a transcript.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0(link is external)

Try It: DataCamp - Apply Your Coding Skills

Using the pre-coded variables below. Calculate and print the correlation matrix. Are any of the variables highly correlated? Which would you include in your multiple linear regression model?

1
2
# libraries
import pandas as pd
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

1/0 0/0

Correlation-Based Variable Selection