
Correlation-Based Variable Selection
Read It: Correlation-Based Variable Selection
One way to selet the optimal variables for multiple linear regression is through a correlation analysis. By determining which explanatory variables are most highly correlated with the response, you can get a better idea of important variables, thus implement models with higher adjusted R2 values. Below, we demonstrate how to implement this process in Python.
Watch It: Video - Correlation Variable Selection (10:13 minutes)
Click here for a transcript.
Credit: © Penn State is licensed under CC BY-NC-SA 4.0(link is external)
Try It: DataCamp - Apply Your Coding Skills
Using the pre-coded variables below. Calculate and print the correlation matrix. Are any of the variables highly correlated? Which would you include in your multiple linear regression model?
1
2
# libraries
import pandas as pd