EME 210
Data Analytics for Energy Systems

(link is external) (link is external)

Correlation-Based Variable Selection

PrintPrint

Correlation-Based Variable Selection

Read It: Correlation-Based Variable Selection

One way to selet the optimal variables for multiple linear regression is through a correlation analysis. By determining which explanatory variables are most highly correlated with the response, you can get a better idea of important variables, thus implement models with higher adjusted R2 values. Below, we demonstrate how to implement this process in Python.

 Watch It: Video -  Correlation Variable Selection (10:13 minutes)

Click here for a transcript.

ADD TRANSCRIPT TEXT HERE

Credit: © Penn State is licensed under CC BY-NC-SA 4.0(link is external)
 

 Try It: DataCamp - Apply Your Coding Skills

Using the pre-coded variables below. Calculate and print the correlation matrix. Are any of the variables highly correlated? Which would you include in your multiple linear regression model? 

1
2
# libraries
import pandas as pd
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
1/0 0/0


 Assess It: Check Your Knowledge

Knowledge Check