EME 210
Data Analytics for Energy Systems

Variable Importance: Regression

PrintPrint

Variable Importance for Regression

Read It: Variable Importance for Regression

The final way to interpret the results from random forest regression is to evaluate the variable importance. That is, determine which of the explanatory variables was the most important for achieving an accurate prediction. There are a number of ways to calculate variable importance, but the most common way for regression models is to look at how the error increases when a variable is either removed or randomized so the relationship between response and explanatory variable is broken. Below, we demonstrate how to find the variable importance for a random forest regression model in Python.

 Watch It: Video -  Regression Variable Importance (4:27 minutes)

Click here for a transcript.

ADD TRANSCRIPT TEXT HERE

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

Try It: GOOGLE COLAB

  1. Click the Google Colab file used in the video here.
  2. Go to the Colab file and click "File" then "Save a copy in Drive", this will create a new Colab file that you can edit in your own Google Drive account.
  3. Once you have it saved in your Drive, try to edit the following code to print the variable importance. Remember to load the data and rerun the model code:

Note: You must be logged into your PSU Google Workspace in order to access the file.

# load the dataset
recs = pd.read_csv(...)

# rerun the model here

# summarize model to view variable importance
...

Once you have implemented this code on your own, come back to this page to test your knowledge.


 Assess It: Check Your Knowledge

Knowledge Check