Print
Variable Importance
Read It: Evaluating Variable Importance in Classification
The final way to interpret the results from random forest classification is to evaluate the variable importance. That is, determine which of the explanatory variables was the most important for achieving an accurate prediction. There are a number of ways to calculate variable importance, but the most common way is to look at how the accuracy decreases when a variable is either removed or randomized so the relationship between response and explanatory variable is broken. Below, we demonstrate how to find the variable importance for a random forest model in Python.
Watch It: Video - Classification Variable Importance (5:43 minutes)
Try It: GOOGLE COLAB
- Click the Google Colab file used in the video here.
- Go to the Colab file and click "File" then "Save a copy in Drive", this will create a new Colab file that you can edit in your own Google Drive account.
- Once you have it saved in your Drive, try to edit the following code to print the variable importance. Remember to load the data and rerun the model code:
Note: You must be logged into your PSU Google Workspace in order to access the file.
# load the dataset recs = pd.read_csv(...) # rerun the model code here # summarize model to view variable importance ...
Once you have implemented this code on your own, come back to this page to test your knowledge.