Evaluating Model Parameters: Regression

Read It: Evaluating Model Parameters for Regression

Another means of interpreting the results from regression is to evaluate how the RMSE changes as the number of trees increase. Generally, as you add more trees to your model, the RMSE will decrease. However, more trees also will increase the amount of time your model takes to run. Generally, there is a point at which there are minimal RMSE decreases from adding more trees. This point is the optimal place to stop training your model for minimizing RMSE and computational time. Below we demonstrate how to create this plot in Python.

Watch It: Video - Regression Evaluate Parameters (2:47 minutes)

Click here for a transcript.

ADD TRANSCRIPT TEXT HERE

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

Try It: GOOGLE COLAB

Click the Google Colab file used in the video here.
Go to the Colab file and click "File" then "Save a copy in Drive", this will create a new Colab file that you can edit in your own Google Drive account.
Once you have it saved in your Drive, try to edit the following code to create a plot of accuracy of the model as a function of the number of trees. Make sure to upload the dataset and rerun the model code:

Note: You must be logged into your PSU Google Workspace in order to access the file.

# load the dataset
recs = pd.read_csv(...)

# rerun model here

# store training logs
logs = ...

# convert logs object into dataframe
logs_df = pd.DataFrame({'num_trees': ..., 
                        'rmse': [log.evaluation.rmse for log in logs]})

# plot the training logs data
...

Once you have implemented this code on your own, come back to this page to test your knowledge.

Evaluating Model Parameters: Regression