One way to selet the optimal variables for multiple linear regression is through a correlation analysis. By determining which explanatory variables are most highly correlated with the response, you can get a better idea of important variables, thus implement models with higher adjusted R2 values. Below, we demonstrate how to implement this process in Python.
Using the pre-coded variables below. Calculate and print the correlation matrix. Are any of the variables highly correlated? Which would you include in your multiple linear regression model?