EME 210
Data Analytics for Energy Systems

Interpreting the Output from Multiple Linear Regression

PrintPrint

Interpreting the Output from Multiple Linear Regression

Read It: Interpreting Output from Multiple Linear Regression

After implementing multiple linear regression in Python, the primary source of interpretation will be the output of the model summary, an example of which is shown below.

Example output of implementing multiple linear regression
Example output of implementing multiple linear regression in Python. This output is critical for interpreting your multiple linear regression model.
Credit: © Penn State is licensed under CC BY-NC-SA 4.0

There are three key areas to focus on in this plot. The first is the F-statistic and its associated p-value. As in Lesson 8, these values can be used to determine model effectiveness using the F-statistic hypothesis test. For multiple linear regression, however, the alternative hypothesis is slightly different, as shown below. 

F-statistic and p-value  for whole model:

H 0 :  model is ineffective 

H A  : at least one  predictor is effective

Notice how the alternative tells us that at least one explanatory variable (predictor) is effective, rather than focusing on the model as a whole. To figure out which predictors are effective, you need to look at the lower half of the output, where the coefficients, t-statistic, and associated p-values are listed. These p-values tell you how significant a given explanatory variables, with values < 0.05 (or your chosen significance level) being significant. You can also use these p-values to test the significance of the coefficient using the hypothesis tests below, notice that they are similar to those you learned in Lesson 8 for the hypothesis test for slope. 

t-statistic and p-value for  individual predictors:

H 0 : β i = 0

H A : β i 0

Finally, the last piece of critical information in the model summary is the Adjusted R2 value. This is the value that you will use to determine the "goodness of fit" for any multiple linear regression models. In particular, the Adjusted R2 value accounts for model complexity, as well as the difference between the predicted and actual values. In this sense, adding explanatory variables that don't contribute to the model accuracy can actually reduce your Adjusted R2. Mathematically, the Adjusted R2 is represented below. You may notice in the above output that the Adjusted R2 and regular R2 are the same, this happens when there are no insignificant explanatory variables in your model. That being said, it is much more common to have a regular R2 that is greater than your Adjusted R2.

R a d j 2 = 1 ( 1 R 2 ) n 1 n p


 Assess It: Check Your Knowledge

Knowledge Check