EME 210
Data Analytics for Energy Systems

Hypothesis Test for Slope

PrintPrint

Hypothesis Test for Slope

Read It: Hypothesis Test for Slope

So far in this lesson, you have learned about a hypothesis test for correlation and have been introduced to simple linear regression. Here, we will go one step further in our analysis to evaluate the significance of the slope of a line. For this analysis, we will return to our one-line linear regression command: stats.linregress. However, before we can start the analysis, we need to define the hypotheses. For a hypothesis test of slope, the parameter of interest is β _ 1 . Additionally, the null hypothesis will always state β _ 1 = 0 , while the alternative will be some inequality (e.g., β _ 1 > 0 , β _ 1 < 0 , or β _ 1  not equal to  0 ). Below is an example of a set of hypotheses for slope, which we will test in the video demonstration. 

 H_0:  β _ 1 = 0

 H_a:  β _ 1 > 0

 Watch It: Video -  Hypothesis Test Slope (4:40 minutes)

Click here for a transcript.

Hello, and welcome back to another video in our linear regression lesson. In this video, I'm going to talk about how we can conduct a hypothesis test to test for the significance of our slope value. And so, this is another great way to add sort of a statistical inference aspect to your linear regression analysis. So you can do correlation, you can look at slope, and later we'll get into additional tests that you can do.

So the first thing that we do with our hypothesis test is state our hypotheses. And I have them stated here. So our h naught is beta 1 equals zero. Likely always it's got an equal sign. And in this case, we're always assuming that it's zero. And then we have our alternative where beta naught is not equal to zero. Now alternatively, you could have beta naught less than zero, or beta 1 greater than zero. But here I'm going to do that two-sided alternative where we say that beta 1 is simply not equal to zero. And now, unlike some of the previous hypothesis testing that we have done, there's no randomization procedure for the slope. In fact, we're going to use this linregress function that we did to do our one-line linear regression in a previous video. And so, this we say stats dot linregress and we just give one value or one variable, then the next variable. And we state our alternative. And this time we're doing two-sided. Alternatively, we could do greater or less as long as we follow our alternative hypothesis. Then we can print the output. And this output is what we have seen before, slope, intercept, r-value, p-value, etc. And so, when we're trying to figure out how to respond, what conclusion to draw from our hypothesis test, we need the p-value. That's just what's given here. And so, we can print that in particular, and say output dot key value. We can do it in a print statement like we have before, so print 'p-value' just output pvalue and run that. And so, we get this p-value of 0.001519, which is below the standard significance level 0.05. And so, we can make this conclusion that the p-value is 0.000152, which is below the significance level. Therefore, we reject the null hypothesis in favor of the alternative that the slope is statistically significantly different from zero. And this is interesting, because if you look up here, our slope is barely above zero, zero point three one three. But that is enough for it to be statistically significantly different from zero, because of really how correlated these values are that even a small slope is significantly different than no slope. And so, this is an example of when you might see that small value and think that it can't be that different from zero. But in reality, because of units and because of the correlation between these two variables, it is much statistically significantly different from zero.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

 Try It: DataCamp - Apply Your Coding Skills

Edit the following code to conduct a hypothesis test for the slope between x and y. Print the slope and p-value.

# This will get executed each time the exercise gets initialized. # libraries import pandas as pd import numpy as np import scipy.stats as stats # create some data df = pd.DataFrame({'x': np.random.randint(0,100,100), 'y': np.random.randint(-50,150,100)}) # one-line test output = ... # print results print('slope: ', ...) print('p-value: ', ...) # libraries import pandas as pd import numpy as np import scipy.stats as stats # create some data df = pd.DataFrame({'x': np.random.randint(0,100,100), 'y': np.random.randint(-50,150,100)}) # one-line test output = stats.linregress(df['x'], df['y']) # print p-value print('slope: ', output.slope) print('p-value: ', output.pvalue)


 Assess It: Check Your Knowledge

Knowledge Check