EME 210
Data Analytics for Energy Systems

Hypothesis Test for Difference of Means

PrintPrint

Hypothesis Test for Difference of Means

If you want to compare the mean of one population to the mean of another population, and you have samples from each population, then you want to perform a difference of means test.

 Read It: Hypothesis Test for Difference of Means

In a difference of means test, your statistical hypotheses will take the following form, and your statistic, computed on both the original sample and the randomized samples, is x ¯ A x ¯ B , where A and B just refer to two separate samples, belonging to two separate populations:

Test Hypotheses Statistic Randomization Procedure
Difference of Means H o :   μ A μ B = 0
H a : μ A μ B < 0 or
μ A μ B > 0 or
μ A μ B 0
Difference of sample means, x ¯ A x ¯ B Reallocate observations between samples A and B

Typically, the nuill value here is 0, since the null hypothesis is usually that the two population means are equal, and thus their difference is zero.

Randomization Procedure

The randomization procedure that we will use for a difference of means test is called "reallocation". Reallocation involves randomly swapping values between the two samples, where each value has the same probability of moving to the other sample or staying in its original sample. The reasoning behind this strategy is that, if the null hypothesis is true and there is no difference between the two populations, then it shouldn't matter from which population the sample data originate. If we have our sample data organized into a DataFrame similar to the table depicted in the figure below, then reallocation can efficiently be accomplished by randomly drawing values (in the "Value" column) without replacement. In other words, simply re-ordering the values in that column. 

Reallocation tables
Reallocation
Credit: © Penn State is licensed under CC BY-NC-SA 4.0

The pseudo-code for this procedure is:

  1. Obtain two samples of size n and m, respectively. 

  2. Organize the values of both samples into a DataFrame, with the sample ID in one column and their respective values in another.

  3. For i in 1, 2, ... N

    1. Randomly sample the value column without replacement.

    2. Calculate the sample means for each sample (e.g., by grouping by the sample ID). 

    3. Find the difference of the sample means. Make sure to do this in the same order as stated in the hypotheses!

  4. Combine all N randomized statistics into the randomization distribution

Note that the two sample sizes, n and m, do not need to be equal. However, you may run into problems if one sample is very small and one is large.

The following video demonstrates how to implement this procedure in Python, using the 2021 Texas Power Crisis as an example.

 Watch It: Video - Hypothesis Test for Difference of Means (12:21 minutes)

Click here for a transcript.

But in this video, we're going to go over how to do a difference of means test, or a comparison of means for two samples. So, let's get to it. So, we're going to continue on using this percent deficit that we calculated in the last video. And our hypothesis test for these two samples in the comparison means, or difference in means test, is going to be the null, is going to be that the mean for gas, minus the mean for wind, is equal to zero. Or, in other words, that these two means are equal. The alternative here is that the mean for natural gas is less than that for wind. Or, in other words, the difference between those two is less than zero. The reason I set it up this way is because just by looking at this visualization that we developed in the last video because natural gas is consistently below zero percent deficit. I have the suspicion that its mean will be less than that for wind. So, I set it up for the hypothesis based on that. But, let's go ahead and calculate our sample statistic here. And we can confirm whether the mean of the data conformed to that. So first, let's pull out our wind sample and recall that our data frame is genpiv, that we're working with here. We've calculated percent deficit in gen cap, so we want to pull out that values from that variable here. So, genpiv.loc. And anywhere where genpiv as a fuel value that's equal to wind, we want to pull out the associated percent deficit values. Okay, and that'll be our wind sample. Same thing for natural gas, we'll just replace wind with yes.

So, we'll just pull out those two samples. And then our sample difference is going to be our sampgas.mean. So, the mean of sampgas for in minus sampgas, sorry, sampwind.mean. So, here's our sample statistic because the difference between the mean of the gas percent deficit and the mean of the win percent deficit. Note that this order is important here. It's got to conform to what we have in the hypothesis test here. It's really critical. Let's print some values here so let's print our mean gas value is this let's also print our mean win value this is just for inspection. And then finally, the difference between those. We see that our average percent deficit for our gas sample is negative 28 percent. The average percent deficit for a wind sample is negative 15.5 percent. So indeed, from the samples the gas is less than the wind. And so, this would justify having the hypothesis set up this way. And then the difference is indeed negative, or less than zero. So again, justifying this setup here. But still, the question remains is this negative twelve and a half percent, is this statistically significantly below zero? In other words, is this completely out of the realm of possibilities even if the null hypothesis were true? So that's what we want to test with the randomization distribution and the resulting hypothesis test here.

So, moving on to a randomization distribution. We already have imported numpy previously, but let's do it again anyway, just for, as a reminder that numpy is needed for this. One thing that we're going to do here is we're going to copy the genpiv data frame over to something called sim. So, we're going to make a new data frame sim, that's just going to be an identical copy to genpiv. The reason being is that we're going to be scrambling up some of the values there. I'm going to want to mess up our original data frame. Again, just a reminder, capital N will be a thousand, as will consistently use through these hypothesis tests. Although you could do more when you're doing this on your own, or sample size is going to be the number of rows in genpiv, or in sim, here. So, this is going to be the sample size of wind, plus the sample size of gas. This shape function gives the first, the number of rows, then the number of columns. By putting the index here we're getting, we're extracting the number of rows from that. And then, as we did in the test for single means, we want to store our resulting statistics from the randomization distribution into some variable. So, we'll have xbardiff, this is just going to be an empty variable, capital N, 1000 blank spaces in it. And as we go through this for loop we will fill in those blanks.

So from our sim data set, data frame, what we're going to do is we're going to scramble up this percent deficit color. So, we're going to make a new percent deficit column, or overwrite the percent deficit column, and we're going to use random choice to do the scrambling. So, we saw random choice first in the hypothesis test for a single mean, where we were drawing with replacement from that wind shift variable here. We're going to draw from our original percent deficit column, and we want to use size equals N. So, the same sample size that we have originally, and here we don't want to do this with replacement. So, we're going to do replace equals false. The reason being is that if we go back up to our data frame, really we just want to scramble up these values and reorder them in the same column, and we don't want to repeat any values we really just want to reallocate these values across the two groups, wind and natural gas, and do that in a random sense. So, some will stay within, when some will stay within natural gas, but we're going to switch out the order and potentially regroup some of them because under our null condition if there isn't any difference between these two groups then it wouldn't really matter what group the observations follow. So, that's the philosophy under the null hypothesis here.

So, once we do that scrambling, well then, we just need to execute our calculation of the statistic again. So, I'm going to copy and paste everything from from step one into here. And really, we just need to replace all these samps with say, sim, for example. So, I'll have our simulated wind sample, it's coming from sim replacing genpiv with sim after we've scrambled up percent deficit here. And then the difference will be the difference in these means. So, the same steps that we had before in calculating the sample statistic from the data, we'll use after we've done the randomization and scrambled up this percent deficit column here. Okay — now. Oh, sorry, I should recall this xbar diff. And we want to, of course, store this difference. Okay. So, we can go ahead and run our randomization distribution and then visualize it. So, I'm going to go back up I'm going to just copy and paste our visualization from our proportion, because of a single proportion bring it back down here, it's the same structure except we have a different, we have different names of things. But still our randomization distribution we want to turn into a data frame, and our x-intercept should be our sample statistic with samp diff here.

So, we can go ahead and visualize that. So, under our null condition, if the null hypothesis are true, if there really wasn't any difference between the two groups, we would see a difference of means a percent deficit that ranges from about negative 10 to about positive 10. So, pretty big swing, but nevertheless, our actual sample statistics. So, from our data, what we observed in reality is quite a bit less than that range, falls outside of that range. So what do you think our p-value is? You're thinking zero, you've got it. But we can go ahead and calculate it here. Again, just replace our p-hat with this xbardiff, the series, not the data frame. Our sampdiff replaces wind p-hat, and the direction of this inequality should be less than because that's our alternative hypothesis. And we have zero here. So, our concluding statement, again since the p-value is very low 0.0, and less than the significance level of 0.05, that's the default value. We can reject the null hypothesis that there is no difference between the means of the two groups. The two groups being percent deficit of gas, versus percent deficit of wind. Again, the critical language here is rejecting the null hypothesis because our p-value is less than 0.05. But it's always good to contextualize that result in the actual problem. So what do you learn from particular null hypothesis? Well, we know then that there actually is a difference between these two groups. So, there is evidence that natural gas underperformed more than wind. Okay. So, that's our test for a difference of needs.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

The Google Colab Notebook use in the above video is available here, and the data are here. For the Colab file, remember to click "File" then "Save a copy in Drive". For the data, it is recommended to save to your Google Drive.


Try It: Comparing Tesla Ranges Across Different Mileages

The following example somewhat continues from the example in Hypothesis Test for a Single Mean, but it doesn't depend on that example or its results. Plug In America collects self-reported survey data on EV performance, including the Model S. In this example, we'll look at the 70 kWh, dual motor ("70D") version of the Model S, with a rated range of 240 miles. Let's further break these cars down into two categories: "Low" mileage when the odometer read less than or equal to 10,000 miles, and "High" mileage when the odometer is greater than 10,000 miles. 10,000 miles is an arbitrary threshold here, and feel free to repeat this analysis with your own definition of low versus high mileage.

Our question here is "Do the actual ranges of the Model S 70D differ from low to high mileage?" Our hypothesis could be: "Yes, the range decreases at high mileage." Thus, our statistical hypotheses are:

  • H o :   μ L o w μ h i g h = 0
  • H a :   μ L o w μ H i g h > 0

Develop Python code below to test these hypotheses and calculate a p-value. The knowledge check will then ask about the conclusion of this test.


 Assess It: Check Your Knowledge

Knowledge Check