Hypothesis Test for Mean of Differences (Paired Comparison)

Read It: Hypothesis Test for Mean of Differences (Paired Comparison)

In the situation where you want to compare the mean of one population to the mean of another population, and you have samples from each population whose values you can pair on some rational basis, then it is better to do a paired comparison than to perform the difference of means test in the previous page. The key here is that you have a good reason to pair the values from one sample to the other sample. For example:

You want to see whether temperature affects the range (miles it can drive on a single charge) of Tesla Model S cars. So you take 5 cars and measure their range on a summer day, and then again on winter day, with the same 5 cars. Your two samples are then "warm range" and "cold range". It is then advisable to pair warm range and cold range for each individual car.

Difference of Means

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

In a mean of differences test, your statistical hypotheses will take the following form, and your statistic, computed on both the original sample and the randomized samples, is $\bar{x_{A} - x_{B}}$ , the mean of the paired differences between sample A and sample B:

Test	Hypotheses	Statistic	Randomization Procedure
Mean of Differences (paired comparison)	$H_{o} : μ_{A - B} = 0$ $H_{a} : μ_{A - B} < 0$ or $μ_{A - B} > 0$ or $μ_{A - B} \neq 0$	Mean of differences in samples, $\bar{x_{A} - x_{B}}$	Reallocate between paired observations (multiply paired difference by 1 or -1, by random chance)

As before, the nuill value here is 0, since the null hypothesis is usually that the two population means are equal, and thus their difference is zero.

Randomization Procedure

As with the difference of means randomization procedure, we will also use reallocation for the mean of differences. However, instead of reallocating across the entire samples, we will restrict our reallocation to the pairs of value. In other words, we will enforce the pairing the observations across the two samples, and each pair of values has the same change of randomly swapping between samples. Thus, the reasoning that we had for reallocation before still holds true: if the null hypothesis is true and there is no difference between the two populations, then it shouldn't matter from which population the sample data originate.

The figure below illustrates how we can organize our DataFrame to maintain pairing across samples. Here, we have a variable generically called "pairing" to denote that it serves as the basis for pairing values across Sample A and Sample B.

Paired Comparison

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

It turns out, that instead of physically reallocating paired values across Sample A and Sample B and then taking their differences again, it is computationally easier to simply find the difference of the pairs once, and then randomly multiply these difference by 1 or -1 (which effectively re-orders the terms in the difference equation). The pseudo-code for this procedure is:

Obtain two samples, each of size n, with paired observations.
Find the differences of the paired observations.
For i in 1, 2, ... N
1. Randomly multiply each difference by 1 or -1.
2. Calculate the mean of the adjusted differences.
Combine all N randomized statistics into the randomization distribution

The following video demonstrates how to implement this procedure in Python, using the 2021 Texas Power Crisis as an example.

Watch It: Video - Hypothesis Test for Mean of Differences (10:37 minutes)

Click here for a transcript.

Hi. In this video we're going to do a hypothesis test for two samples comparing means. But this time we're going to look at the mean of differences, that is, a paired comparison. This is in contrast to the difference of means that we did in the last video. So, let's get to it. So, the reason why we're doing a paired comparison here, or we're able to do a paired comparison, is because at each individual time here we have both a measurement of wind percent deficit and natural gas percent deficit. And so, we can compare what's going on at each individual time as we go along. We don't have to just compare entirety of one curve versus the entirety of another curve. There may be valuable information in constraining it, constraining that comparison at the individual time steps that we have here. So, how do we do this?

First thing is we have to reorganize our data a little bit. So, let's just create a new data object where it's called df2 from our original genpiv. We need to pivot this data frame. So, let's do this. And you'll, it'll start to make sense after we've done it. So, our index is going to be date time. Our columns will be created from the fuel column that we have in genpiv. And our values are going to come from percent deficit. So, let's see what this looks like. And it'll be clear what's being done here. So, what was the date time column in genpiv has now become an index. We had a fuel column that contained either natural gas, or wind, or two groups. And those are now two separate columns, here. And the values underneath are the percent deficit now the reason why we organize it as such is because now we have these pairs at each time step spread over two columns here, so this allows us to very easily then calculate the difference of each of these pairs.

So, I'll make a new variable difference that's simply our natural gas minus our wind and why am I doing natural gas minus wind and not wind minus natural gas? Well, this is because of our hypothesis. So, our hypothesis is that the mean of the difference between gas and wind and this is following the same order that we had in our first test of two means comparison of means looking at the difference in means. So, our order there was gas minus wind. So, we're preserving that same order. Again, this is based upon the supposition that we see in the from the data visualization that gas is under performing relative to wind. But the goal of this is to test that supposition test that hypothesis. So, there we have, there's our difference. We can see that it's been calculated. So, now we have a difference column that's simply the difference between those two things. And then finally, to get our sample statistic, which we can do down here. Our meandiff, we'll call it, is from this difference column and calculating the mean.

So, we can see what that value is. Lo and behold, minus twelve and a half percent. Note that this is identical to the greatest degree of precision that we have here, to what we had above with the difference of means minus 12.5628, blah, blah, blah, blah. They're mathematically the same thing. So we have the exact same sample statistic here just calculated slightly differently. However, the randomization test is going to be done considerably different. So again, just as a reminder, we do need numpy here. Include that to be complete. We're going to also copy df2 to something called sim here. So that we don't mess up our original data frame. It's just good practice. Capital N again will be a thousand, and little n again will be the number of rows that we have in our data frame of interest. We do need an empty vector here to store our statistics in after we do the randomization. And again, we'll fill this up as we march through our for loop

So, what goes into our for loop? Well first we need to do our random randomizing, here. What we'll, how we'll do it is we'll calculate this multiplier. The multiplier here is just going to be, and this should be in square brackets, Our multiplier is just going to be either one, or minus one, that is we either flip the sign of the difference, or we don't, because flipping the sign of the difference is equivalent to swapping the values across these two groups, or AKA across these two columns here. So, this is quite simple in terms of code.

We just use random choice again and we can just simply randomly select from the vector 1 or -1. So, we're just going to randomly choose one, or minus one, we're going to do it n times. And we will do that with replacement, of course, so we're going to end up with every time we go through this whole thing because a vector here or a column it's the same length as percent deficit. I'm sorry same length as difference. This can be filled with some random assortments of ones and minus ones. Okay. And then, our new simulated randomized difference column is going to be simply this sim multiplier times our same difference. So either we to look at the sign, or we don't. And then finally we store our sample statistic in this xbardiff vector, this empty vector here, and this will just be difference. And we'll find the mean of it. Okay.

So, there we go. And then we can of course visualize the randomization distribution xbardiff or we're still using that variable name. We're cycling that variable name, but we're not looking at the difference. It means we're looking at the mean of differences. So, we'll change that name up. And our sample statistic is actually mean diff. We'll call that something else here. Let's see what this looks like.

There we go. So immediately you might say, okay, well the mean diff, which is equal to this minus 12.5 percent still outside the realm of our randomization distribution, right? Again this randomization distribution is the realm of this is the the suite of possible events that we could see if the null hypothesis were true. And so, again, you can immediately tell from this that our p-value is still going to be zero. But there is one difference I want to tease out of this. So, in comparison to what we did with the difference of means, so take a look at this randomization distribution. Most of it's concentrated between about you know, minus seven. Deposit minus seven and a half to positive seven and a half. Whereas, we have a bit of a wider sweep of possible events when we did the difference of means.

So, in other words, this randomization distribution from the mean of differences is a little bit more concentrated around the null hypothesis value. In this case a zero percent deficit. Now the reason for that is because we're constraining the randomization to take place within each row before with a difference of means essentially these values could swap rows. They could end up anywhere. But here they can only stay within the rows. So, because we're maintaining that pairing because we're saying that there's added information somehow in keeping these values at the same times. But there's something valuable about that time information. We're getting a narrower range of possibilities in this randomization distribution. And thus a more definitive answer in our hypothesis test, even though the p-value is still zero. I encourage you to go take a look at a subsequent video effective sample size, in which we'll compare these. If we have a more limited sample size you can see some interesting differences there. Okay. Just around finish the discussion here. Let's calculate our p-value and arrive at a conclusion, we'll just borrow our p-value calculation code up here. It's not xbardiff anymore. Oh it is still xbardiff, but here we have new diff, and again, it's still less than, and again, we get a p-value of zero, and of course, our concluding statement would be the p-value is low. It's zero. It's less than 0.05. We will reject the null hypothesis. Same conclusion that we had under the difference of means. Okay. Thank you.

The Google Colab Notebook use in the above video is available here(link is external), and the data are here(link is external). For the Colab file, remember to click "File" then "Save a copy in Drive". For the data, it is recommended to save to your Google Drive.

Hypothesis Test for Mean of Differences (Paired Comparison)