EME 210
Data Analytics for Energy Systems

(link is external) (link is external)

Difference of Means One-Line Test

PrintPrint

Difference of Means One-Line Test

Read It: Difference of Means One-Line Test

To conduct the one-line test for a two-sample difference of means hypothesis test, you will use a t-test for two independent samples. Below, we demonstrate how to implement this code.

Watch It: Video - Paired Mean of Differences OneLine Test (2:03 minutes)

Click here for a transcript.

In this video, we're going to go over the one-line test that can be used for a two sample mean hypothesis, a comparison of the means. And so, when we wrote these null hypotheses, our null hypothesis was that the mean of gas minus the mean of wind, was equal to zero. Thus, there was no difference between the two data. And our alternative is that gas minus wind is less than zero, suggesting that there is a difference between the two.

And here we calculated mean for the percent deficit as generation minus capacity, divided by capacity. And so, we did all of this prep work, and then we got into the actual randomization procedure where we calculated our sample difference, did our reallocation procedure where we sampled from percent deficit with replace equals to false, and recalculated these differences, and ultimately got a p-value of zero. Which suggested that we reject the null hypothesis. In terms of this one-liner test, I'm going to call the results, results. And again, using that stat dot scipy library we can say stats dot t-test. And in this case, we say IND, for independent samples. And then we give it our first sample, which is sample gas. And a key point here is to make sure that you're matching the order that you write your alternative hypothesis in. And so, we did gas minus wind. So our first sample is gas. And I want to also point out that we are using the actual data here. So our sample is located from the samp.loc which we calculated in step one. We're not using any of the reallocation samples which we calculate in step three. So our first sample is samplegas, our second sample, samplewind. Here we need to say equal there equals false. And this just tells it that the variance is not equal between the two data frames to variables. So, it'll conduct a slightly different version of the t-test than if they were equal variance. And then we specify the alternative equals less, since we are using a left-tailed test. And then after that, we can say results dot p-value to print the p-value.

And so, here we can see that we are very, very close to zero, following that same pattern that we have seen in the earlier two tests. It still has the same conclusion as our randomization procedure, but with more specificity. So again, we reject the null hypothesis.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0(link is external)

Try It: Google Colab

  1. Click the Google Colab file used in the video here(link is external).
  2. Go to the Colab file and click "File" then "Save a copy in Drive", this will create a new Colab file that you can edit in your own Google Drive account.
  3. Once you have it saved in your Drive, try to edit the following code to run the difference of means one-line test. Remember to import the libraries and run the code to create some data.

Note: You must be logged into your PSU Google Workspace in order to access the file.

1
2
3
4
5
6
7
8
9
10
11
# Libraries
import numpy as np
import scipy.stats as stats
 
# create some data
x = np.random.randint(0,100,1000)
y = np.random.randint(0,200,1000)
 
# implement one-line test
results = ...
results.pvalue

Once you have implemented this code on your own, come back to this page to test your knowledge.


 Assess It: Check Your Knowledge

Knowledge Check