Click here for a transcript.
In this last one-line hypothesis test, we're actually going to get in to a difference of proportions. And so, here our null hypothesis is that there's no difference between the proportion of gas that was below capacity, and the proportion of wind that was below capacity. And our alternative is that it was greater than zero. And so, here before we can actually get in to the one-line test, we do need to do some prep work. And so, I'm going to create a new column called, below capacity for final_gen. And this is essentially just going to be equal to the deficit, but we're going to use a special function called an apply function to quickly apply a conditional statement to the deficit. So we say dot apply, and then we give it this keyword, lambda. Which effectively tells python that we're about to create an inline function. And we say x colon true, if x is less than zero, else false. And so, what this is doing is, it's, x is just a placeholder, so it just needs to be the same across both values. And it's saying set x, set below capacity to true, if x, x being deficit, is less than zero. Otherwise, if x is greater than, or equal to zero, set below capacity equal to false. And so, if we print out the first five rows of our data set, we can see that we have our date time column, we have our fuel generation capacity deficit, percent deficit that we've seen before, and now we have a below capacity, which will be true anytime the deficit is negative, and false any time the deficit is positive.
So then once we have our data as it needs to be, we need to actually calculate all of our samples. Similar to our difference of means, we needed to calculate the sample mean. Now we need to calculate the sample proportions. And so, in this case we can see sample wind prop is just final gen.loc and it's where final gen dot fuel, equals wind below capacity. So we're only interested in the below capacity column. And so, we're just going to leave it at that. So we're essentially going to just extract a below capacity column for wind. I'm going to copy this, paste it, and change the instance of wind to gas. And here's wind to gas. So that we just pull out the natural gas data. And then we're actually going to calculate the proportion, so our p wind of our sample is the length of which sample wind prop. And, we are only interested in where sample wind prop equals true. And then we divide that by the total length of sample wind prop. And I'm going to copy that, and change this to gas here, gas here, gas, and finally gas. So these will give us the actual proportion of when the wind in natural gas was below capacity. And then I'm going to get diff prop which is following our alternative hypothesis up here, is gas minus wind. So it's p gas samp minus p wind samp.
So we can run all of that and see that the difference in proportion was 0.35. So then we can move on to the one-line test. So we won't actually do a randomization procedure for the proportion. But in theory, you could follow the procedure for a difference of means, and just substitute in your difference of proportions. So in order to do this one-line test, we need a new library. So we're going to import stats models.stats.proportion. And we're going to call it prop. So this is a new library that is specifically used for a difference of proportions. And in order to use the test that we're going to do, we need to define our success number. Which is very similar to what we did for a single proportion where we determined our success rate. In this case we need to determine the number of successes for our first data set, samp gas, and the number of successes for our second data set samp wind. And so, we're going to do that within square brackets. So again, the key here is to follow the order that you went in for your hypotheses. So we want sample the length of sample gas prop, where sample gas prop equals true. So this is the exact same thing that we did up here to calculate this p wind, or p gas. And then I'm going to copy this, everything from the link command comma, and then paste. And I'm going to edit this, so that we have instances of gas equal to change those two instances of wind. So these are the number of successes for both gas and wind. And then we also need to define sample size, which is just the length of sample gas prop, and then the length of sample wind prop. And so, this is, in effect, the denominator of our proportion up here. And so, once we have these values, we can actually implement the command, which from the prop library that we defined up here. So it's prop dot proportions underscore z-test. This is the command. We need to give it a variable called count, which is our success number, and we need to give it a variable called n ops which is our sample size. And then we need to give it the alternative, and in this case, we follow a slightly different form of alternative. We're using a right-tailed test so it's a greater than sign. But for this particular command we need to say larger. And so, instead of saying greater or less, we say larger. And in particular, we want the first value that's going to print to results. And so here we can see this p-value is very close to zero, much less than our significance level. So we can reject the null hypothesis in favor that the proportion the difference in proportions is greater than zero.