Two Sample Comparison: Wind vs. Natural Gas

Read It: Two Sample Comparison

The next few hypothesis tests we'll introduce are for when one wants to compare one sample to another, and in particular infer a relationship between their respective population parameters.

Continuing with motivating example from the 2021 Texas Power Crisis, we will now look at comparing samples of wind versus natural gas power generation. The following video shows the additional data processing steps in Python for enabling this two sample comparison. In particular, since these two sources have very different levels of power generation, we need to calculate a different variable in order to compare these samples fairly. This variable is the "percent deficit", which captures the percent of the forecasted peak capacity that each source has met.

Watch It: Video - Two Sample Comparison: Wind vs. Natural Gas (6:00 minutes)

Click here for a transcript.

Hi. For the remainder of lessons and videos under hypothesis testing, we're going to look at two-sample comparison. So comparing the wind sample that we have here, is red bumpy curve versus the natural gas sample, just blue bumpy curve. Now before we get into actually, any actual hypothesis tests here, I want to talk a little bit about an additional variable that we're going to calculate out of this. So, let's get to it. So, the reason for calculating an additional variable here, is to have something by which we can compare wind versus natural gas on an even basis. We can see clearly here, that natural gas is producing a lot more energy than wind. And it also has a much larger forecasted peak capacity than windows because of these natural differences. And these natural differences arise because natural gas is a much larger system in the state of Texas than wind is. In other words, Texas relies way more on natural gas energy than wind energy. But because of that, we're kind of comparing apples to oranges here. And we want to level this playing field and come up with a metric by which we can compare fairly, wind versus natural gas and directly answer the the question. Well, which one has underperformed more? If we just looked at generation alone, well, we'd say wind is underperforming simply because it's not making as much energy as natural gas is. But that's not fair, because there aren't as many wind turbines where there isn't as much wind energy to begin with in the state of Texas than there is natural gas. In other words, a lot Texas is way more dependent on natural gas here. So, in order to facilitate a fair comparison between wind and natural gas, we're going to look at the percent deficit.

And this percent deficit is the generation. So, that bumpy curve minus the forecasted peak capacity, divided by the forecasted peak capacity. So, in other words, this is basically looking at how much each energy source has either gone above or below that dashed line on a proportional basis to that capacity level. So, how are we going to calculate this? So let's just make a couple of new variables here. Recall that the data object that we're working with is genpav. That's our data frame. We're going to use this dotloc function to define some new values.

So, we're going to say anywhere where the fuel is equal to wind, we're going to make a new variable called capacity, and set it equal to 6.1. At the same time, we're going to do the same thing, except when the fuel is natural gas under this new capacity variable, we're going to set it to 48.4. These are gigawatts. These are the forecasted peak capacities in gigawatts that we talked about in the first video. Then with those in place, we can calculate a new variable called deficit that will simply be our generation in gigawatts here minus this capacitive. So, that's our deficit that's the numerator the percent deficit and then if we want to just get this into a percent deficit.

We can do our deficit with a capital D, divided by capacity, times 100 percent. So, let's see what this looks like. So there we go. Here's our percent deficit. Many instances it's negative. In other words, that means it's below the forecast peak capacity. I think some of these rows that are omitted here for when to be positive, let's visualize it. So, to start off with our visualization, let's go back up to our previous visualization, borrow from that. And so, we'll still do a line plot. We'll still have the same colors here, except we don't want to plot generation, we want to plot our percent deficit. And we'll color by fuel type. We'll sell blue and red, except now since we're looking at percent deficit, these capacities they're not as relevant anymore. So, let's delete those. Instead, we want to compare. Two is zero percent, right? Is there a zero percent deficit? Let's look at this as a black line dashed, both. I'm going to do it as black because regardless of whether it's wind or natural gas, we want to compare it to zero percent deficit.

There we go. So, the red, wind, blue, natural gas, black, dashed line at zero percent deficit. We see that natural gas is always in the negative, and wind is sometimes perhaps most of the time, well, 64 of the time, as we learned from the test of single proportions, below zero. Okay, so that's what we're going to use. We're going to use the percent deficit in the remaining two sample tests. So, comparison of means both paired and unpaired, and a comparison proportions.

The Google Colab Notebook use in the above video is available here(link is external), and the data are here(link is external). For the Colab file, remember to click "File" then "Save a copy in Drive". For the data, it is recommended to save to your Google Drive.

Two Sample Comparison: Wind vs. Natural Gas