EME 210
Data Analytics for Energy Systems

Motivating Example: 2021 Texas Power Crisis

PrintPrint

Motivating Example: 2021 Texas Power Crisis

Before we progress into some of the common types of hypothesis tests, let's introduce the data and problem that will be used to exemplify these tests. As an example of hypothesis testing, we'll examine wind and natural gas power generation during the Texas Power Crisis of Feb. 14-15, 2021. An excellent article about the event is in Gold, R. (2022) The Texas Electric Grid Failure Was a Warm-up. Texas Monthly, Feb. 2022. Data are from ERCOT and available in L11_IntGenbyFuel2021.csv, accessed from ERCOT on Sep. 19, 2021. This contains power generation from all fuel types for all of Feb. 2021, reported in 15-minute intervals.

In addition to these power generation data, ERCOT forecasted the peak capacity from wind as 6.1 GW and from natural gas as 48.4 GW. These are their predictions of how much each energy source would need to generate in order to meet demand during the unusually cold winter storms associated with this event.

Some important terminology that is used in this example includes:

  • Capacity - the maximum amount of power that a generator can produce at a particular time

  • Forecasted peak capacity - the anticipated amount of power needed to meet peak demand

  • Generation - the power produced from a source (generator)

Also, bear in mind that for units of power we'll use:

1,000,000,000 W (watts) = 1,000,000 kW (kilowatts) = 1,000 MW (megawatts) = 1 GW (gigawatt)

and for energy:

1 Wh (watt-hour) is 1 watt spent for 1 hour.

 Watch It: Video -  2021 Texas Power Crisis (29:05 minutes)

Click here for a transcript.

Hi, in this video I'm going to go over the data set that's going to be used for the next several hypothesis testing videos and used throughout all the different demonstrations of different forms of hypothesis testing that's coming up. So, this data set comes from ERCOT and contains the power generation for different sources for two days from the Texas power crisis in February of 2021, those two days are February 14th through 15th. It's sort of at the height of the crisis in on the 14th ERCOT started shedding load basically in response to the winter storms. And so, we can see how power generation fluctuates during that time. In this data set, the data are containing 15-minute intervals throughout that time period so a lot of data here and aside from this data set we know from other sources that the forecasted peak capacity during this time for wind was 6.1 gigawatts, and that forecast to peak capacity for natural gas was 48.4 gigawatts. What this forecasted peak capacity means is, this is basically ERCOT's prediction of how much power or energy would be needed from these two different sources when did natural gas during this time frame to meet demand during this unseasonably cold winter for Texas. So, through the remainder of this video, I want to go through the processing of this data set. So, let's get to it.

So, we'll start by importing the data. I've got to contain the Google Drive. So, I've already mounted my Google Drive, and and before I import it, it's always good to click on the CSV and open it up either Google Sheets or Excel of course, as we've done before. And so, here we see the the raw data format of this data set. We've got our dates in the first column, the fuel type, biomass, coal, gas. So, a lot of different sources here, not just wind and natural gas. And here we see two different forms of natural gas. So, there's natural gas and then combined cycle natural gas, settlement type, total energy, or total power I should say, and then times that throughout the day. So, starting basically at midnight and going out throughout the day in which power was generated, and then in the cells here we've got the different amounts of power generated. And we know from our cut that these values are in megawatt hours. So, let's get down to the processing steps. First off, we'll import the data set using using the usual command here. Also, good to look at just a preview of the data we can do gen.head to see what we have here. Look at the first five rows and last five rows, and we see what we saw in the csv, basically. So, it looks like everything was imported correctly. And also, it's not showing all the columns, here. So we're sticking out, you know, some of the middle columns. But we can see that it goes up from, you know, 15 minutes after midnight, all the way up to midnight of each day. Let's take a look at the column headers here. So, gen. columns.

Always take a look at the column names again date, fuel, settlement type, and then all the times throughout the day. Okay, all well and good. So, our first order of business is to filter out some of these rows. So we want to focus on just those two days every 14th and 15th. And additionally, we only want to focus on two power sources, wind and natural gas. So, we can filter based upon those two criteria. So, let's make a new data frame. We'll call it gen.fil for filter. Well, we can call it anything here. And our data frame is currently called gen standing for generation, power generation that is. And then we want to filter this. So, we'll have one filter or set of filters for date, and then we'll have another set of filters for power source. And so, in date we'll have we have two dates to filter. And we'll separate those by and/or statement. So, it could be February 14th or February 15th. And so here we just need to say gen.date equals, and then we need to make sure we get the formatting exactly right here. And here it's useful to have this preview in here it's given by head here. We know that it takes on this format, so month, year. And we want to edit this to be either the 14th of 2021, or the 15th of 2020. Right. I'll do the power source on the next line here. Let me organize this a little bit better. And so, our power sources as we saw in the file could be wind. So, our fuel could either equal wind with a capital W, or it could be just gas. Or we have that third type of gas dash CC for combined cycle. Okay so let's make sure that that worked. Make sure we go back to preserve indenting here, keep it organized. Oh, we have an error here, series object is not... Oops, I forgot an or statement here. There we go.

Okay then, we have it. So now our fuel is only limited to gas, gas-CC, and wind. Okay, that's all well and good. Next step is that we need to rename one of our columns here. Now this is just a particular set for this data set, but it turns out that what ERCOT calls night, zero, zero, zero this is going to be, they allocate this to say, for example, February 14th when really in reality you know midnight belongs to February 15th, right? So, let's just change this to to
11:59 p.m. We're 23:59 when you're using military time. So, that it's attributed to the right day. The reason for this is that the built-in date time tools in Python are going to think that zero zero zero should belong to the 15th, when really that data value that we have, there is attributable to the 14th, for example. So, we'll just make this quick edit, gen.fill. And we just need to rename. And we'll rename columns. And so, our existing name zero colon zero. Zero should instead become 23 59. So, even though we're you know, taking it back one minute, that's really not substantial enough to ruin our data set.

So there we have it. Oh this should be encased in squiggly brackets, of course. There we go. So, moving on here, we've you're done with that, now we can remove some superfluous columns here. So, fuel, sorry not fuel but settlement type in total we don't need. So, let's go ahead and get rid of those. So, we'll continue just writing over gen.fill. Instead, we'll say we'll either a drop command to drop from gen fill some columns that are equal to and in squiggly brackets settlement type and total, so we don't need those for the rest of the analysis, and we'll just check on this always good to check and make sure that that worked and there we go we don't have settlement type we don't have total now this next step is a little bit tricky but um essentially we need to think about so these first steps for you know fairly straightforward we can get rid of some extra stuff here and um you know filter out the things we don't need basically But ultimately, we need to think about how we want this data organized. So. And to do that, a good place to start is to think about what are the real cases here. Well, the case, or each individual case, is the power generation. And this is power generation from each source, on each day, at each time. So, for following our standard format of the data frame having one row for each case. We need to have these generation values all in one column. And we can have a separate column that indicates the field type in a separate column that indicates the date, and a separate column that indicates the time. Turns out we're going to combine date and time so that we don't have, so we have this in one column here. But basically we need to distinguish each value here by those other columns, the fuel, the date, the time. Okay. So in order to get it into that right format, to reshape what we currently have here into something that has those columns, we're going to use this melt function. All right, so how we do this is, again, we'll just write over genfil. We'll replace this and we'll call our dataframes genfil.melt. So, we're going to use our melt function here.

We start with our id_vars, so the vars that are identifying the cases. And so, in square brackets, this will be the date, and the fuel, of var_name will be time, so that'll be unique for each one, and then our values. So, our value name is going to be, sorry the time is going to be a new variable that's going to create, that's going to take all of these current column headers a different time and put them into this new variable called time. And then value name we can call this generation. And we know from beforehand that these values are in megawatt hours. So, we'll keep the units there. So, it's going to take all the values that are under these time columns and put them into this generation card. So, in other words just to summarize, so these id_vars are the current ID variables: date, and fuel. So those are static. We're going to keep those. All the other columns are going to go into a new variable called time. And the values underneath those are going to go into a new variable called generation, defined by value, meaning okay. So, let's take a look and see how this has worked.

And there we have it. We've got our new structure. So instead of a wide fat table, we've got a skinny, narrow table. We've got date, we've got the fuel type, we've got time, and we have generation. So, each of our cases there has a single row. And that's exactly what we want. Now, as I said before, we can combine date and time so that we have something called DateTime that's going to be unique for every case here.So, we need to first import a package datetime. Then we will create a new variable. We'll just call it DateTime. We could call it anything we wanted but DateTime is pretty straightforward. And then we'll say from this pandas package, we need to, we're going to use this function to_datetime, so transform something to a datetime object. And the thing we're going to transform is genfil dot date column, date here, plus, and here I'm going to put in a blank space and another plus and then our time. So, here we're we're not only pulling in the values under date and under time, those current columns here, but we're also specifying the format. We're going to take those and we're going to separate them by a single space. So, let's see what this looks like here. Go ahead and run it. And there we go. We've got a new column, DateTime, that's got the date, space, time, which is handy.

Okay, so if we wanted to, we could remove date and time from this but that's not necessary. One other thing we need to do is we need to convert our generation, which is actually in units of energy here, so megawatt hours, and convert this to power. And ideally, we want this in gigawatts. The reason being is because we have our forecasted peak capacity in gigawatts here, and what we're going to want to do is compare the actual generation to these peak capacity values. So we need to have them in the same units. So let's go ahead and convert megawatt hours to gigawatts. And this is going to be here, we're basically using a state of the known formulas or unit conversions that we're going to enable here. Okay, so we'll start by taking these existing generation values, and we can see the first thing we need to do is we need to get rid of these columns, the commas that are in here. So if we took these directly, Python's not going to recognize them as numbers. We need to remove the commas in order for it to recognize it as a numeric variable. In order to do that, we will replace these strings. We're going to replace the commas with nothing. So the way this replace function works is we take our variable, we say dot str. This is referencing a string package and we're going to use this replace function. We're going to take what we want to replace, the commas, and we're going to give it a second argument of what we want to replace it with, in this case nothing. So that'll just simply remove the commas. We can go ahead and make sure that worked. And there you have it.

Now we can convert this to be numeric. I didn't do the, didn't look at D types before, data types before, but if you did that you'd see that this is actually a character column, character variable. So we'll say to_numeric, so we can convert that to a number now. Because ultimately we're going to want to do some quantitative operations on this so we need it as a number. All right, so now that it's a number, now we can actually do the unit conversion. So we'll start here, instead of generation and megawatt hours, and we can say the units. You're not going to be gigawatts. And what we'll do is we'll take our existing variable generation to megawatt hours and we will divide by 0.25, because there's, 15 minutes is a quarter of an hour. So we're dividing by these hours to take it out of the numerator, here. And also additionally, we're going to divide by a thousand because there's a thousand megawatts in a gigawatt.

So we can make sure that that worked. And there we go. So we've got our generation in gigawatts here and then we can now, we can clean up a little bit. Also, we can drop some of these unneeded columns now. So we'll drop columns. We can get rid of date, we can get rid of time, and we can also get rid of generation in megawatt hours. This is going to give me some problems because I've already read that, but anyway we can, here we can put these in a new chunk of code I'll call columns. And there we go. So anyway, now we have our cleaned up data set. The reason I got this error before is because I had already run these. And so now it can't find, I can't do this replacement because I already replaced it. So anyway we've got our cleanup data set here with just the variables we want; the field type, the datetime and generation. One other thing we need to do is we've got gas and gas combined cycle, let's combine these into just natural gas, because these are two different forms of natural gas. We're going to lump them together because ultimately that forecast to feed capacity is considering both forms of natural gas, and the generation data is segregating those two sources. So, in order to do this we need to first pivot our data frame. So let's make a new object called genpiv, and what we're going to do now is we're going to, we have something  that's tall and skinny here, and we're going to make it a bit fatter. We're going to have separate columns for a fuel type here. So we're going to essentially do the reverse of that melt so that we can then do some operations on the two different columns. So we'll say we'll define an index here. Our index is DateTime, sorry, index is DateTime because that's a unique identifier for these rows basically, well, combined with the fuel type, I should say. and then we're going to make some new columns that are based on fuel. And then the values for this new data frame are going to come from our generation in gigawatts. Okay, so let's take a look at what that looks like.

So there we go. Now we have a data frame that has indices. So this is not a column right now. These are indices of DateTime. And then we have gas, gas combine cycle, and wind. And the values that are all in generation in units with gigawatts. Okay. So now we can combine gas, and gas CC, simply by adding those two columns together. So I'll make a new column here. Let's call it Natural Gas, and we'll take our gas column and our gas CC column and we're just simply going to add those two values together. So we're going to aggregate them. Now after we do that, the gas and gas CC are going to be extra, so we can get rid of those. So get rid of gas and gas CC since those are no longer needed, and let's see what we have now. So there we go. We've got now just two columns; wind and natural gas. And you know, we see that gas, gas CC, 7.5, 26.2, and add those together 33.7. That's about right. So there we go. Now we want to get it back into the formula we had before where fuel was a column, not two separate columns. So, let's melt again. Before we melt, we need to re-establish DateTime as a column.

So what we're going to say is, we're going to say from genpiv, take to index and make it a view column called DateTime. Let's see what that looks like here. So we still have DateTime as an index but we've also separately made it as a column here. And then from genpiv, we can say melt again, our id_vars in this case are going to be, is going to be just DateTime and our values are going to be in generation gigawatts. Go ahead and run that. And there we go. That's what we want. So we have a column of DateTime that has a case in the times, fuel type, and then our generation values.

Lastly, let's go ahead and visualize this. This is an important step for hypothesis testing, and seeing what the data looked like this gives us a good idea of how to set up our subsequent hypotheses. So, let's make a line plot here, where we will have a line, one line for each energy source and plotted over time here. So, GG plot, we'll take our data frame genpiv and we'll have a line where X is going to be our DateTime and Y is going to be our Generation. We need to get this name exactly right so it matches the column there, and also we want to specify the color in this case as fuel. So let's see what this looks like. There we go. So we've got our different fuel sources; natural gas, and red and blue plotted over DateTime here. We can see natural gas is a lot higher than wind, so there's a lot more energy coming from natural gas than wind just because there's a larger natural gas infrastructure in Texas. Although Texas does have comparatively large wind infrastructure compared to other states.

We will, let's clean this up a little bit. I don't really like how you can't see the date and times here very well. So, let's adjust our theme and make our axis text on the x-axis a little more readable. We'll say element text, and change the angle to be 75 degrees. You can see the effect of that. There we go. So now we can read the dates here a bit better. But it's cutting off the time, so let's before we do the theme, I like putting the theme at the end. Before we do that, let's adjust the the date labels. And we can control the formatting of this as such, so we can say we want month and day, so we're only going to extract the month and day because they're all the same year, it's all 2021. So it's also all the same month. We could get rid of the months, but let's keep the month and day. And we also want to include our, and separated by colon, the minute. We'll put a plus here. So let's see the effect of that. There we go. So, now we've got some time information there. So, we have have the hours and also the minutes displayed.

So that's great. Let's also put in the capacities. So, we have the actual production plotted here. Let's see how he stacked up against our capacities. And so, we can insert those capacities, since those are essentially constant values over the whole time frame here. We'll put those in as horizontal lines, and if we want to do an H line, we need a y-intercept argument. So basically, what's the y-value for that horizontal line? And so, for natural gas this is going to be 48.4, and our color here can be red and, in our line, type will be dashed. Let's make it a dashed line. And then we'll also do this for wind at 6.1 gigawatts, and let's make this blue. Let's go ahead and see that, so there we go now. Yeah, I don't really like how these are different shades of red and different shades of blue. So, I'm pretty picky here. If we wanted to control that a little bit better, we can say we can define our color scale for the curves, so for the lines, and we'll set some values let's switch this up. Let's say, so we'll have natural gas be the blue and wind be the red. It doesn't really matter. There you go. So now the colors match perfectly in terms of shade. Just how I like it. So, but the main takeaway here is that, well, we can see natural gas is consistently below its capacity over this two-day time frame. Wind is below its capacity sometimes, and sometimes it's above it. So, that's interesting. So, let's retain that thought as we go forward in the subsequent hypothesis testing. Okay?

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

The Google Colab Notebook use in the above video is available here, and the data are here. For the Colab file, remember to click "File" then "Save a copy in Drive". For the data, it is recommended to save to your Google Drive.

The general question we will ask with this dataset is:

"Between wind and natural gas power sources, which did not meet its respective forecasted peak capacity?" 
 

We will answer this question in different ways, using different forms of hypothesis testing, in the subsequent pages.