EME 210
Data Analytics for Energy Systems

Working with Times Series

PrintPrint

Working with Times Series

  Read It: Working with Time Series

Working with time series requires us to work with DateTime objects, which is a special data type in python. Below, we will present three videos that walk you through various ways to plot time series, as well as how to create and work with DateTime objects.

  Watch It: Video - Basic Line Plots (2:56 minutes)

Click here for a transcript.

For the final visualization lecture of this lesson, we're going to be walking you through how to work with a time series. Now this will involve yet another data set. But we'll also show you how to work with date times which can be very useful in your own work as well as later on in the semester. And as we have done in the previous videos, we'll be using plot 9 or ggplot to do this plot.

So, we are here in Google collab file. We've got our libraries that w're going to be using and we've got our data. So I've already mounted my drive. I'm using option one from lecture three and I'm going to go ahead and read this data in. We can see from here the columns that we've got date time kilowatt hours produced in cumulative kilowatt hours. If we print the first five rows, this is a data set that is showing the total kilowatt hours produced by Dr. Morgan solar panels in 2021.

And so, from these first five rows, we can see that we've got early morning hours and no kilowatt hour production, which makes sense, there's no sunlight. And so, we're going to use this data set to explore some additional types of plots. And in particular we're going to focus on line plots. And so, there can be, so again working with ggplot, our data set I called it solar.

If we wanted to do a line plot it's just geom line. And again we need to give it an x value, usually a time, date, and a y value and in this case, we'll say produced kilowatt hours.

So, we can run that. Maybe. So, sometimes it does take a long time, but completed, and so we can see that this isn't the most well thought out plot. We can't read the dates. The line plot looks more like bars, which really isn't ideal. So, perhaps it might be better if we narrowed down where we wanted to plot. Maybe we just want to focus on September 8th.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

  Watch It: Video - Creating and Working with DateTime Objects (7:01 minutes)

Click here for a transcript.

Where we wanted to plot, maybe we just want to focus on September 8th. And so, before we can get into that, we need to create a new column. So, if we create new columns in data frames, we just need to specify it within quotes and square brackets, and in particular, we're going to use the PD to date time, and you can see it's telling me to click it. And what this does is, it takes hours, or it takes dates, and it takes time and combines them into a single date-time object. And so, in order to do that, we need to provide it what our date is. And then we tell it quote space quote, and then we tell it what our time is. And that space just allows the date-time object to not have the time running against the date, but to space it out.

And so, if we look at the top five rows, we can now see that we've got this date time object. Which is a combination of the date and the time, and additionally if we look at the data types of our new data set, we can now see the date time is this special type of data called a date-time object. Which means that we can use it in Boolean or conditional statements to find dates that are greater than our date of Interest or less than.

And so, to go ahead and get that data. We can create a new variable called solar subset in which we want the solar date time that is greater than 2021 September 08. And we want similar data in which the date time is less than the ninth. And so what this will do is, it will extract the data that is above the September 8th at midnight but before September 9th at midnight, so we'll effectively get all 24 hours of September 8th. And then, we don't really need all of the subset. So I'm going to extract a few columns. So, we want we still want all the rules, but we really only want, Well, to show you, we can really just extract date time and produced kilowatt hour. I forgot this print statement here. And so, there is how we could extract just the date time and just the produced kilowatt hour. But it's not necessary because we can always just call out specific values within our gg plot.

So, this is what our new data set looks like, and we can see that now it's just September 8th. So then we can create our line plot. So we can say, gg plot with the subset of data because we're just plotting the eighth at this point, and we can say geom line AES x equals date time as before and Y equals produced kilowatt hour. And so, here we can see that this is looking a lot better. We can see that this is the early hours of September 8th. This is the late hours. We can see that it spiked sometime in the middle of the day, as solar power is likely to do. But we can improve how this looks. I'm just going to copy it, paste it down here, and we can use the theme command as we did with the box plots early on to actually change this axis.

So, we can say axis text X and now before we said element blank because we were removing it, but in this case we're changing it. So, we say element underscore text, and we tell it what we're changing, and so we can just change the angle to 90. And so, now we can see that this is a little bit more readable. And we can see that this spike happens it hasn't attached the time to it, so it's not terribly useful because it's cut off that little bit, but you know we could assume that if the time was here it would be even an even better representation.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

 Watch It: Video - Using Statistics Within a Time Series Plot (5:29 minutes)

Click here for a transcript.

So, this is the actual production over a single day. But something that we'll often want to do is to plot averages or some other statistic. And so in this last bit we will go over how to plot the average hourly production. And to do that I'm going to create another new variable called solar hour which is equal to the date time, But just the hour value. And this is something that you can do with, one of the benefits of date time objects is that you can use a special set of commands that extract bits of time or dates for you without having to do some sort of complicated read the string and figure out what that first number is. so now we can see the hour is over here.

And so we can go ahead and plot this. Say solar again do a geom line AES and here say x equals hour and Y is still produced kilowatt hours. And so this is showing the, essentially, the total kilowatt hours produced over each hour in a day across the entire year. So we haven't gotten to the average yet, which is our goal of doing that. But to do that, we need to use a special plot plotting tool within gg plot. So we still tell it which data to use but now instead of saying geom we stay stat summary and then it still sort of works. And we still say X we still have an AES statement in which we have our X and Y that we've been using. But outside of that AES statement, we need to specify the geom. And so we've sort of just done things a little bit backwards, so if you remember with the bar plot we specified geom bar and then stat inside that. but in order to work with the line plot, we need to specify stat outside and line inside. And then we also need to give it the function that we're applying to the y-axis which we are going to use np.mean.

And so, now we can see that this looks a lot more like we would expect. We can see the average kilowatt hour produced. It's less than three across each hour of the day, given all of the data in 2021. So we can see that there was a spike just after six, but that by and large on average three o'clock tends to be the best time to generate solar power here in central Pennsylvania. If we wanted to show a confidence interval on this, I'm just going to copy this and paste it down here. We can add a second stat summary, maintain the same AES values,but our geom then can be ribbon and our function isn't just on the y-axis, now it's on the data. And it's mean confidence level bootstrapping, which we will get into in the next lesson what that actually means. But for now this is the way that we can get a 95 confidence interval, and I'm also going to specify Alpha so that we can see the line through the ribbon. And so, now we can see how this ribbon will follow the data. And so, this is a way that you can work with time series and create line plots that have different functionality, whether that's showing the actual data or the average or the average plus a confidence interval.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

  Try It: Apply Your Coding Skills in Google Colab

  1. Click the Google Colab file used in the video is here.
  2. Go to the Colab file and click "File" then "Save a copy in Drive", this will create a new Colab file that you can edit in your own Google Drive account.
  3. Once you have it saved in your Drive, use the partial code below to create scatter plots using plotnine/ggplot. 

Note: You must be logged into your PSU Google Workspace in order to access the file. 

from plotnine import * # import the library
 
# fill in the blank (...) to plot a line plot with 'Date' on the x-axis and 'Produced (kWh)' on the y-axis
ggplot(solar) + ...

# fill in the blank (...) to subset all times > '2021-09-08' and < '09-09-2021'
solar_subset = solar[(solar['DateTime'] > ...) ... (solar['DateTime'] < ...)]

# fill in the blank (...) to plot a line plot with 'DateTime' on the x-axis and 'Produced (kWh)' on the y-axis
ggplot(solar_subset) + ... 

# fill in the blank (...) to plot the average hourly 'Produced (kWh)' with a 95% confidence interval ribbon
(ggplot(solar) +
 stat_summary(..., geom = ..., fun_y = np.mean) +
 stat_summary(..., geom = ..., fun_data = 'mean_cl_boot', alpha = 0.5)
 )

Once you have implemented this code on your own, come back to this page to test your knowledge. 


  Assess It: Check Your Knowledge

Knowledge Check