Click here for a transcript.
Hello. In this set of videos, we're going to be talking about how we can add second, third, and sometimes even four variables into our plots. But before we get into the visualization, I'm also going to take a video to show you how we can use melt, the function, to transform our data into a more plottable format.
So, here we are in a new Google Colab file for this part of the lesson. And in particular, we've got the libraries that we're using that we've used before, but we are now adding in num pi, which we have nicknamed as np. So, let's get into the data. So, I've already mounted my Google Drive using option one from earlier. And I'm going to go ahead and read it in. And the data that we are working with is from mission scenarios. So, this data, actually going to cut that, cut that short to make it a little bit easier to type later on. And essentially what these emission scenarios are, they're published by the US, and they're saying what percent change that we might have in emissions based off of different scenarios. So, in 2005 we assumed 100 percent fossil fuel emissions. And then, when we do reference in 2035, those emissions drop because we're actually having negative emissions from zero carbon and solar, and we've got differences here based off of these different scenarios.
And now we could go ahead and plot this to see a variety of different plots, but it's very difficult to work with data sets that are, what we call, wide data sets. Meaning, that there's many columns that fall into different row-based categories, instead of being long and having a short amount of column-based categories. And so, in order to get this data set into what we call long form, we need to use the melt command. And so, I'm going to create a new data frame called emsc_melt. And I'm just going to type our original data frame dot melt.
And there are several things that go into melting the data set. The first is that we need to tell it which variables we want to maintain as they are now. And in this case, we want to maintain category. So, this is the column name. Up here, we want these variables to be the same in our new melted data set. What we want, though, is all of these numbers to be in a single column. And we want these row or column names to become their own categorical variable. And so, we need to provide what we will name our new variable, and in this case, this is whatever your column names currently are. So, we'll just name it source because that's the source of electricity up here. And then we also need to provide a value name, and this is whatever your actual numbers are. And so, we will just call that, emissions. And so, then if we print this, we can see that this data is now what we call long form. All the numbers are in a single column, and we've got these categories. Instead of being individual columns, they're now within a single column to specify. And this really helps us when we go to plot, which we will do in the upcoming videos.