EME 210
Data Analytics for Energy Systems

(link is external) (link is external)

Box Plots

PrintPrint

Box Plots

  Read It: Box Plots

In this lesson, we are going to demonstrate how to create bar plots using ggplot/Plotnine.

  Watch It: Video - Box Plots (6:24 minutes)

Click here for a transcript.

Hello. Welcome back. In this video we're going to continue to talk about how we can create visualizations in Python using Plot9 or ggplot. And in particular we're going to focus on box plots in this video. All right, so we are back in the same visualization Colab file. We've had our libraries continuing to use Plot9. We have our data that we went through in the first video to clean up and rename a number of columns, and now we're going to get into box plots. And now once again, we're using ggplot. And I'm going to create these parentheses so I can enter down between options. So, we start off with what we do for every ggplot object, and so we start off with ggplot survey, and then we have some kind of geom. And here our geom is going to be box plot. And then we need to have our aes. And so, the box plots in ggplot are a little bit different than some other box plots, or some other plots, they do require both an X and a Y variable to be put within our aes parentheses. However, you can just state a single value, such as 0 for the x-axis. And then here we need a quantitative variable for the y-axis, which we will use for credits. And then outside of the aes, we can add a fill. And let's just fill this one with pink. So, we can run this, and we can see this box plot.

And so, we've got credits and we can use this, we can see where our median is, where our first and third quartile is, as well as the min and max. But we've got this sort of weird x-axis that isn't necessarily the best to include, because it has nothing to do with the data itself. And so, if we wanted to change that, we would need to work with a function called theme. And within theme, you can use this to do about anything. You can change legends, you can change text, you can change the background. Here, we're going to be using it to remove the x-axis label, the x-axis text, and the tick marks. So, we'll start off with the title. So, we can say axis_title_x. And so, here we can add any number of things. We can change it. But what I'm going to do is say element_blank. This will just say the x-axis title, make it blank.

And then the next we'll do is we'll do the text. So, this is actually these numbers down here, negative 0.4, negative 0.2, 0, and so forth. So, axis_text_x, and again element_blank, and then the last thing we wanted to remove were the tick marks. So, we can say axis_ticks_major_x. So, you can start to see the pattern here. We tell it what bit of what part of the plot we're working with, so the axis, then we tell it what part of the axis that we're working with, here the title, the text, or the major ticks. And then we tell it which axis we're doing. So, when we run this, we can now see that we've taken away all of that that was on the x-axis, makes it look a little cleaner without that additional set of numbers. That comes from the fact that we are forced to provide some x value.

So, this is how you can make a single box plot, but say you wanted to break up this box plot by credits. So, similar to our bar plot from that video, we want to look at how credits are separated by major. And so, here we can once again say ggplot. With the survey data we can say geom_boxplot(aes. And now, because we actually have two variables that we're doing, we can give our categorical variable on the x-axis and our quantitative variable on the Y, and we can still fill with pink. And so, here we can see how each of these different majors varies within their major in terms of credit hours. So, we can see that PG and E had the greatest variation in credit hours being taken, whereas our other and these two majors represent a single variable point, so not much… no variation there. You can see that they had less.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0(link is external)

 Try it logo Try It: Apply Your Coding Skills in Google Colab

  1. Click the Google Colab file used in the video is linked here(link is external).
  2. Go to the Colab file and click "File" then "Save a copy in Drive", this will create a new Colab file that you can edit in your own Google Drive account. 
  3. Once you have it saved in your Drive, use the partial code below to create box plots using plotnine/ggplot. Alternatively, you can continue to build off of the code used in the bar plots example.

Note: You must be logged into your PSU Google Workspace in order to access the file.

1
2
3
4
5
6
7
from plotnine import * # import the library
  
# fill in the blank (...) to plot a box plot of 'Credits'
ggplot(survey) + ...
  
# fill in the blank (...) to plot a box plot with 'Major' on the x-axis and 'Credits' on the y-axis
ggplot(survey) + ...

Once you have implemented this code on your own, come back to this page to test your knowledge. 


  Assess It: Check Your Knowledge

Knowledge Check