EME 210
Data Analytics for Energy Systems

(link is external) (link is external)

Confidence Intervals: The Percentile Method

PrintPrint

  Read It: Confidence Intervals: The Percentile Method

For some confidence intervals, there is no easy standard error equation. In these situations, we can use the percentile method to estimate the confidence interval. In this method, we assume that the data is normally distributed so that the confidence interval can be represented by the percentage of data outside the interval. For example, with the 95% confidence interval, we can use the percentile method to say that the lower bound is at the 2.5% mark and the upper bound is at the 97.5% mark, with 95% of the data in between. To calculate this, we follow these steps:

  1. 100 - 95 = 5
  2. 5/2 = 2.5
  3. Lower Bound: 0 + 2.5 = 2.5
  4. Upper Bound: 100 - 2.5 = 97.5
  5. Thus, the CI is: [quantile(0.025), quantile(0.975)

This is demonstrated below.

  Watch It: Video - Introduction to Sampling Distributions (07:11 minutes)

Click here for a transcript.

So far in this lesson, we have been working with the standard error method for determining confidence intervals. And this is the method you know that we come through and we can find the mean the standard error we do a calculation. However, we really only have those values here one two three for certain specific intervals. Occasionally you might want to do an 80th confidence percent confidence interval or maybe a 75% confidence interval, and there are no easy calculations in terms of the standard error in that sense. That's where this percentile method comes in.

And so this method, we make the assumption that the data is perfectly normal. And so we can just take our confidence and interval that we're interested in, and just assume that it's centered on the median, and it's just plus or minus however much percentage on either side. And so with the 95 competence interval to calculate this we do 100 minus 95 equals 5 divided by 2 2.5. And so our confidence interval is 2.5 on the lower end and 100 minus 2.5 or 97.5 on the upper end. And we're going to use a function called quantile in pandas, but you can also use a function called NP percentile from num pi, if you would like.

So there are three steps to this, the first is to get the lower percentile LP. And again we're working with the bootstraps, so you still need to do the bootstrap sampling in order to get to this point. But instead of calculating the mean, the X bar, and standard error, we say dot quantile. And we give it our quantile in decimal form. So 2.5 percent is 0.025. And then we can just print the lower bound rounded. So 5.1. Then we calculate the upper bound, the upper percentiles. And again it functions very similar, so we say bootstrap sample means dot quantile and, in this case, we do 0.975 to get the upper bound. And so then we can print this and then, technically, this is our confidence interval. So we can then add say the 95 percent. And then we can give it LP do a just a space in there to separate LP from UP. And so here we have our confidence interval. If we want to plot this, however, we also need to get the median.

And so unlike, or the 50th percentile, unlike the standard error method where we center around X bar, in this case, we center around the median. And so, we could use the median calculation to get this the median command, but it's the same as the 50th quantile. And so we have 6.4 as our median that this is centered around. And so, if we wanted to continue to build up our plot here, just going to copy this and paste it down here. And so this plot contains two confidence intervals right now. The red confidence interval is our bootstrap standard error. The green is our data standard error. And now we're going to add a third one, which is our percentile method. So again geom bar H AES. In this case, we will send this up 2.75 X Min is just going to be LP X max is UP, and then we'll make it purple, and we'll give it a height of 0.25. And then we'll add our points within the AES. Y equals 0.75. X equals the median this time. The color is purple, and the size is three.

So we can run this and we can see our final completed plots. We can see the difference between the three confidence intervals. And in this case, because our data is actually fairly normally distributed, we can see that the percentile method does match fairly well with the standard error method. Sometimes you'll see that the percentile method is much larger or much smaller simply because of the assumptions that we make when we use the percentile method. So that concludes our bootstrapping and confidence interval lectures. Now you can use all of these skills in the lesson below.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0(link is external)

  Try It: Apply Your Coding Skills in Google Colab

  1. Click the Google Colab file used in the video is here.(link is external)
  2. Go to the Colab file and click "File" then "Save a copy in Drive", this will create a new Colab file that you can edit in your own Google Drive account.
  3. Once you have it saved in your Drive, use the partial code below to calculate the 95% confidence interval using the percentile method.

Note: You must be logged into your PSU Google Workspace in order to access the file. 

1
2
3
4
5
6
7
# step 1: calculate lower percentile (this is the lower bound of the CI)
LP = ...
 
# step 2: calculate the upper percentile (this is the upper bound of the CI)
UP = ...
 
print('the 95% confidence interval is ', ...)

  Assess It: Check Your Knowledge

Knowledge Check

 FAQ