Click here for a transcript.
So far in this lesson, we have been working with the standard error method for determining confidence intervals. And this is the method you know that we come through and we can find the mean the standard error we do a calculation. However, we really only have those values here one two three for certain specific intervals. Occasionally you might want to do an 80th confidence percent confidence interval or maybe a 75% confidence interval, and there are no easy calculations in terms of the standard error in that sense. That's where this percentile method comes in.
And so this method, we make the assumption that the data is perfectly normal. And so we can just take our confidence and interval that we're interested in, and just assume that it's centered on the median, and it's just plus or minus however much percentage on either side. And so with the 95 competence interval to calculate this we do 100 minus 95 equals 5 divided by 2 2.5. And so our confidence interval is 2.5 on the lower end and 100 minus 2.5 or 97.5 on the upper end. And we're going to use a function called quantile in pandas, but you can also use a function called NP percentile from num pi, if you would like.
So there are three steps to this, the first is to get the lower percentile LP. And again we're working with the bootstraps, so you still need to do the bootstrap sampling in order to get to this point. But instead of calculating the mean, the X bar, and standard error, we say dot quantile. And we give it our quantile in decimal form. So 2.5 percent is 0.025. And then we can just print the lower bound rounded. So 5.1. Then we calculate the upper bound, the upper percentiles. And again it functions very similar, so we say bootstrap sample means dot quantile and, in this case, we do 0.975 to get the upper bound. And so then we can print this and then, technically, this is our confidence interval. So we can then add say the 95 percent. And then we can give it LP do a just a space in there to separate LP from UP. And so here we have our confidence interval. If we want to plot this, however, we also need to get the median.
And so unlike, or the 50th percentile, unlike the standard error method where we center around X bar, in this case, we center around the median. And so, we could use the median calculation to get this the median command, but it's the same as the 50th quantile. And so we have 6.4 as our median that this is centered around. And so, if we wanted to continue to build up our plot here, just going to copy this and paste it down here. And so this plot contains two confidence intervals right now. The red confidence interval is our bootstrap standard error. The green is our data standard error. And now we're going to add a third one, which is our percentile method. So again geom bar H AES. In this case, we will send this up 2.75 X Min is just going to be LP X max is UP, and then we'll make it purple, and we'll give it a height of 0.25. And then we'll add our points within the AES. Y equals 0.75. X equals the median this time. The color is purple, and the size is three.
So we can run this and we can see our final completed plots. We can see the difference between the three confidence intervals. And in this case, because our data is actually fairly normally distributed, we can see that the percentile method does match fairly well with the standard error method. Sometimes you'll see that the percentile method is much larger or much smaller simply because of the assumptions that we make when we use the percentile method. So that concludes our bootstrapping and confidence interval lectures. Now you can use all of these skills in the lesson below.