Click here for a transcript.
Welcome back to our series on bootstrapping. We left off the previous video after having made this plot where we show the bootstrapping distribution compared to the actual data in the original sample means. What we primarily use bootstrapping for is to do statistical inference. We want a larger sample size so that we can feel more confident in our final statistical inference. And one of those things is performing confidence intervals. And so a confidence interval is generally going to be two numbers. We've got the mean plus or minus something times the standard error. And so depending on the confidence level, you can have different calculations. We're going to focus on the 95 percent confidence interval. As this is by far the most common confidence level that you will see. In practice as well as in this course.
And so there are three steps to calculating the confidence interval with the bootstrap samples. The first is to actually calculate X bar. So what the average of the sample means is. and so it is critical to note that everything that we're doing with confidence intervals is on the bootstrapped sample distribution. So we're using boot means, which we calculated up here to show right here. And so essentially we want boot means bootstrap sample means dot mean. So the average of the sampling distribution. And so then we can print, x r is. So this is our mean that goes into these equations. The next step is to calculate the standard error. And recall from our previous video we are working with the standard error because we're working with a sampling distribution. So we say s e is root means, bootstrap sample means dot STD. Then we can also print that. We can say s e is.
And so now we have the standard error. And so now we're ready, that's the only two things we need in order to calculate the confidence interval with the bootstrap distribution. And so we can create an array using square brackets, where the first value is XB minus 2 times SE, and the second value is XB plus 2 times SE. And then we can print that, and we could just print you know CI similar to what we've been doing up here, but to show you a slightly different way to print things, still using the print button. Instead of just typing out the variable name here, I'm going to use a mini function. So I'm going to say round Y to the third decimal place for all Y in CI. And so, similar to our for loops, this Y is just a placeholder. It just needs to be this to match that, and then it'll figure out as the Y's apply to our values in CI. so if we run that we can see the 95 percent confidence interval is 5.058, 7.75.
And so that is all you need to do in order to get the confidence interval. But if we wanted to visualize it, I'm actually going to come up here and take this lot from our previous video, audit here for reference. We want to add the 95 confidence interval to this. So we have a new geom. This is called error bar H for horizontal AES. We give it a y value. And this is just going to be the position on the y-axis based off of what numbers we see here. So I'm just going to do 0.5 to put it halfway, and then we need to give it the X min of our error bar. So this is the first value of CI but attached to the zero width index. And then we do the same for x max. here CI 1 and outside the AES we can give it a color and say red. And then we're going to also add a point, a single point here. We're going to say same y. but our X is going to actually be X bar so that we sort of show where the sample, the mean of the sampling distribution is in reference to our confidence interval. Again, outside the AES we say color is red, and we'll give it a sufficiently large size so that it stands out. So if we run this we can see that now we've added this confidence interval to our lot, and we've got our X bar here in the middle which lines up with the middle of our normal distribution. And we can see here and so with our confidence interval, essentially we're saying that we're 95 percent confident that the true mean lies within this interval. So this is how you can add the confidence interval to the plot after calculating it.