EME 210
Data Analytics for Energy Systems

Another Example of Chi-Square Tests

PrintPrint

Another Example of Chi-Square Tests

Read It: Another Example of a Chi-Square Test

So far in this lesson, we have been working with an example where all the expected proportions were equal. However, often the proportions we were work with are not expected to be equal. This situation increases the complexity of our chi-square test, since we need to make sure we are ordering things correctly within our expected counts array. Below, we demonstrate how to implement a chi-square test with unequal groups. 

 Watch It: Video - Chi-Square Test Part 2 (9:02 minutes)

Click here for a transcript.

Welcome back to the videos in chi-squared testing. In this lesson, in this video, we're going to go over another way that you might encounter chi-square testing. So, in the previous set of videos, we had a deck of cards that was all that had equal proportions across each of the suits. But oftentimes our populations won't have equal proportions across all groups. And so, in this video I'm going to show you how you can do a chi-square test where we have different populations across groups.

So here in this code, I'm calling it chi-square test part two, we've got some data on undergraduate enrollment in the College of Earth and Mineral Sciences in the Fall of 2022. And our goal is to compare the enrollment in the whole College of Earth and Mineral Sciences to enrollment in one of the courses we taught during the academic year. And so, our null hypothesis is that, is just each proportion based off the total enrollment on EMS. So, we can see that each major, EBF, Energy Engineering, Environmental Systems, and so on, has a different proportion of students. And then our alternative hypothesis is that at least one of these is not as specified. And so, we can go ahead and run the code that I've already got set up, which is essentially to read in the first week survey data. We can see that we've extracted just the majors from that survey. And then we can start by figuring out the count. So, before we were doing... so, we already knew the count right off the bat. But in this case, we need to actually extract amounts. So, we use this dot value counts command to do that. And we can see that we've got some other data. And we've got this Energy Engineering other data that we, that isn't going to work with our existing data set because we don't have any proportion specified for the other, or for dual majors. And so, we can go ahead and update this. First, we're going to account for this one dual major, here. We still want them to count towards the Energy Engineering count, but we don't need the other dual majors, so we can say dot replace. And so essentially, we're just going to replace comma other with nothing, empty quote open close quotation nothing, there. So, we can do that. And then we also need to remove these other majors. We don't know what major they are, so we can't count them towards any of the existing majors. So, we can just say that we want only the majors that are not equal to other.

So that ran, and then we can recount the data, this time storing it in accounts variable. So we can say majors needs to be plural there, dot value counts. Go ahead and print the counts. So here we can see how our counts actually work now that we've removed the other and added that dual major into Energy Engineering. So, this is our observed counts. The next step in our chi-square procedure is develop the existing or the expected counts, so counts_exp, is just counts dot sum. So this is our sample size of our observed data, times the array of proportions. And now, before, our array of proportions was just four times 0.25, but now that our proportions appear, are different for each category, we now need to manually import each of those. And a very critical point here; the order has to be the same as the order up here, because we ultimately are going to be comparing these two values. And so, we need to make sure that the order is the same across both observations and expected count. And so, I've listed the order here that is based off of this order. So we can do 0.493, for pp and g points 172, 0.163, and 0.04. And these values also match what we have up here in our null hypothesis.

So we can run that. We get our expected counts and then we can run the chi-square test. And for this purpose, I'm just going to do the one-liner. So we already know how to do the randomization procedure, but in this case, we'll just do the one- liner. So we say stats dot chi square. Our observation, we call it counts. Our expected values we call counts_exp. And then we can print p-value, which is just results dot p-value.

And so, then we can see the p-value is .017. And so we can write our conclusion. The p-value is less than 0.05, our significance level, so we reject the null hypothesis in favor of the alternative, that at least one of the proportions is not as specified. At least one, however, this chi-squared test doesn't tell us which one is different. It just tells us that at least one, at least one is different. And so, in order to actually get an idea about which one is different, we can just subtract our expected counts and our observed counts. And so, here we can see that the most extreme difference is png e, which had 11 more actual attendance than expected. Since the value is negative, that means observation is greater than expected. Meanwhile, the Environmental Systems group had nearly six less than expected. And so, we can then start to make some, have a discussion about which of these majors are primarily responsible for the rejection of our null hypothesis.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

Try It: GOOGLE COLAB

  1. Click the Google Colab file used in the video here.
  2. Go to the Colab file and click "File" then "Save a copy in Drive", this will create a new Colab file that you can edit in your own Google Drive account.
  3. Once you have it saved in your Drive, try to edit the following code to conduct a one-line chi-square test with unequal groups. Remember to match the order of your expected counts to that of your observed counts!

Note: You must be logged into your PSU Google Workspace in order to access the file.

# Determine the count of each major
counts = ...

# calculate expected counts 
counts_exp = ...

# run the chi-square test
results = ...
print('p-value: ', ...)

Once you have implemented this code on your own, come back to this page to test your knowledge.


 Assess It: Check Your Knowledge

Knowledge Check