Astrology
From this skeptical analysis of some astrology data, listing the numbers of famous rich people in each sign, we see the use of the chi-squared goodness of fit test. The data are:Sign | Number of People |
Aries | 95 |
Taurus | 104 |
Gemini | 110 |
Cancer | 80 |
Leo | 84 |
Virgo | 88 |
Libra | 87 |
Scorpio | 79 |
Sagittarius | 84 |
Capricorn | 92 |
Aquarius | 91 |
Pisces | 73 |
Total | 1067 |
To apply the chi-squared test, we simply compare the above numbers to the expected numbers if completely random, which is 1067 people/12=88.9 people according to:
where O are the observed data and E are the expected counts. Once we have the chi-square value and the degrees of freedom (11 in this case), we can look up in tables to get the p-value:
Normally, this might be the end of the story, given that there is not even close to a significant value (usual cut-off around p=0.05).
Subset of the Data
So, if we only take the extreme values, say:
Sign | Number of People |
Gemini | 110 |
Pisces | 73 |
Total | 183 |
then we calculate a different chi-squared, with 1 degree of freedom, and get
Now this is pretty silly: of course, if you take the extreme values of 12 numbers, and pretend that they came from a 2-category situation, then it'll appear more significant. What about lumping 6 points together, say Capricorn to Gemini (the first part of the year) and the second part. In this case we aren't cherry picking, and the sums should be less significant than the individual data. We then have:
Sign | Number of People |
Capricorn-Gemini | 565 |
Cancer-Sagittarius | 502 |
Total | 1067 |
And we expect 533.5 people in each category. Notice that we went from (the most extreme) 20 person difference from expected in about 100 to a 30 person difference in 500...closer to the expected. What do we get from our chi-squared test?
The test says that this is significantly different from random, more than the individual data! At least the goodness of fit measure, chi-squared value, went down to denote a closer fit to expected but the reduction in the number of data points changes the test quite a lot.
A different measure
E.T. Jaynes suggests in his book to use a different measure of goodness of fit, the &psi measure closely related to the log-likelihood
Using this measure on the above examples, we get
- All data: &psi = 28.9
- Extreme data: &psi = 39.1
- Lumped data: &psi = 8.1
An elementary question about the motivation for the chi-square formula:
ReplyDeleteIf I were to create something like the chi-squared statistic from scratch, I'd base it on the square of the "z-score" of the observed data in a bin. If the there are N observations and each has a probability of p of landing in the bin then the z-score when O observations are in the bin would be (O - Np)/ sqrt ( Np(1-p)) and the square of that would be (O - E)^2/ (E(1-p)) where E = Np is the expected number of observations in the bin. So there would be a factor of (1-p) in the denominator that is missing from the chi-squared statistic. is my algebra wrong? - or Is there an intuitive explanation of why the factor of (1-p) is left out?